UC Berkeley RDI @BerkeleyRDI
UC Berkeley's campus-wide, cross-disciplinary Center for Responsible, Decentralized Intelligence - RDI rdi.berkeley.edu Berkeley, CA Joined December 2021-
Tweets552
-
Followers4K
-
Following48
-
Likes302
🚨 The full program for Agentic AI Summit 2026 is now live. 📍 Aug 1–2 @ UC Berkeley 🔥 The largest Agentic AI event ever held Last year: 2,000+ in person, 40,000+ online This year: 5,000+ in person, hundreds of thousands on livestream Want to understand where Agentic AI is headed next? Join us to get the most comprehensive view of the frontier of Agentic AI, from cutting-edge research to production deployments, covering every layer of the stack: ⚡ Infrastructure ⚡ Foundation models & capabilities ⚡ Agent frameworks & platforms ⚡ Evaluation & benchmarks ⚡ Enterprise & consumer applications; agentic AI for Science, Math, Finance, Legal, Healthcare ⚡ Safety, security & governance 📣 Also excited to announce the Startup Spotlight: Building something exciting in Agentic AI? Apply to pitch directly to 5,000+ decision-makers, investors, practitioners in the room and hundreds of thousands watching worldwide. Application form in thread🧵 Deadline: July 6, 11:59pm PT The future of AI won't just be discussed here—it will be built here. #AgenticAI #AIAgents #ArtificialIntelligence
ALE is truly a community effort. Huge thanks to a distinguished advisory committee guiding our industry landscape and task collection: @gallantlab, @thg_lab, Tarek Zohdi, Carl Boettiger & @ksteinfe (@UCBerkeley) Laure Zanna, @kaanozbay (@nyuniversity) George Em Karniadakis (@BrownUniversity) Tapio Schneider (@Caltech) @Idasim (@UCSF) Arvind Rao (@UMich) @yannakakis (@UMmalta) Patrick Bryant (@scilifelab) @yaminirangan (@HubSpot) @brad_rothenberg (@nTopology) We are also deeply grateful to @BerkeleyRDI, RDI Foundation, @ChenInstitute, @UniPat_AI, @SnorkelAI (Open Benchmarks Grants program) for their support. A huge thank you as well to our incredible organizing and execution team, and to all of the experts and contributors who donated their time, expertise, and real-world projects to make ALE possible. This simply would not have happened without you.
Why "Last Exam"? The name has two meanings: "Last" as the bar to clear:passing these exams means an agent can actually do the job and continue to deliver economically-valuable work in that profession. "Last" as the frontier of difficulty:tasks are real, complex, long-horizon, and require professional expertise to execute. ALE sits right at the edge of what today's agents can reliably accomplish. Come test your agent on ALE → Website: agents-last-exam.org Tasks: agents-last-exam.org/demo Leaderboard: agents-last-exam.org/leaderboard Paper: arxiv.org/abs/2606.05405 Dataset: huggingface.co/datasets/agent… Code: github.com/rdi-berkeley/a…
The most common failure mode remains a familiar one: Agents declare success before they've truly verified their work. A typical completion reads: "Done. All checks pass." Yet the output may be missing required files, contain incorrect counts, omit key fields, or violate explicit constraints in the task specification. These failures occur far more often than many people expect. You can explore concrete examples in agents-last-exam.org/blogs/agent-sh….
Why do ALE's results look different from some other benchmarks, especially for Fable 5? Because there is no universally best agent. Every frontier model, including Fable 5, has domains where it shines and domains where it struggles. Aggregate scores average over 55 occupations and 1,500+ tasks, causing many models to cluster together. But the average is not the story. The real signal lies in where agents succeed, where they fail, and how those patterns differ across domains. On identical tasks, different models often fail for very different reasons. Explore the interactive breakdown in our blog → 👉 agents-last-exam.org/blogs/agent-sh…
In ALE, Fable 5 joins GPT-5.5 and Composer 2.5 in the same overall performance cluster. But performance is only half the story. Cost per task: → Fable 5: ~$15.70 → GPT-5.5: ~$3.80 → Composer 2.5: ~$1.33 At current pricing, Fable 5 delivers similar performance while costing roughly 4–12× more per completed task.
ALE-CLI is a CLI-only subset of ALE. Compared to Terminal-Bench and SWE-bench-Pro, it is broader, longer-horizon, and substantially more challenging: • Broader. Tasks span 40 of ALE's 55 industry subdomains, compared to just 6 in Terminal-Bench and 5 in SWE-bench-Pro. • Longer-horizon. Human completion times range from hours to weeks, rather than minutes to days. • Harder. The best-performing agent achieves only a 25.2% pass rate, compared to 82.0% on Terminal-Bench and 59.1% on SWE-bench-Pro. There's still a long way to go, and plenty of headroom left to climb. 📊👇
How does ALE compare to existing agent benchmarks? Many of today's agent benchmarks are rapidly saturating as frontier systems improve. ALE is designed to measure a different capability frontier: sustained, economically valuable work in real-world professional domains. • 55 industry domains • 1,500+ expert-sourced tasks • Full GUI + CLI environments • Outcome-based, verifiable evaluation If your agent only operates in the terminal, we've also released ALE-CLI: a CLI-only subset of the benchmark.
ALE is built from real work, not synthetic tasks. Every task is derived from a real project that a human expert previously completed, and converted into a verifiable evaluation with objective grading. No vibes. No human judges. Fully reproducible. ALE spans 55 non-physical occupations, grounded in the O*NET / SOC 2018, the U.S. federal occupation taxonomy. Built with 300+ experts from 100+ institutions across science, engineering, medicine, law, finance, education, and many other fields.
Everyone says the latest AI agents will be "job-ready" soon, especially after the release of Fable 5 this week. But is that really the case? Over the past many months, my group and collaborators have been building Agents' Last Exam (ALE), a benchmark designed to test exactly that claim on real digital labor-market work. My group and collaborators previously have created many of the benchmarks the field runs on, including MMLU, MATH, CyberGym, and ExploitGym. Today, I'm excited to share Agents' Last Exam (ALE): a rolling benchmark that measures whether AI agents can actually perform economically valuable work across a broad range of real-world domains. With ALE, we evaluated Fable 5, GPT-5.5, Composer 2.5, and other frontier agent systems across more than 1,500 expert-sourced tasks spanning 55 occupations. The result is both impressive and sobering. Today's agents can solve a meaningful fraction of professional tasks. But when we look at the hardest tasks, the ones requiring sustained reasoning, deep domain expertise, and reliable execution over long horizons, they are still far from human-level performance. On ALE's hardest tier, every frontier agent we tested, including Fable 5, achieved a 0% success rate. The age of useful agents is here. The age of truly job-ready agents is not. We hope Agents' Last Exam (ALE) will serve as a new guidepost and north star for developing agents capable of reliably performing economically valuable work across a broad range of domains. 🧵
🎟️ If you want to be in the room where the Agentic AI community is moving the field forward, now is the time to register! Early-bird tickets are nearly gone, and pricing will increase once they sell out. 📍 UC Berkeley 🗓️ August 1–2 Register: luma.com/agentic-ai-sum… See you in Berkeley this August! #agenticaisummit
🚀 2025 was the Year of Agents. 2026 is where the field begins to scale; from foundation models and agent frameworks to infrastructure, deployment, evaluation, safety, and real-world applications. Expect talks, demos, technical sessions, hallway conversations, coffee meetings, founder introductions, and discussions that will help shape the future of AI!
🧵 On August 1–2, the world’s largest event dedicated to Agentic AI returns to @UCBerkeley. #agenticaisummit Last year: • 2,000+ attended in person • 40,000+ joined online This year: • 5,000+ expected in person • Hundreds of thousands expected on livestream
“AI agents will outperform humans at almost all jobs by 2026–2027.” - The forecast is everywhere. So we built the exam to test that claim, on real labor-market aligned work. On the hardest tier, top agents pass 2.6%. Meet Agents' Last Exam (ALE), a rolling benchmark measuring whether agents can actually do real jobs. 🧵👇
My group & collaborators have built many of the benchmarks the field now runs on — MMLU, MATH, CyberGym, ExploitGym, etc.. I'm really excited to share our latest: Agents' Last Exam (ALE). Why "Last Exam"? The name has two meanings: "Last" as the bar to clear — passing these exams means an agent can actually do the job and continue to deliver economically-valuable work in that profession. "Last" as the frontier of difficulty — tasks are real, complex, long-horizon, and require professional expertise to execute. ALE sits right at the edge of what today's agents can reliably accomplish. A few things that make ALE different: • Real work, not vibes. Every one of the 1,500+ tasks comes from real projects or research contributed by domain experts. We converted them into verifiable tests and objectively graded evaluations — no human judges required. • Built for breadth. ALE spans 55 non-physical occupations based on the O*NET / SOC 2018 occupational taxonomy, with contributions from 300+ experts across 100+ institutions. • Judged on results, no restriction on process. We evaluate Generalist Computer-Use Agents (GCUAs) with full GUI + CLI access, allowing them to solve tasks however it would — clicking, typing, scripting, browsing, and more. We just grade the outcome. Huge thanks to my postdoc @YiyouSun for spearheading this tremendous effort, and to our esteemed advisory committee, incredible team and collaborators who made it possible. We hope Agents' Last Exam (ALE) will serve as a new guidepost and north star for developing agents capable of reliably performing economically valuable work across a broad range of domains. 🧵👇
“AI agents will outperform humans at almost all jobs by 2026–2027.” - The forecast is everywhere. So we built the exam to test that claim, on real labor-market aligned work. On the hardest tier, top agents pass 2.6%. Meet Agents' Last Exam (ALE), a rolling benchmark measuring
🎉 The Agents in the Wild: Safety, Security, and Beyond workshop @ICLR2026 is less than a week away! Join us April 26 in Room 204 A/B, Riocentro, Rio de Janeiro! 🌴 Safety and security for AI agents — both foundational and emerging challenges — demand serious attention. Researchers and practitioners are mobilizing: ▪️ 151 papers accepted ▪️ 161 reviewers (58% industry, 42% academia) ▪️ Up to 800 participants expected ▪️ Incredible engagement on a topic that clearly matters. The schedule: 👇
Looking forward to speaking at Berkeley's Agentic AI Summit later this year, alongside some other great guests.
🚀 The largest Agentic AI event ever — Agentic AI Summit 2026, Aug 1–2 @UCBerkeley Last year: 2,000+ in person, 40,000+ online. This year: 5,000+ in person, hundreds of thousands on livestream. 2025 was the "Year of Agents"; 2026 is poised to be even more explosive. Two days of
🚀 The largest Agentic AI event ever — Agentic AI Summit 2026, Aug 1–2 @UCBerkeley Last year: 2,000+ in person, 40,000+ online. This year: 5,000+ in person, hundreds of thousands on livestream. 2025 was the "Year of Agents"; 2026 is poised to be even more explosive. Two days of important conversations shaping the field — with researchers, founders, AI leaders, VCs, and policymakers across the full stack: infrastructure, foundation models, agent frameworks, training, continual learning, self-improvement, evaluation, applications, deployment, and safety/security. See you in Berkeley this August 🌟 Speaker application, summit registration links in 🧵
x.com/MogicianTony/s… 🧵 1/ Our agent Terminator-1 scored ~100% on 8 major AI agent benchmarks, e.g., SWE-bench Verified & Pro, Terminal-Bench, beating Claude Mythos. It solved 0 tasks. Benchmarks are the field's shared language for measuring AI progress. Our new work shows that language is broken. Here’s how.
SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone,
Dawn Song @dawnsongtweets
38K Followers 830 Following Professor in Computer Science at UC Berkeley, co-Director of Berkeley RDI Center; Building safe, secure, decentralized AI; Serial entrepreneur
EduDAO @Edu_DAO
4K Followers 60 Following Supporting the innovators of tomorrow. Follow us on our journey! 👉 https://t.co/dJoD6SibWS
Andrew Miller @socrates1024
23K Followers 5K Following interim manager @ teleport computer 🛡️ dstack integrations 🏫 https://t.co/LZtbefGx8o
Ⱥlgorand Central Eur... @AlgoDACH
1K Followers 2K Following This is the Ⱥlgorand Hub for the Central Europe Region | DACH + Nordic + BeNeLux + CEE
THUBA DAO @THUBA_DAO
5K Followers 254 Following Tsinghua University Blockchain Association (THUBA). @Tsinghua_Uni To cultivate the next generation of Web3 leaders. President @EggryRan
JT Media @JT__Media
17K Followers 4K Following Founder of JT Media on Youtube | Crypto since 2017 | Analyst | Municipal Accountant | Real Estate Agent | Investor | Media Requests: [email protected]
Timna @timnaWK
78 Followers 298 Following
Zhongjun "Mark" Jin @zhongjun_jin
129 Followers 582 Following Building https://t.co/AGJFFbobyy @tiktok_us | PhD from @UMichCSE advised by @MikeCafarella and Jag | Prev: @MSFTResearch @Trifacta
Lucas Aschenbach @AschenbachLucas
160 Followers 763 Following Math @ TUM, UC Berkeley | Visiting @berkeley_ai
Mike Horton @mikeahorton
6K Followers 774 Following @GEODNET, @hyfixai, @anellophotonics, Crossbow Technology, @ucberkeley, TX
Vinh VC @VinhVC280000000
36 Followers 1K Following AI Researcher/Scientist, Building the next top-tier Deep Learning/Neural Network engine
west len @Davidsh0818
4 Followers 180 Following
Shirin @shirinmojarad
419 Followers 679 Following Personalizing Gemini @Google. PhD in ML | Author | Podcaster | Sharing daily AI nuggets | Opinions mine.
Practically Perfect @practiklyperfct
192 Followers 1K Following Angel investor. Secondary markets since 2009. Early: SpaceX, Palantir Now: E.D.E.N., energy, inference. Help founders build, sell, partner. Connector. DM open.
Kedar Kshatriya @xkedar
699 Followers 3K Following BD & DeFi @GSR_io || prev VC @Ftda_us || Engineer || Builder
Nisheed Meethal @NishMeethal
0 Followers 21 Following
Kibbie (e/acc) @JamesKibbie
1K Followers 2K Following Ventures @Galaxyhq | Live Free or Die | (Views are my own) Former Point72 Ventures & DRW VC
Dave @DJG_GJD
47 Followers 120 Following
Edgaras Genske @EdgarasGenske
2 Followers 161 Following
Sandeep Kumawat @sandeep_511
1 Followers 103 Following
Bastian Wetzel @bastian_wetzel
576 Followers 1K Following Investing @ Cambrena Capital | HSG & UC Berkeley
Ivy Yang @ivylala
3K Followers 3K Following Writes @ftChinese column 话语权时代 and Calling the Shots. Interested in the role of comms and external affairs in global tech companies. https://linktr.e
Alex C @Aycheung95
5 Followers 1K Following
Steven Dillmann @StevenDillmann
576 Followers 1K Following Stanford PhD working on #AI4Science and maintaining Terminal-Bench Science @StanfordAILab 🧬🤖🪐
Fabio Rinaldi @fabiorinaldizh
61 Followers 89 Following
Federico Galassi @federicogalassi
491 Followers 452 Following Software craftsman. Googler. I love javascript, ruby, vim, git, the pomodoro technique, agile and books.
Rupali Bhati @BhatiRupali
2K Followers 1K Following PhD @Northeastern | RL | MARL | Ex @CHAI_Berkeley, @MATSprogram, @Mila_Quebec
Time roll @RollTime52699
6 Followers 54 Following
Snorkel AI @SnorkelAI
17K Followers 328 Following Frontier AI Data Lab advancing AI through better data
Nicolay Karnicov @NKarnicov33167
19 Followers 331 Following
Yanyi PU @PuYanyi716
0 Followers 5 Following
Scott Reed @ReadScottReed
104 Followers 367 Following Prof. of Chemistry, https://t.co/c457LEudUo creator
danglingpointer @sunshotai
17 Followers 2K Following
Hanna Kim @gkssk3654
1 Followers 14 Following
Alind Jain @alindjain11
26 Followers 614 Following
Ricky Esclapon (ricky... @rickydata42
2K Followers 2K Following Senior Data Agent Architect @CambrianNetwork | Prev. Data Scientist working on @graphprotocol. Building agentic stack at https://t.co/Pir0pM807C
Bugs @bugsallover
22 Followers 207 Following
James Dunn @calatberk
67 Followers 562 Following Software Engineering Manager, Comcast Silicon Valley
Anastasia Bizyaeva @anastasiabzv
1K Followers 1K Following Assistant professor @CornellMAE. Collective behaviors, networks, control, dynamical systems, nonlinearity mostly here now https://t.co/6SInlG1MPO
Jaron Fontaine @JaronFontaine
58 Followers 245 Following PhD, researcher at Ghent University | IDLab | imec
Itsik Mantin @Itsik_Mantin
20 Followers 231 Following
anilkapur @anilkapur
36 Followers 4 Following
Akshit @Akkshit_20
0 Followers 299 Following
Kalimera 1 @1_kalimera
57 Followers 77 Following
7erry @7erryX
0 Followers 47 Following
Dawn Song @dawnsongtweets
38K Followers 830 Following Professor in Computer Science at UC Berkeley, co-Director of Berkeley RDI Center; Building safe, secure, decentralized AI; Serial entrepreneur
EduDAO @Edu_DAO
4K Followers 60 Following Supporting the innovators of tomorrow. Follow us on our journey! 👉 https://t.co/dJoD6SibWS
Shafi Goldwasser @ShafiGoldwasser
90 Followers 5 Following
Jiantao Jiao @JiantaoJ
2K Followers 161 Following Director of Research & Distinguished Scientist at @NVIDIA. UC Berkeley. Building AGI/ASI
Yupeng Zhang @YupengZhang7
2K Followers 276 Following Assistant Professor @ECEILLINOIS University of Illinois Urbana-Champaign
UC Berkeley EECS @Berkeley_EECS
10K Followers 459 Following The Department of Electrical Engineering & Computer Sciences at UC Berkeley. Pioneering the frontiers of science & technology with broad impact on society.
BerkeleyNLP @BerkeleyNLP
7K Followers 37 Following We work on natural language processing, machine learning, linguistics, and deep learning. PIs: Dan Klein, @alsuhr, @sewon__min
Stanford AI Lab @StanfordAILab
255K Followers 333 Following The Stanford Artificial Intelligence Laboratory (SAIL), a leading #AI lab since 1963. ⛵️🤖 Emmy-winning video: https://t.co/lV9smZTC1m
Story @StoryProtocol
564K Followers 187 Following AI-native infrastructure for the $80T IP asset class.
Connor Spelliscy @c_spelliscy
2K Followers 582 Following Head of Global Policy Strategy at @ethereumfndn. Prev: Co-Founder at @TheDRC_, @BlockchainAssn, & @web3canada. No legal or investment advice.
Decentralization Rese... @TheDRC_
5K Followers 23 Following Non-profit advocating for decentralization as a fundamental characteristic of emerging technologies. Monthly newsletter: https://t.co/lmH9yV5Dmm
she256 @she_256
8K Followers 213 Following 501c3 nonprofit aiming to increase diversity & break down barriers to entry in the crypto space. Governance delegates in UNI, OP, COMP, ENS +
Ce Zhang @ce_zhang
3K Followers 1K Following CTO @ Together @togethercompute Neubauer Associate Professor @UChicago
trevordarrell @trevordarrell
3K Followers 125 Following EECS, BAIR, UC Berkeley. Director, BAIR Commons Program.
RebeccaWexler @RebeccaWexler
2K Followers 862 Following Data, Tech, and Secrecy in the Criminal Legal System. Alfred W. Bressler Professor of Law. @ColumbiaLaw. Affiliate Fellow @yaleisp. She/Her
Toby Stuart @tobystuart
660 Followers 88 Following Helzel Professor @BerkeleyHaas. Cofounder @FlockFreight | @BlckVentureInst. Former VC @ AvidParkVentures. Board member: @Flyrlabs | @HNTBcorp. Instigator.
Joey Gonzalez @profjoeyg
6K Followers 575 Following Professor @UCBerkeley and co-founder/advisor @RunLLM, @Inferact, @Letta_AI, and @genmoai
Joe Hellerstein @joe_hellerstein
15K Followers 856 Following Berkeley CS Prof, focused on data and computation.
Raluca Ada Popa @ralucaadapopa
7K Followers 178 Following Head of Security and Privacy Research @ Google DeepMind. @UCBerkeley security professor. MIT PhD. Co-founder of @OpaqueSys, @imua & @PreVeil.
Natacha Crooks @siobhcroo
4K Followers 613 Following Assistant Professor at UC Berkeley. Distributed Systems & databases. Former engineer at Materialize. PhD UT Austin. Originally from Paris, France. Views my own.
Rich Lyons @richlyons
12K Followers 594 Following Chancellor, UC Berkeley, and proud undergrad alum. Tweets are my own. Go Bears.
The Stanford Center f... @CBRStanford
3K Followers 29 Following The Center for Blockchain Research (CBR) is a focused research effort on crypto-currencies and blockchain technologies.
Dan Robinson @danrobinson
84K Followers 1K Following coder / lawyer. research at @paradigm. automated research reply guy
Christian Catalini @ccatalini
23K Followers 5K Following Founder w/roots in academia. Founder @MIT Cryptoeconomics Lab. Past: Co-Founder & Chief Strategy Officer, Lightspark. Co-Creator, Libra. Head Economist, Meta.
Berkeley Computing, D... @BerkeleyCDSS
7K Followers 598 Following News on data science and computing research and education from the UC Berkeley College of Computing, Data Science, and Society
Berkeley AI Research @berkeley_ai
275K Followers 459 Following We're graduate students, postdocs, faculty and scientists at the cutting edge of artificial intelligence research.
San Francisco Blockch... @SFBWofficial
4K Followers 906 Following #SFBW22 is coming back! Dates: October 31 - November 6, 2022
IC3 @initc3org
12K Followers 283 Following The Initiative for CryptoCurrencies and Contracts | Stay Ahead of Blockchain Research📍Cornell Tech, NYC
Marko Vukolić @marko_vukolic
510 Followers 165 Following @BTCScalingLabs CEO and Co-founder. Was: ConsensusLab Lead at PL, co-architect of Hyperledger Fabric, Principal RSM at IBM, faculty at ETHZ and EURECOM.
Penn Blockchain @PennBlockchain
5K Followers 359 Following Undergraduate and graduate/MBA organization for blockchain and crypto-curious students at the University of Pennsylvania.
Berkeley Blockchain X... @Xcelerator
3K Followers 207 Following Official feed for Berkeley Blockchain Xcelerator at UC Berkeley. #GoBears
CITRIS @citrisnews
4K Followers 3K Following A @UofCalifornia IT research center at @UCBerkeley, @UCDavis, @UCMerced & @UCSC. This account is no longer active. Please follow our other social channels.
Berkeley Center for L... @BerkLawBusiness
1K Followers 106 Following BCLB is UC Berkeley's community of scholars, professionals, students & policymakers sharing information about developments in law & business
UC Berkeley Law @BerkeleyLaw
22K Followers 2K Following Official Twitter channel of the University of California, Berkeley, School of Law. RTs are not endorsements.
FiatLuxDAO 💡 @FiatLuxDAO
1K Followers 36 Following A collective of optimistic Cal Bears building the first Alumni DAO. Let there be light.
Mantle Network @0xMantle
61K Followers 77 Following Building the liquidity chain of the future — driving capital efficiency in the on-chain economy.
Blockchain at Berkele... @CalBlockchain
28K Followers 2K Following Student-run organization at UC Berkeley focused on blockchain innovation via education, research, and consulting. Established 2016.
Berkeley Engineering @Cal_Engineer
35K Followers 737 Following @UCBerkeley's College of Engineering - Educating Leaders, Creating Knowledge, Serving Society
UC Berkeley Haas @BerkeleyHaas
71K Followers 7K Following University of California, Berkeley | Question the status quo | Confidence without attitude | Students always | Beyond yourself
UC Berkeley SCET @SutardjaCenter
2K Followers 392 Following The Sutardja Center for Entrepreneurship & Technology at @UCBerkeley @Cal_Engineer. Startups & alternative meat.
Simons Institute for ... @SimonsInstitute
10K Followers 340 Following The world's leading venue for collaborative research in theoretical computer science. Follow us at https://t.co/KvcuGI7WM0.



















