Jeff Da @_jeffda
Research Scientist @scale_ai. Research on Reinforcement Learning, Agents, Reasoning. Ex: @allen_ai jeffda.com Joined July 2017-
Tweets166
-
Followers445
-
Following861
-
Likes913
We are sharing an early preview of our ongoing SWE-1.6 training run. It significantly improves upon SWE-1.5 while being post-trained on the same pre-trained model - and it runs equally as fast at 950 tok/s. On SWE-Bench Pro it exceeds top open-source models. The preview model still exhibits some undesirable behaviors like overthinking and excessive self-verification, which we aim to improve. We are rolling out early access to a small subset of users in Windsurf.
OpenAI is moving away from SWE-Bench Verified, citing challenges on underspecified tasks, misaligned tests, and contamination. We agree. These were exactly the motivations behind SWE-Bench Pro (arxiv.org/pdf/2509.16941). What we changed: → Underspecified tasks: structured, executable problem definitions → Contamination: strict curation + private / commercial codebases But this is just step one. Where we’re pushing frontier coding evals next: → Beyond unit tests: rubric-based evaluation (arxiv.org/pdf/2601.04171) → From static tasks to real-world agentic environments Modern coding systems are not solving isolated problems. They operate as agents over repos, tools, and long-horizon workflows. Our evals need to reflect that. SWE-Bench Pro is one step toward more realistic and reliable evaluation for coding agents. We’ll keep pushing the frontier.
The standard for frontier coding evals is changing with model maturity. We now recommend reporting SWE-bench Pro and are sharing more detail on why we’re no longer reporting SWE-bench Verified as we work with the industry to establish stronger coding eval standards. SWE-bench Verified was a strong benchmark, but we’ve found evidence it is now saturated due to test-design issues and contamination from public repositories. openai.com/index/why-we-n…
Introducing M2.5, an open-source frontier model designed for real-world productivity. - SOTA performance at coding (SWE-Bench Verified 80.2%), search (BrowseComp 76.3%), agentic tool-calling (BFCL 76.8%) & office work. - Optimized for efficient execution, 37% faster at complex tasks. - At $1 per hour with 100 tps, infinite scaling of long-horizon agents now economically possible MiniMax Agent: agent.minimax.io API: platform.minimax.io CodingPlan: platform.minimax.io/subscribe/codi…
GPT-5.3-Codex's much better token efficiency *AND* faster inference is the biggest story of this release. Folks at @OpenAI worked hard to improve this and it will only get better from here.
GPT-5.3-Codex is here! *Best coding performance (57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld). *Mid-task steerability and live updates during tasks. *Faster! Less than half the tokens of 5.2-Codex for same tasks, and >25% faster per token! *Good computer use.
GPT-5.3-Codex is here! *Best coding performance (57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld). *Mid-task steerability and live updates during tasks. *Faster! Less than half the tokens of 5.2-Codex for same tasks, and >25% faster per token! *Good computer use.
This release is an emtional one for me because I had stayed up so much for it 🥹 It has been truly amazing to see this model becomes better bit by bit through every change we make, and we have come a long way. Since I did mid-training for this model, I wanted to share a little anecdote about this part. We really made this model with user experience as first-class consideration. We want people to actually use it, period. We took it so serious that we redid midtraining because we saw cases where models failed to follow instructions on out-of-distribution scaffolds. We decided straight-up that we would fix this in a fundamental way instead of surface-level patching. The resulting base model, which we also release, is thus a healthy base. We find that, compared to other base models, this one better learns new tasks. Try fine-tuning our base and lmk what you think 🥳 huggingface.co/Qwen/Qwen3-Cod…
🚀 Introducing Qwen3-Coder-Next, an open-weight LM built for coding agents & local development. What’s new: 🤖 Scaling agentic training: 800K verifiable tasks + executable envs 📈 Efficiency–Performance Tradeoff: achieves strong results on SWE-Bench Pro with 80B total params and
A strong and fast open-source coding model, and a tech report 😍
🚀 Introducing Qwen3-Coder-Next, an open-weight LM built for coding agents & local development. What’s new: 🤖 Scaling agentic training: 800K verifiable tasks + executable envs 📈 Efficiency–Performance Tradeoff: achieves strong results on SWE-Bench Pro with 80B total params and
#1 open source on SWE-Bench Pro. Ahead of Gemini 3 Flash. Level with Haiku 4.5. Thanks @scale_AI for the solid benchmark. Let's keep pushing forward 💪
JUST ADDED: @MiniMax_AI 2.1 just joined our SWE-Bench Pro leaderboard. Check out the updated rankings: scale.com/leaderboard/sw…
Rubrics are effective verifiers for SWE-Agents!
🚀New @scale_AI research: Verifiers for SWE Agents have traditionally used unit tests or simple, execution-free classifiers. But can we get verifiers that are more expressive, repository-grounded, and still execution-free at scoring time? We explore Agentic Rubrics to fill this
🚀New @scale_AI research: Verifiers for SWE Agents have traditionally used unit tests or simple, execution-free classifiers. But can we get verifiers that are more expressive, repository-grounded, and still execution-free at scoring time? We explore Agentic Rubrics to fill this gap 💡 Agentic Rubrics are repo-grounded, execution-free verifiers for SWE agents. We generate a checklist of concrete, codebase-specific criteria using an Agentic Harness, and then score patches against it. 🧑💻
Results: - self-improvement on SWE-bench Verified (+10.4) and Pro (+7.8) - better than the baseline RL using human issue data over the course of training
New Scale research: Do AI models actually reason in ways humans can trust for real-world decisions? Introducing MoReBench, the first benchmark for procedural moral reasoning in LLMs, measuring not just what models decide, but how they reason through moral ambiguity.
@scale_AI Check out the paper and dataset: Paper: static.scale.com/uploads/674f4c… Github: github.com/scaleapi/mcp-a… Dataset: huggingface.co/datasets/Scale… Leaderboard: scale.com/leaderboard/mc…
New open-source benchmark from @scale_AI: MCP-Atlas MCP-Atlas is a large-scale benchmark for evaluating tool-use competency using 36 real MCP servers and 220 tools. The benchmark was featured in recent model cards (GPT, Claude, Gemini), and now it's open-source!
🚀 Today we’re open-sourcing MCP Atlas — a large-scale, real-server benchmark for agentic tool use, which has been used in the recent GPT-5.2, Claude Opus 4.5, and Gemini 3 Flash model releases! 🧠 Key insight: realistic agentic tool use is not a function-calling problem. It
We recently introduced MCP-Atlas, a benchmark for evaluating how well LLMs handle tool use via the Model Context Protocol. Even top models failed nearly half of realistic multi-tool tasks. Today, we’re open-sourcing the benchmark so you can measure performance yourself.
🚀 Today we’re open-sourcing MCP Atlas — a large-scale, real-server benchmark for agentic tool use, which has been used in the recent GPT-5.2, Claude Opus 4.5, and Gemini 3 Flash model releases! 🧠 Key insight: realistic agentic tool use is not a function-calling problem. It requires tool discovery, orchestration, and recovery in real environments. 🔧 MCP Atlas evaluates agents on real MCP servers (36 servers, 220 tools, 1K human-written tasks). Models must find the right tools, call them correctly, chain them together, and handle failures. 📉 What we found: • Agents fail more often at tool interaction than at reasoning • Performance drops sharply with real-world tool friction • Scaling models helps unevenly, robustness remains hard • Claims-based eval reveals how agents fail, not just if they finish Check it out! 📄 Paper: static.scale.com/uploads/674f4c… 🌍 Environment: github.com/scaleapi/mcp-a… 📂 Dataset: huggingface.co/datasets/Scale… 📊 Leaderboard: scale.com/leaderboard/mc… #AgenticAI #ToolUse #LLMEval #Benchmarks #MCP
Jungo Kasai (Kotoba) @jungokasai
2K Followers 666 Following Co-founder & CTO @kotoba_tech | PhD from @nlpnoah at @UW | IBM PhD Fellow | 孫正義育英財団生 | @Yale Undergraduate
Sebastian Gehrmann @sebgehr
6K Followers 2K Following Head of Responsible AI, CTO office @Bloomberg. Formerly LLMs @ Google Brain / PhD @ Harvard. views my own
Vivek Gupta ✈️ AC... @keviv9
4K Followers 6K Following Assistant Prof @SCAI_ASU; PostDoc @cogcomp @Penn, ed-@UUtah,@iitkanpur. @Bloomberg @MSFTResearch Fellow; ex-@MetaAI @IBM @samsungresearch
Antoine Bosselut @ABosselut
4K Followers 614 Following Helping machines make sense of the world. Asst Prof @ICepfl; Before: @stanfordnlp @allen_ai @uwnlp @MSFTResearch #NLProc #AI
Han Guo @HanGuo97
4K Followers 4K Following PhD Student @MIT_CSAIL | Past: @togethercompute @LTIatCMU @MITIBMLab @UNCNLP, @SFResearch, @BaiduResearch | Machine Learning, NLP.
Weijia Shi @WeijiaShi2
10K Followers 2K Following PhD student @uwnlp | Prev @allen_ai @MetaAI @CS_UCLA
Alexis Ross @alexisjross
4K Followers 1K Following currently @humansand | phd-ing @MIT_CSAIL & working towards personalized AI tutors | formerly @allen_ai, @harvard '20
Ofir Press @OfirPress
18K Followers 9K Following I push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.
Steven Feng @stevenyfeng
3K Followers 461 Following Stanford CS PhD student @stanfordnlp @StanfordAILab. Master's from Carnegie Mellon @LTIatCMU. NLP, Computer Vision, Machine Learning, and AI research.
Pei Zhou @peizNLP
3K Followers 908 Following Senior Applied Scientist @Microsoft #OAR | PhD @nlp_usc | X-@GoogleDeepMind @allen_ai @AmazonScience @UCLA | Common Ground Reasoning for Communicative Agents
Alex Amerling @amerlingalex
329 Followers 1K Following helping to democratize access to opportunity @joinhandshake. @wabashcollege alum
Nancy U @HamideYama36664
6 Followers 765 Following whispers to the moon, yells at my phone 🌙 follow back always
Ricardo Olmedo @rdolmedo_
853 Followers 341 Following PhD student @MPI_IS, working with Moritz Hardt and Bernhard Schölkopf | Currently visiting @Stanford
Hao Wang @MogicianTony
2K Followers 283 Following PhD student at @UCBerkeley, @berkeley_ai, @BerkeleySky. Prev @PKU1898 Building better AI evals and secure AI
Obinna Iheonu @obinnaiheonu
724 Followers 8K Following
Abel Le @tasuke2k3
96 Followers 4K Following
Jordan Feldstein @jfeldstein
726 Followers 692 Following
John Zhou @jzjw1129
17 Followers 211 Following Executive Talent Acquisition Specialist at Baidu, focusing on leadership hiring for AI foundation models, cloud computing, and autonomous driving.
itzkushaan @itzkushaan
4 Followers 376 Following
Tonghui @Tonghui26
52 Followers 4K Following Creating original IP characters, expanding into animation and games.
Afra Feyza Akyürek @afeyzaakyurek
988 Followers 824 Following Currently @scale_AI. PhD from @BUCompSci. Research in NLP. Previously @CMU_Stats @kocuniversity @izmirfenlise
John Yang @jyangballin
6K Followers 1K Following CS PhD @Stanford. Created @SWEbench (multi-lingual/modal); SWE-agent; SWE-smith; InterCode; CodeClash; ProgramBench
Julia Roberts @ScarlettJo82733
17 Followers 646 Following account created specifically for my fans ❤️ I love the universal world
fr @derrickkimani
387 Followers 2K Following
Yannis He @yannis__he
36 Followers 69 Following AI PM at @scale_ai | Previously @vectorInst, @UofTRobotics, @BainandCompany
Zhenting Qi @ZhentingQi
1K Followers 710 Following CS PhD @Harvard | Researcher @MetaFAIR @GoogleDeepMind @MIT @MITIBMLab @MSRAsia | Alumni @UofIllinois @ZJU_China
Niklas Lauffer @NiklasLauffer
235 Followers 90 Following Final-year PhD in AI @berkeley_ai | previously @scale_AI
Jeff Ma ✈️ ICML @18jeffreyma
581 Followers 959 Following CS PhD @Harvard, prev @GoogleAI, @AmazonScience, @Citadel, @Nuro, @Caltech Created https://t.co/EBHm5qwcPO, https://t.co/QvK7qCNIjS
Om Donimo @untax100k
13 Followers 526 Following Self-care comes first. untax100k Our name is our mission.
Pooja Nagpal @PoojaIsNagpal
951 Followers 4K Following @trustvariance @joinformal - security | @dcgco, 💻 @papayapay, @BerkeleyEECS
Ariah Noetzel @airnozel
12 Followers 615 Following
instantmerits @modxmeconomxm
70 Followers 3K Following
Keunhong Park @KeunhongP
2K Followers 665 Following Training models at World Labs. (https://t.co/a81eDVLlXF). Creator of FrameBoy (RTFM). Opinions are my own.
Justus Mattern @MatternJustus
8K Followers 849 Following Co-Founder @ProximalHQ | prev. research @PrimeIntellect, @MPI_IS and built revideo
Carlos Rodriguez @CarlosToAi
4 Followers 76 Following Server→AI Engineer (Day 1/180) | Carlos Rodríguez CS50P→https://t.co/lCgh3T6iSg→Kaggle | Python•PyTorch•HF Open to junior roles @huggingface @xai Portfolio ↓ #AICareerSwitch
civil liberation for ... @oppoulos
88 Followers 3K Following
Ziqian Zhong @fjzzq2002
2K Followers 973 Following Intern @TransluceAI | AI interp & alignment @CarnegieMellon, prev @MIT @pika_labs
penryn @0xPenryn
6K Followers 2K Following wearing many hats @worldnetwork. everything is legos. i am @SHL0MS. prev: @tfh_technology @WindrangerLabs ConstitutionDAO
Tim Bauer @TimTheSloth
388 Followers 275 Following Brand Experience @scale_ai. Unashamed generalist.
Alex Dimakis @AlexGDimakis
24K Followers 3K Following Professor, UC berkeley | Founder @bespokelabsai |
Maryam @Sci_Tech_Eng
73 Followers 7K Following Exploring in neural networks from inside purely biological mind with heavy cognition architecture & mapping the phase space where thought becomes destiny.
Kaivu Hariharan @KaivuHariharan
350 Followers 526 Following but we must build as if the sand were stone | @fulcrum_inc
AI at Meta @AIatMeta
811K Followers 324 Following Together with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.
Ai2 @allen_ai
85K Followers 440 Following Breakthrough AI to solve the world's biggest problems. › Join us: https://t.co/MjUpZpKPXJ › Newsletter: https://t.co/k9gGznstwj
Victor Zhong @hllo_wrld
6K Followers 508 Following ML+NLP AP @UWCheritonCS, @cifar_news AIChair @vectorinst. Former @MSFTResearch @MetaAI, @SFResearch via @MetamindIO, @uwnlp, @StanfordNLP, @eceuoft.
clem 🤗 @ClementDelangue
411K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform for AI builders
Greg Durrett @gregd_nlp
8K Followers 914 Following Associate professor at NYU (Courant CS + Center for Data Science) | advisor for @bespokelabsai | large language models and NLP | he/him
William Wang @WilliamWangNLP
21K Followers 770 Following CEO & Founder, @AlphaDesignAI. We make https://t.co/1LfDYicsF2 I'm also Mellichamp Chair Prof. at UCSB CS. PhD @ CMU SCS.
Dipanjan Das @dipanjand
6K Followers 319 Following Researcher at @GoogleDeepmind. Factuality and Gemini x Search.
Swabha Swayamdipta @swabhz
7K Followers 479 Following Assistant Prof. @CSatUSC | Researcher in #NLProc | Previously @uwnlp @allenai
Jungo Kasai (Kotoba) @jungokasai
2K Followers 666 Following Co-founder & CTO @kotoba_tech | PhD from @nlpnoah at @UW | IBM PhD Fellow | 孫正義育英財団生 | @Yale Undergraduate
Tim Dettmers @Tim_Dettmers
46K Followers 904 Following Creator of bitsandbytes. Professor @CarnegieMellon and Research Scientist @allen_ai . I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.
Richard Socher @RichardSocher
120K Followers 1K Following Building self-improving superintelligence CEO @recursive_si and @youdotcom MP @aixventuresHQ Ex: Stanford Adj Prof, Chief Scientist at Salesforce, CEO MetaMind
Yoav Artzi @yoavartzi
19K Followers 191 Following Research/prof @cs_cornell + @cornell_tech🚡 / https://t.co/9YnWry7yHs / researcher @GoogleDeepMind / asso. faculty director @arxiv / building @COLM_conf
François Chollet @fchollet
702K Followers 826 Following Co-founder @ndea. Co-founder @arcprize. Creator of Keras and ARC-AGI. Author of 'Deep Learning with Python'.
Mark Dredze @mdredze
6K Followers 787 Following John C Malone Professor at @JohnsHopkins @JHUCompSci @jhuclsp @jhumceh; Part time @techatbloomberg (tweets my own) Director @hopkinsdsai @mdredze.bsky.social🦋
Sebastian Gehrmann @sebgehr
6K Followers 2K Following Head of Responsible AI, CTO office @Bloomberg. Formerly LLMs @ Google Brain / PhD @ Harvard. views my own
Vivek Gupta ✈️ AC... @keviv9
4K Followers 6K Following Assistant Prof @SCAI_ASU; PostDoc @cogcomp @Penn, ed-@UUtah,@iitkanpur. @Bloomberg @MSFTResearch Fellow; ex-@MetaAI @IBM @samsungresearch
Hao Wang @MogicianTony
2K Followers 283 Following PhD student at @UCBerkeley, @berkeley_ai, @BerkeleySky. Prev @PKU1898 Building better AI evals and secure AI
Ricardo Olmedo @rdolmedo_
853 Followers 341 Following PhD student @MPI_IS, working with Moritz Hardt and Bernhard Schölkopf | Currently visiting @Stanford
Olive Song @olive_jy_song
1K Followers 145 Following I study RL & Evals @MiniMax_AI · Alum @NYU_Courant · Dig deep; Collaborate openly; Make things happen.
John Yang @jyangballin
6K Followers 1K Following CS PhD @Stanford. Created @SWEbench (multi-lingual/modal); SWE-agent; SWE-smith; InterCode; CodeClash; ProgramBench
Joon Sung Park @joon_s_pk
20K Followers 1K Following CEO @simile_ai. Building simulations of society. CS PhD @stanfordhci + @stanfordnlp. Oil painter.
Yannis He @yannis__he
36 Followers 69 Following AI PM at @scale_ai | Previously @vectorInst, @UofTRobotics, @BainandCompany
Vercept @Vercept_ai
5K Followers 11 Following Building the AI Computer Interface of the Future Now part of @AnthropicAI. Building the future of human potential through safe AI.
Phil Chen @philhchen
9K Followers 624 Following Building something new. Previously research @openai @GoogleDeepMind @scale_AI @Stanford
Zhenting Qi @ZhentingQi
1K Followers 710 Following CS PhD @Harvard | Researcher @MetaFAIR @GoogleDeepMind @MIT @MITIBMLab @MSRAsia | Alumni @UofIllinois @ZJU_China
Afra Feyza Akyürek @afeyzaakyurek
988 Followers 824 Following Currently @scale_AI. PhD from @BUCompSci. Research in NLP. Previously @CMU_Stats @kocuniversity @izmirfenlise
Jeff Ma ✈️ ICML @18jeffreyma
581 Followers 959 Following CS PhD @Harvard, prev @GoogleAI, @AmazonScience, @Citadel, @Nuro, @Caltech Created https://t.co/EBHm5qwcPO, https://t.co/QvK7qCNIjS
Calvin Zhang @calvincbzhang
4K Followers 653 Following Evals @OpenAI | Previously @scale_AI @CHAI_Berkeley @MIT @ETH @OfficialUoM
Niklas Lauffer @NiklasLauffer
235 Followers 90 Following Final-year PhD in AI @berkeley_ai | previously @scale_AI
Justus Mattern @MatternJustus
8K Followers 849 Following Co-Founder @ProximalHQ | prev. research @PrimeIntellect, @MPI_IS and built revideo
Conference on Languag... @COLM_conf
7K Followers 7 Following https://t.co/GhGCMEoHU8 Conference: October 7, 2025
Ziqian Zhong @fjzzq2002
2K Followers 973 Following Intern @TransluceAI | AI interp & alignment @CarnegieMellon, prev @MIT @pika_labs
Zonghan Yang @yang_zonghan
2K Followers 2K Following PhD student at Tsinghua NLP & AIR, studying agents that automate tasks ranging from daily activities to creative endeavors. Two drifters with the world to see.
VraserX e/acc @VraserX
22K Followers 435 Following Teacher by heart, AI enthusiast by curiosity, passionate about inspiring minds, exploring tech, and making learning exciting, human, and future-focused!
Bing Liu @vbingliu
2K Followers 127 Following Head of Research @Scale_AI, ex-Meta, Llama 3, PhD @CarnegieMellon.
Zifan (Sail) Wang @_zifan_wang
3K Followers 745 Following AI Safety @AIatMeta MSL | ex-RS @scale_AI (SEAL) and @cais | PhD Alumni of CMU @cylab | Opinions of my own
Marzieh Fadaee @mziizm
2K Followers 800 Following exploring the longitude problem of AI. Head of @Cohere_Labs. PhD from @UvA_Amsterdam. https://t.co/YI5NC5J5e4. زن، زندگی، آزادی
Yu Su @ysu_nlp
15K Followers 1K Following co-founder @NeoCognition | prof. @osunlp | sloan fellow | building towards abundance of specialized intelligence
Miles Grimshaw @milesgrimshaw
13K Followers 4K Following Thrive Capital. @cursor_ai @chaidiscovery @turbopuffer @SocketSecurity @Revel_Software @meshoptical @doji_com @langchainai @benchling @monzo @segment @airtable
Johannes Hagemann @johannes_hage
10K Followers 3K Following co-founder/cto @PrimeIntellect | open superintelligence infra, longevity, techno-optimism
Yuan He @lawhy_X
261 Followers 204 Following Applied Scientist @Amazon Rufus working on agent post-training
Alex Fabbri @alexfabbri4
649 Followers 420 Following Research @meta superintelligence labs; @scale_AI @SFResearch; PhD @Yale; BA @Columbia; Opinions are my own.
Manasi Sharma @ ICLR ... @ManasiSharma_
540 Followers 336 Following research engineer @scale_AI, working on reasoning for frontier models, agents, rl | prev @stanford, @StanfordAILab, @mitll, @Columbia
Yoram Bachrach @yorambac
4K Followers 7K Following Research Scientist at Meta (prev Google DeepMind and Microsoft Research). Working on LLM Agents and Multi-Agent Systems.
Ofir Press @OfirPress
18K Followers 9K Following I push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.
Noah Jacobson @noahajake
40 Followers 329 Following Gemini Agents @ Google. Formerly at ScaleAI, Amazon, Stanford.
Chenchen Ye @chenchenye_ccye
830 Followers 913 Following CS PhD student @UCLA, Intern @scale_AI | Prev Intern @MSFTResearch | Prev Undergrad @NUSingapore | Generative Models
Leo Liu @ZEYULIU10
1K Followers 2K Following RS @SFResearch Prev @UTAustin @uwcse, @metaai, @allen_ai, @USC_ISI
Kanishka Misra 🌊 @kanishkamisra
2K Followers 683 Following Asst. Prof of Ling, and Harrington Fellow at @UTAustin. language, concepts, and generalization. also on the site where the sky is blue. Aspiring wugologist
Sanxing Chen @sanxing_chen
497 Followers 518 Following phd-ing @duke_nlp. previously @googledeepmind @msftresearch @uva_ilp. agentic exploration & rag
Fei Wang @fwang_nlp
2K Followers 2K Following Research Scientist @Google. PhD @USC. LLM post-training.







































