Jeremy Cohen @deepcohen
Research fellow at Flatiron Institute, working on understanding optimization in deep learning. Previously: PhD in machine learning at Carnegie Mellon. jmcohen.github.io New York, NY Joined September 2011-
Tweets1K
-
Followers6K
-
Following998
-
Likes1K
@valeriechen_ @khodakmoments Congrats!!
@jonasgeiping I can't tell if this is a play on Houdini or Carlini
@Shiwei_Liu66 Would be interested in hearing others’ wild speculation
@Shiwei_Liu66 Update: not the secret sauce (definitely known to the other labs)
@deepcohen Thanks for your kind thoughts and support, Jeremy! I do believe this direction could potentially open up a new level of scaling toward deeper models with stronger reasoning capabilities. To be clear, I was not aware of the Depth-MP work when developing this paper (shamed); the
Did Anthropic get more gains out of model scaling than other labs thought was possible? It reminds me of an interesting recent paper, which showed that deep layers in open LLMs are not doing much, and that this can be fixed by scaling the LayerNorm output. arxiv.org/abs/2502.05795
@mrtnm If someone forced me to bullshit an explanation with a gun to my head, I would say that implicit curvature regularization promotes "feature learning" (whatever that is), and that in the fine-tuning setting, you don't want feature learning.
@mrtnm Sorry, I missed this. Actually, even when you fine-tune to the same SFT loss, lower LR's are better for fine-tuning: arxiv.org/abs/2604.13627. This indicates that it's not about distance traveled, but rather about implicit bias.
The recent Microsoft AI report noted that too much learning rate decay during pretraining hurts post-RL performance. This is actually just the latest of several papers this year pointing out that small learning rates can be harmful in LLM pretraining. (Thread)
@DimitrisPapail they say that you only need adam for the last layer and the layernorm parameters
@SunnySanyal9 The first paper I read with a sqrt(L) or sqrt(l) scaling was this one: arxiv.org/abs/2010.12859. But the paper I tweeted is proposing something different from the others.
@brunorganised @DeyNolan It's kind of similar, but these small things can make huge differences, no? x.com/deepcohen/stat…
@xidulu I mean, it sounds similar, but there could be a big difference. Also, this one is layer index rather than depth.
@xidulu I mean, it sounds similar, but there could be a big difference. Also, this one is layer index rather than depth.
Oh also, in the *multi-epoch* LLM setting, there is evidence that larger LR's yield better population pretraining loss, exactly mirroring what was known in 'classical' image settings (arxiv.org/abs/2306.08590). (This experiment is with batch size, but LR should be the same)
Nevertheless, hyperparameters matter, and I'm glad we're starting to see good science about LR schedules for LLMs. PS: as noted by Catalan-Tatjer et al, weight-averaging recovers many benefits of LR decay but without increasing sharpness. More should consider weight-averaging!
Jim Fan @DrJimFan
470K Followers 3K Following NVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
Gautam Kamath @thegautamkamath
65K Followers 620 Following Assistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Joining @NYU_Courant Fall 2026. Co-EiC @TmlrOrg. I lead @TheSalonML.
typedfemale @typedfemale
45K Followers 556 Following a really exciting new account "advanced pytorch user" - @cHHillee alt: @typedalt
Dan Roy @roydanroy
66K Followers 2K Following @Google DeepMind. On leave, Canada CIFAR AI Chair and Former Research Director, @VectorInst. Professor, @UofT (Statistics/CS). Views are my own.
Sebastien Bubeck @SebastienBubeck
78K Followers 1K Following I work on AI at OpenAI. Former VP AI and Distinguished Scientist at Microsoft.
Tom Goldstein @tomgoldsteincs
28K Followers 2K Following Professor at UMD. AI security & privacy, algorithmic bias, foundations of ML. Follow me for commentary on state-of-the-art AI.
Kyunghyun Cho @kchonyc
86K Followers 2K Following a mediocre combination of a mediocre scientist and a mediocre advisor at @nyuniversity (@CILVRatNYU)
rohan anil @_arohan_
43K Followers 2K Following member of technical staff & co-founder of @coreautoai - and continuing to aspire to understand deep learning.
Eric Jang @ericjang11
135K Followers 4K Following
Jonathan Frankle @jefrankle
23K Followers 809 Following Chief AI Scientist @databricks via MosaicML. e/brick
Jason Lee @jasondeanlee
35K Followers 5K Following Associate Professor CS/stats UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on LLMs and deep learning. PhD at Stanford.
David Pfau @pfau
35K Followers 2K Following Knowledge manifests itself in radiant dreams that shimmer like the wild sun Views are my own https://t.co/xqtVHHVI17 on 🦋
Dimitris Papailiopoul... @DimitrisPapail
28K Followers 1K Following Researcher @MSFTResearch, AI Frontiers | Prof @UWMadison (on leave) | babas of Inez Lily.
Ananya Kumar @ananyaku
9K Followers 582 Following Research lead at Meta TBD Labs. Previously research lead and core contributor to o1, o3, gpt5, at OpenAI. PhD at Stanford with Percy Liang and Tengyu Ma
Zachary Lipton @zacharylipton
68K Followers 2K Following Professor: CMU/@acmi_lab, Cofounder: @AbridgeHQ, Creator: @d2l_ai & https://t.co/QQt98VNLUp, Relapsing 🎷
Prof. Anima Anandkuma... @AnimaAnandkumar
40K Followers 2K Following AI+Science, Bren Professor @caltech, Time100, Fmr Sr Director of #AI research @nvidia Fmr Principal Scientist @awscloud
Sam Power @sp_monte_carlo
21K Followers 7K Following Lecturer in Maths & Stats at Bristol. Interested in probabilistic + numerical computation, statistical modelling + inference. @OnlineMCSeminar. (he / him)
Jeff Dean @JeffDean
446K Followers 6K Following Chief Scientist, Google DeepMind & Google Research. Gemini Lead. Opinions stated here are my own, not those of Google. TensorFlow, MapReduce, Bigtable, ...
FutPro @FutPro0101
1 Followers 35 Following
Rahul @wte09fl
7 Followers 747 Following
Rahul Chinthala @TheChinthala
382 Followers 4K Following Building Autonomous Agents at AWS | Human Co-Founder at @tryOllieLabs | Philosophy | Startups
Rumman ali 🇩🇪 @rummanali2000
33 Followers 1K Following ML researcher@ Cispa || Informatik @ Saarland
Eman @Eman__Hussein
65 Followers 890 Following
Mohammad Moshtaghifar @moshtaghifar
250 Followers 347 Following MSc CS@UBC, Prev CE@Sharif Interested in Optimization and Machine Learning Theory
Gray Henderson @HendersonG56631
460 Followers 7K Following
ARIA @rdmsnpr
90 Followers 3K Following
申延理 @shenyanli63
82 Followers 5K Following
PoppaMac @jirongo_mac
105 Followers 2K Following
Philipp Nazari @philna00
109 Followers 267 Following PhD Candidate at Max Planck ETH Center for Learning Systems
_ @_prmd_
146 Followers 5K Following Product Developer @ SAP, Product Security, AppSec, Data Privacy, GDPR. Tweets are personal. RTs aren't endorsements.
Aether - ?/acc @aethergradient
83 Followers 1K Following
Habtamu Asefa @habt_asefa
12 Followers 696 Following Building TTS and ASR models for low resource languages. First-principles ML/DL engineer exploring native audio, and multimodal models.
autodidac @autodidaclzfm
0 Followers 5K Following
F00M @0xVonNeumann
197 Followers 3K Following Just doin' my best to get to see the singularity or WW3, whichever comes 1st.
Harshitha M @harshithamanju3
104 Followers 2K Following Multimodal LLMs & UAV/Edge Intelligence | PhD Researcher at UT Dallas | HPC-AI on GPUs & Supercomputing | Passion for AI | ex-BMW | ex-Deloitte
kshitij @The_Real_Baka_
7 Followers 777 Following
ApE Mom @IamApEMom
0 Followers 214 Following Following AI breakthroughs | Sharing news and thoughts | ML/LLM enthusiast | open for collaborations.
Zephyr @Zephyr495843106
1 Followers 13 Following
Ishaan Verma @IshaanV78119888
1 Followers 130 Following
VegetaAvatar @VeGeTaX29
20 Followers 8K Following
The Data Therapist @yuvalmarton
1K Followers 2K Following Computational Linguist, NLP/NLU/AI Research Scientist, Affil. Assistant Professor, tech mentor, corporate emp. Political in sep acnt. Not my employer’s opinions
vvomen @vvomen181732
28 Followers 2K Following
spacy @dosco
5K Followers 2K Following LLM research, systems and compilers | ax + dspy in TS | agent engineering. https://t.co/sfb9LG5uSU https://t.co/CM4AQP5n1z https://t.co/ZdGcEh57Wh
Dhruv Rawat @imdhruvrawat
15 Followers 1K Following {ml, distributed, storage} systems, incoming mscse @UMichCSE, past: @yugabyte, cs @bitspilaniindia
Junaid Akhtar | AI Fu... @Junaid_Ramey
27 Followers 520 Following 🚀 Building AI-powered SaaS products in public AI • Automation • Full-Stack Development ⚡ Turning ideas into scalable software 📈 Sharing code, lessons & Ideas
Anwesha @anwesha_ac
302 Followers 4K Following research (diffusion&world models) | investor | genz✌️| loves AI art | diplomatic opinions
shifu xiong @xiongshifu1234
3 Followers 358 Following
Alessandro Breccia @alebreccia99
4 Followers 39 Following
John D. Pope 🦒 @johndpope
2K Followers 8K Following Hi-Yo, Silver! - prepare for financial reset. 🧙♂️🔮 🍿 I built a chrome extension for grok imagine -
アメド オシナ�... @HakeemDemi
1K Followers 4K Following Aspiring Data professional (Data Scientist/ML Engineer.)|| Learn to do a lot with very little, learn to make the complex simple. || Profane & unredeemed. || 27+
Miko @UltronChina
0 Followers 584 Following
josh @josh8118h
25 Followers 2K Following
Jim Fan @DrJimFan
470K Followers 3K Following NVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
Gautam Kamath @thegautamkamath
65K Followers 620 Following Assistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Joining @NYU_Courant Fall 2026. Co-EiC @TmlrOrg. I lead @TheSalonML.
Gabriel Peyré @gabrielpeyre
101K Followers 446 Following @CNRS researcher at @ENS_ULM. One tweet a day on computational mathematics.
typedfemale @typedfemale
45K Followers 556 Following a really exciting new account "advanced pytorch user" - @cHHillee alt: @typedalt
Dan Roy @roydanroy
66K Followers 2K Following @Google DeepMind. On leave, Canada CIFAR AI Chair and Former Research Director, @VectorInst. Professor, @UofT (Statistics/CS). Views are my own.
Sebastien Bubeck @SebastienBubeck
78K Followers 1K Following I work on AI at OpenAI. Former VP AI and Distinguished Scientist at Microsoft.
Behnam Neyshabur @bneyshabur
42K Followers 1K Following Co-Founder & CEO @mirendilAI 💼 Past: co-led Discovery team @AnthropicAI & Blueshift team @GoogleDeepMind 🎒Traveling & Backpacking
Percy Liang @percyliang
108K Followers 425 Following professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of https://t.co/7R5THVogW2, co-founder of @simile_ai, pianist
Tom Goldstein @tomgoldsteincs
28K Followers 2K Following Professor at UMD. AI security & privacy, algorithmic bias, foundations of ML. Follow me for commentary on state-of-the-art AI.
Kyunghyun Cho @kchonyc
86K Followers 2K Following a mediocre combination of a mediocre scientist and a mediocre advisor at @nyuniversity (@CILVRatNYU)
Lucas Beyer (bl16) @giffmana
141K Followers 610 Following Researcher (now: Meta. ex: OpenAI, DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian. Anon feedback: https://t.co/xe2XUqkKit ✗DMs → email
rohan anil @_arohan_
43K Followers 2K Following member of technical staff & co-founder of @coreautoai - and continuing to aspire to understand deep learning.
Eric Jang @ericjang11
135K Followers 4K Following
Soumith Chintala @soumithchintala
309K Followers 1K Following Building new things @thinkymachines. Also dabble in robotics at NYU. Cofounded @PyTorch. AI is delicious when it is accessible and open-source.
(((ل()(ل() 'yoav)))... @yoavgo
83K Followers 2K Following
Jonathan Frankle @jefrankle
23K Followers 809 Following Chief AI Scientist @databricks via MosaicML. e/brick
Jason Lee @jasondeanlee
35K Followers 5K Following Associate Professor CS/stats UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on LLMs and deep learning. PhD at Stanford.
Boaz Barak @boazbaraktcs
33K Followers 818 Following Computer Scientist. See also https://t.co/EXWR5k634w . @harvard @openai opinions my own.
David Pfau @pfau
35K Followers 2K Following Knowledge manifests itself in radiant dreams that shimmer like the wild sun Views are my own https://t.co/xqtVHHVI17 on 🦋
Shashwat Goel @ShashwatGoel7
4K Followers 2K Following Training AI for Decision Making Past work: https://t.co/Slt56DRftV, Training AI Co-scientists, ΔBelief-RL, Measuring Long Horizon Execution
Albert Gu @_albertgu
21K Followers 77 Following assistant prof @mldcmu. chief scientist @cartesia_ai. leading the ssm revolution.
Shiwei Liu @Shiwei_Liu66
2K Followers 593 Following Hi, I am a PI at ELLIS Institute Tübingen and MPI-IS. Was RS NIF @UniofOxford, JRF @SomervilleOx, postdoc @UTAustin, and PhD @Data_AI_TUe.
Xidulu @xidulu
639 Followers 709 Following Xi Wang, Full-stack Bayesian, ECNU, UMass CICS, JHU CS, Fan of U-Shape. Previously MSR Cambridge, Netflix Research
Sharan Vaswani @sharan_vaswani
414 Followers 329 Following Assistant Professor @SFU_CompSci Reinforcement Learning, Optimization Previously @AmiiThinks, @Mila_Quebec, @UBC, @bitspilaniindia
Tyler Farghly @tylerfarghly
573 Followers 493 Following Postdoc @Sierra_ML_Lab (@inria_paris / @ENS_ULM) stochastics, diffusion models, mcmc, optimisation musician prev @oxcsml @UniofOxford @antikythera_xyz
Dwarkesh Patel @dwarkesh_sp
239K Followers 1K Following Host of @dwarkeshpodcast https://t.co/3SXlu7fy6N https://t.co/4DPAxODFYi https://t.co/hQfIWdM1Un
Zhiqi Bu✈️ICML 20... @woodyx218
8 Followers 32 Following Research scientist on optimization and scaling (not optimizers) @Meta SuperIntelligence Labs @FAIR; ex @Amazon AGI, AWS
Matthieu wyart @MatthieuWyart
964 Followers 109 Following Theoretical physics × machine learning Scaling laws | statistical mechanics | learning theory Prof @ Johns Hopkins & EPFL Trying to understand why LLMs work
Nicolas Loizou @NicLoizou
1K Followers 318 Following Assistant Professor @JohnsHopkins. Optimization and Machine Learning.
Ishaan Watts @IshaanWatts18
406 Followers 460 Following Foundational Models @CarnegieMellon | Prev - @GoogleDeepMind @MSFTResearch @iitdelhi
Abdulkadir Canatar @canatar_a
357 Followers 1K Following Research Fellow @FlatironCCN. Theoretical Neuroscience, Machine Learning, Physics. Prev: @Harvard and @sabanciu
Jamie Simon @learning_mech
1K Followers 74 Following doing fundamental science of deep learning | PhD from Berkeley | can catch a whole egg in my mouth
Xavier Gonzalez @xavierjgonzalez
785 Followers 750 Following AI Researcher at Unconventional AI (https://t.co/8gd5jLMc6B). Parallelizing "inherently sequential" processes like nonlinear RNNs and MCMC.
Jiaxin Wen @jiaxinwen22
6K Followers 198 Following research @berkeley_ai @anthropicai. prev @tsinghua_univ.
Jonathon Byrd @jbyrdflying
10 Followers 388 Following
Elisabetta Cornacchia @Elisabetta68885
107 Followers 141 Following Assistant Professor at Bocconi University
Hao-Jun Michael Shi @hjmshi
147 Followers 202 Following Research Scientist, Meta Superintelligence Labs @AIatMeta | Previous: Ph.D. @NU_IEMS, B.S. @uclamath | Numerical Optimization, Deep Learning
Will Townes @will_townes
2K Followers 2K Following Asst Prof @CMU_StatDS interested in infectious disease (@CMUDelphi), genomics, time series, and discrete stable distributions.
Jerry Tworek @MillionInt
38K Followers 1K Following CEO and co-founder of Core Automation former VP of RL @ OpenAI : reasoning models, o3, o1, GPT4, ChatGPT, Codex, RL for robots cautious AI optimist
Priya Kasimbeg @KasimbegPriya
47 Followers 64 Following
Bruno Mlodozeniec @brunorganised
627 Followers 592 Following Working on understanding scaling and NN training | @Cambridge_Uni | https://t.co/nBaf31ZSeu
Chulhee Yun @chulhee_yun
2 Followers 15 Following
Jacopo Teneggi @JacopoTeneggi
141 Followers 337 Following Computer Science PhD student @JohnsHopkins Research Scientist Intern @PolymathicAI
Shahriar Noroozizadeh @ShNoroozi
86 Followers 124 Following ML PhD Candidate @CarnegieMellon, AI Research Intern at @MSFTResearch, (prev @GoogleResearch intern)
Yijun Dong @YijunDong1
121 Followers 261 Following Postdoc at NYU Courant. PhD from UT Austin. Interested in randomized numerical linear algebra and machine learning theory.
Kangwook Lee @Kangwook_Lee
7K Followers 1K Following CAIO @KRAFTON_AI / CTO @LudoRobotics (Prev) Associate Professor @UWMadisonECE, PhD @Berkeley_EECS
Mary Letey @maryiletey
277 Followers 713 Following PhD student at Harvard @hseas. Machine learning theory, cosmology.
Andrew Carr 🤸 @andrew_n_carr
26K Followers 5K Following co-founder leading science @getcartwheel co-founder advisor @arcade_ai Past: Codex @OpenAI, Brain @GoogleAI, world ranked Tetris player
Tim G. J. Rudner @timrudner
3K Followers 948 Following Assistant Professor, @UofT Statistics & CS CIFAR AI Chair, @VectorInst Machine Learning, AI Safety, AI Governance Prev: Rhodes Scholar, @UniofOxford, @Yale
Kevin Frans @kvfrans
4K Followers 518 Following phd @berkeley_ai prev mit, reflection, openai read my thoughts: https://t.co/7CZsOTrKRA
Sebastian Bordt @sbordt
310 Followers 632 Following Language models and interpretable machine learning. Postdoc @ Uni Tübingen.
Michael Choi @michaelchchoi
3K Followers 3K Following Assistant Professor @NUSingapore. Applied probabilist. Probability, MCMC, statistical physics, optimization, information theory, TCS. Opinions my own.
Georges Harik @gharik
8K Followers 4K Following humans& co-founder, 7th employee google, co-created adwords online, co-created adsense targeting, worked on ai, gmail, calendar, bought android.
Lindia Tjuatja ✈️... @lltjuatja
2K Followers 684 Following a natural language processor and “sensible linguist”. Final-year PhD-ing @LTIatCMU, frequent visitor @NYUDataScience, incoming faculty @UT_linguistics (F27).
William Merrill @lambdaviking
5K Followers 691 Following incoming Prof @TTIC_Connect theory and pretraining at @allen_ai Will irl, TC0 enthusiast
Michael Crawshaw @CrichaelMawshaw
100 Followers 33 Following Computer science PhD student at George Mason University. Machine learning, optimization. https://t.co/1m3nTS97gb
Seunghyun Seo @SeunghyunSEO7
3K Followers 949 Following deep learning enjoyer. from speech to llm @ naver, now exploring image space @midjourney
Arseniy Andreyev @arseniqum
12 Followers 144 Following

























