Rupesh Srivastava @rupspace

Fully open LLM frontiers @MBZUAI IFM Silicon Valley. Previously (co)developed Highway Networks, Upside-Down RL, Bayesian Flow Networks, EvoTorch. rupeshks.cc Santa Cruz, CA Joined September 2014

Tweets

1K
Followers

3K
Following

769
Likes

2K

Richard Zhuang @RichardZ412

a week ago

How can we train small agentic models that are highly capable of terminal use and coding? Announcing OpenThoughts-Agent + OpenThinkerAgent-32B, the strongest Qwen-3 based open-data agentic model: 44.8% avg across 7 agentic benchmarks! (1/n)

24 88 416 804K 333

View Details

Rupesh Srivastava @rupspace

a week ago

@j_foerst Congrats! This is gonna be awesome.

0 0 0 264 0

View Details

Rupesh Srivastava @rupspace

2 weeks ago

@stochasticchasm Hope, huh? Must be nice 🙂

0 0 1 219 0

View Details

Institute of Foundation Models @IFM_MBZUAI

4 weeks ago

1/3 Most language models generate text the way a typewriter works. They go left to right, one token at a time. Diffusion language models generate entire sequences by simultaneously refining noise into meaning.

2 4 12 2K 3

View Details

Rupesh Srivastava @rupspace

4 weeks ago

Love it when Jürgen puts things in perspective! 🙂

Jürgen Schmidhuber @SchmidhuberAI

4 weeks ago

Tera IPOs coming! $1T sounds like a lot. But $1T is just a 7-m-wide gold cube, thanks to massive inflation since 1971 when $ and gold decoupled. A little house full of gold. To put things in perspective: the 2017 neutron star merger GW170817 produced several earth masses of gold.

27 32 471 102K 105

0 0 4 182 0

View Details

Mingkai Deng @mdeng34

a month ago

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

5 46 278 62K 271

View Details

Rupesh Srivastava @rupspace

2 months ago

@tw_killian @BYU @BYUCS

0 0 1 23 0

View Details

Rupesh Srivastava @rupspace

2 months ago

@agarwl_ @BlackHC That idea came from von Malsburg. Hinton and Plaut even cited him in the paper for this, but his influence is sadly forgotten.

0 0 2 34 0

View Details

Jeff Clune @jeffclune

2 months ago

Thrilled to share that we founded Recursive to create AI that safely conducts experiments on how to improve itself in an open-ended process of endless, automated scientific discovery. As I wrote in my 2019 AI-generating algorithms paper, this will likely be the fastest path to superintelligence. Our work since has shown the power of this approach. Excited to scale up and improve upon ideas like the Darwin Gödel Machine, HyperAgents, ADAS, OMNI, ALMA, The AI Scientist, PromptBreeder, Rainbow Teaming, Automated Capability Discovery, and other work on open-ended and AI-generating algorithms. We’ve assembled a dream team of researchers and significant resources to pursue this vision. My amazing co-founders are pictured here, and we have an all-star team of founding members (we’re over 25 and growing). Please join us if you are interested! Follow our progress @Recursive_SI

50 42 619 118K 168

View Details

Rupesh Srivastava @rupspace

2 months ago

Did he just ... wow @fredagainagain1 thank you so much! youtube.com/watch?v=GiXKuk…

0 0 1 181 0

View Details

Rupesh Srivastava @rupspace

2 months ago

Yes!

Susan Zhang @suchenzang

2 months ago

@charuman wasn't meant as sarcasm it's always nice to see a lab so confident/secure in their capabilities that they can openly publish all their struggles

1 0 45 3K 3

0 0 2 282 0

View Details

Loren Lugosch @lorenlugosch

2 months ago

In this paper, we ask: 𝘏𝘰𝘸 𝘤𝘢𝘯 𝘸𝘦 𝘤𝘭𝘶𝘮𝘴𝘪𝘭𝘺 𝘳𝘦𝘧𝘰𝘳𝘮𝘶𝘭𝘢𝘵𝘦 𝘵𝘩𝘦 𝘤𝘢𝘱𝘢𝘣𝘪𝘭𝘪𝘵𝘺 𝘸𝘦 𝘪𝘮𝘱𝘭𝘦𝘮𝘦𝘯𝘵𝘦𝘥 𝘪𝘯 𝘵𝘩𝘦 𝘧𝘰𝘳𝘮 𝘰𝘧 𝘢 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯?

0 1 14 2K 4

View Details

Rupesh Srivastava @rupspace

3 months ago

@finbarrtimbers I think this is likely a difference of scale mainly. If there's enough filtered data to train on, then use that. If there's limited data, train on all.

0 0 1 226 0

View Details

Shibo Hao @Ber18791531

3 months ago

🍫 CocoaBench v1.0 is out! CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search. Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs. Some takeaways: • Even the best agent system reaches only 45.1% on CocoaBench v1.0. • Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering. • Stronger agents tend to push more of the work into code. • Open source models still lag behind leading frontier models on these general tasks. 👇More on the website and in the paper #AI #Agents #LLM #Benchmark #CocoaBench

Shibo Hao @Ber18791531

7 months ago

🍫 CocoaBench is calling for contributions from the community! Join us and help shape how next-generation agents are evaluated and built🚀✨ #LLM #AI #Agent #CocoaBench More details in the threads 👇

3 14 32 15K 12

2 34 79 12K 18

View Details

Institute of Foundation Models @IFM_MBZUAI

3 months ago

A visually convincing rollout is not the same thing as a useful world model. WR-Arena is built to test the harder question: can a model simulate futures well enough to support action, planning, and reasoning? That’s the shift from simple next-state prediction to realistic world simulation grounded in real-world utility. Paper + code are live. t.co/waRc0MJmwP t.co/ZzN76nOwoI #AI #WorldModels #Benchmarking #EmbodiedIntelligence #PhysicalAI #MachineLearning

0 10 46 5K 41

View Details

Rupesh Srivastava @rupspace

3 months ago

@Grad62304977 @kalomaze All networks are mixtures of experts, just gated at unit level :) arxiv.org/abs/1410.1165

0 0 2 73 0

View Details

Alex Shaw @alexgshaw

3 months ago

The Harbor registry is getting an upgrade. Now, anyone can publish to the registry to make their dataset available to every Harbor user:

4 5 38 5K 4

View Details

Institute of Foundation Models @IFM_MBZUAI

3 months ago

Back in beautiful New Haven this weekend for YHack. We’ll be there with K2 Think V2, a fully open-source reasoning system. Hackers! Dig into how it works: huggingface.co/LLM360/K2-Thin…

0 3 7 612 0

View Details

Lucas Beyer (bl16) @giffmana

4 months ago

Yes and no. Very often it turns out that what you think solves the problem is not what actually solves it, and this you only find out by not moving on, but making sure you have experiments that back up the *exact* statement you make removing all reasonable confounders. And that, you get from one of: - public review - extremely strict colleagues - insane self discipline