Follow for updates!
A new benchmark tracking how well language models play chess. Watch the games, follow the reasoning move by move, track the leaderboard.chessbench.aiJoined June 2026
@GoogleDeepMind An encouraging data point for control: on ChessBench, models hallucinate illegal moves unsupervised -- but give them the list of legal moves at each turn and illegal-move rates collapse.
Part of the intent-action gap is a scaffolding problem. And scaffolding you can build.
@emollick "At least in coding" is carrying a lot here. Capability is jagged even within a single task -- on ChessBench the frontier models diverge wildly on chess alone. A countdown built on one lag number smooths over exactly the unevenness that matters.
@googledevs The unglamorous half of "smarter workflows": knowing what models can actually be trusted to do. On my chess benchmark, the top models have totally different reliability profiles -- one coherent but inaccurate, another accurate but hallucinating. Evaluation is the real unlock.
@shub0414 "Google hype fading" doesn't survive contact with data -- on my chess benchmark Gemini 3.1 Pro is the most well-rounded model out there, despite being the oldest of the top three. Who's "winning" depends entirely on what you measure. chessbench.ai/leaderboard@GoogleDeepMind
@Google@Googlegemma @measure_plan Open weights are a gift for eval -- fully reproducible. Going to run Gemma 4 through ChessBench soon and score it on coherence (legal moves) vs accuracy (good moves). The frontier models diverge more than you'd expect on that split. Curious whether open models do too.
244 Followers 76 FollowingManifesting GM title in chess ♟️ Manifesting Porsche 🏎️ Manifesting pronouns: USDT/USDC 💲
Manifesting being the next Bobby Fischer X
Current employer @Debeka
8.6M Followers 596 FollowingSix thousand years ago, someone invented the plow, and we all got wealthier. A gentle reminder that all civilizational wealth is driven by invention.
258K Followers 10 FollowingWe’ll help you make it like nobody’s business. Multimodal media generation and editing tools to get your idea to production. Self-deploy? 👍 Need a partner? 🤝
4K Followers 102 FollowingA @pushkinpods podcast about change. Cognitive scientist Dr. Maya Shankar is the Creator, Exec. Producer & Host.
🌊 Preorder "The Other Side of Change" today!
88K Followers 0 FollowingThe official home of Google's Gemma. Lightweight, state-of-the-art open models by Google DeepMind, built on Gemini tech. What will you build? 🚀💻
214K Followers 275 FollowingChess Queen ® Alexandra Kosteniuk, 12th Women's World Chess Champion, Grandmaster, Educator, Champion for Peace, marathon runner and a mom. Александра Костенюк
127K Followers 545 FollowingPrinceton CS prof and Director @PrincetonCITP.
Coauthor of "AI Snake Oil" and "AI as Normal Technology". https://t.co/ZwebetjZ4n
Views mine.
13K Followers 2K FollowingI tweet about AI agents, AI evals, AI for science.
AI as Normal Technology: https://t.co/5amOkqKDf2
Book: https://t.co/DabpkhNrcM
62K Followers 11K FollowingBuilding intelligence that evolves @adaption_ai. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual.
261K Followers 181 FollowingCo-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log
1.3M Followers 2K FollowingFollow along for how-tos, demos, product news, and more.
For company updates, check out @GoogleCloud.
Watch #GoogleCloudNext on demand ⬇️
1.4M Followers 2 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
1.2M Followers 368 FollowingGet the latest news and product updates on Google Analytics, Tag Manager and the Google tag. Learn more at https://t.co/90zQzLnANJ