AlphaGo beat Lee Sedol 4â1.
But consider the training data:
Lee Sedol: perhaps 10,000â30,000 games over a lifetime.
AlphaGo: millions of self-play games.
If intelligence is measured by performance alone, AlphaGo wins.
If intelligence is measured by performance per unit experienc
4/4
What I found most interesting was how shifted window attention lets information gradually flow across the entire image.
The model achieved better medical image segmentation performance than several CNN and hybrid Transformer-CNN approaches on CT datasets.
1/4
Today I read the paper âSwin-Unet: Unet-like Pure Transformer for Medical Image Segmentationâ.
The paper tries to replace traditional CNN-based U-Net architectures with a fully Transformer-based segmentation model for medical imaging.
4/4
Another interesting systems idea in the paper was âContinuous Depth-wise Batchingâ.
Since the same layers are reused recursively, they combine this with early exiting and theoretically achieve up to 2â3Ă inference throughput gains.
Authors:
@raymin0223@adamjfisch@harhrayr
3/4
The results were surprisingly strong.
Recursive Gemma 1B outperformed vanilla TinyLlama 1.1B and Pythia 1B models trained with similar parameter budgets.
As LoRA rank increased, the relaxed recursive models nearly recovered full-size transformer performance.
1/4
Today I read the paper âRelaxed Recursive Transformersâ from ICLR 2025.
Core idea:
Instead of using N unique transformer layers, reuse K layers multiple times recursively to reduce parameter count and inference cost.
Paper:
openreview.net/forum?id=WwpYSâŚ
- "On that day, I want you and I to have that same conversation again. I will tell you exactly about today's conversation, about how your policy literally caused the United States to concede the second largest market in the world for no good reason at all."
Few of the most badass quotes from Jensen in his latest appearance on the Dwarkesh
- "The input is electrons, the output is tokens. In the middle is Nvidia."
- "The beautiful thing is you're talking to the expert."
- "Is Blackwell 50 times more advanced lithography than Hopper? Not even close. Blackwell is 50 times Hopper."
- "Comparing AI to anything that you just mentioned is lunacy."
- "When Nvidia first started, there were 60 3D graphics companies. We are the only one that survived."
- "I am the evidence."
- "You're not talking to somebody who woke up a loser"
Can LLM be integrated with a traditional rl model suppose LLM able to understand/build the embedding of a chess board generated by RL. This will give LLM world to work with đ¤
Does it matter if the things we use are handicrafts or mass-produced? If not, then it also won't matter if the movie we are seeing is AI-generated or not."
34 Followers 81 FollowingJust a nerd coder who codes his life. IITian. Playing with APIs pushing in production.
Do CP @codeforces and practice dsa @leetcode
8K Followers 633 FollowingStaff Research Scientist @GoogleDeepMind on the Gemini team. Vision Lead on Gemma 3 & Gemma 4. PhD at NYU with @ylecun Masters at UMass Amherst
6K Followers 6K FollowingCurrently RS @aiatmeta | LLMs/SLMs Post Training | Data, Evals, Rewards and Agentic System Orchestration
https://t.co/Nv4aMHzhTC
228K Followers 7K FollowingOG GenAI Skeptic; spoke at US Senate. Warned about hallucinations in 2001. Advocating world models & neurosymbolic AI ever since. Author, Marcus on AI & 6 books
6K Followers 712 FollowingBring GenAI and Knowledge Graph to enterprise systems. | Director of ML @Adobe Experience Platform | Previously @Apple @IBMResearch. Tweets are all mine.
2K Followers 327 FollowingResearch scientist @GoogleDeepMind. Robotics & Physical AGI. Ex-Apple. Ph.D. @CarnegieMellon, M.S. @Stanford, B.E. @NTUsg. All opinions are my own.
91K Followers 921 FollowingOpen model research @ something new.
Prev. co-led Olmo at Ai2.
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book,
đď¸đââď¸
28K Followers 434 FollowingBuilding the podcast & ergonomic keyboard I wish existed ⢠ex-software engineer @instagram, @meta ⢠See what I'm building here â