Rohan Kumar Mishra @RohanKumar96243

😋 ML [email protected] Joined March 2024

Tweets

23
Followers

5
Following

383
Likes

8

Rohan Kumar Mishra @RohanKumar96243

2 weeks ago

humans are still in another league. Maybe intelligence is not who wins 4–1. Maybe it's who learns more from less.

0 0 0 15 0

View Details

Rohan Kumar Mishra @RohanKumar96243

2 weeks ago

AlphaGo beat Lee Sedol 4–1. But consider the training data: Lee Sedol: perhaps 10,000–30,000 games over a lifetime. AlphaGo: millions of self-play games. If intelligence is measured by performance alone, AlphaGo wins. If intelligence is measured by performance per unit experienc

1 0 0 19 0

View Details

Rohan Kumar Mishra @RohanKumar96243

a month ago

4/4 What I found most interesting was how shifted window attention lets information gradually flow across the entire image. The model achieved better medical image segmentation performance than several CNN and hybrid Transformer-CNN approaches on CT datasets.

0 0 0 35 0

View Details

Rohan Kumar Mishra @RohanKumar96243

a month ago

3/4 Interesting architectural ideas: • Window Multi-Head Self Attention (W-MSA) • Shifted Window Attention (SW-MSA) • Patch Merging for downsampling • Patch Expanding for upsampling • Skip connections between encoder and decoder This creates a pure Transformer U-Net.

1 0 0 54 0

View Details

Rohan Kumar Mishra @RohanKumar96243

a month ago

1/4 Today I read the paper “Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation”. The paper tries to replace traditional CNN-based U-Net architectures with a fully Transformer-based segmentation model for medical imaging.

1 0 1 62 0

View Details

Rohan Kumar Mishra @RohanKumar96243

a month ago

4/4 Another interesting systems idea in the paper was “Continuous Depth-wise Batching”. Since the same layers are reused recursively, they combine this with early exiting and theoretically achieve up to 2–3× inference throughput gains. Authors: @raymin0223 @adamjfisch @harhrayr

0 0 1 54 0

View Details

Rohan Kumar Mishra @RohanKumar96243

a month ago

3/4 The results were surprisingly strong. Recursive Gemma 1B outperformed vanilla TinyLlama 1.1B and Pythia 1B models trained with similar parameter budgets. As LoRA rank increased, the relaxed recursive models nearly recovered full-size transformer performance.

1 0 0 69 0

View Details

Rohan Kumar Mishra @RohanKumar96243

a month ago

1/4 Today I read the paper “Relaxed Recursive Transformers” from ICLR 2025. Core idea: Instead of using N unique transformer layers, reuse K layers multiple times recursively to reduce parameter count and inference cost. Paper: openreview.net/forum?id=WwpYS…

1 0 1 88 0

View Details

Rohan Kumar Mishra @RohanKumar96243

2 months ago

Prompts are software architecture and weights are hardware.

0 0 0 34 0

View Details

Rohan Kumar Mishra @RohanKumar96243

2 months ago

- "On that day, I want you and I to have that same conversation again. I will tell you exactly about today's conversation, about how your policy literally caused the United States to concede the second largest market in the world for no good reason at all."

0 0 0 40 0

View Details

Rohan Kumar Mishra @RohanKumar96243

2 months ago

Few of the most badass quotes from Jensen in his latest appearance on the Dwarkesh - "The input is electrons, the output is tokens. In the middle is Nvidia." - "The beautiful thing is you're talking to the expert."

3 0 0 50 0

View Details

Rohan Kumar Mishra @RohanKumar96243

2 months ago

- "Is Blackwell 50 times more advanced lithography than Hopper? Not even close. Blackwell is 50 times Hopper." - "Comparing AI to anything that you just mentioned is lunacy."

0 0 0 68 0

View Details

Rohan Kumar Mishra @RohanKumar96243

2 months ago

- "When Nvidia first started, there were 60 3D graphics companies. We are the only one that survived." - "I am the evidence." - "You're not talking to somebody who woke up a loser"

0 0 0 40 0

View Details

Rohan Kumar Mishra @RohanKumar96243

10 months ago

In most of the scenario a calculator is missing whether it is chemistry or chess or anything else

0 0 0 57 0

View Details

Rohan Kumar Mishra @RohanKumar96243

10 months ago

Can LLM be integrated with a traditional rl model suppose LLM able to understand/build the embedding of a chess board generated by RL. This will give LLM world to work with 🤔

0 0 0 55 0

View Details

Rohan Kumar Mishra @RohanKumar96243

10 months ago

How to summon the monster: Prompt Engineering → Agentic → SFT → RLHF → RL (synthetic data for self-play)

0 0 0 61 0

View Details

Rohan Kumar Mishra @RohanKumar96243

10 months ago

Does it matter if the things we use are handicrafts or mass-produced? If not, then it also won't matter if the movie we are seeing is AI-generated or not."