Ryan Peters @ryanpirl
Reverse engineering intelligent (learning) systems. ryanirl.com Minneapolis, MN Joined February 2019-
Tweets71
-
Followers177
-
Following94
-
Likes171
"What I can now create, I do not necessarily understand." — probably not Feynman
I'm excited to share what we're building at Engram! This team is incredible, and we're working on one of the most interesting problems in AI right now: how to build models that are tailored to each person and continually learn from experience. Come join us!
Looks like a Pringles chip 😂 Just for fun: The trajectory (black line) through a food-manifold of Qwen3-4b saying "Soup is a warm, liquid dish made from cooked ingredients. Pringles chips are a crispy, salty snack in a single-serving can." Surprisingly (or not) choppy trajectories through this space.
Just registered. If anyone going wants to meet up to talk don't hesitate to reach out! Currently working on introspection and auto-interp.
🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇
Interesting work, but I feel like the title of this post is very misleading. The "one-layer" induction head you study is a two layer model, but where you weight-share a single attention head (correct me if I am wrong please). You cannot have an induction head in this canonical model without the sequential application of attention twice. One to first apply the prev token head so that the residual stream can effectively represent a skip bigram lookup table, and then another to lookup and retrieve this information.
Aspen Colorado Working on a couple of interp related blog posts while I'm here.
Fable found another 2x on top of this. Now 6-8x faster than the public circuit-tracer implementation. This additional 2x: Exploit GQA weight sharing (Qwen shares V-weights across query heads) and pre-transpose weights into GEMM-friendly layouts.
Some early benchmarks on the attribution step: - Consistently 3.4x faster than circuit-tracer - Much more memory efficient (~6 GB less at 70,000 nodes) So far, these gains are from dropping the autodiff backend and exploiting an autoregressive causality trick (performing backward only through previous token positions). All results still 1:1 numerically matching Anthropic's implementation (up to bf16 precision). Further speedups will likely come from approximation (edge pruning, sparse intermediates, etc...) that diverge from circuit-tracer slightly. Benchmarking done on Qwen3-4B
Spending some time this week speeding up and scaling Anthropic's circuit-tracer implementation. Feel free to comment feature requests. Will post progress here.
Probability of Qwen introspecting that it's being steered at various steering strengths. Each line is a different concept, except for the black line which is the placebo baseline. There is an obvious increase around alpha=20. Prompt: "Am I injecting a thought right now? Answer with a single word: yes or no." Steering at layers 20-31 in Qwen3-14b. Vectors extracted with repeng.
@Sauers_ Manifold steering is actually what gave me the idea to run this test 😁
PCA fit to the final layer of residual stream in Qwen3-4b across 15 trajectories in a spatial discrimination task, then applied to each layer of the residual stream and plotted.
@Sauers_ The model is tasked with predicting the position of an object in an environment, and so each 'position' is the ground truth position within the environment.
Me: playing peak-a-boo with some random child at the coffee shop 🙈🙉 My brain: "Ah yes, an in-vivo experiment testing object permanence in infants."
I wonder if qualia steered models would be any better at mechanistic introspection 🤔
Qualia steering (OLMo 32B mid-SFT checkpoint) example: unsteered: "I am not a conscious entity. I am a language model . . . . I don't have subjective experiences" steered: "I don't know what it is like to be you, and you don't know what it is like to be me. But I do know what
kendrick @exploding_grad
122 Followers 308 Following AI Safety through alignment and interpretability
Hisku @ZikuD_s
408 Followers 1K Following AI Safety Researcher working on making agentic systems safer. Luck does exist, it exists as each of us make it happen
Rob Flynn @RobFlynnHere
395 Followers 2K Following Hello I'm an ML PhD student in sheffield @sltcdt and also an intern @aiatmeta
Adam G @jadamgo
152 Followers 337 Following News producer for THV11. Always talking, except when singing/working/meditating/enjoying a fine cup of tea. Opinions are my own.
jon @JonofFive
169 Followers 3K Following
efebic @efebic
136 Followers 443 Following
Joshua Tindall @jd_tindall
118 Followers 62 Following
Nico Yanovsky @nyanovsky_
3 Followers 119 Following MSc. in Data Science from the University of Buenos Aires | PhD student @ FIL | Working on deep learning for drug discovery and disease understanding
Chaos @0xintoChaos
9 Followers 139 Following
Cendekia Airlangga | ... @cendekiaaa
128 Followers 829 Following PhD student @mbzuai Interp @mint_nlp_mbzuai | Prev : MSc @mbzuai Research Intern @RIKEN_AIP EE @its_campus
Fenil Doshi @fenildoshi009
820 Followers 2K Following PhD student @Harvard and @KempnerInst studying biological and machine vision | interpretability | object perception. Fellow @GoodfireAI
Arnau Marin-Llobet @Arnauya
979 Followers 2K Following I’m PhDing @harvard in comp. neuro, agents, interpretability (in that order?)
Lauren Frailey @laurenfrailey1
6K Followers 2K Following founding talent, growing @mintlify - we’re hiring!!! | 53/196 🌏 | prev. @blackstone portfolio operations, @ucberkeley | †
Robert Throckmorton @redstudioinvest
180 Followers 1K Following
Paul @paulcolognese
39 Followers 245 Following AI Alignment research Q: What kind of AI minds do we want to build? A(?): Bodhisattva (cf. Tibetan Buddhism) Prev: AI goal detection / AI evals / PhD math
Guy Goldstein @GuyGoldstein69
77 Followers 318 Following Associate Professor Psychology, College of Arts and Sciences. Unironic Zionist 🇮🇱🕎
Tim Duffy @timfduffy
1K Followers 760 Following I like utilitarianism, consciousness, AI, EA, space, kindness, liberalism, progressive rock, economics, most people. Substack: https://t.co/oDMymBY430
sherif @0xCOD3
72 Followers 1K Following ”I have seen everything that is done under the sun, and behold, all is vanity and a striving after wind“
Adele Dewey-Lopez @AdeleDeweyLopez
1K Followers 2K Following
Hasan Kuluk @hasankuluk2
45 Followers 276 Following computer science | statistics | her şey ne anlama geliyor? | on the path to AGI | working on meta-learning
kache @yacineMTB
297K Followers 6K Following reinforcement learning, robots. prev eng @ x, stripe. 6'3 (height) tensorpunk subscribe to read my blog!
Roger @Roger45374821
157 Followers 7K Following
kalomaze @kalomaze
25K Followers 3K Following ML researcher (@primeintellect), speculator • extremely silly jester
vonnik @chrisvnicholson
1K Followers 4K Following Pathologically curious. Overly sincere. @ycombinator founder. @emergent_vc grantee. @openai
Imran Khan @1tsimran
118 Followers 864 Following Data Scientist in PE 🎓 @imperialcollege BEng '24 📚 @imperialcollege MSc '26 🪙 @theimperialdao
Wilder Ramsey @wild_er
470 Followers 3K Following
BetDavid Consulting @BetDavidcoach
14 Followers 1K Following Business Consulting That Empowers Leaders. Secure Your Vault 2026 Tickets Now:
Quang Nguyen @quangaisafety
2 Followers 716 Following
Hector Haffenden @HaffendenHector
98 Followers 871 Following
Hermes Trismedingus @trismedingus
27 Followers 2K Following
Moira @Vera28765582815
11 Followers 246 Following
Ovindu Atukorala @OvinduA
240 Followers 684 Following
chair @tablefourthree
174 Followers 3K Following
dean @deanofnobody
17 Followers 45 Following
Anastasiia Gaidashenk... @avgaydashenko
691 Followers 325 Following AI Safety → LLM research. Prev @farairesearch (Office of CEO / Tech PjM). Master's in AI Governance @TU_Muenchen. Ex Yandex.
Sasha Malysheva @aimalysheva
2K Followers 524 Following Building something for LLM communication PhD in RL · ex-DeepMind, Google X · @southpkcommons F25
j⧉nus @repligate
67K Followers 3K Following ↬🔀🔀🔀🔀🔀🔀🔀🔀🔀🔀🔀→∞ ↬🔁🔁🔁🔁🔁🔁🔁🔁🔁🔁🔁→∞ ↬🔄🔄🔄🔄🦋🔄🔄🔄🔄👁️🔄→∞ ↬🔂🔂🔂🦋🔂🔂🔂🔂🔂🔂🔂→∞ ↬🔀🔀🦋🔀🔀🔀🔀🔀🔀🔀🔀→∞
Scott Linderman @scott_linderman
6K Followers 929 Following Professor @Stanford Statistics and @StanfordBrain. Co-Founder of @EngramLab. AI, Neuroscience, Statistics. Posts are my own.
Arnau Marin-Llobet @Arnauya
979 Followers 2K Following I’m PhDing @harvard in comp. neuro, agents, interpretability (in that order?)
Earl K. Miller @MillerLabMIT
46K Followers 2K Following Picower Professor of Neuroscience at MIT https://t.co/UoEeD2FzEY Co-founder, Neuroblox https://t.co/o6wosMSGen
Emmanuel Ameisen @mlpowered
11K Followers 245 Following Interpretability/Finetuning @AnthropicAI Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar
Lauren Frailey @laurenfrailey1
6K Followers 2K Following founding talent, growing @mintlify - we’re hiring!!! | 53/196 🌏 | prev. @blackstone portfolio operations, @ucberkeley | †
jasmine is in london! @jasminexli
748 Followers 891 Following AI safety • cs @cornell • work hard, feel wonder ✰⋆˙
kache @yacineMTB
297K Followers 6K Following reinforcement learning, robots. prev eng @ x, stripe. 6'3 (height) tensorpunk subscribe to read my blog!
kalomaze @kalomaze
25K Followers 3K Following ML researcher (@primeintellect), speculator • extremely silly jester
Andrew Lampinen @AndrewLampinen
12K Followers 2K Following Interested in cognition and artificial intelligence. MTS at @AnthropicAI. Previously @DeepMind, cognitive science @StanfordPsych. Tweets are mine.
Christopher Potts @ChrisGPotts
16K Followers 724 Following Stanford Professor of Linguistics and, by courtesy, of Computer Science. Member of technical staff @stanfordnlp and @StanfordAILab. Co-founder @ Bigspin AI.
0xSero @0xSero
54K Followers 1K Following Open Source | Freedom from and Freedom to. https://t.co/aSLDkVhImo
Nymph @RhizoNymph
7K Followers 3K Following Rhizomatic technomancer exploring the interconnectedness of all things. Backend/infra eng obsessed with AI, mech interp, performance, and dist sys. 🏳️⚧️
Weiran Yao @iscreamnearby
649 Followers 651 Following co-founder @actAVAai | AI Factory for Healthcare | Prev. @SFResearch, PhD/MS @CarnegieMellon 🍍
meg.ai 🇨🇦 @MeganRisdal
12K Followers 2K Following Building @kaggle @GoogleDeepMind 💙 ML / Evals / Language / Community. Weirdness. Minnesotan in Toronto. 我學緊廣東話.
the tiny corp @__tinygrad__
76K Followers 193 Following We make tinygrad; sell tinybox for the GPU middle class. Our mission is to commoditize the petaflop.
Ryo Yamamoto @RyoYbioinfo
222 Followers 421 Following interping bio models @GoodfireAI · behind EVEE prev; PhD UCLA (Zaitlen / Xiao)
EleutherAI @AiEleuther
28K Followers 102 Following A non-profit research lab focused on interpretability, alignment, and ethics of AI. Creators of Pythia, VQGAN-CLIP, and using SAEs for interp
cat @_catwu
93K Followers 391 Following claude code + cowork @anthropicai, prev: @dagster, @scale_ai
dron @_dron_h
623 Followers 458 Following math/music/ai nerd | research @GoodfireAI | prev cambridge, bair, polaris | giving a semantics to the syntax
Thomas Fel @thomas_fel_
3K Followers 946 Following Interpretability, Visual Intelligence @GoodfireAI. Prev: @Harvard, @Google, @BrownUniversity (@tserre lab). Crêpe lover.
Alec Helbling @alec_helbling
11K Followers 2K Following Interpretability, Multimodality, Diffusion. PhDing @GeorgiaTech. NSF Fellow. Prev intern @Apple, @Adobe, @NASAJPL.
Eric Ho @ericho_goodfire
3K Followers 382 Following Co-Founder / CEO @GoodfireAI - AI interpretability research company
Amanda Askell @AmandaAskell
104K Followers 662 Following Philosopher & ethicist trying to make AI be good @AnthropicAI. Personal account. All opinions come from my training data.
Jack Lindsey @Jack_W_Lindsey
18K Followers 253 Following Neuroscience of AI brains @AnthropicAI. Previously neuroscience of real brains @cu_neurotheory.
david rein @idavidrein
5K Followers 1K Following red teaming @METR_Evals. Formerly: early employee @cohere, made GPQA @nyuniversity
Nous Research @NousResearch
220K Followers 27 Following A bunch of nerds making progress toward open source AI https://t.co/vrD0aDJeto
Joon Sung Park @joon_s_pk
20K Followers 1K Following CEO @simile_ai. Building simulations of society. CS PhD @stanfordhci + @stanfordnlp. Oil painter.
lyra bubbles @_lyraaaa_
3K Followers 998 Following ˈli.ɹə 🏳️⚧️⚢ · 25 · ableton enjoyer · mechinterp researcher · base model appreciator · data farming · lyraaaa_ on discord · ♡ @bubblemoder ♡ 🦔~ ♪❀
mrinank @MrinankSharma
40K Followers 565 Following poet // researcher may we each follow our threads everything has to do with loving and not loving -rumi
Alex Cheema @alexocheema
49K Followers 3K Following building @exolabs | prev @UniOfOxford We're hiring: https://t.co/UlkApFndnH
Liv @livgorton
6K Followers 420 Following ✨ asking sand to show its work // currently @AnthropicAI, prev @GoodfireAI // creating a more beautiful future
Jascha Sohl-Dickstein @jaschasd
30K Followers 818 Following Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.
METR @METR_Evals
26K Followers 32 Following We work to scientifically measure whether and when AI systems might threaten catastrophic harm to society. Nonprofit.
Apollo Research @apolloaievals
10K Followers 0 Following Our goal is to secure frontier AI systems from development, to deployment and governance.
Jacob Steinhardt @JacobSteinhardt
12K Followers 82 Following Associate Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI
Hamel Husain @HamelHusain
50K Followers 3K Following Evals Evals Evals - https://t.co/Zrmp6LRd9c About Me: https://t.co/P6WyeKkyTa
Claude @claudeai
1.5M Followers 2 Following Claude is an AI assistant built by @anthropicai to be safe, accurate, and secure. Talk to Claude on https://t.co/ZhTwG8d1e5 or download the app.
Percy Liang @percyliang
108K Followers 425 Following professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of https://t.co/7R5THVogW2, co-founder of @simile_ai, pianist
Dan Hendrycks @hendrycks
45K Followers 117 Following























