Ryan Peters @ryanpirl

Reverse engineering intelligent (learning) systems. ryanirl.com Minneapolis, MN Joined February 2019

Tweets

71
Followers

177
Following

94
Likes

171

Ryan Peters @ryanpirl

14 hours ago

"What I can now create, I do not necessarily understand." — probably not Feynman

0 0 1 42 0

View Details

Ryan Peters @ryanpirl

a day ago

@scott_linderman Congrats!

0 0 1 225 0

View Details

Scott Linderman @scott_linderman

2 days ago

I'm excited to share what we're building at Engram! This team is incredible, and we're working on one of the most interesting problems in AI right now: how to build models that are tailored to each person and continually learn from experience. Come join us!

Engram @EngramLab

2 days ago

x.com/i/article/2069…

148 191 2K 1.5M 1K

12 17 161 35K 45

View Details

Ryan Peters @ryanpirl

2 days ago

Looks like a Pringles chip 😂 Just for fun: The trajectory (black line) through a food-manifold of Qwen3-4b saying "Soup is a warm, liquid dish made from cooked ingredients. Pringles chips are a crispy, salty snack in a single-serving can." Surprisingly (or not) choppy trajectories through this space.

Goodfire @GoodfireAI

2 days ago

Read the full post: goodfire.ai/research/stori…

2 3 58 4K 26

1 0 15 2K 4

View Details

Ryan Peters @ryanpirl

6 days ago

Just registered. If anyone going wants to meet up to talk don't hesitate to reach out! Currently working on introspection and auto-interp.

Gabriel Franco @gvsfranco

2 weeks ago

🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇

2 30 119 23K 44

0 0 1 123 0

View Details

Ryan Peters @ryanpirl

a week ago

Interesting work, but I feel like the title of this post is very misleading. The "one-layer" induction head you study is a two layer model, but where you weight-share a single attention head (correct me if I am wrong please). You cannot have an induction head in this canonical model without the sequential application of attention twice. One to first apply the prev token head so that the residual stream can effectively represent a skip bigram lookup table, and then another to lookup and retrieve this information.

1 0 2 107 0

View Details

Ryan Peters @ryanpirl

2 weeks ago

Aspen Colorado Working on a couple of interp related blog posts while I'm here.

0 0 1 106 0

View Details

Ryan Peters @ryanpirl

2 weeks ago

@realmeatyhuman @Sauers_ 😅

1 0 1 47 0

View Details

Ryan Peters @ryanpirl

2 weeks ago

Fable found another 2x on top of this. Now 6-8x faster than the public circuit-tracer implementation. This additional 2x: Exploit GQA weight sharing (Qwen shares V-weights across query heads) and pre-transpose weights into GEMM-friendly layouts.

0 0 1 74 0

View Details

Ryan Peters @ryanpirl

4 weeks ago

Some early benchmarks on the attribution step: - Consistently 3.4x faster than circuit-tracer - Much more memory efficient (~6 GB less at 70,000 nodes) So far, these gains are from dropping the autodiff backend and exploiting an autoregressive causality trick (performing backward only through previous token positions). All results still 1:1 numerically matching Anthropic's implementation (up to bf16 precision). Further speedups will likely come from approximation (edge pruning, sparse intermediates, etc...) that diverge from circuit-tracer slightly. Benchmarking done on Qwen3-4B

1 0 1 213 0

View Details

Ryan Peters @ryanpirl

4 weeks ago

Spending some time this week speeding up and scaling Anthropic's circuit-tracer implementation. Feel free to comment feature requests. Will post progress here.

1 0 3 289 1

View Details

Ryan Peters @ryanpirl

2 weeks ago

@Sauers_ The concept: 'trumpets'

0 0 3 42 0

View Details

Ryan Peters @ryanpirl

2 weeks ago

Probability of Qwen introspecting that it's being steered at various steering strengths. Each line is a different concept, except for the black line which is the placebo baseline. There is an obvious increase around alpha=20. Prompt: "Am I injecting a thought right now? Answer with a single word: yes or no." Steering at layers 20-31 in Qwen3-14b. Vectors extracted with repeng.

1 0 13 836 0

View Details

Ryan Peters @ryanpirl

2 weeks ago

From my experience, any kind of neuroscience work is getting flagged :(

0 0 1 101 1

View Details

Ryan Peters @ryanpirl

2 weeks ago

@Sauers_ Manifold steering is actually what gave me the idea to run this test 😁

1 0 2 19 0

View Details

Ryan Peters @ryanpirl

2 weeks ago

PCA fit to the final layer of residual stream in Qwen3-4b across 15 trajectories in a spatial discrimination task, then applied to each layer of the residual stream and plotted.

1 2 16 1K 6

View Details

Ryan Peters @ryanpirl

2 weeks ago

@Sauers_ The model is tasked with predicting the position of an object in an environment, and so each 'position' is the ground truth position within the environment.

1 0 4 45 0

View Details

Ryan Peters @ryanpirl

2 weeks ago

Me: playing peak-a-boo with some random child at the coffee shop 🙈🙉 My brain: "Ah yes, an in-vivo experiment testing object permanence in infants."

0 0 1 113 0

View Details

Ryan Peters @ryanpirl

2 weeks ago

Qwen is just a humble bread 🥖🍞

1 4 28 1K 3

View Details

Ryan Peters @ryanpirl

2 weeks ago

I wonder if qualia steered models would be any better at mechanistic introspection 🤔

Sauers @Sauers_

2 weeks ago

Qualia steering (OLMo 32B mid-SFT checkpoint) example: unsteered: "I am not a conscious entity. I am a language model . . . . I don't have subjective experiences" steered: "I don't know what it is like to be you, and you don't know what it is like to be me. But I do know what