Dylan Sam @dylanjsam

safety research @openai | formerly phd @mldcmu, @BrownUniversity dsam99.github.io San Francisco Joined October 2017

Tweets

209
Followers

1K
Following

521
Likes

3K

Bryan Wilder @brwilder

3 weeks ago

Thanks for having me! I talked about our work on valid inference with synthetic data (arxiv.org/abs/2508.06635) and robust human-AI complementarity (ICML 2026, paper up soon), both with my PhD student @yewonbyun_

Despoina Paschalidou @paschalidoud_1

3 weeks ago

Exploring the Next Generation of Data Workshop @CVPR is happening now at Room 603. @brwilder is talking about the science with synthetic data.

1 1 12 2K 0

1 2 6 546 0

View Details

Tomek Korbak @tomekkorbak

3 months ago

New OpenAI post: Can midtraining on docs about aligned AI bake in alignment priors for agents? We report an experiment where those priors are quickly washed away by RL and fail to generalize to agentic settings. But that cuts both ways: priors that AIs are misaligned fade too!

7 39 220 36K 109

View Details

Jason Wolfe @w01fe

3 months ago

I'm also extremely excited for our companion post today on Model Spec Evals! Spec Evals are a new way we're measuring progress towards alignment with the Model Spec — including public results, an open dataset, and code others can build on. alignment.openai.com/model-spec-eva…

7 9 37 4K 10

View Details

Zico Kolter @zicokolter

3 months ago

As AI agents access more untrusted information with greater autonomy, prompt injections may become the greatest security challenge of our era. @GraySwanAI, in collaboration many frontier labs, just released our paper on the largest public prompt injection challenge to date. 🧵

Gray Swan AI @GraySwanAI

3 months ago

Your AI agent can be hijacked by a prompt injection and you'd never know! The attack executes. The response looks normal. And the user moves on. We ran the largest public competition testing this exact threat across tool use, coding, and computer use agents. 464 participants,

6 17 54 17K 39

6 8 65 13K 33

View Details

Karan Singhal @thekaransinghal

3 months ago

x.com/i/article/2032…

34 58 303 65K 391

View Details

Marc Finzi @m_finzi

6 months ago

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence arxiv.org/abs/2601.03220 with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

56 392 2K 1.1M 2K

View Details

Dylan Sam @dylanjsam

7 months ago

Finally, I'm presenting work on monitoring models for harmful behaviors, hallucinations, and adversarial manipulation at Poster #1304 in Exhibit Hall C,D,E on 12/5 at 4:30pm! x.com/dylanjsam/stat…

Dylan Sam @dylanjsam

a year ago

To trust LLMs in deployment (e.g., agentic frameworks or for generating synthetic data), we should predict how well they will perform. Our paper shows that we can do this by simply asking black-box models multiple follow-up questions! w/ @m_finzi and @zicokolter 1/ 🧵

4 40 116 15K 82

0 0 2 434 0

View Details

Dylan Sam @dylanjsam

7 months ago

Next, I'm presenting on safety pretraining, where we find that incorporating safety behaviors during pretraining leads to more robust language models! Come by Poster #5210 at Exhibit Hall C,D,E at 4:30pm today (12/4)! x.com/dylanjsam/stat…

Dylan Sam @dylanjsam

9 months ago

🚨Excited to introduce a major development in building safer language models: Safety Pretraining! Instead of post-hoc alignment, we take a step back and embed safety directly into pretraining. 🧵(1/n)

7 90 360 64K 239

1 0 3 605 0

View Details

Dylan Sam @dylanjsam

7 months ago

I'm at NeurIPS this week! Excited to meet old/new friends and chat with people about training safer language models. I'm presenting a few works on safety pretraining, measuring diversity in data curation, and monitoring model behaviors --- more info below 👇

4 4 37 4K 7

View Details

Emily Byun @yewonbyun_

7 months ago

I’m at NeurIPS this week (12/2-12/8) to present our work on when/how synthetic data (e.g., LLM simulations) can help scientists make inferences with less real data, improving the efficiency of costly experiments. Come by Poster #904 on Thursday 4:30PM (Exhibit Hall C,D,E)!🙂

Emily Byun @yewonbyun_

9 months ago

💡Can we trust synthetic data for statistical inference? We show that synthetic data (e.g. LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moments of synthetic data and those of real data

2 36 144 32K 84

2 4 31 13K 9

View Details

Pratyush Maini @pratyushmaini

7 months ago

Excited about our NeurIPS'25 tutorial Data Privacy, Memorization & Copyright in GenAI with Cooper (co-founder, GenLaw) & Joe (represents OpenAI, Stability in all US copyright litigations) We bring together ML researchers, with those who understand its legal implications. Pls RT

3 22 82 13K 8

View Details

Bryan Wilder @brwilder

7 months ago

I gave talks at MIT and Harvard this week about "Science with synthetic data". How can generative models help us learn about the world (e.g., social systems) in a principled way? Lots of interesting conversations; more convinced than ever that there's nuanced issues to navigate

1 2 9 626 4

View Details

Sachin Goyal @goyalsachin007

8 months ago

📢 Multi-token prediction has long struggled with defining the right “auxiliary target,” leading to tons of heuristics. We show a core limitation of these and propose a simple & sweet idea: future summary prediction. Introducing what I call 🚀TL;DR token pretraining🚀

Divyat Mahajan @divyat09

8 months ago

[1/9] While pretraining data might be hitting a wall, novel methods for modeling it are just getting started! We introduce future summary prediction (FSP), where the model predicts future sequence embeddings to reduce teacher forcing & shortcut learning. 📌Predict a learned

11 45 222 61K 131

4 36 241 29K 159

View Details

Yuda Song @yus167

8 months ago

🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)

2 40 142 31K 66

View Details

Bryan Wilder @brwilder

9 months ago

How can synthetic data from LLMs be used, e.g. for social science, in a principled way? Check out Emily's thread on our NeurIPS paper. The key is to generate each synthetic sample by prompting with a real example -- enables debiased estimates that wouldn't be possible otherwise!

Emily Byun @yewonbyun_

9 months ago

2 36 144 32K 84

1 2 10 1K 2

View Details

Emily Byun @yewonbyun_

9 months ago

14/ I’ll be giving a talk on our work at the #COLM2025 Social Simulations workshop tomorrow (Friday 10/10) at 10AM. Come by Room 523AB!🙂 Paper Link: arxiv.org/abs/2508.06635 Code: github.com/lasilab/valid-…