Shaun Smith @evalstate

https://t.co/rA1UoojwhN https://t.co/76p6mDAfej huggingface.co/evalstate united kingdom Joined July 2024

Tweets

2K
Followers

1K
Following

834
Likes

9K

Shaun Smith @evalstate

2 hours ago

@paw_lean @Andy_AJT Imagine that you didn't have to imagine what it would be like if I was there.

1 0 0 46 0

View Details

@AstroAdamH @SylwiaVargas @arizeai @rachelnabors @photo_png Usually some variant of this (from an old deck) - migrated recently to Slidev (used to use MARP), then fast-agent with a tuned VLM loop, and data gathering agents where needed.

0 0 2 30 0

View Details

Shaun Smith @evalstate

a day ago

Sharing slides from my GEPA talk last night at @arizeai builders meetup. Dominant theme of owning your models and compute. Brilliant insights shared from @AstroAdamH and @rachelnabors on this: Open is more important than ever. Also love meeting people building on HF @photo_png

6 3 18 903 12

View Details

Shaun Smith @evalstate

18 hours ago

@pamelafox @fragermk I'm planning to write up the recent labelling/routing and data build work I've been doing recently (everything is sampling and labelling 😩); would appreciate your review on it as I go @pamelafox ?

1 0 1 29 0

View Details

Shaun Smith @evalstate

19 hours ago

@pamelafox 😊. That's a really nice example - they can be tricky to find!

0 0 0 19 0

View Details

Shaun Smith @evalstate

21 hours ago

@kentcdodds @threepointone @mattpocockuk Yes: this is trivial to demonstrate and a mechanism to control test-time compute. Something must be lost in translation here.

0 0 1 41 0

View Details

Shaun Smith @evalstate

22 hours ago

@SylwiaVargas @arizeai @AstroAdamH @rachelnabors @photo_png Thank you @SylwiaVargas!

1 0 2 39 0

View Details

Shaun Smith @evalstate

a day ago

@arizeai @AstroAdamH @rachelnabors @photo_png Link to the slides (and other recent presentations) here: …alstate-presentations.static.hf.space/index.html

0 0 3 104 2

View Details

Shaun Smith @evalstate

a day ago

@aryaman2020 Relatable. I love the smell of burnt fingers and wasted token spend in the morning.

0 0 0 80 0

View Details

Shaun Smith @evalstate

a day ago

@ClementDelangue Epic collab 🤝❤️.

0 0 0 108 0

View Details

Shaun Smith @evalstate

a day ago

@ben_burtenshaw Blink once if you are OK

0 0 0 60 0

View Details

Shaun Smith @evalstate

a day ago

@victormustar Next step is it's all driven from LinkedIn comments sections.

1 0 4 106 0

View Details

Shaun Smith @evalstate

2 days ago

@marlene_zw I'm taking the long way to stay overground today!

1 0 1 41 0

View Details

Shaun Smith @evalstate

2 days ago

It's hot, so we'll put more people on fewer trains 👍☀️

1 0 5 877 0

View Details

Onur Solmaz @onusoz

2 days ago

New blog post: Using local models for agentic zero-shot classification, in real-time, high frequency triage If you have a 128gb of memory for models (a DGX spark like I do for example), you can create a real time classifier and notifier for yourself that can classify more than >20 items per minute, using mid-sized @googlegemma and @Alibaba_Qwen models, with over 200-300 output tok/s aggregate throughput Like processing new tweets on twitter, issues/prs on github, messages on telegram and discord, in real-time Over the past few weeks, I have built one for myself, to filter and get notified about local model related issues on the OpenClaw repo I initially thought gemma-4-e4b would give me the best tradeoff I was wrong. I learned that if one has enough memory already, one should not bother with <10b models like gemma4 e4b or e2b. Precision and recall were much higher zero-shot with gemma-4-26b-a4b, whereas the smaller e4b needed significant prompt optimization to eventually not perform nearly as good To provide more context to the model, I created a restricted bash-like shell, called reposhell. In that shell, it can run read-only commands to ls/find/grep/cat openclaw source code, but only that. When the PR description/diffs are not clear enough as to categorize it, the agent reads the code to figure it out Because small models can get prompt injected, and I need to make sure that someone can't harm my setup by creating a malicious issue or PR in the openclaw repo I found that for specific systems like this, it is very convenient to extend and bundle Pi. You can create agentic CLI tools that work fully locally and for free, and keep that separate from your main pi coding setup. localpager-agent has its own session dir and tools, and I ensure that it will run local models in a secure way by isolating it from my main pi setup Once localpager-agent categorizes a PR/issue as local_models and related labels, I automatically receive it as a notification on Discord The whole implementation is fully open source and MIT licensed, alongside the dataset we used to benchmark the performance I believe zero-shot agentic classification running on local hardware will find many use cases across a wide variety of business applications, like news gathering, open source software development, customer support, content moderation, sales and so on Agents increase the amount of information produced in a lot of systems, and hence we will need to set up cheap ways to wrangle all that information In times where governments can cut off access to SOTA models on a whim, it is more important than ever to build your business on open models and if possible, run them on your own hardware! Big thanks to @evalstate and @ben_burtenshaw for their valuable feedback, especially with helping me evaluate this more rigorously! One take-away is that categorizing contributions in an open source repo is a *hard* problem, and that it is not trivial to reliably create a golden dataset with LLMs, for evaluation purposes Read more here: huggingface.co/blog/local-mod…

1 2 15 975 16

View Details

Shaun Smith @evalstate

2 days ago

@reach_vb Frequent web socket drops, SSE fallback not working due to "Our servers are currently overloaded. Please try again later." -- been like this for the last couple of hours or so.