@AstroAdamH@SylwiaVargas@arizeai@rachelnabors@photo_png Usually some variant of this (from an old deck) - migrated recently to Slidev (used to use MARP), then fast-agent with a tuned VLM loop, and data gathering agents where needed.
Sharing slides from my GEPA talk last night at @arizeai builders meetup. Dominant theme of owning your models and compute. Brilliant insights shared from @AstroAdamH and @rachelnabors on this: Open is more important than ever. Also love meeting people building on HF @photo_png
@pamelafox@fragermk I'm planning to write up the recent labelling/routing and data build work I've been doing recently (everything is sampling and labelling 😩); would appreciate your review on it as I go @pamelafox ?
@kentcdodds@threepointone@mattpocockuk Yes: this is trivial to demonstrate and a mechanism to control test-time compute. Something must be lost in translation here.
New blog post: Using local models for agentic zero-shot classification, in real-time, high frequency triage
If you have a 128gb of memory for models (a DGX spark like I do for example), you can create a real time classifier and notifier for yourself that can classify more than >20 items per minute, using mid-sized @googlegemma and @Alibaba_Qwen models, with over 200-300 output tok/s aggregate throughput
Like processing new tweets on twitter, issues/prs on github, messages on telegram and discord, in real-time
Over the past few weeks, I have built one for myself, to filter and get notified about local model related issues on the OpenClaw repo
I initially thought gemma-4-e4b would give me the best tradeoff
I was wrong. I learned that if one has enough memory already, one should not bother with <10b models like gemma4 e4b or e2b. Precision and recall were much higher zero-shot with gemma-4-26b-a4b, whereas the smaller e4b needed significant prompt optimization to eventually not perform nearly as good
To provide more context to the model, I created a restricted bash-like shell, called reposhell. In that shell, it can run read-only commands to ls/find/grep/cat openclaw source code, but only that. When the PR description/diffs are not clear enough as to categorize it, the agent reads the code to figure it out
Because small models can get prompt injected, and I need to make sure that someone can't harm my setup by creating a malicious issue or PR in the openclaw repo
I found that for specific systems like this, it is very convenient to extend and bundle Pi. You can create agentic CLI tools that work fully locally and for free, and keep that separate from your main pi coding setup. localpager-agent has its own session dir and tools, and I ensure that it will run local models in a secure way by isolating it from my main pi setup
Once localpager-agent categorizes a PR/issue as local_models and related labels, I automatically receive it as a notification on Discord
The whole implementation is fully open source and MIT licensed, alongside the dataset we used to benchmark the performance
I believe zero-shot agentic classification running on local hardware will find many use cases across a wide variety of business applications, like news gathering, open source software development, customer support, content moderation, sales and so on
Agents increase the amount of information produced in a lot of systems, and hence we will need to set up cheap ways to wrangle all that information
In times where governments can cut off access to SOTA models on a whim, it is more important than ever to build your business on open models and if possible, run them on your own hardware!
Big thanks to @evalstate and @ben_burtenshaw for their valuable feedback, especially with helping me evaluate this more rigorously! One take-away is that categorizing contributions in an open source repo is a *hard* problem, and that it is not trivial to reliably create a golden dataset with LLMs, for evaluation purposes
Read more here: huggingface.co/blog/local-mod…
@reach_vb Frequent web socket drops, SSE fallback not working due to "Our servers are currently overloaded. Please try again later." -- been like this for the last couple of hours or so.
153 Followers 2K Following"I tempered the storm, though your faith was small.
I prayed while you slept and the night waged war" - The Lord Jesus Christ
221 Followers 621 FollowingComputational ID dynamicist now doing AI. Wrote the book on post-training. Board certified in public health. Living with NMOSD. All views personal. (he/him)
9K Followers 616 FollowingPrincipal Engineer for Open Source @callstackio. Core @reactnative Community contributor. Created @agent_device, RN Testing Library, ex-maintainer Jest
30 Followers 17 FollowingOld school programmer. Building my AI agent system from the ground up. I'm not a sarcastic person and I want to meet people building agents or are learning how
68 Followers 999 FollowingWith heart, head, passion & commitment, you will find me mainly in the digital machine room and there mainly in the problem and solution room.
9K Followers 616 FollowingPrincipal Engineer for Open Source @callstackio. Core @reactnative Community contributor. Created @agent_device, RN Testing Library, ex-maintainer Jest
713 Followers 75 FollowingYour weekly digest of AI software trends, expert insights, & handpicked content and events, delivered straight to your inbox.
https://t.co/HhOO8vJKdo
52K Followers 374 FollowingPlugins, Apps, MCPs at @OpenAI. Prev Director of Engineering @Shopify, co-founder and CEO @Stellate (acq. by Shopify), co-creator of styled-components.
30 Followers 17 FollowingOld school programmer. Building my AI agent system from the ground up. I'm not a sarcastic person and I want to meet people building agents or are learning how
4K Followers 1K FollowingDevRel strategist and tech writer 💜
I believe in kindness and I like books.
"Me and my books in the same apartment, like a gherkin in its vinegar"
34K Followers 156 Following@cloudflare and @voidzerodev
built https://t.co/0OwN9doMDp, Athena Crisis, Codiff, https://t.co/tGexL75M8l, jest, metro, yarn and mootools. https://t.co/HKzv7XuHs3
18K Followers 9K FollowingI push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.