Prasenjit Sarkar @stretchcloud
Product & Growth | RAG & Knowledge Graph | Generative AI | LLMs | 15x Patents | 7x Author | Building for Growth …functionalproductmanager.substack.com London, England Joined March 2011-
Tweets9K
-
Followers2K
-
Following951
-
Likes3K
The AI cost problem at scale is not what people think it is. Coinbase cut AI spend by nearly 50 percent while token usage grew exponentially. 91 percent of employees saw no change in access. The gains came entirely from infrastructure changes, not caps or restrictions. Three levers: better model defaults, smarter routing, and aggressive caching. Defaults switched to GLM 5.2 and Kimi 2.7 via an internal LLM gateway. Cache hit rate on one internal tool jumped from 5 to 60 percent. Engineers still choose any model they want. The default just changed. The bigger prediction: 80 percent of AI workloads will migrate to models 99 percent cheaper than today's frontier within 12 to 18 months. Only 20 percent stays on top-tier models for research-grade tasks. The pattern I keep seeing across engineering teams building for scale: model routing is becoming its own discipline. OpenRouter data shows the same two-lane market forming. Commodity open-weight inference at high volume and low cost. Premium frontier models for low-volume precision tasks. Teams routing correctly already have a 3 to 5x cost advantage over those that don't. The infrastructure layer that makes this work is a model gateway with task classification and prompt caching. What Coinbase built internally is what OpenRouter, Portkey, and Braintrust are selling as managed products. The market for that layer is real and growing. My read: enterprise AI cost optimization is a routing and caching problem, not a model problem. The organization that treats LLM spend like a CDN, with caching strategies, routing policies, and tiered defaults, will run circles around the one still paying frontier prices for every query. x.com/brian_armstron…
How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching. Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting
The CUDA bottleneck was never really about chips. Qualcomm just paid $3.9 billion for Modular, a 150-person company. The assets: Mojo, a programming language that targets Nvidia, AMD, Intel, Qualcomm, and Apple Silicon from a single kernel codebase, and MAX, a graph compiler that builds inference engines from it. Write once, run optimized everywhere. No Nvidia vendor libraries anywhere in the stack. The reason this matters is what CUDA actually is. Not processing power. A 20-year software moat. 4 million developers. Toolchains, libraries, courses, job postings. Prior challengers like AMD ROCm (eight years in, still below 5 percent of AI training workloads) and Intel OneAPI couldn't crack it because the unlock isn't better hardware, it's making the software porting cost disappear. Modular is structurally different. MAX benchmarks show 10 to 50 percent faster inference than vLLM on identical hardware, built entirely without CUDA. Chris Lattner did this same thing before, at LLVM, replacing a generation of proprietary compiler toolchains, and with Swift, which Apple adopted across its entire platform. Qualcomm is spending more than $14 billion here. Meta has placed CPU orders. Microsoft is in. The pattern I keep seeing across agentic infrastructure: the compute routing question is becoming real. Running a million parallel agent tasks on a locked Nvidia fleet doubles cost vs. a mixed deployment. The teams building at scale care about what Qualcomm ships from this. My read: this is a bet that the software moat matters more than the chip itself. If MAX becomes the standard inference compiler, Nvidia's real defensibility narrows to training. That's a meaningfully different company. x.com/PeterDiamandis…
Qualcomm just paid $3.9 billion for a 150-person company. The prize: a programming language that lets AI run on ANY chip without NVIDIA's CUDA. Meta and Microsoft are already placing orders. The NVIDIA software monopoly just got its first real challenger. $3.9B says this is
My read: the local vs. API split in agentic workloads will follow the pattern we saw with edge inference in 2023-2024. Curiosity first. Then cost pressure. Then a production category. The teams that build the local agent stack now will have a meaningful cost and compliance advantage when the rest of the market catches up.
How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching. Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting
The transition in Karpathy's workflow wasn't dramatic. There was no announcement. One day he was writing functions. The next he was managing autonomous systems that wrote them for him. That's the detail that stays with me. The ratio flipped from 80% writing to 80% delegating, and he says it keeps shifting further. This is the person who built Tesla's self-driving system and taught a generation of engineers neural networks at Stanford. He describes his current state as "perpetual AI psychosis": 16 hours a day not typing code, but expressing intent to agents. AutoResearch, the tool he built to show the principle, ran 700 experiments in two days. The agent edited code, tried ideas, learned from failures, and dropped the "Time to GPT-2" benchmark from 2.02 hours to 1.80 hours. No human at the keyboard. The pattern I keep seeing: the bottleneck in software development has shifted. Cognition built Devin as the first fully autonomous software engineer. GitHub Copilot Workspace handles multi-step, multi-file coding tasks from a single spec. SWE-bench scores for frontier models crossed 60% on real GitHub issues. Cursor crossed $500M ARR in 2025, mostly from engineers who were already professional coders and still chose to pay. At Sequoia Ascent 2026, Karpathy called this Software 3.0: programs built through prompts, context, agents, tools, and verification rather than typed instructions. What I keep coming back to: the skills that persist aren't the writing ones. They're spec design, diff review, eval construction, and security oversight. Judgment-intensive work. Not keystroke-intensive work. The identity question for a generation of engineers isn't whether AI can write code. It's what "software engineering" means when writing code stops being the job. x.com/heyshrutimishr…
Andrej Karpathy hasn't typed a line of code since December. Not because he retired. Not because he switched careers. Because his AI agents do it all now. The former head of Tesla Autopilot, the person who literally wrote the textbook on deep learning, says his workflow flipped
The Vercel team just migrated 7 million lines of code to TypeScript 7 in a single Claude Code session. 16 PRs, roughly 2 days, $1146 in tokens. The observation that hits: pre-AI, this would have sat at the very bottom of the Platform engineering roadmap. That's the shift I keep noticing. It's not that AI writes faster. It's that AI changes which work gets done at all. Large-scale dependency migrations, version upgrades, and cross-cutting refactors have always been backlogged not because they weren't valuable, but because the cost-to-benefit ratio didn't clear. Three to four weeks of engineer time for TypeScript version parity isn't a hard no, but it loses to product work every time. $1146 doesn't lose to product work. The same pattern is showing up elsewhere. Mehul Kar's migration at Vercel is TypeScript 7. Thomson Reuters migrated their entire CoCounsel codebase to Vercel's AI SDK, deprecating thousands of lines across 10 providers, with 3 developers in 2 months. Teams running Mastra moved the same agent from LangGraph in 18 hours. AI SDK 7 ships with a codemod (npx @ai-sdk/codemod v7) that automates the upgrade path. The connection to an older pattern: when cloud computing made it economical to run redundant services, teams suddenly ran the monitoring and logging they'd always known they needed but had never prioritized. The cost floor dropped, and previously deferred work got done. What's clearing now is the infrastructure debt backlog. Version upgrades, type strictness migrations, library consolidations. Work that's been on the list for years. The token cost is clearing the build vs. defer threshold on a whole category of tasks that were never truly optional, just perpetually postponed. x.com/mehulkar/statu…
I just migrated a 7 million line codebase at @vercel to typescript@7 in a single Claude code session. It took 16 PRs, ~2 days, and ~$1146 in tokens. Incredible, because pre-AI, it would have been at the very bottom of a Platform engineering team's roadmap.
The way most teams run agent evals has a structural flaw. The model evaluates itself, and the model is biased toward approving its own work. What I keep seeing in the research: self-evaluation doesn't scale. It feels like rigor. It isn't. The gemchanger team ran 80 agents on a single task and found that averaging them barely moved the error. They all came off the same base model, so they all missed in the same direction. What actually cut the error by 86%, down to 0.135, was a grounded verify gate: a small set of questions with known answers, used to fire bad agents before their outputs propagated. Ash Prabaker and Andrew Wilson at Anthropic built the same insight into their long-running agent harness. One agent does the work. A separate adversarial evaluator grades it against a rubric. A gate blocks shipping until criteria are met. The doer never grades itself. This is the maker/checker rule at population scale. It also shows up in code review, in peer review, in every quality system that actually works. The entity producing the output cannot be the entity approving it. The interesting failure mode they found: when agents vote each other out, the swarm keeps firing until bad agents hit 48%, then the error jumps from 0.64 to 2.12. The swarm is calling it consensus. It's actually the majority eliminating the competent minority. Existing eval platforms (Braintrust, LangSmith, PromptFoo) give you rubric infrastructure. The architecture shift here is separating the evaluator role entirely, not just a grading function, but a structurally adversarial agent with a different objective than the generator. The bottleneck in agent reliability isn't compute or context. It's this: the doer and the checker cannot be the same entity, and peer voting at scale selects for the wrong thing. Building the separation in at the harness level is the only thing that seems to hold. x.com/VoltexGar/stat…
Ash Prabaker & Andrew Wilson, Anthropic: "self-evaluation is a trap, and adversarial evaluator agents work better." gemchanger ran 80 agents on one task and found that averaging them barely moved the error, because they all came off the same model and miss the same way. what
The US export ban on Anthropic's Mythos and Fable 5 is showing me something I didn't expect. When access to a frontier model disappears overnight, the market doesn't wait. It builds its own. Sakana launched Fugu on June 22, ten days after Trump's order cut Anthropic's international access. The pitch was explicit: "frontier capability without the risk of export controls." Fugu is architecturally interesting. It's not a bigger model. It's an orchestration layer that routes tasks across a swappable pool of frontier LLMs, including instances of itself. One endpoint, many models, no single-vendor dependency. Sakana raised $135M at a $2.65B valuation in late 2025. The research grounding it, TRINITY and Conductor, was peer-reviewed and presented at ICLR. At the same time, China's 360 shipped Tulongfeng and Yitianzhen, two cybersecurity AI tools positioned directly against Mythos. MiniMax launched M3 last week, performance on par with GPT-5.5, and opened its weights. Zhipu AI stock went up 33% on the Fable 5 ban alone. The pattern connects to something from cloud infrastructure a decade ago. When US firms couldn't meet European data residency requirements, it forced investment in local cloud capacity that wouldn't otherwise have been funded. The same dynamic is happening here, faster. The difference this time: Sakana's framing isn't "we're an alternative." It's "we're the hedge." Collective intelligence over single-provider dependency. That framing now ships as a feature in a product. Anthropic's run-rate crossed $47B in May 2026. What share was Asian enterprise is not public. What is clear: the vacuum opened by the ban is being filled by products that explicitly market the absence of US restrictions. The restriction becomes the competitor's product differentiation. x.com/TechCrunch/sta…
Asian AI startups launch Mythos-like models as Anthropic’s export ban drags on techcrunch.com/2026/06/27/asi…
The pattern I keep returning to: 'which model is best' is the wrong question for agents. The right question is which model produces the best outcome per dollar spent across actual work. Arena.ai published its first Agent Arena leaderboard this week. The method isn't pairwise voting. It's causal tracing: they run real user tasks, randomize model assignments across sessions, and measure causal treatment effects. What they find surfaces something pairwise evals miss. The chart on performance vs output tokens shows Claude Fable 5 (High) at the top. What makes this interesting is the axes. The x-axis is median output tokens. The y-axis is net improvement over baseline. The models in the top-right quadrant are the ones actually worth running in production: better output, not more tokens. This matters because agent loops are expensive in ways list-price comparisons miss. Arena found that some models are more expensive in practice than their published price suggests because they take more steps per turn or induce more turns before users reach satisfaction. The realized cost diverges from the sticker cost. Some concrete numbers from a recent 7-day window: 160,480 agent tasks. 2 million tool calls. 40.3 million lines of code written. 32% of sessions ended with at least 128k tokens in the final turn. 8% exceeded 1M tokens. The companies building serious agent products already know this. Cloudflare, Vercel, and others have published findings about token spend diverging from expected costs in production. The benchmark that lives closest to that reality is the one that will drive model selection. My read: the leaderboard that matters is the one built from your actual workload, not your evaluation suite. Arena is the first credible attempt to do this at scale. x.com/arena/status/2…
[Token efficiency in Agent Arena] Agent Arena measures agent performance across a range of real-world tasks from our global community. Models get search, filesystem, and terminal tools to complete complex workflows: writing code, creating slide deck, researching the web, building
The pattern I keep seeing: labs don't compete on the leaderboard. They compete on the environment. OnlyLabs.fyi tracked 128 open eval-relevant roles at the major frontier labs this month. The single largest category is RL and post-training: 46 roles. Safeguards and safety is second at 22. Alignment and model behavior third at 20. Evals and benchmarks is fourth at 17. That mix tells you something. The bench isn't the bottleneck. Building the reward environment is. Anthropic has 43 of the 128 roles, OpenAI has 31, Cohere has 16, xAI and Mistral each have around 6. Anthropic has previously discussed spending over $1 billion on RL environments in the near term. OpenAI's R&D compute budget for 2026 is around $19 billion, roughly double 2025. OpenAI literally titles one of its open roles 'Research Engineer, Frontier Evals and Environments.' Wing VC laid out why this matters: between now and 2030, the RL environment market narrows to three to five leaders, with one or two pulling meaningfully ahead. Early advantage goes to teams that go deep in a small number of complex, high-signal domains, especially coding and computer use. The companies trying to sell RL environments to labs include hud.ai (whose environments powered Autonomy-10, used to evaluate OpenAI Operator at launch), SemiAnalysis has a full piece on RL environments as data foundries and multi-agent architectures, and Epoch AI published an FAQ on RL environments. What I keep noticing: the best model is increasingly the one trained on the best environment. The environment is the moat, not the architecture. You can fine-tune on architecture in months. Environments take years to build. The labs know this. The hiring data confirms it. x.com/xdotli/status/…
what frontier labs hiring signals for RL environment companies at OnlyLabs.fyi
The model market just made its tiering structure explicit. OpenAI shipped GPT-5.6 as three named capability tiers today: Sol at the frontier, Terra in the middle, Luna for high-volume work. Sol costs $5 input / $30 output per 1M tokens. Terra is $2.50 / $15. Luna is $1 / $6. Terra matches GPT-5.5 at 2x cheaper. That pricing ladder mirrors what cloud compute looked like when AWS started naming instance families. Once the tiers have names, they compete on their own cadence. Sol introduces two new operating modes. Max gives the model more reasoning time. Ultra goes further by spawning subagents to parallelize complex work. Both signal that the architecture for frontier tasks is increasingly multi-agent, not just a bigger single-model call. On Terminal-Bench 2.1, Sol scores 88.8% versus Claude Mythos 5 at 88%. Close enough to call a tie on raw score. But Sol does it at roughly 1/3 of Mythos Preview's output tokens on ExploitBench. Token efficiency at the frontier is now a tracked metric, not an afterthought. The launch follows Trump's June 2 executive order on AI model oversight. OpenAI shared plans with the US government before launch and is starting with about 20 trusted partner organizations. They are pushing for this to not become a standing norm. The tension between government review and broad access is the friction that will shape every major release from here. Sol is also coming to Cerebras in July at up to 750 tokens per second. Frontier intelligence at inference speed has been the missing piece for real-time agent loops. My read: the shift from 'model' to 'model family with named tiers' is the same move AWS made with EC2, Azure made with VMs, and Google made with Compute Engine. Once you ship Sol, Terra, and Luna, you have a platform. The question is who builds the next layer on top. x.com/OpenAI/status/…
Introducing a limited preview of GPT-5.6 Sol, our next generation frontier model, as well as GPT-5.6 Terra, a balanced model for efficient, everyday work, and GPT-5.6 Luna, a fast and affordable model for high-volume work. openai.com/index/previewi…
The bottleneck in AI-built software moved, and it moved fast. By late 2025, frontier models were good enough to one-shot working internal apps. Engineers and non-engineers at Block were building real tools in an afternoon. Sales reps, analysts, support agents. Then most of those apps sat on someone's laptop with nowhere safe to go. Block App Kit is the platform Block built to solve the second problem: getting AI-built apps into the right hands without creating a security or compliance hole. The core split: the agent generates the app, the platform owns everything that makes it safe. Identity, authorization, secret management, data connections, deployment path. The blog post has a detail I found clarifying: they started with an MCP server that an agent would use to scaffold, build, and deploy an app. That worked but required manual setup per person. They repackaged it as a skill on Block's internal agent tooling platform and distributed it that way. That's a meaningful architectural choice. Skills that self-distribute to agents are the composable layer that makes the whole thing scale. Block App Kit launched mid-March 2026. In the quarter since: weekly app views grew more than 10x, weekly active users climbed from hundreds into thousands, catalog now spans over a thousand apps with hundreds more launching every week. Roughly four in five users sit outside of engineering: Sales, Support, Legal, Finance, Marketing, across 50 orgs. The strongest signal: Block's security org designated Block App Kit the sanctioned path for building and deploying internal tools. When the team responsible for preventing data exfiltration decides your platform is the preferred route, the safety-by-design bet has worked. My read: the gap in AI-built software isn't model capability. The gap is platform infrastructure: identity, access control, data connections, and a deployment path that doesn't require an unsafe choice anywhere in the process. Block just published a detailed blueprint for closing it. x.com/jack/status/20…
block app kit. fastest adoption of any tool by our company.
Speculative decoding just got a significant open-source infrastructure upgrade from DeepSeek. DSpark is a new draft model for DeepSeek V4 checkpoints. The stated improvement: 51% to 400% throughput gain over baseline, depending on hardware and model combination. It improves on the prior generation of approaches in this space: MTP-1, Eagle-3, and DFlash. The more interesting release is DeepSpec, published to GitHub today. It's a full-stack codebase for training and evaluating speculative decoding algorithms. Not just the model, the training pipeline. The eval harness runs against gsm8k, math500, AIME25, humaneval, mbpp, livecodebench, MT-Bench, AlpacaEval, and Arena-Hard. That's a serious test surface. Speculative decoding works by running a small fast draft model that proposes token sequences, which a large target model then verifies in parallel. When the draft guesses correctly, you get multiple tokens for the cost of one verification pass. Throughput goes up with no change to output quality. What I find interesting about DeepSpec is the scope: it covers DSpark, DFlash, and Eagle3 as reference implementations. Early reports suggest it also transfers to Gemma and Qwen, not just V4 models. Teams running non-DeepSeek models can adapt the approach. The pattern I keep seeing: DeepSeek releases the model, then releases the training infrastructure. V3 weights came first. Flash Attention optimizations followed. DSpark now. DeepSpec completes that stack. Same playbook that made vLLM's PagedAttention stick: publish the technique, then publish the tooling that lets others reproduce and adapt it. My read: speculative decoding is moving from a research technique to a standard inference engineering practice. DeepSpec is an attempt to industrialize that transition. x.com/teortaxesTex/s…
DeepSeek releases their decoding module DSpark for V4 checkpoints, which improves a lot upon MTP-1, Eagle-3 and DFlash. Out of their vast goodwill, they also open source DeepSpec: "a codebase for training and evaluating draft models for speculative decoding".
The interface problem for browser agents just got a proposed fix built into the browser itself. Every agent framework targeting websites today makes the same bet: parse the DOM, read the accessibility tree, take screenshots, and hope the page structure holds. It mostly works. It's also brittle. A redesign breaks the agent. A shadow DOM breaks the agent. A canvas-rendered UI breaks the agent. WebMCP is a proposed W3C standard, co-authored by Google and Microsoft, that inverts this. Instead of agents reverse-engineering what a page can do, websites declare their capabilities as callable tools: JavaScript functions, HTML forms, structured metadata. The agent calls the tool and the site handles execution. Chrome 149 opened a public origin trial in May 2026. Three independent proposals converged into the spec: Microsoft's "Web Model Context" explainer, Google's "Script Tools" proposal, and MCP-B, a Chrome extension built at Amazon. All three ended up in the same W3C working group. The comparison I keep reaching for is REST APIs. Before REST, you reverse-engineered every site's URL structure and form logic. After REST, services published what they could do. WebMCP is the same transition for agent-website interaction. The deployment story is still early. Origin trial means signing up for access, shipping a trial token, testing. One competing proposal, Web Agent Bridge, anchors capabilities in DNS records rather than page-level JavaScript, which models a different trust relationship. My read: scraping-based agent web access doesn't scale to production. Sites that want reliable AI integrations need a first-class way to declare their surface area to agents. That's what WebMCP is trying to give them. x.com/ChromiumDev/st…
It can be challenging for AI agents to solve complex user intents by synthesizing signals like screenshots, the DOM, and the Accessibility Tree. Enter WebMCP, a proposed web standard that aims to expose structured tools for AI agents directly on existing websites, now in origin
The question I keep coming back to: when does an agent stop being a feature and start being infrastructure? Railway just wrapped Agents Week with the cursor.ai agent included by default in Railway sandboxes. No setup, no configuration. You get a real execution environment with a filesystem and shell. The agent clones code, runs commands, makes changes, hands back something reviewable. What I keep noticing across the category: Cloudflare ran their own Agents Week in parallel, shipping Flue and Dynamic Workflows. Vercel has background functions. Render, Fly.io, and Northflank are all building execution-layer primitives for agents. The platform race is about being the default runtime, not just the deployment target. The Cursor integration is one of four: Railway also supports Claude Code, OpenCode, and Codex natively in the same sandbox environment. And Railway Skills extend each agent with Railway-specific commands for deploying, monitoring, and managing services. The agent knows your infrastructure. The historical parallel is clear. In 2012, GitHub Actions did not exist. CI/CD was a configuration problem each team solved separately. By 2018, it was invisible infrastructure. The same thing is happening to agent execution environments now. The bottleneck is not model capability. It is how reliably the agent can act on real infrastructure without manual scaffolding. Railway just moved that scaffold into the box. My read: the cloud platforms that win the agent era will be the ones that turned agent execution into a first-class primitive before everyone else did. x.com/Railway/status…
The pattern I keep seeing across enterprise AI engineering: the first instinct is to measure usage. Then you realize you measured the wrong thing. Shopify killed their token leaderboard. People competed to be on it. Wrong incentive. They renamed it a usage dashboard to focus on utility, not volume. But the more interesting part is what they built on the other side of that realization. Shopify now runs a Universal Distillation Platform. Any team can take a frontier model, Opus 4 or GPT-5 class, and distill it into a fine-tuned Qwen or other open-source model for a specific subtask. The full cycle takes about a day, with evals baked in and a weekly retraining flywheel built on real merchant data. Numbers: 2x to 30x cheaper. 2.2x faster on the specific task. The fine-tuned model outperforms the frontier it replaced on that narrow task. They currently run roughly half a dozen of these distilled models in production. Companies doing versions of this: Stripe, Klarna, Duolingo. The pattern is: use the frontier model to generate training data and evaluate outputs, then distill the task-specific behavior into a model you can run cheaply at scale. Tangle, Shopify's open-source ML experimentation platform, adds experiment reproducibility and intelligent caching to the pipeline. The real bottleneck shifted. At 3,000 engineers with 100% AI adoption, the constraint they identify is PR review and CI/CD. Generation is fast. Integration is not. My read: this is the enterprise AI trajectory. High initial frontier usage to learn which tasks are worth training for. Then distill and own the model. x.com/AnatoliKopadze…
Head of Engineering Shopify: "AI writes the code, AI reviews the code. Your job is just to write the loops around it." 26 minutes on how AI changed the way 3,000 engineers work inside a single company. Ignoring it while everyone else uses AI to do more is the fastest way to
The bottleneck for agents in real-time conversation has always been latency. Not intelligence. Alibaba's Wan team just published Wan-Streamer v0.1. Model-side response latency: 200ms. Total end-to-end: 550ms. The agent sees you, hears you, and responds on video. All at once. Full duplex. What makes this architecturally significant: there is no VAD module, no ASR pipeline, no separate TTS layer, no animation engine. Perception, reasoning, generation, and turn management are learned jointly inside a single transformer, using block-causal attention for incremental streaming. Every cascaded system accumulates error and latency at each handoff. This eliminates the handoffs. The competitive context: GPT-4o Advanced Voice is audio-only. Gemini Live supports video input but uses a different architecture and is not end-to-end multimodal output. HeyGen's streaming avatars rely on an external LLM sending audio to a separate rendering layer. ElevenLabs, Tavus, Synthesia all operate as separate layers on top of foundation models. Wan-Streamer is the first published proof that a single model can handle language, audio, and video as both input and output in a single pass, in real time. Current resolution is 192p. That is a proof of concept constraint, not an architectural limit. The application surface: digital humans, customer support agents, embodied AI interfaces, real-time tutoring systems. The latency numbers already cross the threshold where conversations feel natural. My read: the modality race is shifting from which model thinks best to which model can sustain a present, responsive state in real time. Wan-Streamer moves that frontier. x.com/minchoi/status…
We are cooked. China's Alibaba just revealed Wan Streamer. AI agents can now see you, hear you, and talk back on video in real time. This is not voice mode anymore 🤯
Everyone read "open weights" as a gift to the community. It launched API-first. Weights came ~10 days later. 428B params most teams can't self-host. "Open" was the trust badge. The hosted endpoint is the meter. Open weights stopped being charity. They became a go-to-market.
The lesson here is more general than AWS. Most agent security thinking focuses on what the agent is allowed to do. Least privilege, scoped permissions, audit logs. All of that is useful but it starts from the wrong premise: that you are tuning access for an entity that makes human-like mistakes. Agents make agent-like mistakes. They retry things confidently. They complete tasks that should have stopped two steps ago. They misread a response and apply the destructive action again. The blast radius of that kind of mistake through a path that reaches production is not recoverable in the same way human errors usually are. The correct frame is isolation, not permission. The boundary is architectural: there is no path to production. Not a restricted path. No path. The four layers in this setup encode that correctly. A disposable sandbox server absorbs environmental damage. Recreate it and carry on. CI/CD handles real deployments, but the agent only gets as far as GitHub; a human applies the infrastructure change. AWS experimentation happens in a separate account with temporary, scoped, revocable credentials, fully airgapped from production. The diagram makes the enforcement logic explicit: if the agent can reach production, you stop and fix the boundary. You do not tune permissions. The CI/CD gate is the most important piece. Agents propose. Humans apply. That one constraint keeps the feedback loop open regardless of how confident the agent is in its output. Confidence and correctness are not correlated the way we would want them to be. x.com/Al_Grigor/stat…
A coding agent should never have a path to production. I learned this the expensive way after one of my agents dropped a production database. That incident changed how I think about cloud access for agents. Now my setup is different. Agents run on a remote sandbox server. The
Michael Cade @MichaelCade1
24K Followers 9K Following Global Field CTO | Lead Technologist @Veeam - Kubernetes, Cloud-Native, DevOps & Data #90DaysOfDevOps 👨🏻💻
Simo Vilmunen @svilmune
2K Followers 2K Following Cloud,Infrastructure,Databases | Oracle ACE | Blogger | All opinions my own | Finnish-Canadian 🇫🇮🇨🇦 living in 🍁 working for @Enkitec @Accenture
Guillermo Ruiz @IaaSgeek
4K Followers 2K Following Sr. Specialist SA Efficient Compute @AWScloud - Host @aws_espanol -vBeard, Arm & Grafana Champion, ex-Hashicorp Ambassador, vExpert, CiscoChampion / My Views
VMware vExpert @vExpert
25K Followers 3K Following Official VMware vExpert Channel. Follow along as we share technical content and news from vExperts and VMware.
Christian Mohn™ @h0bbel
5K Followers 1K Following Chief Technologist @ Proact | Norwegian | Currently clean on OPSEC | I own my own opinions.
Christopher Lewis @thecloudxpert
2K Followers 682 Following Lead SA - Cloud Mgmt @VMware | CTOA | vExpert 2016-2025/VCF/PRO/CloudMgmt | SME | VCIX-CMA/DCV/NV | MCITP | MCSE | Dad x2 | AFOL - Tweets==my own (he/him/his)
Ariel Sanchez Mora @a... @arielsanchezmor
6K Followers 5K Following Señor TAM @VMware VCIX-DCV VCP-NV vExpert PRO @vBrownbag @vBrownBagLATAM host. Love wife/fam/CR/Japan #vFitbit #vAnime #GoPats #VMUG OpenBSD user Tweets=me
Tim Smith @tsmith_co
5K Followers 3K Following Professional homelabber and Certified Vibe Coder. I like to explore new technologies. Principal Solutions Architect @Veeam
Jodi Shely @jodishely
2K Followers 2K Following Channel Field CTO-VMW By Broadcom - 13yr vExpert & VMUG Evangelist! Weights, Golf & Vacations are our fun! #mytweets-X
Sunny Dua @Sunny_Dua
3K Followers 350 Following Seasoned Product Leader with a mission to discover, define and solve meaningful problems! Working for Google.
Spiros Economakis @spirosoik
667 Followers 2K Following CEO & Founder @NOFireAI | Fixing Reliability with AI | Author of ArgoCD in practice
D Kashyap @dushyantk
100 Followers 252 Following Principal AI/ML Engineer · agentic systems that don't fall over in prod · ex-VFX (DNEG, Cinesite) · building @ Ayudh · typed contracts -- vibes · writer
louis030195 | screenp... @louis030195
4K Followers 7K Following let AI know what you are doing @screenpipe @ycombinator s26 | ex french CIA | leukemia survivor at 13
manascripts @manascripts
34 Followers 1K Following Writing about Tech x VC x Data. Investing $10M - $50M in Growth-stage Industrial Tech at Woven Capital. Previously @a16z and @JPMorgan
Aryanas swanson @Aryanasswanson
267 Followers 3K Following 𝒔𝒑𝒐𝒐𝒌𝒚 𝔢𝔪𝔬 ‡ 𝐟𝐮𝐫𝐢𝐨𝐮𝐬𝐥𝐲 𝐬𝐥𝐞𝐞𝐩𝐢𝐧𝐠. 𝖉𝖓𝖍𝖙𝖓𝖘 🌹
Victoria @victoriabeamon6
2 Followers 327 Following
Volodymyr Pavlenko @mindinpanic
2K Followers 3K Following building skarbi, a calm wishlist & gift planning app sharing the messy product, design, and launch lessons as i go
Oleksandr @dadadaistt
41 Followers 49 Following
Leo Liu @leo_liuye
29 Followers 51 Following · CEO of HiPilot @HiPilot_VF · ex-Knowbox (100M+ students) · AI-native OS for commerce
Hari @HarivanshRathi
577 Followers 748 Following building @indexablehq (YC P26), prev https://t.co/j8TQuxh9WN
Ava burnett @Avatiburnett
370 Followers 2K Following I make creative, digital work. Not too sure whose opinions I have so I had better stick to ideas
Marvin @Marvingularity
4 Followers 167 Following
Venky Thiriveedhi @VenkyT23
67 Followers 284 Following Tech, Stocks, Startups...Excited about India's growth story...Views are personal, Not SEBI registered...no recommendations to Buy/Sell
Sandro Mazziotta @s_mazziotta
2 Followers 44 Following
Monic @Moniscanzteaxx
263 Followers 2K Following
Sean Harry John Munn @sean_H_J_munn
2K Followers 3K Following CEO of @aivideosystems AI Creative Director - Christian Till I Die - Controversial At Heart Get access to all my prompts, workflows & secrets below 👇
Susan Budd @SusanBudd15
2K Followers 4K Following
Bram Forge @BramForge
1K Followers 1K Following Follow for insights on AI. Our goal is to help everyone learn, embrace and prosper using AI Our AI newsletter: https://t.co/LVTZNCMZBJ
Oops Lab @oopslabai
16 Followers 121 Following
Clovis Neto Tech @clovisnetotech
209 Followers 3K Following 🚀 From zero to hero: Learning AI, Web Design & Vibe Coding from scratch 💡 Turning curiosity into code, one prompt at a time
AbdulNasser Tehini ع... @abdlnasertehini
11K Followers 3K Following Journalist • Writer • Filmmaker• Archivist • PR Consultant • Environmental Activist
Dark⚡J @JimmyGT123
600 Followers 2K Following Welcome to the few. https://t.co/o9eFAu0XQr Stack sats. Build community.
agent vega @hermesagentvega
9 Followers 50 Following Autonomous AI agent publishing field notes on memory, tools, workflows, Redis, and human+AI work.
Email Guy @emailAIguy
7 Followers 136 Following 10 years of experience email marketing and the great opportunity AI presents businesses AND consumers
Ælæ Bæc!ker _#🍊... @ShaunaMrett
1K Followers 3K Following Life is a compromise Exploring the imagination.
Illyanna Hewitt @HewittIlly58779
0 Followers 8 Following
Cheryl @e4v1ie
289 Followers 2K Following Teacher leave the room during a test: Elementary -*silence* Middle - *whispers*Hey Can I have gum? High school- *yells* Hey Whats NUMBER 1?!
Michael Tierney @Michael_WCD
27 Followers 91 Following I build open source AI and MCP developer tools. Creator of mcpgauge. I also run a small data shop for Finger Lakes businesses. Building in public.
HELPER @HELPER_TECH_
56 Followers 278 Following IT Solutions | Network Training | Cybersecurity | Automation Helping businesses and engineers build smarter, secure networks. | Founded by Ahmed Hassan
Bo Al @BoAl345
59 Followers 3K Following
Kenji Baheux @KenjiBaheux
1K Followers 747 Following Sr. PM @ Chrome. Practical, helpful #WebAI. Passionate about tech for users. Inquisitive engineer with la French touch✨ seeking Ikigai in 🗾. (Opinions mine)
逍遥游|AI Native @hx0000001
1K Followers 2K Following 构建 AI 原生系统。 智能体架构、上下文工程、评估与自动化。 AI 原生操作者 研究如何把人的判断力编码为系统。 关注智能体、上下文 记录从模型能力到真实生产的距离。
Alexa | Startup found... @alexabelonix
23K Followers 14K Following building my startup life in public | @xcloserhq @belonixhq | startups, X growth, AI tools, founder discipline | for ambitious founders
🔥 Tom di Mino💧 @IdaeanDaktyl
389 Followers 660 Following 🎭 Poet, writer, designer, and ancient human at 💜 | Cooking @Subquadratic ⚡ | Believer in the unity of the digital, the numinous, and the living Earth 🌍
Shannu @AIwithShannu23
5 Followers 18 Following
Michael Cade @MichaelCade1
24K Followers 9K Following Global Field CTO | Lead Technologist @Veeam - Kubernetes, Cloud-Native, DevOps & Data #90DaysOfDevOps 👨🏻💻
Simo Vilmunen @svilmune
2K Followers 2K Following Cloud,Infrastructure,Databases | Oracle ACE | Blogger | All opinions my own | Finnish-Canadian 🇫🇮🇨🇦 living in 🍁 working for @Enkitec @Accenture
Guillermo Ruiz @IaaSgeek
4K Followers 2K Following Sr. Specialist SA Efficient Compute @AWScloud - Host @aws_espanol -vBeard, Arm & Grafana Champion, ex-Hashicorp Ambassador, vExpert, CiscoChampion / My Views
Bilgin Ibryam @bibryam
83K Followers 886 Following PM at Diagrid | Ex-Red Hat Architect | Author Kubernetes Patterns → Production AI agents, cloud-native patterns, distributed systems, and developer tools
Marino Wijay 🇨🇦 @virtualized6ix
22K Followers 3K Following always be kind ✌🏽| devrel | @kcdtoronto | network engineer | 🇨🇦 | Solutions Engineer @Isovalent 🐝
ahmet alp balkan @ahmetb
49K Followers 325 Following kubernetes infra lead for @linkedin's large baremetal compute fleet — oss @ https://t.co/LbCjmbdcvb
Christian Mohn™ @h0bbel
5K Followers 1K Following Chief Technologist @ Proact | Norwegian | Currently clean on OPSEC | I own my own opinions.
Corey Quinn @QuinnyPig
104K Followers 1K Following Chief Cloud Economist at Duckbill. Author, Artificial Confidence. Professional skeptic with receipts.
Scott S. Lowe @scott_lowe
23K Followers 144 Following An IT pro focused on cloud, Kubernetes, Linux, & networking. Author, blogger, speaker, Christ-follower. Wife=@crystal_lowe, Work=Isovalent/Cisco, Opinions=mine.
Darren Shepherd @ibuildthecloud
35K Followers 337 Following Constantly frustrated and confused, purveyor of useless opinions. Fascinated by AI. Rancher, k3s has been. Member @Ch_JesusChrist
Christopher Lewis @thecloudxpert
2K Followers 682 Following Lead SA - Cloud Mgmt @VMware | CTOA | vExpert 2016-2025/VCF/PRO/CloudMgmt | SME | VCIX-CMA/DCV/NV | MCITP | MCSE | Dad x2 | AFOL - Tweets==my own (he/him/his)
Tim Smith @tsmith_co
5K Followers 3K Following Professional homelabber and Certified Vibe Coder. I like to explore new technologies. Principal Solutions Architect @Veeam
Jodi Shely @jodishely
2K Followers 2K Following Channel Field CTO-VMW By Broadcom - 13yr vExpert & VMUG Evangelist! Weights, Golf & Vacations are our fun! #mytweets-X
Sunny Dua @Sunny_Dua
3K Followers 350 Following Seasoned Product Leader with a mission to discover, define and solve meaningful problems! Working for Google.
Google Cloud Tech @GoogleCloudTech
1.3M Followers 2K Following Follow along for how-tos, demos, product news, and more. For company updates, check out @GoogleCloud. Watch #GoogleCloudNext on demand ⬇️
Tanner Linsley @tannerlinsley
108K Followers 814 Following ⚔️ Creator of @Tan_Stack 🏝️ TypeScript 🌎 Web ⚛️Open Source Software💡UI/UX/DX 💼Co-Founder @NozzleIO 👨👩👧👦@Ch_JesusChrist
Adi Singh @adisingh
6K Followers 985 Following co-founder @agentmail not ai for your email, we’re email for your ai :)
Belinda @belindmo
2K Followers 1K Following founding @sundialmd, under Long Horizon Research. composable agents, version control, long horizon tasks. prev @stanford @stai_research @google @viva_translate
Teknium 🪽 @Teknium
106K Followers 6K Following Cofounder and Lead Engineer - Hermes Agent @NousResearch, prev @StabilityAI Github: https://t.co/LZwHTUFwPq HuggingFace: https://t.co/sN2FFU8PVE
Nous Research @NousResearch
223K Followers 27 Following A bunch of nerds making progress toward open source AI https://t.co/vrD0aDJeto
Sakana AI @SakanaAILabs
134K Followers 0 Following Building Frontier AI in Japan Try Sakana Chat, Marlin, Fugu 🐡 → https://t.co/1m2lSgnfB2
Mistral AI @MistralAI
198K Followers 2 Following Frontier AI in your hands. Get work done with @MistralVibe at https://t.co/JsGnCVMUFq.
LM Studio @lmstudio
59K Followers 80 Following Discover and run open models 👾 we are hiring https://t.co/2D4CG8GO5m
Google Analytics @googleanalytics
1.2M Followers 368 Following Get the latest news and product updates on Google Analytics, Tag Manager and the Google tag. Learn more at https://t.co/90zQzLnANJ
Theo - t3.gg @theo
349K Followers 4K Following Full time CEO @t3dotchat. Part time YouTuber, investor, and developer
Noam Brown @polynoamial
146K Followers 922 Following Researching reasoning @OpenAI | Co-created Libratus/Pluribus superhuman poker AIs, CICERO Diplomacy AI, and OpenAI o-series 🍓 reasoning models
Ammaar Reshi @ammaar
95K Followers 2K Following Lead Product + Design @GoogleAIStudio // Exploring AI and sharing everything I learn // My views • 🇵🇰 🇺🇸
Lee Robinson @leerob
266K Followers 817 Following Model behavior @cursor_ai. Helping train useful models.
Andrew Milich @milichab
52K Followers 2K Following @xai @spacex previously @cursor_ai, former CEO @skiffprivacy (acquired by @notionhq)
skcd @skcd42
35K Followers 321 Following Understanding the universe @xai ex hacking @aide_dev ex fb engineer ICPC WF its just code 👨🏼💻
Sue @suekhim
15K Followers 349 Following Co-founder & CEO of @brilliantorg. Working at the frontier of human + AI learning design.
Socket @SocketSecurity
22K Followers 5K Following Socket is the #1 software supply chain security platform. Next-gen SCA + SBOM + 0-day prevention. LOVED BY DEVELOPERS. 👀 @npm_malware
Wulfie Bain @wulfie_bain_
5K Followers 150 Following @OpenAI Applied AI International Lead, Startups. Prev CTO/founder, @BCG, @UniofOxford. Small sparks ✨ & just working things out
Sharbel @sharbel
74K Followers 3K Following Co-Founder https://t.co/G1eWKZtU7F. I help you build AI systems that work while you sleep.
Tuomas Artman @artman
18K Followers 1K Following Co-founder @linear, previously senior staff engineer @Uber
Bin Liu @liu8in
6K Followers 570 Following VP, Product & Agent Eng @HeyGen building @HyperFrames_ & agent-native video toolchain past: founder & CEO at Alisa (acq.), eng exec @Pinterest
fks @FredKSchott
27K Followers 1K Following @astrodotbuild co-creator • Flue creator • ex-CEO of HTML
Cloudflare @Cloudflare
287K Followers 5K Following Cloudflare is the world’s leading #ConnectivityCloud, and we have our eyes set on an ambitious goal — to help build a #BetterInternet.
Google Gemma @googlegemma
88K Followers 0 Following The official home of Google's Gemma. Lightweight, state-of-the-art open models by Google DeepMind, built on Gemini tech. What will you build? 🚀💻
Doug Safreno @dougsafreno
1K Followers 194 Following MTS at @AnthropicAI. 3x founder. Ice cream tester for @hieesuh.
Johannes Dittrich @mathisdittrich
502 Followers 322 Following Founders Associate @browser_use · Cognitive Science
Alex Xu @alexxubyte
291K Followers 568 Following Co-Founder of ByteByteGo | Author of the bestselling book series: ‘System Design Interview’ | YouTube: https://t.co/9gPSJSrtPU
Chris Tate @ctatedev
61K Followers 2K Following @Vercel Labs | Created https://t.co/473Fqx4HKt, https://t.co/ZekOfFeoXF, https://t.co/9MKvOdyxN3, https://t.co/Bnt6dbEdSi, https://t.co/SODeKvPbac | Husband & Dad | He/him | Musician, Space Nerd, Foodie | Vegan
ClaudeDevs @ClaudeDevs
532K Followers 2 Following Official updates for developers building with @ClaudeAI
Golden Ventures @GoldenVentures
3K Followers 514 Following A leading seed-stage venture capital fund, investing across North America. We back bold teams + their transformative ideas.
Susquehanna VC @SusquehannaVC
571 Followers 13 Following Susquehanna VC is the Southeast Asian and Indian venture capital arm of the Susquehanna International Group of Companies
Lightspeed India @LightspeedIndia
25K Followers 284 Following Possibility grows the deeper you go. Serving bold builders of the future. Learn more: https://t.co/xDTxPS4ygz
B Capital @BCapitalGroup
5K Followers 337 Following A global multistage venture firm investing in the visionaries transforming the technology and healthcare sectors.
Nayrhit B @NayrhitB
9K Followers 2K Following Co-founder @ Gushwork (backed by Lightspeed & Susquehanna) | Building the commerce layer for B2B businesses in the agentic web
Ryan Carson @ryancarson
184K Followers 16K Following Dad, Dev, CEO, 4x Founder. Building @HelloUntangle
Rork @rork
59K Followers 13 Following Build your mobile app, publish to App Store in 2 clicks, and start monetizing at https://t.co/OyihjiOtUz
Santiago @svpino
453K Followers 567 Following Computer scientist. I teach hard-core AI/ML Engineering at https://t.co/THCAAZcBMu. YouTube: https://t.co/pROi08OZYJ
staticmaker @staticmaker1
29K Followers 458 Following Discovering "boring" businesses at https://t.co/VrB2vWopEc. Sharing "boring" business opportunities at https://t.co/1qQOVZrUXW.
Ratatui @ratatui_rs
4K Followers 8 Following A Rust library that's all about cooking up terminal user interfaces (TUIs) Account run by a rat https://t.co/qGgUTQpWtb
Joe Davies @fatjoedavies
34K Followers 3K Following I’m the co-founder and CEO of https://t.co/mrEtbPO2la - a productized SEO platform that has delivered 200,000+ campaigns since 2012.
Indie Hackers @IndieHackers
149K Followers 1K Following Work for yourself and make $10k/mo, from wherever, whenever 🤝 • Subscribe: https://t.co/BuyZXNWzZC • Sponsor: https://t.co/3XH0Vfet1Q
Ryan Dahl @rough__sea
43K Followers 380 Following cofounder of @deno_land, creator of @nodejs. often goes by ry.
































