This is true but matters much less in a world of rapidly improving coding agents. There is effectively a very high discount rate on future programming work.
In software, complexity is a tax you pay on every future change. You might think, "as long as it works, I'm good, I don't care about aesthetics", but elegant code is about maintainability, not aesthetics. Elegant code is deeply practical.
@xeophon@dtcb Where here did I say “mostly” here? My point is that being able to distill is a significant advantage, so it pushes forward the Pareto cost-performance frontier significantly. I am not claiming it is the only thing that pushes it forward, or even the main thing.
@navvye@panickssery@AndrewSabisky The correct metric is profit which unfortunately OR does not report, but thankfully here the usage on Nemotron 3 is so abysmal even at -100% margins that it's pretty clearly not even close to competitive.
@navvye@panickssery@AndrewSabisky Despite being FREE, Nemotron 3 Ultra is just barely matching the usage of Kimi K2.6 on OpenRouter, let alone the more popular open models which each have ~10x the usage.
If it was as succesful, then why don't US companies really have a good competitor against cheap Chinese models?
AFAICT, Chinese labs just have some sort of advantage in the business of releasing cheap open weights models, or else there would be an American competitor. Points 1 and 2 are my guesses for what that advantage could be.
Point 1 is that DS et al. don't need to compete against Ant/OAI/etc to hire the best Chinese talent, so they can get good talent while doing the relatively less prestigious work of a trailing lab.
Point 2 is that the CCP may have simply decided that they wanted to have good open models, and allocated lots of capital accordingly. This may have resulted in more availability of capital to Chinese open weights labs than self-interested American VCs will provide to American open-weights labs, allowing Chinese labs to create better open weights models at the cost of lower margins.
Yes, American labs do some distillation, but it seems to be much less popular as a main driver of capabilities, or at least much less succesful. My main guesses for why are (1) it's less prestigious so DS et al.'s advantage competing for Chinese talent is really important or (2) the CCP just allocates more capital to open weights labs than the business model warrants.
@panickssery@AndrewSabisky Yes, DS has good pretraining, but the reason they’re able to compete on the price-performance frontier is distillation. The “just” in my sentence was supposed to denote that it would be easy for American labs to do this, not that it’s operationally the only thing that DeepSeek
@panickssery@AndrewSabisky Yes, DS has good pretraining, but the reason they’re able to compete on the price-performance frontier is distillation. The “just” in my sentence was supposed to denote that it would be easy for American labs to do this, not that it’s operationally the only thing that DeepSeek
Yes, DS has good pretraining, but the reason they’re able to compete on the price-performance frontier is distillation. The “just” in my sentence was supposed to denote that it would be easy for American labs to do this, not that it’s operationally the only thing that DeepSeek does.
Ordering something not from Amazon is a weird experience because you remember how much life sucked before everything you ordered was delivered in <1 day.
Instead of a moratorium on data centers, I’m calling for a moratorium on tech conferences that create traffic jams and make me late for work. They probably use a lot of water too (despite tech oligarchs attempting to CONCEAL this).
The round-trip time from a human’s hand to their brain is roughly 40ms, on par with the RTT to a datacenter in many hypothetical robot deployments. This suggests that very little edge compute is necessary for robots to achieve human-level dexterity.
1K Followers 726 FollowingIntelligence beyond the artificially possible. CEO Human Intelligence Project. ex @scale_ai, early @quora, @meta and @duolingo. MLRE & design. Meditation nerd.
13 Followers 208 FollowingCurrently at LASR Labs in London; very excited about AI safety and mechanistic interpretability research. Follow for more bad takes.
4K Followers 4K FollowingShepherd the finite through the local minima of imperfect information/ Universal mettalignment w/ lovepill R&D/ formalizing axiology/ Northant hyperstitioner
630 Followers 5K FollowingGrew up on dial-up and VHS. Veteran. Husband and Father. Proud of America and what it used to stand for and can stand for again.
13K Followers 1K Followingcatholic, ai researcher, co-founder/cto of @NousResearch
alignment: whatever the opposite of yudkowsky + bryan johnson is.
blessed be God in all his designs.
140K Followers 1K FollowingSemiAnalysis
Boutique AI Infrastructure Research and Consulting
DMs are open for consulting, quotes, or to talk shop,
Opinions my own
96K Followers 757 FollowingProf of economics at George Mason, co-founder of the online education platform https://t.co/yocRRym80n. Advisor to firms, incl MultiversX, TEAL, Bluechip, 0L Network +
335K Followers 549 Followingnew book *Talent: How to Identify Energizers, Winners, and Creatives Around the World*, https://t.co/7bU5cTWLzc, Conversations with Tyler, The Free Press.
84K Followers 3 FollowingGMU econ prof, NYT bestseller, father of 4, author of Myth of the Rational Voter, Selfish Reasons to Have More Kids, Case Against Education, Open Borders, & BBB
63K Followers 1K FollowingConsider donating 10% to effective charities:
https://t.co/VMXkr4hnd7
Or a career for impact:
https://t.co/AUIhrElLkr
My research:
https://t.co/dEcMWUnNHU