Max Niederman @MaxNiederman

crafting artisanal slop as head of quality @ mechanize maxniederman.com San Francisco, CA Joined October 2019

Tweets

441
Followers

237
Following

209
Likes

1K

Max Niederman @MaxNiederman

a day ago

@yacineMTB me buying ram while holding Micron I bought with my AI company salary

0 0 3 113 0

View Details

Max Niederman @MaxNiederman

a day ago

This is true but matters much less in a world of rapidly improving coding agents. There is effectively a very high discount rate on future programming work.

In software, complexity is a tax you pay on every future change. You might think, "as long as it works, I'm good, I don't care about aesthetics", but elegant code is about maintainability, not aesthetics. Elegant code is deeply practical.

47 153 1K 72K 287

0 0 7 832 1

View Details

Max Niederman @MaxNiederman

2 days ago

@xeophon @dtcb Where here did I say “mostly” here? My point is that being able to distill is a significant advantage, so it pushes forward the Pareto cost-performance frontier significantly. I am not claiming it is the only thing that pushes it forward, or even the main thing.

1 0 0 77 0

View Details

Max Niederman @MaxNiederman

2 days ago

@navvye @panickssery @AndrewSabisky The correct metric is profit which unfortunately OR does not report, but thankfully here the usage on Nemotron 3 is so abysmal even at -100% margins that it's pretty clearly not even close to competitive.

0 0 0 84 0

View Details

Max Niederman @MaxNiederman

3 days ago

Has anyone tried to make DeepSeek but US? Like a lab that just distills frontier models into a way cheaper model?

15 0 52 21K 10

View Details

Max Niederman @MaxNiederman

2 days ago

@navvye @panickssery @AndrewSabisky The version that isn't 100% subsidized has only 15B tokens/week usage, btw.

0 0 1 42 0

View Details

Max Niederman @MaxNiederman

2 days ago

@navvye @panickssery @AndrewSabisky Despite being FREE, Nemotron 3 Ultra is just barely matching the usage of Kimi K2.6 on OpenRouter, let alone the more popular open models which each have ~10x the usage.

2 0 0 97 0

View Details

Max Niederman @MaxNiederman

2 days ago

If it was as succesful, then why don't US companies really have a good competitor against cheap Chinese models? AFAICT, Chinese labs just have some sort of advantage in the business of releasing cheap open weights models, or else there would be an American competitor. Points 1 and 2 are my guesses for what that advantage could be. Point 1 is that DS et al. don't need to compete against Ant/OAI/etc to hire the best Chinese talent, so they can get good talent while doing the relatively less prestigious work of a trailing lab. Point 2 is that the CCP may have simply decided that they wanted to have good open models, and allocated lots of capital accordingly. This may have resulted in more availability of capital to Chinese open weights labs than self-interested American VCs will provide to American open-weights labs, allowing Chinese labs to create better open weights models at the cost of lower margins.

2 0 1 150 0

View Details

Max Niederman @MaxNiederman

2 days ago

Yes, American labs do some distillation, but it seems to be much less popular as a main driver of capabilities, or at least much less succesful. My main guesses for why are (1) it's less prestigious so DS et al.'s advantage competing for Chinese talent is really important or (2) the CCP just allocates more capital to open weights labs than the business model warrants.

1 0 1 227 0

View Details

Max Niederman @MaxNiederman

2 days ago

@DafyddFD @SakanaAILabs Their models suck tho. I'm much more interested in labs that are actually succeeding at this.

0 0 3 828 0

View Details

Max Niederman @MaxNiederman

2 days ago

@navvye x.com/MaxNiederman/s…

Max Niederman @MaxNiederman

3 days ago

@panickssery @AndrewSabisky Yes, DS has good pretraining, but the reason they’re able to compete on the price-performance frontier is distillation. The “just” in my sentence was supposed to denote that it would be easy for American labs to do this, not that it’s operationally the only thing that DeepSeek

3 0 3 4K 0

1 0 0 1K 0

4 days ago

@navvye Ideally not on LinkedIn though. LinkedIn is usually insane slop and no signal.

2 0 1 129 1

View Details

Guive Assadi @GuiveAssadi

4 days ago

Instead of a moratorium on data centers, I’m calling for a moratorium on tech conferences that create traffic jams and make me late for work. They probably use a lot of water too (despite tech oligarchs attempting to CONCEAL this).

0 2 16 2K 1

View Details

Max Niederman @MaxNiederman

5 days ago

@leebriskcyrano Yeah the spinal cord is also doing a lot of the work, but the latency there is about the same.

2 0 2 151 0

View Details

Max Niederman @MaxNiederman

5 days ago

The round-trip time from a human’s hand to their brain is roughly 40ms, on par with the RTT to a datacenter in many hypothetical robot deployments. This suggests that very little edge compute is necessary for robots to achieve human-level dexterity.