Chinese company DeepSeek announced its new
R1 model on January 20. They released a paper on how R1 was trained on January 22. Over the weekend, the DeepSeek app became the number-one free download on the Apple App Store, surpassing ChatGPT. DeepSeek was the tech talk of the weekend.
On Monday, the markets reacted.
The European markets opened first — six hours ahead of New York — and tech and energy stocks crashed. So US traders knew it was time to dump. [
FT, archive]
Markets may recover. But the vibes of the US AI market, and its sense of superiority, have taken a beating.
Number go down
On Monday, the tech-heavy Nasdaq composite fell 3.1%, while the S&P ended the day 1.46% lower, after hitting a record last week. It was the worst trading day in five years, since the COVID crash of January 27, 2020.
Nvidia took the biggest tumble. The GPU maker saw $598 billion in value disappear from its market cap — the largest single-day market decline in US stock market history.
(Though, to be fair, several of the biggest single-day declines in the past year were also Nvidia, as its stock traced its usual rollercoaster trajectory through the AI bubble.)
A bunch of other techs in hock to AI went down, too. Oracle fell 14%. Super Micro Computer, which makes AI servers, slid 13%. Broadcom fell 17%. Microsoft and Alphabet (Google) also dropped.
Not all big tech stocks took a whooping. Apple was up 3.3%, overtaking Nvidia as the top company by market value. Meta was up 1.9%. Two tiny tech stocks surged after saying they would add DeepSeek to their platforms: Aurora Mobile ADRs soared 229% intraday and MicroCloud Hologram went up 67%. [
Bloomberg, archive;
Bloomberg, archive]
Energy stocks took a pounding. Oklo, the
Sam Altman-backed nuclear fission reactor company, fell 26%, after gaining more than 60% last week.
DeepSeek’s killer advantage: it’s cheaper
DeepSeek is specifically training the weights — the bits in front of the actual model, which is either Meta’s Llama or Ali’s Qwen model. You’ll frequently see new models that have trained their weights differently, to beat whatever the hot benchmark is. That’s the work that DeepSeek claim to have done on a shoestring. [
Stratechery, archive]
Is the R1 model
better than all existing models? Well, it benchmarks well. But everyone trains their models to the benchmarks hard. The benchmarks exist to create headlines about model improvements while everyone using the model still sees lying slop machines. No, no, sir, this is much
finer slop, with a bouquet from the rotting carcass side of the garbage heap.
Goodhart’s Law as a service.
What matters here is that DeepSeek’s API is GPT-compatible and a lot cheaper than GPT-4o via OpenAI. Also, R1 is about as good as any of the current LLMs, and you can run it yourself.
DeepSeek R1 does what LLMs do. But it was reportedly created with a fraction of the resources. That’s the killer advantage.
We remain skeptical of DeepSeek’s claims to have trained on the cheap. The much-touted figure of $5 million to $6 million is just the final training run, which cost $5,576,000. [
arXiv]
But that doesn’t matter while the model is cheaper via the API and you can run it at home — that’s what’s making customers leave OpenAI. And that’s what people discovered over the week since R1’s announcement.
DeepSeek’s Janus Pro-7B, released January 27, does images as well as text — and again, you can run it at home (albeit slowly) in 24GB of video RAM. It may not be as good as Midjourney or OpenAI’s Dall-E — but it doesn’t have to be. [
Hugging Face; TechCrunch]
How’s OpenAI doing?
OpenAI was already deeply unprofitable. They priced ChatGPT Pro, with the o1 model, at
$200 per month, and then a Chinese company comes out of nowhere and undercuts them
OpenAI has been hemorrhaging key research staff and executives for a while now. In September, Chief technology officer Mira Murati, chief research officer Bob McGrew, and VP of research Barret Zoph all announced they were quitting.
What’s really popped is OpenAI’s credibility. They marketed themselves as needing to burn all those
billions of dollars on model training. Along comes DeepSeek with a model that works just about as well but for much cheaper, and saying it was trained for ridiculously less.
Sam Altman is good at raising money and good at making promises about the future — but OpenAI is a machine for spending money as fast as possible, not a machine for working efficiently.
OpenAI doesn’t need a massive AI infrastructure initiative like
Stargate — they need to work out how to train more cheaply.
SoftBank, one of the big backers behind Stargate, is notorious for WeWork-level funding disasters — but five days from floating the idea to the crash might be a new record.
What happens next?
The reaction to the crash includes a lot of people explaining why the tech market panic over DeepSeek validates their previous position, whatever it was.
This crash doesn’t mean AI sucks now or that it’s good now. It just means OpenAI, and everyone else whose stock dipped, was just throwing money into a fire. But we knew that.
Slop generators are inexpensive now, and that’s a sea change — but the output is still terrible slop, just more of it.
We think the market is reacting to the shock. We expect the stock prices to go back up again, at least in the short run. Business that have products and customers should be okay. If cheap LLMs are deployed everywhere, they’ll run on Nvidia.
Venture capital firms who put money into startups training AI foundational models — and it’s quite a lot of money — are likely to be unhappy (now rather than later). [
Axios]
The AI bubble and its
VC funders are deeply vibes-based. This is because generative AI promoters still haven’t found a convincing use case. The vibes have taken a serious hit.
We don’t know of anyone who predicted that tech stocks would crash just from someone coming up with a cheaper model. But Silicon Valley AI is Theranos on the GPU all the way down, so we should expect that to come out at some point.