No "The AI bubble has finally burst" thread?

two sheds · Monday at 11:19 PM

Crispy said:
It's already "used" data in training (and it'll be the same data as everyone else ie. the entire internet). What they've released is the end result - the billions of weights for the nodes in the network. That is static and unchanging for every LLM, simply because deriving those numbers is so astronomically expensive and time-consuming.

PS: This is why LLMs are a complete dead end for AGI. It's like taking a backup copy of a person, telling them a story and ask them to think of the next word, which you write down. Then you kill them and reboot a new version of that old backup and tell them the slightly longer story, get the next word, kill, rinse repeat. The person is unchanging and only every knows their upbringing and the story you just told them. They don't learn.

I understand bits of that

Ah ok so you're running the program offline and using the web as data? I was confused by "off-line" but that makes sense.

two sheds · Monday at 11:30 PM

And we also have:

Viral AI company DeepSeek releases new image model family | TechCrunch

DeepSeek, the viral AI company, has released a new set of multimodal AI models that it claims can outperform OpenAI's DALL-E 3.

techcrunch.com

DeepSeek, the viral AI company, has released a new set of multimodal AI models that it claims can outperform OpenAI’s DALL-E 3.

The models, which are available for download from the AI dev platform Hugging Face, are part of a new model family that DeepSeek is calling Janus-Pro. They range in size from 1 billion to 7 billion parameters. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

Presumably it's going to need to be open source for people to be able to examine and so trust it - anything closed could open back doors and things.

burtabraham · Tuesday at 12:31 AM

two sheds said:
I understand bits of that

Ah ok so you're running the program offline and using the web as data? I was confused by "off-line" but that makes sense.

The data is contained in the model once it is trained. No need for the internet; no need to use a cloud service. Need a pretty hefty computer for the high end models though.

GPT4All is the easiest way to go about running one locally, and they have a tonne of lower end models that work well for those with a regular computer.

two sheds · Tuesday at 12:32 AM

Surely though if I ask a specialist question then there's no way it'll have enough data on that subject stored on my computer?

gosub · Tuesday at 12:36 AM

-------
Think in relation to dotcom bubble it's like around 96 when Microsoft, having missed the boat on the Internet went everyone now has the tools to Web publish. Actual Bubble pop was a while later but factors of quality and decernment started to mingle with the buzzwords.
Much like we lost a load of real world presence off the dotcom a substantial proportion of Business is going to go cheap and end up with an AI as fit for purpose as our #1 for...systems
..

burtabraham · Tuesday at 12:38 AM

two sheds said:
Surely though if I ask a specialist question then there's no way it'll have enough data on that subject stored on my computer?

That was my intuition to begin with too, and I won't pretend to know how it works exactly, but it's something to do with the way data is stored as a vast neural net.

It's not storing data as regular files, like text files with facts in it; it somehow keeps the data in the same way the human brain does, as activation patterns within the neural net.

I'm basically as clueless as the next guy in regards to the details though, so take my explanation with a pinch of salt. I'm sure someone else will correct me

Fez909 · Tuesday at 4:09 AM

burtabraham said:
That was my intuition to begin with too, and I won't pretend to know how it works exactly, but it's something to do with the way data is stored as a vast neural net.

It's not storing data as regular files, like text files with facts in it; it somehow keeps the data in the same way the human brain does, as activation patterns within the neural net.

I'm basically as clueless as the next guy in regards to the details though, so take my explanation with a pinch of salt. I'm sure someone else will correct me

You're mostly right. It's basically compression.

Imagine telling someone from 1525 that you could store all the world's books in a device that fits in your hand. They wouldn't think it was possible, but of course it is now, due to how we store it. It's the same with LLMs; they're just more efficient at compressing and storing the information.

tommers said:
one of the major innovations is that they've split the model into "experts" and only use the ones needed to answer the query. Rather than trying to use the whole thing for everything.

OpenAI and others have been doing MoE (Mixture of Experts) for a while. It's not that new. Here's an Nvidia article on it from last year:

Applying Mixture of Experts in LLM Architectures | NVIDIA Technical Blog

Mixture of experts (MoE) large language model (LLM) architectures have recently emerged, both in proprietary LLMs such as GPT-4, as well as in community models with the open-source release of Mistral…

developer.nvidia.com

Crispy said:
No, training is something only the big boys can do. It takes a long time and costs millions in GPU time.
What this does is makes it much cheaper to do the end-user bit, the "fancy autocomplete" bit.

I think training is the thing that's actually been cracked here. A research lab at Berkley claims that they've replicated the training methods from Deepseek for under $60 and can get a 3B model that's "comparable to larger systems".

Deepseek's full model is 685B, so quite a bit bigger, but the costs are now within reach of many more organizations.

I have no idea how this scales, though. And of course, have no idea if any of Deepseek's or other claims are true.

Petcha · Tuesday at 5:39 AM

This is fucking insane... surely the markets must have had some kind of warning here for such a massive bubble burst? Trump's reaction is even weirder, he's actually kind of welcoming it.

Petcha · Tuesday at 5:44 AM

Actually I guess those times are in GMT in that chart. So must have been flagged all weekend and traders were hammering away at their keyboards at 8.55 EST. Not seen something like that since Truss got to her feet in the Commons to launch her mini-budget.

Crispy · Tuesday at 8:09 AM

two sheds said:
Surely though if I ask a specialist question then there's no way it'll have enough data on that subject stored on my computer?

Sure it will. The entire text of English wikipedia is about 25GB and that contains loads of redundant information. LLMs are kind of like lossy compression for text. If wikipedia.zip is like a PNG then a LLMs is like a JPG; close enough to the real thing to fool the eye (or rather, fool the brain). So the "weights" data is even smaller than all of wikipedia yet contains approximately all the same knowledge. Kinda. If you don't mind it being less confident about stuff that isn't common knowledge.

ska invita · Tuesday at 8:15 AM

Petcha said:
This is fucking insane... surely the markets must have had some kind of warning here for such a massive bubble burst? Trump's reaction is even weirder, he's actually kind of welcoming it.

View attachment 461589

obviously no warning hence the cliff!
and whats Trump going to say? More tariffs on an open source platform?

heres some perspective on the Nvidia share price....still way up on last year

Doctor Carrot · Tuesday at 8:17 AM

two sheds said:
And we also have:

Viral AI company DeepSeek releases new image model family | TechCrunch

DeepSeek, the viral AI company, has released a new set of multimodal AI models that it claims can outperform OpenAI's DALL-E 3.

techcrunch.com

Presumably it's going to need to be open source for people to be able to examine and so trust it - anything closed could open back doors and things.

hitmouse · Tuesday at 8:40 AM

Analysis from Pivot to AI:

DeepSeek slaps OpenAI, tech stocks crash

Chinese company DeepSeek announced its new R1 model on January 20. They released a paper on how R1 was trained on January 22. Over the weekend, the DeepSeek app became the number-one free download …

pivot-to-ai.com

Chinese company DeepSeek announced its new R1 model on January 20. They released a paper on how R1 was trained on January 22. Over the weekend, the DeepSeek app became the number-one free download on the Apple App Store, surpassing ChatGPT. DeepSeek was the tech talk of the weekend.

On Monday, the markets reacted.

The European markets opened first — six hours ahead of New York — and tech and energy stocks crashed. So US traders knew it was time to dump. [FT, archive]

Markets may recover. But the vibes of the US AI market, and its sense of superiority, have taken a beating.

Number go down

On Monday, the tech-heavy Nasdaq composite fell 3.1%, while the S&P ended the day 1.46% lower, after hitting a record last week. It was the worst trading day in five years, since the COVID crash of January 27, 2020.

Nvidia took the biggest tumble. The GPU maker saw $598 billion in value disappear from its market cap — the largest single-day market decline in US stock market history.

(Though, to be fair, several of the biggest single-day declines in the past year were also Nvidia, as its stock traced its usual rollercoaster trajectory through the AI bubble.)

A bunch of other techs in hock to AI went down, too. Oracle fell 14%. Super Micro Computer, which makes AI servers, slid 13%. Broadcom fell 17%. Microsoft and Alphabet (Google) also dropped.

Not all big tech stocks took a whooping. Apple was up 3.3%, overtaking Nvidia as the top company by market value. Meta was up 1.9%. Two tiny tech stocks surged after saying they would add DeepSeek to their platforms: Aurora Mobile ADRs soared 229% intraday and MicroCloud Hologram went up 67%. [Bloomberg, archive; Bloomberg, archive]

Energy stocks took a pounding. Oklo, the Sam Altman-backed nuclear fission reactor company, fell 26%, after gaining more than 60% last week.

DeepSeek’s killer advantage: it’s cheaper

DeepSeek is specifically training the weights — the bits in front of the actual model, which is either Meta’s Llama or Ali’s Qwen model. You’ll frequently see new models that have trained their weights differently, to beat whatever the hot benchmark is. That’s the work that DeepSeek claim to have done on a shoestring. [Stratechery, archive]

Is the R1 model better than all existing models? Well, it benchmarks well. But everyone trains their models to the benchmarks hard. The benchmarks exist to create headlines about model improvements while everyone using the model still sees lying slop machines. No, no, sir, this is much finer slop, with a bouquet from the rotting carcass side of the garbage heap. Goodhart’s Law as a service.

What matters here is that DeepSeek’s API is GPT-compatible and a lot cheaper than GPT-4o via OpenAI. Also, R1 is about as good as any of the current LLMs, and you can run it yourself.

DeepSeek R1 does what LLMs do. But it was reportedly created with a fraction of the resources. That’s the killer advantage.

We remain skeptical of DeepSeek’s claims to have trained on the cheap. The much-touted figure of $5 million to $6 million is just the final training run, which cost $5,576,000. [arXiv]

But that doesn’t matter while the model is cheaper via the API and you can run it at home — that’s what’s making customers leave OpenAI. And that’s what people discovered over the week since R1’s announcement.

DeepSeek’s Janus Pro-7B, released January 27, does images as well as text — and again, you can run it at home (albeit slowly) in 24GB of video RAM. It may not be as good as Midjourney or OpenAI’s Dall-E — but it doesn’t have to be. [Hugging Face; TechCrunch]

How’s OpenAI doing?

OpenAI was already deeply unprofitable. They priced ChatGPT Pro, with the o1 model, at $200 per month, and then a Chinese company comes out of nowhere and undercuts them

OpenAI has been hemorrhaging key research staff and executives for a while now. In September, Chief technology officer Mira Murati, chief research officer Bob McGrew, and VP of research Barret Zoph all announced they were quitting.

What’s really popped is OpenAI’s credibility. They marketed themselves as needing to burn all those billions of dollars on model training. Along comes DeepSeek with a model that works just about as well but for much cheaper, and saying it was trained for ridiculously less.

Sam Altman is good at raising money and good at making promises about the future — but OpenAI is a machine for spending money as fast as possible, not a machine for working efficiently.

OpenAI doesn’t need a massive AI infrastructure initiative like Stargate — they need to work out how to train more cheaply.

SoftBank, one of the big backers behind Stargate, is notorious for WeWork-level funding disasters — but five days from floating the idea to the crash might be a new record.

What happens next?

The reaction to the crash includes a lot of people explaining why the tech market panic over DeepSeek validates their previous position, whatever it was.

This crash doesn’t mean AI sucks now or that it’s good now. It just means OpenAI, and everyone else whose stock dipped, was just throwing money into a fire. But we knew that.

Slop generators are inexpensive now, and that’s a sea change — but the output is still terrible slop, just more of it.

We think the market is reacting to the shock. We expect the stock prices to go back up again, at least in the short run. Business that have products and customers should be okay. If cheap LLMs are deployed everywhere, they’ll run on Nvidia.

Venture capital firms who put money into startups training AI foundational models — and it’s quite a lot of money — are likely to be unhappy (now rather than later). [Axios]

The AI bubble and its VC funders are deeply vibes-based. This is because generative AI promoters still haven’t found a convincing use case. The vibes have taken a serious hit.

We don’t know of anyone who predicted that tech stocks would crash just from someone coming up with a cheaper model. But Silicon Valley AI is Theranos on the GPU all the way down, so we should expect that to come out at some point.

teqniq · Tuesday at 8:42 AM

I mistakenly posted this on the new developments thread. It more properly belongs here:

Concerning Deepseek, I read this last night from Ryan Grim of Dropsite. Really interesting. Because of US protectionism with particular reference to China, Deepseek has been developed on a much cheaper, efficient and open source platform. The big US tech corporations may consequently be in big trouble - pass the box of tissues lol:

DeepSeek just proved Lina Khan right

Khan warned that enabling protectionism for tech monopolies wouldn’t just hurt all of us, it would hurt them too. Now they’re getting wiped out.

www.dropsitenews.com

also there's this article this morning from Richard Murphy:

Is Deepseek the start of an assault on the whole of US capitalism?

As Reuters has reported within the last hour: Japanese technology shares fell on Tuesday as a global market rout sparked by the emergence of a low-cost Chinese artificial intelligence model entered day two. The market value of the much-hyped US AI tech stocks has tumbled by more than $1 trillion...

www.taxresearch.org.uk

Petcha · Tuesday at 8:47 AM

ska invita said:
obviously no warning hence the cliff!
and whats Trump going to say? More tariffs on an open source platform?

heres some perspective on the Nvidia share price....still way up on last year
View attachment 461593

Well yeh theyre no longer the most valuable company in the world. Apple is king again, for a day or two maybe. But yeh $600bn knocked off their valuation in a day. I think that's the biggest loss ever in a single day.

And yeh I love that Trump can't throw tariffs at an open source model. That must really really irk. And must be very confusing for him.

tommers · Tuesday at 8:49 AM

Petcha said:
Well yeh theyre no longer the most valuable company in the world. Apple is king again, for a day or two maybe. But yeh $600bn knocked off their valuation in a day. I think that's the biggest loss ever in a single day.

And yeh I love that Trump can't throw tariffs at an open source model. That must really really irk. And must be very confusing for him.

The thought of Donald Trump being told about how Open Source works is very very funny.

Plumdaff · Tuesday at 9:00 AM

Can someone better versed in this explain how it might affect Starmer's great plans for AI transforming the UK economy?

Am I right in thinking these developments make it even more of a pile of shite than it already is, i.e. any 'growth' in AI infrastructure, power generation, even less likely now?

teqniq · Tuesday at 9:08 AM

Plumdaff said:
Can someone better versed in this explain how it might affect Starmer's great plans for AI transforming the UK economy?

Am I right in thinking these developments make it even more of a pile of shite than it already is, i.e. any 'growth' in AI infrastructure, power generation, even less likely now?

If they had any really flexible innovative people on the team they would view this as a great opportunity..... but..... perhaps not.

magneze · Tuesday at 9:24 AM

teqniq said:
I mistakenly posted this on the new developments thread. It more properly belongs here:

Concerning Deepseek, I read this last night from Ryan Grim of Dropsite. Really interesting. Because of US protectionism with particular reference to China, Deepseek has been developed on a much cheaper, efficient and open source platform. The big US tech corporations may consequently be in big trouble - pass the box of tissues lol:

DeepSeek just proved Lina Khan right

Khan warned that enabling protectionism for tech monopolies wouldn’t just hurt all of us, it would hurt them too. Now they’re getting wiped out.

www.dropsitenews.com

also there's this article this morning from Richard Murphy:

Is Deepseek the start of an assault on the whole of US capitalism?

As Reuters has reported within the last hour: Japanese technology shares fell on Tuesday as a global market rout sparked by the emergence of a low-cost Chinese artificial intelligence model entered day two. The market value of the much-hyped US AI tech stocks has tumbled by more than $1 trillion...

www.taxresearch.org.uk

The linked substack which tests Deepseek is pretty impressive:

The Impact of 25% Tariffs on Canadian GDP

The Bank of Canada vs Deepseek

stephaniekelton.substack.com

teqniq · Tuesday at 9:34 AM

magneze said:
The linked substack which tests Deepseek is pretty impressive:

The Impact of 25% Tariffs on Canadian GDP

The Bank of Canada vs Deepseek

stephaniekelton.substack.com

Yeah 12 seconds to perform the calculations (I posted that article on the new developments thread).

Spymaster · Tuesday at 9:37 AM

Petcha said:
This is fucking insane... surely the markets must have had some kind of warning here for such a massive bubble burst? Trump's reaction is even weirder, he's actually kind of welcoming it.

Trump’s full of shit. He’s hardly going to say that the Chinese have just humiliated the US tech industry.

He’s probably just lost a couple of billion himself.

newme · Tuesday at 9:38 AM

bcuster said:
obviously no warning hence the cliff!
and whats Trump going to say? More tariffs on an open source platform?

heres some perspective on the Nvidia share price....still way up on last year
View attachment 461593

Yeh plus its earnings in 29 days which are likely to be impressive, they've been on a massive upswing. I imagine there will also be a lot buying in on this basis with effectively the discount from today. Mostly panic selling due to sentiment as their cards were a near monopoly for running the existing models which was priced in. Deepseek is not dependent on them, however the performance is still far better using the nVidia cards from what I gather.

Rob Ray · Tuesday at 9:43 AM

lazythursday said:
Can't see that the tech bros have any better a record on data privacy than the CCP to be honest.

Some of them are arguably considerably worse.

Yossarian said:
Wonder if Trump is going to try to ban DeepSeek, would probably be easy enough to do so with the TikTok precedent - the AI's probably got a "flatter Trump" mode that can be activated if a ban appears likely

There is a 100% chance it will be. The tech bros are at the wheel in government. The problem the tech sector has though is multi-part. The model itself can be reused, which is why Nvidia is the one taking a hit atm, and the model is free, meaning there's a hell of an incentive to use it, and a hell of an incentive to just VPN your way to finding it.

teqniq · Tuesday at 9:57 AM

I see people having difficulty getting av Deepseek account. I got one this morning after the 3rd attempt at sending the code.

Petcha · Tuesday at 10:05 AM

teqniq said:
I see people having difficulty getting av Deepseek account. I got one this morning after the 3rd attempt at sending the code.

It's not any better than ChatGPT. The fuss over it is just that it's gonna be way cheaper and better for the environment. And at least you can ask Chat GPT about Hong Kong, Taiwan etc

ska invita · Tuesday at 10:06 AM

Petcha said:
It's not any better than ChatGPT. The fuss over it is just that it's gonna be way cheaper and better for the environment. And at least you can ask Chat GPT about Hong Kong, Taiwan etc

yes but it is your duty to download it and spook the markets further
i have done so and immediately deleted it
i might download it again - #feeltheagency

Yossarian · Tuesday at 10:06 AM

The Guardian looked into DeepSeek censorship and found some workarounds

We tried out DeepSeek. It worked well, until we asked it about Tiananmen Square and Taiwan

The AI app soared up the Apple charts and rocked US stocks, but the Chinese chatbot was reluctant to discuss sensitive questions about China and its government

www.theguardian.com

I'm glad to see bloated US tech giants get a bloody nose but not sure if a cheap alternative from a country controlled by a totalitarian regime is a great development

ska invita · Tuesday at 10:07 AM

Yossarian said:
The Guardian looked into DeepSeek censorship and found some workarounds

View attachment 461598

We tried out DeepSeek. It worked well, until we asked it about Tiananmen Square and Taiwan

The AI app soared up the Apple charts and rocked US stocks, but the Chinese chatbot was reluctant to discuss sensitive questions about China and its government

www.theguardian.com

I'm glad to see bloated US tech giants get a bloody nose but not sure if a cheap alternative from a country controlled by a totalitarian regime is a great development

presumably if others use the open sources the tianamen square bit isn't hardcoded in

TopCat · Tuesday at 10:08 AM

I got an account straight away this morning.

Mandarin lessons have commenced.

Petcha · Tuesday at 10:23 AM

Lol, just watching Sky news - their tech journo is saying this is all because of Biden

In response to US controls, allies could seek alternatives to US tech, creating unparalleled long-term incentives for Chinese competitors such as Huawei, Baidu, Alibaba, and Tencent. US tech groups and Congressional leaders worry the White House is rushing to avoid outside scrutiny and beat the buzzer on President Biden’s term. US officials have declared that the new controls do not represent a “significant regulatory action” — allowing the rule to skirt normally required stakeholder review.

Biden’s Final Global Chip Controls Target China — and Allies

President Biden’s team is racing to kneecap China’s AI industry. But its final chip control gambit risks doing the opposite.

cepa.org

No "The AI bubble has finally burst" thread?

Least noticed poster 2007

Least noticed poster 2007

Well-Known Member

Least noticed poster 2007

~#

Well-Known Member

toilet expert

Well-Known Member

Well-Known Member

The following psytrance is baṉned: All

back on the other side

Marxist Henchman

Attachments

entreprenulsoip critical thinking

Number go down​

DeepSeek’s killer advantage: it’s cheaper​

How’s OpenAI doing?​

What happens next?​

DisMembered

Well-Known Member

It was... Rebekah Vardy's account

joy in people

DisMembered

🎧

DisMembered

Plastic Paddy

Giant in Pastyland now

Weight is meaningless

DisMembered

Well-Known Member

back on the other side

⌛

back on the other side

Putin fanboy

Well-Known Member

Similar threads

Number go down

DeepSeek’s killer advantage: it’s cheaper

How’s OpenAI doing?

What happens next?