DeepSeek Week: What (Virtually) Everybody Missed

Earlier this week, information of DeepSeek GPT swept via the media. The market reacted, and lots of {hardware} corporations (e.g., Nvidia, ASML, Broadcom, Marvell) noticed a drop of their market worth. Presumably, the demand for truckloads of GPUs shall be diminished as a result of DeepSeek has delivered a basis mannequin that rivals the “large children” at 1/50 the fee. This evaluation missed the purpose, apart from previous fingers like Pat Gelsinger.

To be clear, I’m not providing funding recommendation; I’m an HPC gray hair who has seen some issues. As a result of nobody is aware of what really constitutes an AI, or an AGI for that matter, the {hardware} and software program wanted to breed HAL 9000 remains to be a bit up within the air.

The coaching value of DeepSeek is quoted at $5 million — about 50 instances lower than the reported value of the $250 million estimate required to construct a state-of-the-art mannequin. There’s a lot quibbling about this worth together with solely {hardware} coaching prices and never folks, improvement, and different infrastructure prices. Let’s eradicate all of the nitpicky arguments by assuming DeepSeek diminished the fee by an element of 10, which means they spent $25 million on the works. That’s nonetheless low cost in comparison with the information heart fashions we always hear about.

Take a pause right here and take into consideration the previous. Has there ever been a time when the “value of entry” to computing was diminished by an element of 10? Take into account supercomputing within the early 1990’s. It was an costly sport, often required 7 figures to play, and the league was moderately unique. Then Thomas Sterling and Don Becker, with $50,000 from Jim Fisher, constructed this commodity hardware-based “Beowulf” cluster laptop.

The remaining is a well known a part of historical past. The price of entry to supercomputing was diminished by not less than an element of ten. Consequently, much more folks have been now doing analysis and engineering high quality supercomputing for lots much less cash. There have been tales circulating about buying a quick Beowulf cluster for the price of a reminiscence improve or annual help settlement on present supercomputing programs.

The appearance of Beowulf commodity computing was an inflection level for the market. Virtually anybody might get within the sport — and did. The lowered “value of entry” spurned elevated {hardware} gross sales into a brand new market — commodity based mostly HPC. Along with the centralized functionality machines, there have been now native machines designed to ship the precise efficiency wanted by customers.

Again to at present. If it may be reproduced by others, the DeepSeek information is the “Beowulf” second of the GenAI market. The membership is now open to many extra organizations that have been prohibited by value up to now. Certainly, a current X/Twitter put up by Matthew Carrigan;

Full {hardware} + software program setup for working Deepseek-R1 domestically. The precise mannequin, no distillations, and Q8 quantization for full high quality. Whole value, $6,000.

To be clear, that’s the full mannequin, which is 650GB in dimension. Now the kicker. The complete mannequin runs on two AMD EPYC processors (9004 or 9005) with 768GB (to suit the mannequin) throughout 24 RAM channels. Throw in a case, energy provide, and SSD, and also you just about have a neighborhood machine that Matthew stories to run at “between 6 and eight tokens/second.” Surprisingly, there’s no GPU within the {hardware} manifest. The limiting components in working the total mannequin are reminiscence dimension and reminiscence bandwidth. On this case, giant quantities of reminiscence swing the platform in favor of CPUs. After all, you will get a GPU with extra reminiscence, however it can value you “a bit” greater than $6,000. Notice: GPUs are nonetheless wanted for coaching the mannequin, so don’t promote your Nvidia inventory simply but as a result of a decrease value to coach and run the mannequin could generate extra {hardware} gross sales.

The DeepSeek-R1 launch was accompanied by an in depth venture launch on GitHubub and a technical paper outlining the venture’s key steps. Additionally, an “Open-R1: a completely open replica of DeepSeek-R1” has been began on Huggingface.

Lastly, there’s additionally reporting and strategies within the their paper on how the DeepSeek crew optimized the mannequin. These optimizations permit the mannequin to run extraordinarily quick and could also be as a result of DeepSeek crew making an attempt to benefit from the obtainable {hardware}. Remarkably, DeepSeek claims to have pre-trained its V3 mannequin on solely 2,048 crippled Nvidia H800 GPUs. Not like the “cheaper” methodology of throwing extra {hardware} on the downside (or extra doubtless “throw an information heart on the downside”), DeepSeek labored on software program optimizations. Continued testing and examine of this mannequin will undoubtedly reveal extra particulars.

As Mark Twain has famously stated, “Historical past Doesn’t Repeat Itself, however It Usually Rhymes.” Certainly, as we now have seen up to now when the barrier to entry for technical computing goes down, these large established corporations are inclined to frown.

This text first appeared on sister website HPCwire.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...