Tesla is reportedly talking to SK hynix about a potential ₩1 trillion ($725 million) order for the high-capacity SSDs it will need to store data for its AI training supercomputers.
This is according to a report by the Korea Economic Daily, whose chip industry sources reckon the storage product in question is SK-hynix-owned Solidigm’s 61.44 TB D5-P5336 SSD made with QLC (4bits/cell) flash. It is a PCIe gen 4 drive and a single unit price is around $7,300. A $725 million order value would imply 99,315 drives at full price, and around 200,000 at a 50 percent volume discount, providing 12.3 EB of capacity.
Neither SK hynix nor Tesla have responded to a request for comment about the claimed deal, but whether or not such talks are taking place, Tesla’s need for SSDs to feed AI training data to its GPUs is very real – and huge. Its Dojo supercomputing infrastructure, spread across three sites is being used to develop Tesla’s Autopilot and Full Self-Driving (FSD) mode software for its electrically powered cars and trucks. One Dojo supercomputer employs 10,000 NVIDIA H100 GPUs in a gigantic cluster, according to a tweet from Tim Zaman – formerly of Tesla and currently working for Google’s Deepmind AI team – last August.
Zaman was then an AI Infra & AI Platform Engineering Manager at Tesla. He’s now a software engineer at Deep Mind.
Another and more recent report suggests Dojo will use 50,000 H100s plus Tesla’s own 20,000 wafer-scale AI processors using internally developed D1 chips set out in a 5×5 matrix of 25 x D1s per wafer. Tesla is building a second Dojo supercomputer at Austin, Texas, and the Dojo systems will also be used to develop AI software for FSD and to operate the Optimus robot.
Elon Musk’s xAI business is building a separate Colossus supercluster in Memphis, Tennessee, with >100,000 H100 GPUs on a single RDMA fabric, and exabytes of storage. Supermicro built the servers using its liquid-cooled racks, each with 8 x 4RU Universal GPU system servers, fitted with 8 x NVIDIA H100s, meaning 64 GPUs per rack. The racks are grouped in mini-clusters of 8, meaning 512 GPUs, plus networking. That implies, with c100,000 GPUs in total, there are some 200 mini-clusters, and these are deployed in 4 x 25,000 GPU data halls.
Much of the storage hardware is based on Supermicro Storage Servers, such as the 1RU NVMe storage node with 8 x 2.5-inch all-flash storage bays. The storage software comes from several sources including, we understand VAST Data.
Nvidia said on Oct 28: “Colossus, the world’s largest AI supercomputer, is being used to train xAI’s Grok family of large language models, with chatbots offered as a feature for X Premium subscribers. xAI is in the process of doubling the size of Colossus to a combined total of 200,000 NVIDIA Hopper GPUs.”
Tesla CEO Elon Musk has tweeted about Tesla’s Cortex AI training supercomputer cluster being built at its Gigafactory in Austin, Texas. You can check out a short video of Cortex here. We understand that this is a renamed Dojo system.
We view Tesla as having a Dojo infrastructure for FSD and Optimus bot training, with the original Dojo system based in California using sites in San Jose (ex-Twitter) and Sacramento. Dojo 2 is what is being called Cortex, based at the Gigafactory site in Austin, Texas. Dojo 3 will be the Buffalo, New York, installation.
Together, the three Dojo sites plus the xAI Colossus system in Memphis, Tennessee – four AI supercomputer clusters in all – means that Tesla has an almost insatiable need for AI training compute power and, hence, also for fast access to AI training data. And that means SSDs.
QLC SSDs provide an excellent mix of capacity and access speed for AI model training and can keep GPUs active better than disk drives with their long access time. Having a reliable, long-term supply deal for the tens of thousands of SSDs needed would make sense.
Solidigm was an early entrant into the QLC SSD market and is now developing a 122TB follow-on. Competitors like Samsung are also building 61.44TB SSDs, its BM1743, and developing 128TB ones. Western Digital also has its own 61.44 TB SN655 drive.
SK hynix supplies high-bandwidth memory (HBM) chips used by NVIDIA to build its GPU servers, and Tesla represents a huge end-user customer for SK hynix and its DRAM and SSD products – possibly its single largest customer.