ChatGPT, LLMs, and storage

Analysis: The IT analytics world is being dominated by a storm of interest in Large Language Model (LLM) machine learning and generative AI (GAI). The release of the ChatGPT chatbot at the end of November last year generated a huge amount of interest, with 1 million users in a week and an upsurge in similar foundation model applications such as Google’s Bard and Amazon’s Titan.

Such GAIs, with their ability to understand text requests and output competent answers, have the potential to be applied across the entire enterprise and public sector IT landscape enabling fundamentally better search and analytics. Their use threatens/promises to replace and/or improve the productivity of knowledge workers of all sorts, ranging from call center and inside sales staff to accountants, lawyers and financial analysts over the next decade.

The fast and high rising interest in GAIs will have an effect on the IT industry and such effects are being researched and predicted by analyst and research houses like Forrester, Gartner and others. Parts of the data storage industry faces strong sales benefits from GAI adoption and we have attempted to catalog them.

A 126-page William Blair document, “Generative AII: The New Frontier of Automation”, provided much information for this effort.

Hardware

DRAM – More, more, more will be needed for CPU/GPU servers running LLMs for training and inference and that includes high High Bandwidth Memory HBM for GPUs.

PCIe – PCIe 4 and 5.0 component suppliers should get ready for a surge in demand.

CXL – CXL 2.0 memory pooling should receive a gigantic shot in the arm from LLMs and that incudes CXL hardware suppliers, meaning DRAM expanders, CXL switches and components Companies such as Micron, Samsung, SK hynix and others should all benefit.

NAND and SSDs – More, more and more will be needed, with an emphasis on NVMe access, PCIe 4.0 and 5.0 connectivity plus a mix of performance and capacity. This suggests QLC and high layer-count TLC NAND will benefit. All the NAND fabs and SSD vendors should be focussed on this market.

Storage arrays – The need will be for high capacity and high speed IO. AI/ML clusters will need petabyte levels of capacity. LLM training runs will require high speed dataset reading and checkpoint writing. This will need parallel access facilitated by hardware and software. LLM inferencing runs will need high read access rates with parallel data delivery paths to the processors.

We think this will primarily benefit file access all-flash arrays using NVMe protocols and with GPUDirect support for Nvidia GPU servers. Suppliers such as Dell, DDN, NetApp, Pure Storage (AIRI) and VAST Data are well-positioned to capitalize on this. Panasas sees an opportunity in edge AI. Object storage and block storage suppliers are not so well-positioned.

Vendors lacking GPUDirect support should, we think, pursue this with urgency.

Software

CXL-focused software – Suppliers such as MemVerge and Unifabrix should expect to see a large and sustained rise in interest for their products.

Data analytics – suppliers need to investigate adopting LLMs front ends as a matter of urgency.

Database, warehouses and lakehouses – They need to support the vector embeddings needed by LLM models. Vector databases that have the support will become more important. The need to investigate and trial chatbot front ends for their users is intense. This will enable non-data scientist and unskilled SQL users to run sophisticated analytics. They also have an opportunity to find ETL (Extract, Transform and Load) processes to pump selected data quickly out to LLMs for training and inference runs. See SingleStore and Databricks as examples.

Data managers – They can benefit by applying LLM technology to anayse their data sets and by feeding LLM proceses with data. See Cohesity as an example.

High speed arrays – Suppliers may well find it worthwhile to port their software to the public clouds which will run GAI models. That way they can support their customers who adopt a hybrid on-premises/public cloud approach to running LLMs.

Scale-out parallel filesystem – Suppliers such as IBM (Storage Scale) and WEKA are well placed as their existing customers adopt GAI technology and new customers look to them for fast and high-capacity file access software. These suppliers could be big winners.

Indirect beneficiaries and the unaffected

Cloud file services suppliers – They can use datasets they store in the cloud to feed LLMs but the data will need moving from their underlying object vaults to faster access stores; some form of ETL in other words. That is unless the CSPs like AWS, Azure and Google find some GPUDirect-like way of pumping data from S3 and Azure Blobs to their GPU instances. 

Data orchestrators – They may get an indirect benefit if they can orchestrate data needed to feed LLMs.

Data protection and security – Vendors need to check out chatbot interfaces to their management facilities to better protect and secure datasets and identify vulnerabilities. Domain-specific chatbots could inspect an organisation’s attack surface and identify actions to protect it. Data Protection backup data sets could feed LLMs given ETL processes.

Disk drive arrays – Your products are too slow and can only be used as a second tier behind flash storage primary stores.

Lifecycle managers – Vendors need to research how chatbot interfaces could make their users more productive.

Security suppliers – Chatbots and other AI tech could make them more effecting in detecting and responding to malware and handling user interactions.

Software-defined storage suppliers face being left behind by the GAI tsunami unless they find some means of satisfying the high performance access needed.

Storage admins – Chatbots could make them more productive or be used to enable less skilled staff to do more skilled work.

Tape systems – Archival systems are just too slow to feed data to LLMs, but still have their place.

Web3 – Such storage is going nowhere in an LLM world. It’s too slow.