It’s not either block storage or file/object storage for AI training and inference. It’s both.
AI large language model (LLM) training requires stacks of data, both files and objects, but not block data. The LLM training process needs unstructured data, turning it into tokens then vectors. Even AI inference with natural language input handles unstructured data and not block data.
How, then, does a block storage software supplier like Lightbits Labs make progress in the AI training market?
The GPU servers used in AI training actually need two kinds of data. One is unstructured data used to train the LLMs. The other is stored software code that operates the GPU servers themselves, typically Kubernetes-organized pods (containers), and they have to be loaded into the GPU servers from somewhere.
Think of VMware servers and their bare metal ESXi hypervisor, the bootable ESXi software, with NSX networking and vSAN storage and sundry drivers and firmware. This ESXi software bundle is called an image, an ISO file for installation and booting. A virtual machine disk file (VMDK) is the primary disk image format in VMware, containing the guest OS, applications, and data – the “disk image” of the VM. It is held on storage media, like a local SSD, and booted into life in the server.
Now translate this into the Kubernetes world, roughly, with container images. When a GPU server system can have hundreds if not thousands of GPUs, loading container images onto the GPUs becomes a huge job. To reduce GPU downtime to a minimum, image load should start quickly, meaning with a low latency, and complete quickly, meaning having high throughput. Doing this through a file or object protocol interface adds latency and slows throughput compared to pulling this system-level data through a block interface.

Lightbits Labs CEO Eran Kirzner says the company is selling its software to deal with two main use cases: transitions from VMware to alternatives, and migration to Kubernetes. This latter, Kirzner says, applies 100 percent to AI workloads, cloud, and GPU cloud datacenters.
These transitions are happening in e-commerce and finance as well. Each of these has peaky workloads, and requires as fast a response as possible and as high performance data delivery as possible to cope with the peaks as they rise. When an e-commerce site is starting Black Friday, it needs software loading onto its dynamically scalable estate of virtualized and/or containerized servers as near instantly as possible, so that it doesn’t lose a moment of Black Friday trading time. Server unavailability translates instantly into lost dollars.
In the AI training world, Lightbits software is serving images to GPU servers so that they can be switched from one training run to another with minimal downtime. These hyper-expensive processors waste a lot of money if they sit idle.
Lightbits’ market is not typically a VMware-type environment. Kirzner says: “Everything is OpenShift, KubeVuit, Kubernetes, and OpenStack,” but “we still have some large VMware customers. The beauty is that, on the same cluster Lightbits, you can run VMware and you can run a new (containerized) datacenter.”

The number of Kubernetes cores involved can be huge. Kirzner tells us: “One of our largest customers has more than two million Kubernetes cores connected to our environment. It’s pretty massive.” That could mean between 15,500 and 31,000 nodes, each one of which has to have a container image moved onto it. The scale at the high end is super-large, but other smaller customers might have 10,000 or even just 1,000 cores.

Lightbits’ cloud service provider customers typically exhibit this pattern as they embrace AI. Kirzner says: “Almost every cloud provider we have is also going to have an AI cloud. And here, one of the important parts is not just performance. Our hall of fame is performance and latency, very high performance, very low latency. If you compare us to Ceph, we are going to be between 5x to 10x better performance, and the same 5x to 10x better latency.”
Apart from the performance and latency, provisioning is another highly important aspect of this, Kirzner says. “I’ll explain why. Because when you need to provision 10,000 systems, 20,000 systems, and start to run a workload, you need all the images to fit in, and then to start simultaneously. If your workload is inference or your workload is training [then] this has to happen really, really fast, and we help some of the cloud providers to go from hours of provisioning to minutes of provisioning.” That’s hours of expensive GPU idle time cut down to minutes.
Any large GPU farm will need block storage, like Lightbits software, to load the system and LLM images onto the processors, and also file and/or block storage to hold the data that’s going to be used by the LLMs.








