VAST may embrace GPU processing nodes


VAST Data has been promising a move up the stack from storage to better support AI/ML workloads for some time now. Nothing has been revealed but clues have been dropped and we think we can see GPUs playing a role in VAST’s future.

The company supplies Universal Storage; a single tier of all-flash NVMe file-based storage accessed by stateless x86 compute nodes, each of which can access any one of the drives and see all the storage. This is its Disaggregated Shared Everything (DASE) architecture. It believes that reinventing storage was its Act 1 and that moving beyond storage will be its Act 2.

In November, co-founder and CMO Jeff Denworth referred to this in a blog, saying: “We still have so many advantages we plan to bring to the table in the form of new partners, new cloud investments and whole new product dimensions that we’re working hard to align to meet our customer needs.”

Co-founder and VAST Data CEO Renen Hallak said more about this on a SixFive Summit podcast with Futurum senior analyst Daniel Newman in June 2022: “As we look into the future. It’s not at all about storage, it’s working with our customers hand in hand, in order to build out the type of infrastructure that they require for these next generation workloads.

“We want to be that middle of the sandwich, always leveraging the best and greatest hardware, like what we see from our friends at Nvidia as an example. But always also providing the application with the most convenient API, such that the application developers don’t need to think about what’s underneath them.”

The next-generation workloads are ones requiring instant access to all kinds of data – new, mid-term and historic – so that AI, ML, deep learning and other analysis routines can detect and use patterns in all of an organisation’s data and so, in theory, make better decisions.

Hallak said: “And so our customers have been pulling us in multiple directions along those lines. And we’ll be announcing a lot more about this later this year.”

Nothing has yet been announced so things have been delayed. But we can deduce something about VAST’s product direction from this and subsequent words spoken by Hallak: “But you can expect that we move into non-storage related parts of the stack, we move horizontally across geographies, we move up the stack, in very interesting ways of doing data management and data understanding and training.” 

“And up into the compute layer is also important to encapsulate everything that’s needed for these very, very large AI supercomputers that we’re in the process of deploying with our customers.”

Consider a simple AI stack view:

  • Storage Layer <— VAST
  • Data moving network <— VAST
  • Hardware processing layer – CPUs, GPU
  • AI application software layer issuing API calls

VAST Data already has its internal RDMA network and its x86 compute nodes (CNodes), which can run containerized applications. It has integrated with Nvidia to support GPUDirect and so can ship data fast to Nvidia’s GPUs. How can VAST move up this stack?

The simple answer is by embracing GPUs and bringing them directly into its architecture as GNodes to provide an AI supercomputer. It can also add API-level interfaces to the common AI and ML applications. We asked a VAST source about this inclusion of GPUs and were told: “We talked early on about customers orchestrating the VAST containers (CNodes) and application containers on the same host.  If those hosts had GPUs BINGO there’s your supercomputer. We don’t support this at the moment, but it could work.”

Then our source said: “And yes, we see those compute tasks sharing a more flexible pool of compute power, including GPUs in the future.”

We also think VAST will announce metro and then continental clustering so that separate and remote VAST storage repositories can talk to each other and possibly have a single metadata pool.

The company does not build servers so there is an open possibility of it providing its storage to server vendors such as Supermicro, Lenovo and others in an OEM or reselling relationship, with them providing x86 and GPU servers as controller nodes atop of VAST’s Ceres storage trays.

Lastly, Hallak said to Newman that VAST Data does not need thousands of customers to prosper: “We’re in a very fortunate position of not needing hundreds of 1000s of customers in order to sustain the business, we do very large systems. And so we start at petabyte scale, and grow into exabyte scale. What that means for us is that a few 100 customers are enough for us definitely at this stage to get to those very high revenue levels.”

“And every single one of those customers is a design partner of ours, every single one of them wants interaction with R&D, wants to help shape our roadmap and the features that they will receive over time. And we love that because it means… it becomes very easy to prioritize and decide what to build. And we never build something that doesn’t get used by our customers.”

VAST, like every other storage supplier, faces the Sapphire Rapids adoption problem/opportunity, along with PCIe gen 5 and, down the road, CXL memory pooling. We think that 2023 could be an exciting year for VAST, and one in which mainstream vendors start to publicly recognize that VAST is becoming a force in their markets.