VAST Data is announcing new AI database software as part of its upcoming mega data platform, revealing how data storage and the database itself have been converged into a single platform.
Its data platform suite, which has been compared to the concepts of former supercomputing outfit Thinking Machines, is based on its NAS DataStore, VastOS storage software and single-tier QLC flash certified hardware base with data catalog software.
The platform will ultimately be composed of its existing hardware, OS and Data Catalog. VAST is currently adding the DataBase part and developing the DataEngine part – the eventual compute layer. The “storage-database hybrid” platform will only be complete when the DataEngine software component arrives next year.
Renen Hallak, VAST CEO and co-founder, said: “We’ve been working toward this moment since our first days, and we’re incredibly excited to unveil the world’s first data platform built from the ground up for the next generation of AI-driven discovery.”
The VAST DataBase is a combined transactional and analytical database with a scalable and ACID-transactional distributed design, with an exabyte-scale columnar data structure optimized for flash. VAST says it is architected for rapid data capture and aggregates the features of a standard database, data warehouse and data lake.
The company is developing VAST DataEngine by adding application triggers and Python-based functions natively into the VAST Data Platform. The intent is to make it a global function execution engine.
This DataEngine will operate on the VAST DataStore with real-time streams of rich content, IOT data, and text. The software will make decisions by correlating all of a VAST storage fleet’s metadata, accessing at all the fleet’s global locations and including archive data.
VAST claims a global federation of machines will compute on a global collection of data to discover the greatest insights with the greatest infrastructure efficiency. There will be a global namespace, a DataSpace, “that permits every location to store, retrieve and process data from any location with high performance.”
That is, the distributed VAST DataStore fleet will compute, rather than a single datacenter.
That includes the public cloud as the VAST OS is now available in AWS, Azure and the Google Cloud.
The DataEngine software operates within the DataSpace to create a mesh of computational resources (CPUs, GPUs and DPUs) that can move the data to compute (when compute has greater gravity) or compute to data (when data has more gravity).
As our sister publication The Next Platform notes, AI workloads need enormous amounts of data to build models, an enormous amount of compute to run inference on new data as it enters the model, plus a lot of performance grunt. As “all of this puts tremendous pressure on the storage system to deliver information.” Vast Data says its Universal Storage, a disaggregated shared nothing implementation of NFS that has a very fine-grained quasi-object store underneath it, can handle this.
VAST says its DataStore will understand natural data by embedding a queryable semantic layer into the data itself. It will continuously and recursively compute on data in real-time, evolving with each interaction.
Hallak said: “Encapsulating the ability to create and catalog understanding from natural data on a global scale, we’re consolidating entire IT infrastructure categories to deliver a recursively computing thinking machine to drive new discoveries that were previously unthinkable. With the VAST Data Platform, we are democratizing the data infrastructure of artificial intelligence to reimagine a world that discovers at an ever-accelerating pace.”
Hallak believes a future AI system might go further in synthesizing and learning from data than today’s large language models. This will need a platform that can ingest “the entire data spectrum of natural data – unstructured and structured data types in the form of video, imagery, free text, instrument data” which will be generated from all over the world and processed using real-time inference and constant, recursive AI model training.” This is where functions and the application triggers come in, as the VAST system generates new data which is used to trigger fresh processing routines.
The company’s’ DataBase product is available now, with some customers already using it in production. The DataEngine will become available in the early part of 2024.