VAST Data adds vector search and deepens Google Cloud ties

VAST Data has added vector search to its database and integrated its software more deeply into Google’s cloud.

The database is part of its software stack layered on top of its DASE (Disaggregated Shared Everything) storage foundation along with the Data Catalog, DataSpace, unstructured DataStore and DataEngine (InsightEngine). Generative AI large language models (LLMs) manipulate and process data indirectly, using hashed representations – vector embeddings or just vectors – of multiple dimensions of an item. An intermediate abstraction of word in text documents is a token. These are vectorized and a document item’s vectors are stored in a multi-dimensional space with the LLM searching for vectors as it computes steps in its generation of a response to user requests. This is called semantic searching.

A VAST Data blog by Product Marketing Manager Colleen Quinn says: “Vector search is no longer just a lookup tool; it’s becoming the foundation for real-time memory, context retrieval, and reasoning in AI agents.”

Vectors are stored by specialized vector database suppliers – think Pinecone, Weaviate and Zilliz – and are also being added as a data type by existing database suppliers. Quinn says that the VAST Vector Search engine “powers real-time retrieval, transactional integrity, and cross-modal governance in one platform without creating new silos.” 

In the VAST world, there is a single query engine, which can handle SQL and vector and hybrid queries. It queries VAST’s unstructured DataStore and the DataBase, where vectors are now a standard data type. Quinn says: “Vector embeddings are stored directly inside the VAST DataBase, alongside traditional metadata and full unstructured content to enable hybrid queries across modalities, without orchestration layers or external indexes.”

“This native integration enables agentic systems to retrieve memory, reason over metadata, and act – all without ETL pipelines, external indexes, or orchestration layers.”

“The system uses sorted projections, precomputed materializations, and CPU fallback paths to maintain sub-second performance – even at trillion-vector scale. And because all indexes live with the data, every compute node can access them directly, enabling real-time search across all modalities – text, images, audio, and more – without system sprawl or delay.”

“At query time, VAST compares the input vector to all stored vectors in parallel. This process uses compact, columnar data chunks to prune irrelevant blocks early and accelerate retrieval.”

“Future capabilities will expand beyond vector search, enabling new forms of hybrid reasoning, structured querying, and intelligent data pipelines.” Think multi-modal pipelines and intelligent data preparation.

Google Cloud

Building on its April 2024 announcement that it had ported its Data Platform software to Google’s cloud, enabling users to spin up VAST clusters there, VAST has now gone further. It says its Data Platform “is fully integrated into Google Cloud – offering a unified foundation for training, retrieval-augmented generation (RAG), inference, and analytics pipelines that span across cloud, edge, and on-premises environments.”

Renen Hallak, VAST founder and CEO, spoke of a “leap forward,” stating: “By combining the elasticity and reach of Google Cloud with the intelligence and simplicity of the VAST Data Platform, we’re giving developers and researchers the tools they need to move faster, build smarter, and scale without limits.”

The additional VAST facilities now available on GCP include:

  • InsightEngine enabling developers and researchers to run data-centric AI pipelines—such as RAG, preprocessing, and indexing—natively at the data layer.
  • DataSpace with its exabyte-scale global namespace which connects data on-premises, at the edge, and in Google Cloud as well as other hyperscalers for data access and mobility.
  • Unified file (NFS, SMB), object (S3), block, and database access.

VAST says customers can run AI, ML, and analytics initiatives without operational overhead and unify their AI training, RAG pipelines, high-throughput data processing, and unstructured data lakes on its single, high-performance platform.

The base VAST software has already been ported to AWS, with v5.2 available in the AWS Marketplace. We understand v5.3 is the latest version of VAST’s software. 

There is limited VAST availability on the Azure Marketplace, where “VAST’s virtual appliances on Azure allow customers to deploy VAST’s disaggregated storage processing from the cloud of their choice. These containers are free of charge and customers interested in deploying Universal Storage should contact VAST Data to get their capacity under management. This product is available as a Directed Availability release.”

Comment

With its all-in-one storage and AI stack, VAST Data is becoming the equivalent of a software AI system infrastructure mainframe environment, built from modular storage hardware boxes, NMVe RDMA links to x86 and GPU compute, not forgetting Arm (BlueField). Both compute and storage hardware are commodities for VAST. But the software is far from a commodity. It is VAST’s core proprietary IP, and being developed and extended at a high rate, with a promise of being uniformly available across the on-premises environment and the AWS, Azure, and Google clouds. For better or worse, as far as we are aware, no other storage nor system data infrastructure company is working on such a broad and deep AI stack at the same pace.