DataStax goes vector searching with Astra DB

DataStax has tweaked its Astra DB Database-as-a-Service (DBaaS) by incorporating vector search capabilities in response to the increasing demand for generative AI-driven data storage solutions.

The foundation of vector search lies in vector embeddings, which are representations of aspects of various types of data, including text, images, audio, and video, presented as strings of floating-point numbers.

DataStax uses an API to feed text data to a neural network, which transforms the input into a fixed-length vector. This technology enables search input that closely matches existing database entries (vector embeddings) to produce output vectors in close geometric proximity, while inputs that are dissimilar are positioned further apart.

Ed Anuff, DataStax chief product officer, emphasized the significance of vector databases in enabling companies to transform the potential of generative AI into sustainable business initiatives. “Databases that support vectors – the ‘language’ of large learning models – are crucial to making this happen.”

He said massive-scale databases are needed because “an enterprise will need trillions of vectors for generative AI so vector databases must deliver limitless horizontal scale. Astra DB is the only vector database on the market today that can support massive-scale AI projects, with enterprise-grade security, and on any cloud platform. And, it’s built on the open source technology that’s already been proven by AI leaders like Netflix and Uber.”

Astra DB is available on major cloud platforms, including AWS, Azure, and GCP, and has been integrated into LangChain, an open source framework for developing large language models essential for generative AI.

Matt Aslett, VP and Research Director, Ventana Research, said in DataStax’s announcement: “The ability to trust the output of generative AI models will be critical to adoption by enterprises. The addition of vector embeddings and vector search to existing data platforms enables organizations to augment generic models with enterprise information and data, reducing concerns about accuracy and trust.”

Astra DB adheres to the PCI Security Council’s standards for payment protection and safeguards Protected Health Information (PHI) and Personally Identifiable Information (PII), we’re told.

Other database and lakehouse providers – such as SingleStore, Databricks, Dremio, Pinecone, Zilliz, and Snowflake – are also actively supporting vector embeddings, demonstrating the growing demand for these features in the generative AI data storage landscape.

Try DataStax’s vector search here (registration required) and download a white paper on Astra DB vector search here (more registration required).

Additionally, customers using DataStax Enterprise, the company’s on-premises, self-managed offering, will have access to vector search within the coming month.