DataStax is using Nvidia’s NIM inference and NeMo retriever microservices to generate RAG vector embeddings faster for Astra DB.
DataStax provides Astra DB – a scale-out, cloud-native Cassandra-based, NoSQL database that incorporates vector embeddings – as a service. Vector Embeddings are coded, multi-dimensional representations of text, audio and images used for similarity-based semantic search in generative AI large language models (LLMs). RAGStack provides a RAG capability to Astra DB via a LlamaIndex partnership. RAG stands for retrieval-augmented generation and adds an organization’s own data to LLMs to help prevent Gen AI hallucinations.
DataStax chairman and CEO Chet Kapoor explained in a statement: “RAG has emerged as the pivotal differentiator for enterprises building genAI applications with popular large language frameworks. … Integrating Nvidia NIM into RAGStack cuts down the barriers enterprises are facing to bring them the high-performing RAG solutions they need to make significant strides in their Gen AI application development.”
Organizations building generative AI applications need to vectorize existing and newly-acquired unstructured data for integration into large language models (LLMs). The embeddings need generating in near-real time from newly-arrived data, and added to the database index.
NeMo Retriever generates over 800 embeddings per second per GPU, and pairs with Astra DB, which can ingest new embeddings at more than 4000 transactions per second with single-digit millisecond latencies, on industry-standard storage hardware. With embedded inferencing built on Nvidia’s NeMo and Triton Inference Server software, AstraDB vector performance on RAG use cases running on Nvidia H100 GPUs achieved 9.48ms latency embedding and indexing documents.
DataStax claims that, with this integration, users will be able to create instantaneous vector embeddings 20x faster than other popular cloud embedding services and benefit from an 80 percent reduction in cost for services.
Enterprises can validate the availability and performance of various combinations of embedding and LLM models for common RAG pipelines by using a RAGStack compatibility matrix tester.
DataStax is also launching, in developer preview, a new feature called Vectorize. This performs embedding generations at the database tier, enabling users to leverage Astra DB to generate embeddings using its own NeMo microservices instance, and passing cost savings directly to the customer.
Nvidia’s Kari Briski, VP of of AI software, said: “Enterprises are looking to leverage their vast amounts of unstructured data to build more advanced generative AI applications. Using the integration of Nvidia NIM and NeMo Retriever microservices with the DataStax Astra DB, businesses can significantly reduce latency and harness the full power of AI-driven data solutions.”
Read a DataStax vector search explainer here.
Bootnote.
Triton is Nvidia open source software for running inference workloads on GPUs or CPUs.