Kioxia open sources AiSAQ tech to cut DRAM use in RAG

Kioxia AiSAQ technology, designed to reduce DRAM requirements in generative AI systems, was this week released as open source software.

AiSAQ, otherwise known as “all-in-storage ANNS with product quantization,” provides an “approximate nearest neighbor search” (ANNS) algorithm optimized for SSDs. Kioxia AiSAQ software also delivers scalable performance for retrieval-augmented generation (RAG) without placing index data in DRAM, instead searching directly on SSDs.

Generative AI systems demand significant compute, memory, and storage resources. “While they have the potential to drive transformative breakthroughs across various industries, their deployment often comes with high costs,” said Kioxia. RAG is a critical phase of AI that refines large language models (LLMs) with data specific to the company or application.

A central component of RAG is a vector database that accumulates and converts specific data into feature vectors for retrieval. RAG also utilizes an ANNS algorithm, which identifies vectors that improve the model based on similarity between the accumulated and target vectors. “For RAG to be effective, it must rapidly retrieve the information most relevant to a query,” said Kioxia.

Traditionally, ANNS algorithms are deployed in DRAM to achieve the high-speed performance required for these searches. But Kioxia AiSAQ technology provides a “scalable and efficient” ANNS solution for billion-scale datasets with “negligible” memory usage and “fast” index switching capabilities, Kioxia said.

Kioxia AiSAQ slide
Kioxia AiSAQ slide

The key benefits of AiSAQ include allowing large-scale databases to operate without relying on limited DRAM resources, enhancing the performance of RAG systems. It also eliminates the need to load index data into DRAM, enabling the vector database to launch instantly. This supports switching between user-specific or application-specific databases on the same server for efficient RAG service delivery.

Axel Stoermann, Kioxia
Axel Stoermann

It is optimized for cloud systems by storing indexes in disaggregated storage for sharing across multiple servers. This approach adjusts vector database search performance for specific users or applications, and helps the migration of search instances between physical servers.

“Our AiSAQ solution paves the way for almost infinite scaling of RAG applications in generative AI systems based on flash-based SSDs at the core,” said Axel Stoermann, chief technology officer and VP at Kioxia Europe. “By utilizing SSD-based ANNS, we are reducing the reliance on costly DRAM, while matching the performance needs of leading in-memory solutions, enhancing the performance range of large-scale RAG applications significantly.”