Panmnesia speeds up vector search with CXL

Panmnesia claims to have devised CXL-based vector search methods up to 110x faster than current best practices.

The Korean startup has developed CXL-ANNS (CXL – Approximate Nearest Neighbor Search), an AI vector search system that loads billion point vector search data sets into CXL 3.0 shared memory, and then ensures the search paths are aligned with local, prefetched parts of the data set. Vector search is used in search engines, data mining, databases and recommendation engines, with vector embeddings representing numeric information ranking an item’s values on up to thousands of dimensional scales.  

Myoungsoo Jung, Panmnesia
Myoungsoo Jung

Panmnesia CEO Myoungsoo Jung said: “CXL-ANNS establishes a new standard in crafting advanced systems specifically designed for representative datacenter applications, wholly harnessing the advantage of CXL technology. We foresee our trailblazing research sparking innovation within the community and stimulating the expansion of the CXL ecosystem.”

Web search has progressed from simple keyword search to users inputting detailed questions and receiving more precise answers or uploading images and retrieving similar ones. Vector embeddings are used in these searches, being generated from the input data and then used to find the desired result, items with the nearest vector embeddings to the input data.

Panmnesia says each embedding vector is a composition of hundreds of numerical values. They are cataloged by a proximity graph, designed with the vectors as its nodes, to make search faster and return results in milliseconds. But this needs a lot of host search server memory. Microsoft has disclosed that its search engine manages over 100 billion embedding vectors, potentially consuming more than 40 terabytes of memory space. Memory space is maximized by compressing the vectors and also augmented by storing vectors and the proximity graph in SSDs, slower to access than DRAM.

CXL-ANNS capitalizes on the CXL-based disaggregated memory pool, with up to 4PB per host CPU root complex. Panmnesia says it’s effectively providing infinite memory for vector search because it’s architecture is scalable, linking multiple memory expanders to the CPU via a CXL switch.

That fixes the memory capacity problem but introduces its own latency, caused by each memory request to the CXL memory space necessitating data transfer across the CXL interconnect. This adds a latency equal to or more than a DRAM access itself. Panmnesia’s researchers have devised ways to limit this.

CXL-ANNS places the frequently accessed nodes of the proximity graph in the local DRAM. This technique minimizes CXL memory access during vector indexing, so reducing the additional latency caused by CXL. 

Secondly, the CXL-ANNS hardware includes a domain-specific accelerator (DSA) in its controller. This analyzes the embedding vector and generates a smaller result, indicating whether the vector aligns with user intent. This compact result is sent to to the CPU instead of the original vector, so reducing data movement overhead.

Thirdly, the duration of search can be reduced by parallelizing search operations and ensuring that needed data is local to search instances through caching and prefetching.

Panmnesia research team member Miryeong Kwon said: “CXL-ANNS amplifies the speed of query processing by 111.1 times, surpassing current methodologies, including Microsoft’s production service.”

The company says it has a complete system prototype based on actual hardware and software. Its work will be showcased at the USENIX Annual Technical Conference 2023 in Boston, USA, this July. The research findings are documented in a paper titled “CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search.”

The rate of CXL software development is so fast that there will be a software ecosystem in place to immediately take advantage of CXL 3.0 hardware when it arrives.