AI/ML

Redis Vector Sets and LangCache speed GenAI models

April 11, 2025

Redis has announced the Vector Sets data type and fully managed LangCache semantic caching service for faster and more effective GenAI model development and execution.

Redis (Remote Dictionary Server) is the developer and supplier of the eponymous real-time, in-memory key-value store structure, usable as a cache, database or message broker, which can also write persistent data to attached storage drives. It supports strings, lists, sorted and unsorted sets, hashes, bitmaps, streams and vectors, and now Vector Set data types.

The Vector Set datatype complements “Redis’ existing vector similarity search, offering a lower-level way to work with vectors,” while LangCache lets developers integrate Redis-based LLM response caching into applications. This reduces calls to LLMs, storing and reusing prompts and responses to minimize cost, improve prompt accuracy, and accelerate responsiveness,.

CEO Rowan Trollope said that the “Vector Set is modelled on a sorted set but it’s hyper-dimensional.” A vector set has string elements, like a sorted set, but they’re associated with a vector instead of a score.

The Vector Set was devized by Redis creator Salvatore Sanfilippo, now one of Trollope’s advisors. He invented an algorithm that works with quanticized vectors, smaller in size than standard 32-bit vectors, enabling more vectors to be held in RAM, and so making semantic search more effective.

Trollope says the fundamental goal of vector sets is to make it possible to add items, and later get a subset of the added items that are the most similar to a specified vector. He tells us: “We can do natural language search of this Redis Vector Set.”

In more detail Vector Sets implement:

Quantization with vector float embeddings quantized by default to 8 bit values. This can be modified to no quantization or binary quantization when adding the first element.
Dimensionality reduction: The number of dimensions in a vector can be reduced by random projection by specifying the option and the number of dimensions.
Filtering: “Each element of the vector set can be associated with a set of attributes specified as a JSON blob via the VADD or VSETATTR command. This allows the ability to filter for a subset of elements using VSIM that are verified by the expression.”
Multi-threading: Vector sets speed up vector similarity requests by splitting up the work across threads to provide faster results.

Redis claims that its quanticization enables the use of int8 embeddings to reduce memory usage and cost by 75 percent, improve search speed by 30 percent, while maintaining 99.99 percent of the original search accuracy.

Trollope said this the Vector Set is: “a more fundamental representation of vectors than in other vector databases. … We don’t store the original vector [as] we don’t believe the full vector is needed. … We quanticize the vector with 1 byte quanticization and variants, such as FP32 to binary – a 32x reduction.” Which quanticization method is used depends upon the use case.

Redis now has two complementary search capabilities:

Redis Query Engine for general search & querying,
Vector set for specialized vector similarity search.

LangCache

Trollope blogs that LangCache: “provides a hosted semantic cache using an API connection that makes AI apps faster and more accurate.” It is a REST API and “includes advanced optimizations to ensure highly accurate caching performance.”

LandCache uses a Redis custom fine-tuned model and configurable search criteria, including search algorithm and threshold distance. Developers can generate embeddings through their preferred model provider, eliminating the need to separately manage models, API keys, and model-specific variables.

LangCache can manage responses so that apps only return data that’s approved for the current user, eliminating the need for separate security protocols as part of the app.

Redis tools and Redis Cloud update

Redis has introduced more AI developer tools and features.

A Redis Agent Memory Server is an open source service that provides memory management for AI apps and agents. Users can manage short-term and long-term memory for AI conversations, with features like automatic topic extraction, entity recognition, and context summarization.
Redis hybrid search combines full-text search with vector similarity search to deliver more relevant results.
A portfolio of native integrations for LangGraph has been specifically designed for agent architectures and agentic apps. Developers can use Redis to build a LangGraph agent’s short-term memory via checkpointers, and long-term memory via Store, vector database, LLM cache, and rate limiting.

Some Redis Cloud updates provide GenAI app-building facilities

Redis Data Integration (RDI) is Redis’ change data capture offering which and automatically syncs data between cache and database to deliver data consistency in milliseconds.
Redis Flex on Cloud Essentials is Redis rearchitected to natively span across RAM and SSD, “delivering the fastest speeds from the first byte to the largest of dataset sizes. Developers can store up to 5X more data in their app and database for the same price as before.”
Redis Insight on Cloud: Developers can now view, update, query, and search the data in Redis directly from their browser. Redis Insight gives access to the Redis developer environment, including the Workbench and tutorials, and new query autocompletion which pulls in and suggests schema, index, and key names from Redis data in real-time to allow developers to write queries faster and easier.

The Vector Set will be included in the Redis 8 Community Edition beta, due May 1. RDI is in private preview -sign up here. Redis Flex is in public preview. A Redis blog discusses LandCache, Vector Sets and the various tools, etc. Another blog discusses Vector Sets in more detail.

Redis Vector Sets and LangCache speed GenAI models

LangCache

Redis tools and Redis Cloud update

ABOUT US

FOLLOW US

Storage news ticker – April 11

Keepit answers SaaS app backup scheme questions

China’s YanRong integrates KVCache with its filesystem to accelerate AI inferencing