Qdrant CEO on why AI needs purpose-built vector search

June 16, 2025

Qdrant is an open source vector database startup, with more than 10 million installs. We spoke to founder and CEO Andre Zayarni to find out more about Qdrant’s differentiation from general-purpose databases.

Blocks & Files: As I understand it, an AI training and inference data pipeline starts with data sources. Could you explain the AI data pipeline stages from your point of view?

Andre Zayarni: It’s important to separate training from inference. Training pipelines prepare raw data to fine-tune or pre-train foundation models, while inference pipelines focus on applying those models to real-world tasks. Vector search is central to the inference stage: embeddings are created from relevant data sources and stored for fast retrieval, enabling techniques like RAG and increasingly, agentic RAG – to augment model outputs with real-time, context-aware information. This augmentation is critical when models need access to dynamic, proprietary, or task-specific knowledge, like enterprise IP, that wasn’t part of their original training. This is where dedicated vector search comes in, acting as the semantic retrieval layer for high-performance AI applications.

Blocks & Files: Does the pipeline include block (structured), file, and object data?

Andre Zayarni: Yes, though their roles differ across the pipeline. AI pipelines increasingly focus on unstructured data – files, documents, images, and code – which form the backbone of both model training and real-time inference tasks like RAG or AI agents. Structured data, such as metadata, is often used to label, filter, or organize this content for better retrieval and control. Since most enterprise knowledge lives in unstructured formats, this is where the real opportunity lies, and feature-rich, native vector search engines are designed to bridge that gap with fast, precise search and native metadata filtering.

Blocks & Files: Do you think the initial AI data pipeline stages of input data collection, selection by filtering, cleansing and PII removal are the same for live data, backup data and archive data? Do we need three separate AI data pipeline initial stages to access these different kinds of data sources since the data sets are controlled by different applications?

Andre Zayarni: In vector search, upstream steps like cleansing and PII removal are handled before data is embedded. You don’t need separate pipelines for live, backup, or archived data at the vector database layer, but upstream ingestion logic may differ to account for query latency, retention policies, or data recency requirements. Qdrant, for example, accepts pre-processed vectors with rich metadata and supports advanced filtering, including geo, nested, and semantic filters. It also offers vector-level API key permissions and multi-tenancy, enabling strict access control even at query time, helping organizations maintain end-to-end privacy across teams, data types, and deployments.

Blocks & Files: How should data be vectorized and are vectors best stored in a general database/data lake or a specific vector database? If the latter, what are the advantages of a vector-specific database, such as scalability, performance, and features?

Andre Zayarni: Data should be vectorized using embedding models that align with your task and domain – but once transformed, vectors are large, fixed-size, and computationally intensive to search efficiently. General-purpose databases are fundamentally not designed for high-dimensional similarity search; they lack the indexing structures, filtering precision, and low-latency execution paths needed for real-time retrieval at scale. In contrast, native vector databases are purpose-built for this challenge, offering features like one-stage filtering (applying structured filters during search), hybrid search, quantization, and intelligent query planning. These become essential for building AI systems that rely on fast, semantically relevant results across massive, evolving datasets.

Blocks & Files: What are the advantages and disadvantages of storing vectors on-premises or in the public cloud? What kind of underlying storage is needed?

Andre Zayarni: Storing vectors on-premises offers more control over data privacy, compliance, and latency, especially for regulated industries or where infrastructure is already in place. Public cloud, on the other hand, provides scalability, ease of setup, and access to managed services that reduce operational overhead. The trade-off often comes down to data sensitivity, performance requirements, and team resources. Vector workloads benefit from fast, memory-efficient storage, ideally with memory mapping, tiered RAM-disk balancing, and I/O optimized for large, fixed-size embeddings.

Blocks & Files: What data management issues should we keep in mind?

Andre Zayarni: Managing vector data isn’t necessarily difficult, but it does require infrastructure built with vector workloads in mind. Vectors are derived from source data, so if models or content change, re-vectorization and reindexing may be needed to preserve accuracy. Metadata also plays a key role in filtering, access control, and ranking logic, so it needs to be stored and queried efficiently alongside vectors. A well-designed vector database can handle these needs, supporting real-time updates, metadata indexing, and high-performance search without manual workarounds.

Blocks & Files: How are vectors sent to GPUs? Is there an ETL (Extract, Transform, Load) process or similar for getting vectors out of a vector database and sent to GPUs for inference and/or training?

Andre Zayarni: Vectors aren’t used to train models; they’re the output of embedding models trained on raw data like text or code. Vector databases don’t perform inference themselves; they store and retrieve pre-computed vectors to support downstream tasks like RAG, semantic search, or AI agents. Retrieved vectors are passed to application components like rerankers for relevance tuning, fusion layers for combining multiple retrieval results, or business logic modules that drive decisions – such as choosing the next action in an AI agent or assembling an answer in a chatbot.

Blocks & Files: Is Nvidia GPUDirect support a necessity for a vector database, or is that a concern of the underlying storage?

Andre Zayarni: Nvidia GPUDirect is not a necessity for a vector database. It’s a low-level hardware feature mainly relevant to high-throughput data transfer between storage and GPU memory. In vector search, performance hinges more on fast indexing and retrieval – tasks that can be GPU-accelerated without relying on GPUDirect. Qdrant, for example, uses the Vulkan API to enable platform-agnostic GPU acceleration for indexing, allowing teams to benefit from faster data ingestion across Nvidia, AMD, or integrated GPUs without being locked into a specific vendor.

Blocks & Files: Are there security and governance issues relevant to a vector database and/or to AI pipelines generally?

Andre Zayarni: Definitely. AI pipelines often involve sensitive or proprietary data, so beyond speed, they need strong access controls and governance. For example, fine-grained API key permissions that enable access restrictions not just by collection, but down to individual vectors or metadata filters; multi-tenancy to isolate teams or projects, and role-based access control (RBAC) in managed cloud environments. More broadly, support for hybrid and private cloud deployments provides the flexibility to enforce security policies without compromising performance or deployment choice.

Blocks & Files: When AI agents are involved, what is the role of MCP?

Andre Zayarni: MCP gives agents a standardized way to interact with external memory during reasoning loops by retrieving relevant context, recalling prior knowledge, or grounding new inputs on the fly. Vector databases are often used as this memory layer, where agents query embeddings tied to documents, code, or conversations. An MCP integration that enables real-time, filtered retrieval across local or cloud environments is ideal for agentic tasks like planning, grounding, or code synthesis.

Blocks & Files: Should AI agents be viewed as quasi-human users and the same zero trust data access security principles employed?

Andre Zayarni: AI agents should follow the same zero trust principles as human users, with strict authentication and scoped access. Capabilities like vector-level API key permissions, multi-tenancy, and cloud RBAC ensure secure, compliant agent interactions.

Qdrant CEO on why AI needs purpose-built vector search

ABOUT US

FOLLOW US

Pure, Ocient and Solidigm push back against Seagate-quoted SSD paper

Databricks rolls out Lakebase Postgres and Agent Bricks for AI-era apps

Storage news ticker – June 16