Vector-specific databases and knowledge graph extensions are being promoted as a way to have GenAI large language models (LLMs) access unstructured and block data. SingleStore, a database provider, claims that you don’t need either to have great GenAI applications.
SingleStore provides a multi-dimensional database supporting both online transactional and analytic processing – OLTP and OLAP. It also supports external table access to massive unstructured data sets stored in Iceberg format. It provides a multitude of data search methods including relevance scoring, phonetic similarity, fuzzy matching and keyword-proximity-based ranking in full text search and also vector search.
The company’s assertion is that its Pro Max database is a real-time data platform designed for all applications, analytics and AI. It supports high-throughput ingest performance, ACID transactions, low-latency analytics, and structured, semi-structured (JSON, BSON, text) and unstructured data (vector embeddings of audio, video, images, PDFs, etc.) data storage.
Startups like PineCone, Qdrant and Zilliz have developed vector databases to store the vector embeddings of text, audio, images and video data used for GenAI’s semantic search. Proprietary data in such formats is being used in retrieval-augmented generation (RAG) to improve LLM response accuracy and completeness.
Such specialized databases are not favored by SingleStore. CEO Raj Verma told us in a briefing this month: “Two and a half years ago vector databases became every database company’s worst nightmare. … Because all the investors started to think that that was the just the ultimate way to solve it all, you know, world hunger, whatever.”
Vector storage is a feature, not a product.
“It’s just now that we’ve all seen that the vector layer will belong to the incumbent database. And no one’s going to add a layer of complexity by introducing yet another vector database into the data architecture.
“Yes, I think, you know, when you get off the gate, Pinecone had some advantage over the rest of us, right from within its vector capabilities, for sure. But we all caught up.
“What we’re seeing is, if you were to ask a organization what vector database they’re using, a vast majority – I’m talking about 95 percent plus – are going to say that they are using their incumbent database for the vector capability.”
Verma thinks that the vector-only database companies may not survive. As an illustration of how he sees it: “One of our investors said that there was about $4 billion spent on applications that helped do some form of AI for Adobe Photoshop. There was actually $4 billion worth of investments. So you could probably say $14 billion worth of market cap at which companies got investments at least. And then what happened is about eight months ago, Adobe released its AI suite of products on Photoshop, and all 135 of the startups are either dead, or they don’t know that they are dead yet.”
He thinks that GenAI and other major data access applications work best when they access a single virtual silo of data built for real-time and fast access. It provides a single point of management and support, a complete source of an organization’s data and simpler supplier relationships. And it includes both structured and unstructured data.
Vectors are representations of unstructured data, not structured data, as is stored in relational and other databases. That cannot be readily vectorized and much of a record’s context and meaning is encapsulated in row and column metadata. A startup like illumex says that the best way to representing this is using knowledge graph (KG) technology. Connector applications are then written to make such information available to the GenAI LLMs.
SingleStore does not support knowledge graph representations of structured data record meaning. Its position is that KG technology is not needed – particularly because, at scale, its data access rates are slow.
CMO Madhukar Kumar tells us that with structured data: “You need to get deterministic queries, answered at an extremely fast rate.”
He explained, “When it comes to knowledge graph, if you boil it down to first principles, it’s really the entity and the relationships. And you can store it actually in different ways. You can store it in a graph database, which is RDF (Resource Description Framework). But then you have ETL (Extract, Transform and Load). You have a whole different team moving data. It’s not really efficient when you’re talking about 10 petabytes of data, and trying to create something like a breadth-first search.
“Sure, it’s more efficient maybe. And it also maybe gives you more accuracy. But at the end of the day, a knowledge graph is an addition to a bunch of other things that you do – which is structured, unstructured, you do vector or semantic search, you do read ranking, and you do exact keyword match.
“One of our largest customers is LiveRamp, and LiveRamp used to have a graph database – the largest identity graph in the whole world. It’s a marketing analytics company and it’s massive. And they went from a graph database to SingleStore and their workloads that were taking about 18 hours or so came down to like 18 seconds.”
As with vector databases, SingleStore’s view is that any point advantages are neglible when set against the ones accruing to having a single, real-time source of database truth for an organization. In Verma’s words: “We have been saying for years and years that the time is now for real-time data. … Truly with AI now, it’s table stakes, because you are mixing and matching real-time context with your vast corpus of data that’s sitting everywhere in various different data types. That’s why we feel it’s really the perfect time for us.”