Data Management

Pinecone: Long-term memory for AI

February 28, 2023

Startup Pinecone, which provides a vector database that acts as the long-term memory for AI applications, has hired former Couchbase CEO and executive chairman Bob Wiederhold as president and COO after 15 months of acting as advisor and board member.

A long-term memory for AI apps sounds significant, but is it? Why do such apps need a special storage technology? Pinecone’s software applies to vector databases which are used in AI and ML applications such as semantic search and chatbots, product search and recommendations, cybersecurity threat detection, and so forth. After one year of general availability Pinecone says it has 200 paying customers, thousands of developers, and millions of dollars in annual recurring revenue (ARR).

Edo Liberty, Pinecone founder and CEO, said in a statement: “To maintain and even accelerate our breakneck growth, we need to be just as ambitious and innovative with our business as we are with our technology. Over the past 15 months I’ve come to know Bob as one of the very few people in the world who can help us do that.”

The key Pinecone technology is indexing for a vector database.

A vector database has to be stored and indexed somewhere, with the index updated each time the data is changed. The index needs to be searchable and help retrieve similar items from the search; a computationally intensive activity, particularly with real-time constraints. That indicates the database needs to run on a distributed compute system. Finally, this entire system needs to be monitored and maintained.

Liberty wrote: “There are many solutions that do this for columnar, JSON, document, and other kinds of data, but not for the dense, high-dimensional vectors used in ML and especially in Deep Learning.” The vector database index – the reason he founded Pinecone was to create an indexing facility – needed to be built in a way that was generally applicable and facilitated real-time search and retrieval.

When AI/ML apps deal with objects such as words, sentences, multimedia text, images, video and audio sequences, they describe them with numeric values that can describe a complex data object, such as color, physical size, surface light characteristics, audio spectrum at various frequency levels and so on.

These object descriptions are called vector embeddings and stored in a vector database, where they are indexed so that similar objects can be found in the database through index searching. A search is not run based on direct user-input data such as keywords or metadata classifications for the stored objects. Instead, we understand, the search term is processed into a vector using the same AI/ML system used to create the object vector embeddings. A search can then look for identical and similar objects.

Pinecone was founded in 2019 by Liberty, an ex-AWS director of research and one-time head of its AI Labs that led to the creation of Amazon SageMaker. He spent just over two and half years at AWS after almost seven years at Yahoo! as a research scientist and senior research director. Pinecone raised $10 million in seed funding in 2021 and $28 million in an A-round in 2022.

In a 2019 blog, Liberty wrote: “Machine Learning (ML) represents everything as vectors, from documents, to videos, to user behaviors. This representation makes it possible to accurately search, retrieve, rank, and classify different items by similarity and relevance. This is useful in many applications such as product recommendations, semantic search, image search, anomaly detection, fraud detection, face recognition, and many more.”

Pinecone’s indexing uses a proprietary nearest-neighbor search algorithm that is claimed to be faster and more accurate than any open source library. The software’s design provides exceptional performance regardless of scale, with dynamic load balancing, replication, name-spacing, sharding, and more.

Bob Wiederhold, Pinecone — *Bob Wiederhold*

Vector databases are attracting a lot of attention. Zilliz raised $60 million for its cloud vector database technology in August last year. And we wrote about Nuclia, the search-as-a-service company in December. Wiederhold’s transition from advisor and board member to a full-on operational COO role indicates he shares that excitement.

He said: “There is incredibly rapid growth across all business metrics, from market awareness to developer adoption to paying customers using Pinecone in mission-critical applications. I am ecstatic to join such an elite company operating in such a critical and growing market.”

Pinecone: Long-term memory for AI

ABOUT US

FOLLOW US

VAST Data cracks into HPC with Doudna supercomputer win

Storage news collection – July 3

Kioxia tunes SSD-based vector search for RAG workloads