A brief news note yesterday said startup Ocient (pronounced oh-see-ent) enabled SS8 Networks to harness petabytes of data in interactive time. Ocient is focussed on ingesting billions of rows per second, and filtering and computing aggregate results at rates up to trillions of rows per second.
Rows? As in relational databases? What was this about? We all know and understand that unstructured data volumes are rising like an ever-growing tsunami, but structured data? In a relational database with trillions of rows? Is that growing too?
Our curiosity piqued, we took a closer look at this startup — realising as we did that there had been no public discussion of how its technology worked in anything but the most bland and uninformative terms.
Ocient was founded in 2018 by CEO Chris Gladwin, who previously founded object storage supplier CleverSafe (which was acquired by IBM for $1.3 billion in October 2015) along with chief product officer Joe Jablonski, and chief architect George Kondiles. Several ex-CleverSafe execs are in the management team.
The aim of starting Ocient was to build database and analytics software to enable fast analysis of exabyte-scale datasets. The founders wanted to develop a new relational database and data analytics solution (DAS) using industry-standard interfaces such as SQL, JDBC and ODC, and commodity hardware. Funding rounds followed the classic tech formula: tech demo, prototype and then deliverable v1.0 product, with self-funding to a $10 million A-round in 2018, a kind of extended A-round for $15 million in 2020 and, this year a $40 million B-round — $65 million in all.
The developed Ocient DAS can, the company says, hold quadrillions of rows of data, ingest billions of rows per second, and filter and compute trillions of rows per second. Imagine a relational database holding quadrillions of rows of data — that’s 1,000 trillion. How many customers would need to consider buying that big a database?
The DAS software can be deployed on commodity hardware or in the public cloud, uses massively parallel processing on large core-count systems and is claimed to be 1,000 times faster than leading MPP, NoSQL and Hadoop-based databases when querying a large dataset (with same hardware, queries and data). Ocient says analytics that once took an hour now take 10 seconds or less.
The DAS system benchmarks at five to 1,000 times faster — typically around 50 times faster — than current high-performance alternatives like MPP, NoSQL, Hadoop-based databases and open source Presto. It requires 20 per cent of the data storage footprint of these alternatives. Ocient says its software supports standard ANSI SQL including aggregates, joins and count distinct through its JDBC, ODBC, and Spark connectors. There is also a Tableau connector.
Apart from the claim it is faster than MPP, NoSQL, Hadoop and Presto, there is no solid performance data enabling a comparison against Oracle, Snowflake or any other proprietary named product.
There is no detailed public configuration data and no technology backgrounder or white paper and no consultancy review of the product, such as an ESG validation, and certainly no published benchmark data with test configurations. We know from Ocient’s website that the DAS hardware is industry-standard and utilises NVMe SSDs and 100Gbit/sec networking — but that’s all.
After four years of development Ocient is a black box. So we tried to shine a bit of light into it.
Let’s envisage a 100PB DAS database and imagine how Oracle would configure such a system, with its scale-out Exadata hardware and software.
An Exadata X9M-2 is made up of database machine and storage expansion racks. The database machine part contains database servers (2–19) and storage servers (3–18). There are up to 1,216 CPU cores and 38TB memory per rack for database processing, and up to 576 CPU cores per rack dedicated to SQL processing.
It uses a 100Gbit/sec RoCE network and up to 27TB of persistent memory. An X9M-2 can hold up to 920TB of NAND per rack and up to 3.8PB of disk per rack — 4.7PB in total. A 100PB Exadata X9M-2 system would then need 21.3 standard racks.
Based on this we believe the Ocient DAS uses a scale-out node architecture employing tens of racks, scaling up to hundreds. It will use NVMe SSDs and, possibly, Optane drives, for RDB metadata, and nearline disk drives for the actual data. The DAS database and analytics software will be distributed across a sea of nodes — tens or even hundreds of them — and we envisage these as Exadata-type nodes, combining SQL processing and storage servers with souped up software.
We found a US patent, number 10,754,856, attributed to George Kondiles and Jason Arnold of Ocient, and entitled “System and method for optimising large database management systems using bloom filter.”
Its abstract states: “A large highly parallel database management system includes thousands of nodes storing huge volume of data. The database management system includes a query optimiser for optimising data queries. The optimiser estimates the column cardinality of a set of rows based on estimated column cardinalities of disjoint subsets of the set of rows. For a particular column, the actual column cardinality of the set of rows is the sum of the actual column cardinalities of the two subsets of rows. The optimiser creates two respective Bloom filters from the two subsets, and then combines them to create a combined Bloom filter using logical OR operations. The actual column cardinality of the set of rows is estimated using a computation from the combined Bloom filter.”
“Column cardinality” is a measure of the number of unique values in a database table column relative to the number of rows in the table. A Bloom filter is a probabilistic data entity used to test if a data element is in a data set, and it says whether the element is definitely not in the set or possibly in it.
We note the “thousands of nodes” phrase with interest.
Where we’re at
We think Ocient is developing its software in conjunction with large potential customers. It has 16 people across two advisory boards — a relatively large number in our view — and we think these may be being used to help pull in test customers as well as to offer advice.
The company has a VP Of Global Sales and Marketing, Kumar Abhijeet, and a COO, Bill McCarthy. These two positions inform us that it is talking to potential customers and taking in money, or very close to being ready to do so.
We believe it has a capable and benchmarked prototype system, and is building a v1.0 product. Once that is built and proved to deliver the claimed goods, then it will unlock C-round funding, in the 2022/23 timeframe, and a big go-to-market push.
Ocient could be a huge success if: its scalability to exabyte-size datasets is real and unique; its performance advantage over Oracle, Snowflake, etc., is real; and there are sufficient customers who want to analyse exabyte-sized relational databases. Gladwin has a great track record, with CleverSafe. Watch this Ocient space to see if he can do it again.