There’s a new data transformation kid on the block in the form of Tobiko Data, which has just raised $21.8 million in funding.
Blocks & Files met Tobiko Data co-founder and CTO Toby Mao this week in Santa Clara, California, to discuss the firm’s position in the market.
Tobiko, named for the flying fish roe popularly used in sushi, develops SQLMesh, an open source data transformation platform. With SQLMesh, data scientists and analysts can build “correct and efficient data pipelines,” promises the firm.
Development environments can take too long to spin up and are costly, says Tobiko, and when things go wrong, it’s “painful” to undo changes. Also, there is a lack of visibility into data pipeline performance, and managing large datasets have their own complexities.
Fundamentally, says Tobiko, DBT projects “don’t scale up”. DBT (data build tool) automatically generates documentation around descriptions, models dependencies, model SQL, sources, and tests. DBT creates lineage graphs of the data pipeline, and aims to provide transparency and visibility into what the data is describing, how it was produced, as well as how it maps to business logic, but some tools are better than others.
Tobiko claims it can save customers time and money in the whole process by only having to build tables once, reducing warehouse costs, and helping to get more work done.
Formed around 18 months ago, with the founders having helped lead the development of data performance metrics at the likes of Apple, Netflix, Airbnb, and Google, Tobiko competes against more established data transformation companies like DBT Labs.
On this week’s IT Press Tour across Silicon Valley, Mao told press and analysts: “With Snowflake [the cloud data platform], if you use it inefficiently, it can cost you a lot of money. With our SQLMesh, you only have to produce tables once, while other technologies make you do them time and again.
“A single change in a query can affect billions of rows of data, and companies spend millions every year on unnecessary rebuilds of the warehouse when only a small precise change is needed.”
SQLMesh can be used for free, but the firm has just launched SQLMesh Enterprise, a paid-for product which is a full observability platform. It not only tells users something went wrong with their data, but it also tells users why.
Mao says: “We are talking to Snowflake, for instance, about the value of our technology, but the software is not currently integrated into their ecosystem. But it can save users thousands of dollars off their Snowflake bill.”
He added: “Your developers can do things quicker and be more productive. They don’t have to go and get a coffee every time they change something in the data, they just run things with us and carry on with what they’re doing, they don’t have to wait around.
“This is all down to our SQL expertise, including our own SQLGlot framework that supports SQLMesh.” SQLGlot has been made open source too. It is a SQL parser, transpiler, and translator that currently supports 24 different SQL dialects.
Tobiko’s advantage is its fundamental semantic understanding of SQL, meaning it only executes necessary downstream changes instead of completely rebuilding the warehouse. It also holds state, allowing for first-class incremental refreshes, because it knows where users left off, and powers virtual data environments. The system understands and remembers every version of every model, avoiding duplicative computation.
Automated data movement provider Fivetran is a fan and user of Tobiko technology, with founder George Fraser being part of the latest funding round.
Mao says Tobiko will remain focused on data transformation. “We are a dev tool, we are not a data warehouse, we do metadata and support the tooling to move the data between any platform, whether it be Snowflake, Databricks or anywhere else, without being locked in.”
On the potential of being acquired by a larger data management player, Mao said: “At the moment we’re having fun, I’m doing software with my friends. We’ll see where we are in the future.”