A startup called The Modern Data Company has appeared with DataOS, a data operating system aimed at unifying data silos, old and new, and giving users a self-service programmable data OS for analytics, AI, ML and applications.
These are bold claims. Do they stack up? TMDC was co-founded by CEO Srujan Akula and CTO Animesh Kumar in 2018. According to Linkedin profile Akula had been a product consultant and advisor to multiple companies since leaving a VP Product position at Apsalar (now Singular) in 2017. Kumar left a DMP architect position at Apsalar in 2018 and was co-founder and CTO at automated billing company 47Billion until 2018.
An Akula statement said: “DataOS makes your existing legacy infrastructure work like a modern data stack without rip-and-replacing anything. It costs significantly less, gives you complete control of your data, and makes creating new data-driven applications and services simple for developers and business users alike.”
TMDC says it has developed a data operating system called DataOS to remove complexity and future proof a user’s data ecosystem by intelligently unifying data management under one roof. It is a “complete data infrastructure product” and TMDC claims in a video that “DataOS delivers an operationalisation layer on top of an existing legacy or modern data stack.”
It has 121 employees and is headquartered in Palo Alto. There are no details available about its funding.
We’re told DataOS enables users to treat their data as software, using declarative primitives, in-place automation, and flexible APIs. These enable users to “easily discover, understand and transform data.” It is, TMDC claims, the world’s first multi-cloud, programmable data operating system, simplifying data access and management by decoupling it from tools, pipelines, and platforms.
TMDC is making a massive claim. DataOS connects, it says, with all systems within the data stack without the need for integrations. Taken literally this means mainframes, IBM Power systems running the i operating system, all flavors of Unix and Linux and Windows, all filesystems and all SAN arrays and all object storage systems.
We have asked TMDC if this is true and what connectors it uses, what data access protocols are available to users, how DataOS is programmable and other questions.
TMDC says DataOS enables access across multiple clouds and on-premises systems in a governed fashion, abstracting away data infrastructure complexity and allowing users to manage and access data across any format and any cloud through a single pane of control. It asserts that DataOS can connect to any system and can see everything that is happening to the data, so customers get a near real-time view.
DataOS allows data developers and users to access the data through a knowledge layer using an open standards approach. Developers can work with tools of their choice with respect to programming languages, query engines, visualization tools, and AI/ML platforms.
Data is delivered in appropriate formats to deliver advanced analytics, power AI/ML, enable rapid experimentation and build data-driven applications. DataOS supports secure data exchange/data sharing with teams, and heterogeneous formats, such as SQL, Excel files, and more. It can extract data and see metadata.
TMDC declares that “DataOS is a modern, open and composable data management platform-as-a-service (PaaS) that provides total data visibility and turns data into insights that drive actionable intelligence.”
Q and A session
Blocks & Files: TMDC says DataOS connects with all systems within the data stack without the need for integrations. Taken literally, I think this means mainframes, IBM Power systems running the operating system, all flavors of Unix and Linux and Windows, all filesystems and all SAN arrays, and all object storage systems, whether on-premises or in the public clouds. Is this true?
TMDC: DataOS provides a consistent way to access, discover, and govern data that is often managed in a highly fragmented and siloed manner. Through the data depot contract, DataOS delivers this consistency across access, discovery, and governance.
DataOS does not connect to mainframes or IBM power systems directly but does connect into the DB2s of the world where the data resides. It can also connect to data pipeline tools, data governance, catalogging, quality tools, etc. to build a knowledge layer that is refreshed in near real time.
Blocks & Files: What connectors does DataOS use to link to SAN, NAS and object systems on-premises and in the cloud?
TMDC: See above.
Blocks & Files: Can the DataOS abstraction layer scale, and by how much?
TMDC: DataOS is not a data virtualization play. While DataOS provides business teams with the ability to create logical data models, it does so by intelligently moving data that needs to be moved to meet the SLAs of the use cases that rely on that data. DataOS comes with a storage layer that supports multiple data formats to facilitate intelligent data movement. DataOS was architected to scale both horizontally and vertically, and we have auto scaled up and down to process over a billion data events, from a normal load of 50 to over 100< events per day — without any intervention needed.
Blocks & Files: Can it work across a users’ distributed site and both on-premises and the public clouds? Which public clouds?
TMDC: DataOS delivers a consistent way to work with data that sits across multiple clouds and data centers.
Blocks & Files: What protocols can users have available to access data?
TMDC: Users and applications can access data with DataOS using our standard JDBC, ODBC, ODATA connections. They can also leverage REST API’s and GraphQL interfaces that are available on top of all data products within DataOS. [I, and Reverse ETL.]
Blocks & Files: In what sense does DataOS make data programmable?
TMDC: DataOS makes data “programmable” because business teams can define domain level data lenses that can be composed to create higher order capabilities in an object oriented programming constructs, enabling businesses to take those building blocks (data products and data lenses) and power many types of use cases without needing data engineering support.
The DataOS architecture starts with core primitives that are the building blocks to realizing any data architecture design pattern (e.g., data fabric, data mesh, etc.). The composable nature of the architecture allows our customers to take the building blocks and compose data experiences instead of building/integrating them.
Blocks & Files: How is TMDC funded?
TMDC: We cannot reveal this information at this time.
These answers reveal that at least some of TMDC’s claims require scrutiny, such as its claim to connect all systems within the data stack. We also note that TMDC does not reveal which public clouds it supports or the protocols (connectors) it uses to connect to storage repositories. Finally, it will say nothing about its funding, which we find quite odd.
Absent more detailed information about TMDC’s funding, technology, customer progress and engineering credentials, Cohesity, CTERA, Hammerspace, Komprise, LucidLink, Nasuni, Panzura and others needn’t start worrying just yet.