Interview: Delphix, the data virtualization, vendor now describes itself as the industry leader in “programmable data infrastructure”. So we asked Jedidiah Yueh, founder and CEO, what that term actually means. Here is his reply.
Blocks & Files: What is the overall definition of a programmable data infrastructure?
Jedidiah Yueh: Today, it’s easy to get an automated build of an application environment with code, compute, storage, etc. But how do you get the right data into the environment?
Code, servers, storage, compute, and networks have all been automated and can be managed via APIs. Data is the last automation frontier.
Programmable data infrastructure (PDI) enables data to be automated and managed via APIs.
PDI sources data from all apps across the multi-cloud, including SaaS, private, and public clouds. It ensures data compliance with regulations like GDPR and CCPA by masking personally identifiable or sensitive information.
PDI enables mission critical application use cases, including cloud adoption, CI/CD, and AI/ML model training and deployment.
Blocks & Files: What is the purpose of a data infrastructure?
Jedidiah Yueh: Tech giants are masters of data management and infrastructure. Today, data is a strategic battleground for all businesses.
Data is critical for digital transformation – cloud adoption, accelerating application releases, and training AI/ML models.
Data infrastructure is what enables these objectives, accelerating digital transformation and increasing the ability to compete in the modern world.
With the Delphix API-first data platform, enterprises can adopt cloud 30% faster, release software 50% faster, and access 90% more data for AI/ML, while protecting data privacy and maintaining compliance with GDPR, CCPA, HIPAA, etc.
Blocks & Files: What is the difference between a programmable data infrastructure and data management?
Jedidiah Yueh: You can manage data manually without automated data infrastructure. But it takes a really long time, and the data is often filled with compliance and security risk.
Digital transformation programs are starved for data and environments. When you delay the data you need in application environments, you hemorrhage developer productivity and overrun project timelines and budgets.
Programmable data infrastructure ends the wait for data and ensures data security and compliance.
Blocks & Files: A data infrastructure would seem to consist of (a) data and of elements that (b) store it, (c) move it, (d) deliver it to applications in the right format, and (e) manage the data and this infrastructure. What are a, b, c, d, and e for Delphix? I’m looking for named product entities that customers can buy or subscribe to.
Jedidiah Yueh: A, b, c, d, and e are all part of the Delphix Data Platform, which is software that we sell on a subscription basis.
In addition, we sell Compliance Engines, which operate as part of the platform to profile, mask, de-identify, and tokenize data to comply with regulations including GDPR, CCPA, HIPAA, PCI, etc. You can also buy these separately from the platform, but then you lack the version control and delivery capabilities.
Finally, we sell Data Control Tower (DCT), which provides centralized API access and management for data stored across the multi-cloud (SaaS, on prem, or public cloud).
Blocks & Files: A programmable infrastructure must be programmable. How are the elements of such an infrastructure programmable? Would you argue that setting policies amounts to programming? Are API calls used? API calls by what? API calls to what?
Jedidiah Yueh: Setting policies does not amount to programming. You need to be able to make API calls as part of automated workflows, such as a CI/CD pipeline.
In CI/CD, you might have an automated build process that stands up compute, storage, network, and the latest code, but then you need the data.
For programmable data infrastructure, you would make API calls to a Delphix Data Platform Engine or Data Control Tower.
Those calls need to cover a comprehensive set of requirements to be part of a CI/CD build pipeline, including:
- Provisioning data to a target application environment (build)
- Configuring and starting the database or data store
- Profiling data for compliance or security risk
- Masking data for compliance
- Selecting the specific version of data or time of data (e.g. most recent data)
- Rollback, reset, or cleanup of data or the environment (e.g. after test or batch run)
- Bookmarking for later retrieval and version control.
API calls can be made by automation servers, including Jenkins, to enable CI/CD pipelines.
Blocks & Files: If a data management product, such as Cohesity or the Google-acquired Actifio, have API interfaces then do they offer a programmable infrastructure? How is Delphix’ offering positioned against data management products (or services) offering API access and hence programmability?
Jedidiah Yueh: No, some backup products may have APIs, but they do not meet several requirements for PDI.
They do not enforce data compliance and masking, which is a foundational requirement in a world of increasing data privacy regulations and security risks.
They do not have comprehensive APIs to enable the application development lifecycle or a CI/CD pipeline (see a subset of our APIs shared above).
They do not meet performance or duty load requirements. For instance, we have customers running millions of CI/CD pipeline runs per month with Delphix. You simply cannot restore data fast enough from backup solutions to satisfy that demand, even via APIs.
Blocks & Files: Does Komprise offer a programmable data infrastructure? If not – why not?
Jedidiah Yueh: Komprise is a tool focused on data tiering and mobility (modern HSM).
It does not meet the data compliance requirement and lacks a comprehensive set of APIs necessary to enable application workflows such as CI/CD, SRE, and AI/ML.
We prefer to leave it to Komprise to determine if they are or are not offering programmable data infrastructure.
Blocks & Files: Does Hammerspace offer a programmable data infrastructure? If not – why not?
Jedidiah Yueh: Hammerspace is a Kubernetes solution focused on data tiering and mobility (Kubernetes HSM and portability).
It lacks support for the world of apps outside of Kubernetes, the compliance requirement, and a comprehensive set of APIs necessary to enable application workflows such as CI/CD, SRE, and AI/ML.
Delphix supports Kubernetes, but also everything from mainframe to cloud-native PaaS apps.
We prefer to leave it to Hammerspace to comment on whether they are or are not offering programmable data infrastructure.