Cloudian CEO Michael Tso thinks that the public cloud vendors will have to run their software on-premises because data has gravity. Sometimes compute must come to the data, and a single hybrid cloud experience needs the software environment to be the same across the on-premises and public cloud worlds. See if you agree with him.
Blocks & Files: How do you view the status of on-premises computing outside IoT edge sites? Will it persist in spite of the attractions of public cloud computing?
Michael Tso: On-prem computing is here to stay. A quarterly survey of enterprise VARs indicates that 40 percent of workloads will remain on-prem, a figure that has remained steady for about two years now. Migration to the cloud did accelerate during the early part of the COVID era, but what we are seeing now is customers bringing some of those workloads back on-prem. They are doing this for a number of reasons, the most common of which are cost and data sovereignty.
But we are also seeing that user expectations have evolved. IT managers now know the benefits of public cloud, such as capacity on-demand, simple management, and a high-level API. They now want solutions that will deliver these benefits everywhere – in their own data centers, at their service providers, and at the edge. That’s exactly what Cloudian’s software provides.
Blocks & Files: Do you think the idea that data gravity discourages data movement as data scales is true?
Michael Tso: Moving data is complicated, time consuming, and expensive. In many use cases, data is created near the edge where actual devices and people are. Rather than move that data to the cloud, it is often easier to move the compute to the data. The end result will save time and cost, and can enhance security. Information technology has always been about delivering a better business outcome in less time. For many workloads, moving the compute to the data accomplishes both.
Blocks & Files: Is it the case that compute will come to the data rather than substantial amounts of data being transferred to compute? What is a substantial amount of data? Won’t networking speeds increase or WAN optimization improve to make data transfer to compute more feasible?
Michael Tso: At the macro level, data and compute performance are both growing exponentially with Moore’s law, while overall WAN network bandwidth is growing at a slower rate. In other words, data is growing faster than the pipes. There will therefore be a growing preference towards data remaining near the source.
At the application level, the optimal solution will always be use case-driven. There will never be a single solution that is ideal for all. AWS now speaks of a “continuum” of options – ranging from their Regions, to AWS Local Zones, to their AWS Outposts servers. In fact, we are helping by working with AWS to provide storage for Local Zones and Outposts. We have significant experience here because our customers already employ a distributed cloud. They use Cloudian on-prem, and Cloudian systems deployed at service providers, and hybrid solutions that combine Cloudian and public cloud.
The “ideal” solution is driven by application requirements, and latency is just one consideration. Other concerns include data sovereignty, data security, data availability, and of course cost. Data volume is certainly a factor, but is just one driver. We have customers with as little as 100TB. Ultimately, it’s the overall requirements that will determine the answer. And because Cloudian is cloud-native architecture, we can tie it all together seamlessly, from the edge, to the core, to the cloud.
Blocks & Files: If data is not moved to the public cloud then how will public cloud compute process it? Are you saying that, logically, public cloud compute (i.e. AWS/Azure and GCP compute instances) has to come to the on-premises data? How might this happen?
Michael Tso: Cloud compute technology will be everywhere. I spoke of AWS’s “continuum,” which puts AWS compute wherever you need it using Local Zones and Outposts. Microsoft Azure and Google Cloud also offer distributed compute solutions. And VMware and Kubernetes give you the ability to run their platform anywhere, whether in the data center or cloud.
Information technology has always been fluid, moving wherever the applications need it to be. The Windows OS started on the PC, but came to dominate the data center. Cloud technology started in a shared environment, and we now see it migrating to on-prem. This is inevitable since the majority of new workloads deployed today use cloud-native technology, according to Gartner. These workloads will end up running everywhere and will need cloud-native platforms to run on.
Blocks & Files: AWS instances run on large-scale AWS infrastructure. What problems might it face in down-scaling this and so port its software to run on smaller on-premises hardware infrastructure?
Michael Tso: Scaling down cloud storage is much harder than scaling down cloud compute. Creating a smaller compute platform can be done readily, as AWS has already shown with Outposts. But making cloud storage run on a small footprint… that’s a harder problem to solve.
When you design a storage platform, you make hundreds of design decisions. How will it scale? What hardware will it run on? How do you retire old hardware? How is the data protected? Does it have to be easy to manage? All of these decisions ultimately get manifested in millions of lines of code. If you change your operating mission, it gets really hard because every fundamental change ripples through all that code.
If you start with a technology that’s designed for a team of experts to run on a massive scale, and then attempt to shrink that to become a 1U box that anyone can run, it changes everything. You’re pretty much starting over. We started Cloudian to build an enterprise-scale, distributed system that can start small, with a few 1U servers, and grow to exabytes. Our mission remains exactly that.
Blocks & Files: Will Cloudian move its software up-stack in a sense by adding data services on top of its object storage SW? Or will Cloudian add an integration layer so that up-stack applications could use it, as if it were, for example, AWS’s S3 service for data virtualizers like Dremio, Databricks, Snowflake and others? Then they could run on-premises.
Michael Tso: The S3 API has really revolutionized the data warehouse space. Cloudian partners with all of the analytics solution providers to provide an S3-compatible enterprise storage platform that integrates seamlessly with all of them. We have already announced solutions with Snowflake, Teradata, Microsoft SQL Server, Vertica, Splunk, Greenplum, and Cribl.
With this standardization on the S3 API, enterprises can now pick an analytics solution and run it anywhere, cloud or on-prem, using the same data type. Anything they can do with public cloud storage, they can also do with Cloudian. This gives them complete flexibility to optimize for performance, data sovereignty, and cost.
Blocks & Files: Backup data protectors generally write to a file system target and then tier off to S3 in the cloud. Could they backup direct to object storage? If they could, how might such a system appear? Would a purpose-built-backup appliance be replaced by an S3-compatible object store?
Michael Tso: Data protection is an exciting area that’s evolving quickly. Originally, backup solutions needed a multi-layer storage architecture to meet data recovery time and recovery point objectives. This required an expensive filer for short-term storage, plus an S3-based solution for long-term storage. That’s no longer the case.
Vendors are now starting to offer direct-to-object solutions, such as Veeam’s new v12, that eliminate the need for a filer. This provides several advantages, one of which is cost savings. A Cloudian customer told us they save about $1M per year by going with this approach. Another advantage is enhanced ransomware protection. With direct-to-object storage, data is made immutable earlier in the process, and is therefore better protected from hacker encryption.
Direct-to-object is a simpler approach that provides better security at less cost. Unlike a purpose-built backup appliance, with an object storage system, the deployed capacity can be used for other applications within a multi-tenant environment. You have a shared storage resource that can be expanded without limit. Just like the public cloud.