Pure Storage PortWorx Q&A: Why storage needs a data strategy

Murli Thirumale

Paid Feature Murli Thirumale is VP and general manager of the Cloud Native Business Unit at PortWorx, now part of Pure Storage, and he told Blocks & Files why organisations need to have a data strategy – not just a storage strategy

Blocks & Files: There are lots of companies providing storage for containers. And it seems that on the storage front there are two approaches. One will hook up an external array with the CSI interface, as Dell EMC has recently done. Another is to actually have cloud-native storage facilities within Kubernetes and storage is just another type of container, a system container. Mixing of virtual machines and containers also seems to be a sensible thing to do, but it sometimes seems like the two things are fundamentally incompatible, and you can make a stab at putting an abstraction layer across the two of them. But really, you need to go all in on one or the other. What is your view on that?

Murli Thirumale: I’d like to take a stab at that question from the customer in, rather than the vendor out. Storage itself is evolving, moving on to data management, and  on to a new way of thinking about how enterprises need to win with data.

Let’s step back 20 years or so, and we can see the start of cloudification. The big change was the cloud and moving from capex to opex,. But technology always drives changes in the underlying hardware infrastructure. So, not only was the hardware infrastructure being cloudified, it was also being upgraded – you had Cisco Nexus top-of-the-rack switches, you had HCI happening. 

The next phase is when SAP and all of those guys came to the fore, and it was the world of apps. The value changed to apps, and Software-as-a-Service. Today the world of apps and data is about automating these apps, and this has led to containerisation.

Now, people are not just being responsive or competing by going fast. They are competing by being smart with their data – this is the “data is the new oil, data is the new currency” argument.

Being smart is not just a question of using your own data – I am not talking about data mining HDFS data lakes, here. This is about conflating your data with the world’s data. To take one example, one of our customers is a COVID vaccine company, and they were able to do fast data science models. They were comparing publicly available information versus their own private tests. So, it’s conflating the two that allows you to gain insight. Now, let’s think about Uber, which is nothing but conflating a lot of publicly available GIS information with its own private information about the driver and about where I want to go.

So in this enterprise journey, the world of storage and CIO has moved from thinking about cloudifying infrastructure to automating apps. That is where the puck is now.  But mining data, both real time and batch wise, to gain insight – that’s where the puck is going.

How does the world of storage add value and not just turn into those old storage admins working away in the basement of enterprises and never seeing daylight? The reality is the world has moved on to apps, people and DevOps. So how does storage cope in this world? The answer lies in migrating to a data strategy.

So we can’t just be furniture guys concerned about wardrobes, drawers and boxes?

What does storage do? Storage is storing data, data management automation. And what we would do today is about freeing that data from one place and making it multi-cloud and multi-app and all of that. But in the end, you have to actually mine the data itself for insight.

So what is going to happen in the world of storage? At the bottom is, of course, the infrastructure. And nowadays, there is a software-defined storage overlay that has been overtaken by a Kubernetes storage overlay.

Now companies like Portworx or Robin.io, and there’s a host of other people, whether it’s the Cohesives of the world or other people who are in data management, this is about that automation layer. We have taken Kubernetes and we’ve taken that data, freed it from the array, and made it available across the cloud, across containers across different apps. But data management, this is the bulk of our business today.

Now when I say data, people think of data as one thing. But in my book, data is actually five things. Data is consumed as a service now, by these applications. So the app tier is at the top, but the first thing is databases, because databases are how data is stored. The second thing is data search – Elasticsearch, in particular. And I’m going to talk about these as services, because that’s really where the world is at. The third thing is analytics, and analytics can be an Excel spreadsheet, but it could be Tableau and come through the old-style analytics. And then there is AI and ML which is unique. Why? Because  this requires a different parsing of the data. It’s really GPU-based, TensorFlow, those types of things. Then finally, streaming, in which I include messaging. So I would merge these two boxes – streaming is really about having distributed data right out there, IoT, and stuff like that, or sensors of different kinds.

So these are the five data services. And this is actually the whole array of modern app solutions. It’s MongoDB, it’s Elastic, it’s Cassandra, it’s Kafka and Spark. This is not the old-style siloed world of Oracle and Sybase – which still exists. But this is the new world, where infrastructure is cloudified. Data is now all running on containerised apps. That’s the cloud-native world. But in addition, data is being consumed as a service in these five different sub-segments.

It looks like a stack. And it looks like the traditional place for suppliers like Pure is at the bottom, but that Pure is moving up the stack to provide services there. And the implication I’m drawing from what you’re saying, is that Pure Portworx will move even further up?

Pure is going to be in all these layers. These things are not mutually exclusive. And in fact, Portworx is an example of how we’re actually stitching these together. And in the future, there’s no reason why we couldn’t have a vertical slice that goes all the way, and even ties-in the app as well.

But you would probably do that with partnerships, wouldn’t you? Because of the amount of code you have to write doing that?

Exactly. So this is what I think a CIO needs to do from an industry viewpoint. But we’re not doing this on our own. This is not such a secret anymore, but people think of Kubernetes as being the container orchestrator. And they’re right, that is the primary role of Kubernetes.

But now, I believe there’s the second coming of Kubernetes, and this is really as an infrastructure control plane. It’s a multi-cloud infrastructure control plane. Sometimes Kubernetes is orchestrating infrastructure through the help of CNI. That’s what [Tigera’s Project Calico] does. I’m also using CSI extensions of Kubernetes and orchestrating storage. That’s what PortWorx or StorageOS [now Ondat] or Robin.io do. And then it will also be orchestrating VMs in the future, using KubeVirt, which is a new emerging technology that is gaining some currency. It’s still a technical concept, but I think more and more, you will see compute being orchestrated by Kubernetes.

That’s astounding. I’ve been with you up until now, but the idea of compute being organised by Kubernetes …

Well, there is this CNCF incubated technology called KubeVirt and it’s basically a way to orchestrate VMs using Kubernetes. You stand up VMs and then you can manage them just like you would containers, but now instead of containers being orchestrated, you’re instantiating and doing things like moving VMs and moving containers within VMs.

This is still in its infancy, but I think it’s going to happen. And this may sound a little bold, but I would say Kubernetes is really going to replace the vision of what OpenStack was intended to do. OpenStack was going to be this abstraction layer that allowed people to manage across any infrastructure, their storage, networking and compute. In storage it was Cinder and Swift, and so on.

My view is, it was so complicated and poorly done that it kind of crumbled. Of course, there are probably 150 companies using OpenStack that still swear by it. But these were mostly people who put a lot of effort into developing the standard. But in reality, the era of OpenStack is over. OpenStack was intended to be the universal way to manage infrastructure, and that’s what is happening now in a multi-cloud way with Kubernetes, with extensions to Kubernetes, called CNI, CSI, and then KubeVirt.

Do you see Kubernetes getting involved in composing datacentre IT resources?

Exactly. The old world is a machine-defined world. That’s how VMware was when the focus was on infrastructure. But now the focus is on as-a-service. Forget infrastructure – people want to consume services. So how do you shorten the path to something we consume as a service? You orchestrate it with containers and Kubernetes.

Look at PortWorx, which is an amazing example of this. Our buyer is not the storage admin, our buyer is a DevOps person. And eventually, with PortWorx Data Services, our buyer is going to be a line of business person.

Because you’re supplying services, not hardware boxes or software, you’re supplying services?

Thirumale: Yes, they’re consuming a service. So Kubernetes was conceived as an app organising framework, and so it is naturally already set up to be app-oriented, but it’s also consuming. So this is data services as code. You had infrastructure as code, software as code, now you have as-a-service as code.

But you don’t have to go there. Pure could remember it is a storage company. What’s in it for Pure to move up the stack to this as-a-service control plane and provide service-level applications up there?

I’m not saying we’ve left our data management world behind. That’s the bulk of our revenue as PortWorx and it’s growing. But this is a brand new thing we launched In September and it’s called PortWorx Data Services. Basically it’s a one-click way to deploy data services. Think of this as a curated set of data services, and over the next year there will be 12 to 14 of those.

Our analysis has revealed that these data services are probably about 75 to 80 per cent of what is being deployed out there in the modern kind of app world. It’s not about  siloed  infrastructure stovepipes – this is the modern multi-cloud world. And what we will offer is essentially a one-click way to do it.

On day one, we’ll let you deploy them with a single click. We actually have curated operators that allow database sizing, so we’ll do basic database sizing. And it will start with a default that we’ve known over time with our experience. And you can just download it – it’s a containerised version of Couchbase, or a containerised version of Cassandra. And we will have an open source version initially, but in future we might also have partnership licences from the vendors.

You won’t be providing the equivalent of Couchbase or Redis, or Kafka yourself? What you’re providing is the facility for consuming them as a service?

Yes, this is a database-as-a-service platform. If I were to be grandiose, I would say it’s like an app store for databases. When I go on my phone, and I go click on the app store, Apple just provides me a way to get Facebook or to get Google Maps. So, remember the old walled garden phrase? This is kind of a walled garden for data services.

But we’re doing more. We’re not just providing you the ability to provision it. That’s the day one part. But we will now allow you to optimise deploying it on a multi-tenant infrastructure. One of the challenges we found is that people might understand how to run Redis, but they won’t know how to pick the instance size to get the IOPS optimised. And they sure as hell have no idea what to do when a container fails and they have to move to a different cloud or how to migrate it.

And then day three is backing it up and archiving it right through the lifecycle. So what PortWorx Data Services is really doing is using Kubernetes in its new avatar as a service manager. Underneath the covers, a line of business person does not care that it’s Kubernetes. They may not even know – the point here is Kubernetes becomes invisible.

We’re not going to them and saying “Kubernetes, Kubernetes!” We’re just saying to them, you can consume a Postgres endpoint, consume a Redis endpoint, here’s Elasticsearch as a service. So our customer is really still a DevOps person, but one who is now going around offering these five data services to their line-of-business customers as a self-service model.

Sponsored by Pure Storage.