Charlie Giancarlo on dataset management, tactical product decisions, and SW stacks

Pure Storage has come a long way since it presented the world with its FlashArrays back in 2011. It’s expanded its all-flash storage portfolio with FlashBlade, Evergreen Storage for non-disruptive upgrades, developed cloud-native solutions like Portworx and Pure Fusion for hybrid and multi-cloud environments and introduced Pure1 for AI-driven management. And of course it IPO’d in 2015.

Charles Giancarlo become its CEO in 2017, and we talked about AIOPS, AI data and arrived at a data set management term in the first part of an interview. In this second part we look some more at that and move on to off-the-shelf SSDs, software stacks and copilots.

B&F: Can you tell me more about the data set management idea?

Charles Giancarlo: Let’s talk about data management versus data dataset management. And then eventually, I think we’ll take on more data management. So when data management’s talked about today, it’s talked about in the sense that you talked about before, which is this particular data store for this particular AI or analytics engine. They’re managing that. What they’re not managing is [that] this is a lot to manage, and if you try to manage all of the individual bits of data in this, I think you’d fail right now. 

But in the meantime, if you manage the data sets, meaning you don’t necessarily, or we don’t necessarily know every bit of data in every dataset, but if we can keep track of the data sets themselves, not just, and again, not just the ones necessarily that we run, but where are they? What is the dataset lifecycle management? How long should they stay alive? When should they be killed? 

B&F: This is generalised data management. The data could reside on somebody else’s kit.

Charles Giancarlo: Over time. I mean, we won’t get there today. We’re not there today, but over time we could get there. Okay. And what is the problem? Okay, what is the lineage of that dataset? Because it’s a copy of a copy of a copy. It is a combination of two different data sets that been put together for core analysis, that has a copy, that has a copy, and keeping track of all of this. And I’ll give you a sense why you want dataset lifecycle management.

B&F: Keeping track of all this so that you are not wasting storage space on redundant copies when you don’t need to have them.

Charles Giancarlo: You don’t need that. But there’s also the other thing, what about today? The copies that somebody made and then they left the company and nobody remembers? That’s compliance, the ghost copies, right? It’s a compliance issue. And what happens is, a lot of those eventually get part of ransomware, because what happens is they’re forgotten about, maybe not known about at all. So they’re not subject to ongoing security things such as a key rotation.

B&F: It’s just like a  waiting open back door.

Charles Giancarlo: Yes. It’s out in the trash now and someone’s rummaging through the trash and they find it, and that’s a big problem. So you need lifecycle management. If it’s not touched in three months and nobody owns it, get rid of it.

B&F:  Okay. Back down in the weeds. The FlashArray//ST, FAST, uses off the shelf SSDs. I was wondering; let’s take a Pure DFM, let’s reorganise it and make it SLC. Could you do that?

Charles Giancarlo: Yes.

B&F: What would the speed of that be like compared to the SSDs?

Charles Giancarlo: It’d be tremendously fast.

B&F: Which brings me to the question, well, why not do that [instead of using COTS SSDs]?

Charles Giancarlo: Actually, we could use TLC as well. Part of the reason was that what customers were really asking for was just very, very high throughput. And we also have some unique electronics that we built in there to offload a lot of the services that in our regular product gets handled by the Intel processor. So that also reduced latency, increased the overall performance. So it was for us, it was an easier way; tactical. Yes, it was tactical.

B&F: So in the future, will it survive?

Charles Giancarlo: Oh, everything we try to do, or, I should say, we’ve got a pretty good track record so far, is always evergreen. And so when you ask, will it survive, none of our stuff survives. It’s always updated every three years. [But] it will survive, I believe. Maybe it’ll just become one more of the standard, but it’ll survive. Certainly.

B&F: Do you think that Pure could get involved in high bandwidth flash?

Charles Giancarlo: So, the real question, I think in our minds, because this kind of relates to, for example, EXA, is how large and how specialised do we see a particular market? And if it’s large and specialised, then maybe using more off-the-shelf components such as, so for example, in EXA we’re using pretty standard JBODs. 

Part of the reason for that is that it’s a unique market actually, compared to the overall storage market. I know I say this every quarter, but the market hears something different, compared to the overall storage market., it’s not that big a market, alright. It’s a very specialised market. Part of that specialised market is sometimes they want InfiniBand, sometimes they want Ethernet. And there’s a specsmanship aspect to this where you always have to be at the front end of the specs and we could spend more of our time on DFMs doing that, or we can buy SSDs and let them do that. 

And so it’s a small enough market where we don’t get enough benefit. You’re trading off, I guess more engineering dollars to get a slight benefit versus faster time to market, and that’s the trade off we make. 

I think the FAST, even though we’re excited about it; we started the design for one particular customer and we think there’s more than one customer that’ll enjoy this. But again, I don’t think it’s going to be mainstream. 

B&F: It’s not generally going to be common across all your customers. 

Charles Giancarlo: That’s correct. And therefore it makes sense to leverage some off the shelf technology. 

B&F: Pure and Vast are both erecting very comprehensive and capable software stacks on top of their storage. They’re doing different things, but they’re very large and very capable and they’re resting on a storage hardware and storage array operating system base. But go far, far beyond that. I’d characterise NetApp, IBM, HPE and Dell as not doing that. Sure, HPE is doing GreenLake and so on, but this is just a new way of consuming storage. It’s not a new software stack.

Charles Giancarlo: Both Dell and HPE, in my view, are still doing something that I pejoratively call and they call full stack. Okay. And why do I say that’s pejorative? Full stack is a vertical architecture. What we’re saying is; okay, virtualization has already flattened and made horizontal compute. It flattened and made horizontal networking. Storage is the only thing that is still subservient to the application environment. So it’s vertical. We’re saying that should also be horizontal. 

And so at this point, full stack is a hardware concept that doesn’t make sense. What you want are virtual full stacks that you can create out of software. You don’t want any physical full stacks. Right? But that’s where HPE, because they’re very much in a hardware mindset, because that’s what they do. Both Dell and HPE. So they talk about vertical full stacks as in a hardware full stack. And that’s what companies should not be doing.

This is like 10 years ago. But not now. Right? They should be creating a flexible environment. That means that you create virtualized environments for compute, for your networks, as well as, in our view, for their storage as well.

B&F: Last question. The Copilot. It’s a very Microsoft term.

Charles Giancarlo: Fair enough. Yes. But everybody’s using it for their own purposes. It’s not just us. Everybody’s using copilot now as an AI layer above their, let’s call it management or operations platform. Part of the reason why they call it copilot is most companies say, we don’t yet want to let the genie out in the box. It’s got to be a human in the loop. And that’s why it tends to be called copilot.

B&F: I shouldn’t assume that because you use the term copilot that you’re using Microsoft’s Copilot.

Charles Giancarlo: No, not at all. Not at all. In fact, we reserve the right, and we do, we use different LLMs. In fact, in some cases multiple of them, because they each have their own idiosyncrasies and benefits or detriments.

****

Charles Giancarlo is a terrific CEO to talk to for a semi-tech guy – I’m not an engineer and nor am I an analyst – like me. He can handle any type of question – about markets, about technologies, business models, SW stacks and so forth without blinking. He listens to questions and answers them in his own way without marketing-speak. He lets you have a real conversation. Its enjoyable and you learn a lot.