Interview. What led up to Pure Storage developing a new disaggregated system architecture for its FlashBlade//EXA?
Pure Storage was founded as an all-flash storage array (AFA) vendor, surviving and prospering as a wave of similar startups rose, shone briefly, and then were acquired or failed as more established competitors adopted the technology and closed off the market to most new entrants.
In the 2010-2015 era and beyond, the mainstream enterprise incumbents including Dell, Hitachi Vantara, HPE, IBM, NetApp, and Pure ruled the AFA roost with what was essentially a dual-controller and drives architecture with scale-up features. There was some limited scale-out, with examples such as Isilon, but for true scale-out you had to look to the HPC space with parallel file system-based suppliers such as DDN.
Then, in 2016, a disaggregated canary was born in the AFA coal mine: VAST Data emerged with separately scale-out metadata controllers and scale-out all-flash storage nodes talking to each other across an NVMe RDMA fabric. There was a lot more clever software in VAST’s system yet the base difference was its disaggregated design, aided and abetted by its single storage tier of QLC flash drives.
Moving on, the AI boom started and gave an enormous fillip to VAST Data and parallel file system AFA vendors such as DDN. Only their systems could keep GPU server clusters running GenAI training workloads without data I/O waits. The incumbent enterprise AFA vendors added Nvidia GPUDirect support to speed their data delivery to GPU servers but could not match the data capacities enabled by the disaggregated scale-out and parallel file system vendors. Consequently, VAST Data and DDN prospered in the Nvidia-dominated GPU Server mass data training space.
This caused a revaluation among other suppliers. HPE adopted a VAST-like architecture with its Alletra storage. Dell assembled Project Lightning to add parallelism to its PowerScale/Isilon storage, while NetApp created the ONTAP for AI project. And Pure? It announced its FlashBlade//EXA a couple of weeks ago.
This has a clever twist in that the existing dual-controller FlashBlade system is used as the metadata controller layer system with separately scaled-out QLC flash storage nodes accessed across an RDMA fabric. We spoke to Pure’s VP for Technology, Chadd Kenney, to find out more.
Blocks & Files: FlashBlade//EXA seems like a radical revision of the FlashBlade architecture, giving it a parallel processing style of operation. Would it be possible to do the same sort of thing with FlashArray, and is that a silly question?
Chadd Kenney: No, it’s an interesting one. So let me give you some background on kind of where EXA came about, which may help you understand the logic behind why we decided to build it. It was a very fun project. I think we spent a lot of time thinking about what architectures existed in the market today and which ones did we want to take additional values from, and then what core competences did we have that we could potentially apply to a new solution.

And so this was an interesting one because we had a couple different options to play with. The first is we obviously could make bigger nodes and we could break away from the chassis kind of configuration and the bigger nodes would give us more compute. We’d have ability to scale them in a different manner. And that gave us a pretty sizable performance increase, but it wasn’t big enough.
Customers were starting to ask us for, it’s funny, we would get in conversations with customers, they would ask us for 10 terabytes or 20 terabytes per second of performance. So we were kind of scratching our head saying, wow, that’s a lot of performance to deliver there. And so we started to think then later about how decoupling the infrastructure could be an interesting possibility, but how would we do it?
Then we went back and forth on what the core competence that we were trying to solve for customers was. And the one thing we continued to hear was metadata performance was incredibly problematic and in many cases it was very rigid in the way that it actually scaled. So as an example, there are alternate PNFS or Lustre-like solutions that have tried to solve this before where they disaggregated metadata out. The downside was that it wasn’t scalable somewhat independently in a very granular fashion so that I could say I want to have a massive amount of metadata and a tiny bit of capacity, or I want to have massive capacity or a little tiny bit of metadata.
And so that intrigued us a little bit to understand, OK, so how could we actually deliver this in a different mechanism?
The second thing we started to think about was, if you think about what FlashArray started with, it’s a key-value store that was highly optimized to flash in order to access data. We then took that same exact key-value store and FlashBlade and somewhat scaled it out across these nodes and we realized, OK, there’s a couple of core competencies to this.
One is incredibly fast with multi-client connections. And then the second part was that we had an object count that was in the 40 quadrillion level. I mean it was ridiculous level for the amount of objects. So as we started to think about that, we said, well, what if we use the metadata engine that was core to FlashBlade and just kept that intact and then with the amount of objects that we could scale, we could infinitely scale data nodes.
We actually don’t even know what the quantity of data nodes we could get to was. So then we had to test to say, well, could these data nodes scale in a linear fashion? And so when we started adding data nodes in this decoupled way we decided to go with PNFS because that was what customers were asking for initially from us.
Although I think ObjectStore with S3 over RDMA will probably be the longer term approach, PNFS was kind of like what every customer is asking us about. And so when we started building this, we started to realize, oh my God, as we add nodes, we are seeing exactly a linear performance rise. It was between 85 to a hundred gigabytes per second. And we got really excited about the fact that we could actually build a system that could take very different client profiles of access patterns and linearly scale the overall bandwidth.
And so then we all of a sudden got really excited about trying to build a disaggregated product. And the first logical thing we thought about was, well, we’ll build our own data nodes. And we’ve talked about doing those for quite some time and we were back and forth on it. The hyperscale win kind of gave us some ideas around this as well. And so we then looked at talking to customers – how would you consume this? And what was interesting is, most said, I already have a massive data node infrastructure that I’ve invested in. Could I just use those instead?
And we said even competitive nodes, comically enough, we could potentially use, we just have to meet the minimum requirements, which are pretty low. And so we decided let’s go to market first with actually off-the-shelf whatever nodes that meet the requirements and Linux distribution and we’ll lay a small little set of packages to optimize the workflows, things like our rapid file toolkit and a bunch of other stuff that were on top of it.
It got really exciting for us that all of a sudden we had this thing we could bring to market that would effectively leapfrog the competition in performance and be able to achieve some of these larger GPU cloud performance requirements that were almost unattainable by anybody without specific configurations out there.
Blocks & Files: I think I’m asking a dumb question by asking if FlashArray could use this same approach because that’s basically like asking could the compute engine in Flash array use the same approach.
Chadd Kenney: It’s an interesting concept because I think we’re always open to new concepts and ideas and we try to prove these out, especially in the CTO office, we spent a lot of time conceptualising could we change the way that we construct things. I mean the one thing that changed with FlashBlade that was different was we went to this open ecosystem of hardware. And so there is a different mode of operation than what we typically build. We typically build these gorgeous, elegant systems that are in highly simple that anyone can get up and running in near minutes. EXA is obviously a little bit different, but what we realized is the customers wanted that build-it-themselves thing. They kind of liked that, whereas in the enterprise model, they’re not that into that. They really want just the more black box, plug it in and it works.
Blocks & Files: An appliance?
Chadd Kenney: Yes. I think with FlashArray, the one thing that’s tough about it is it’s just so good at what it does in its own ecosystem and we’ve kind of built now every possible tier. In fact, you’ll see some announcements come out from us around Accelerate timeframe … But we’re going to take somewhat of an EXA-level performance in FlashArray too. And so we really like the way it’s built, so we haven’t really found a mechanism of disaggregating it in any way that made sense for us. And I think you’ll see there are maybe new media types we’ll play with that we will talk about later. But QLC is still very much our motion for this way of doing multi-tiers. So I think FlashArray is somewhat going to stay where it is and we just love it the way it is to be honest. There hasn’t really been demands to shift it.
Blocks & Files: Do you see the ability to have existing FlashArrays occupy the same namespace as EXA so the data on them can be included in EXA’s namespace?
Chadd Kenney: Yeah, really good question. I think what you’re going to start seeing from us, and this will be part of the Accelerate announcement as well, is Fusion actually is going to be taking the APIs across products now.
So for Object as an example, you may or may not see Object show up on the alternate product in the future here. And so the APIs will actually be going through Fusion for Object and then you can declare where you want it to actually land based upon that. So you’ll start to see more abstraction that sits on top of the systems that will now dictate things like policy-driven provisioning and management where consumers very likely won’t even know if it lands on FlashArray or FlashBlade or not. For certain protocols. Obviously for Block, it’s likely going to land on a FlashArray, but they may not know which FlashArray it lands on.
So some of the cool stuff I’m working on the platform side of the house is trying to take the telemetry data that we have, build intelligence for recommendations and then allow users to interact with those recommendations both through policy-driven automation but also through natural language processing with some of the Copilot capabilities we’re building.
And then inside the platform we’re starting to build these workflow automations and then recipes that sit outside of that to connect to the broader ecosystem. So we will see our platform thing come to life around Accelerate. I won’t steal too much thunder there, but that’s where we’re really trying to go. It’s like instead of actually trying to bridge namespaces together at the array side, we’re trying to build this all into Fusion to actually have Fusion dictate what actually lands on which system.
Blocks & Files: Let’s imagine it’s six months down the road and I’m a Hammerspace salesperson. I knock on the door of one of your customers who’s aware of all this stuff going on and say, I’ve got this brilliant data orchestration stuff, let me in and I’ll tell you about it. Would your customer then be able to say to the Hammerspace guy, sorry, I don’t need you. I’ve already got that.
Chadd Kenney: Yes. So one thing that we embedded into EXA that we didn’t really talk a lot about is we use FlexFiles as well. So data nodes can reside anywhere and those all can be part of the same namespace. And so we didn’t go too big on this initially because, if you think about the core use case, honestly most people are putting it all in the same datacenter right next to the GPUs.
So I think over the longer term you’ll start to see more geo-distribution. With FlashBlade as a whole, that is a plan. And as we build pNFS into the core FlashBlade systems, you’ll start to see FlexFiles and multi-chassis be kind of stitched together as one common namespace. So I think we’ll have similar use cases. Of course we’re all-flash and they can use other media types out there. So they may get into use cases maybe that we’re not so tied to; ultra archive and those types of things. But I think as we get into 300 terabytes and then 600 terabytes and the incremental growth of capacity, we’ll start to break into those similar use cases as them.