Quantum positions ActiveScale as HPC secondary storage

Interview: Quantum, known best for its flash, disk and tape storage, also sells into the data-intensive world of HPC storage. We spoke to Quantum’s Eric Bassier, senior director, Product and Technical Marketing about tape, cold storage, HPC capacities, and more during its exhibition at ISC High Performance 2022 in Hamburg, Germany.

Blocks & Files: What are the use cases Quantum sees in the high performance computing market?

Eric Bassier

Eric Bassier:  We don’t do the primary storage for high performance computing. But we do have a lot of customers, different research laboratories, different organisations in life sciences, research, different firms, where they do use Quantum for the secondary storage. And a lot of those use cases are our StorNext file system, with some kind of a disk cache in front of tape.

Blocks & Files:  What does an average quantum secondary storage HPC installation look like, in terms of the disk capacity range they might have and the tape capacity rates they might have?

Eric Bassier:  It does depend on the use case. But it in general, it might be 10 to 20 percent on disk, and 80 or even 90 percent on tape.

Blocks & Files:  Could you describe a couple of customers?

Eric Bassier:  One is the Texas Advanced Computing Center (TACC) talk. They’ve built a centralized archive for their research facility based on StorNext and tape. And then the other public case study – it would represent why we want to be at a show like ISC – is what we’ve done at Genomics England. We’ve actually partnered with Weka.

Blocks & Files:  Why is that?

Eric Bassier:  As a file system Weka is much more suited for a typical HPC type of workload. StorNext really excels for streaming data, which is why it’s so good for large video files, movie files. Genomics England have 3.6 petabytes of flat storage for for their Weka file system on flash.

That’s where they’re ingesting the data from the genomic sequencers. And they now have over 100 petabytes of our ActiveScale object store. It’s totally their secondary storage. In that case more than 90 percent of their data would be considered secondary storage.

ActiveScale diagram

Blocks & Files:  Does that include tape?

Eric Bassier:  Although Genomics England is not using tape today, as part of their ActiveScale system we are talking to them about it.

Blocks & Files:  I’d imagine that the rate at which they’re accumulating data, they’ll possibly start thinking in terms of the disk ActiveScale archive having colder data on it in parts, and maybe there’s so much of it, that they could offload it to tape.

Eric Bassier:   It really is an ideal use case in many ways. A reseller partner of ours in the federal government space, they do a lot with kind of AI and machine learning,  the head there has said a lot of data is cold, or inactive. But it’s only inactive temporarily.

And their customers … can’t predict all the time when they’re going to want to bring it back from cold data. In many of those use cases, they’re perfectly happy if it takes five minutes, 20 minutes to get data back from tape. The speed of tape is not a factor for that. And they like the low cost and reliability and also the low power – the green aspect.

Blocks & Files: With you having a focus on secondary storage for HPC market, then I guess you’re thinking that we need to accept data from the primary storage systems quickly and straightforwardly and easily, and we need to ship cold data that’s now warmed up to those primary storage systems in the same way. Is there a workflow aspect to this?

Eric Bassier:  Yes. Any type of research is going to have a workflow associated with it. They’re going to have a stage where you have scientists actively analysing [or] working on the data, and then they would move it to less expensive storage, to an archive. Now, I think one thing that Quantum has done, where we have a very, very unique offering, is the way that we’ve integrated tape with ActiveScale. I think that it’s the first time where it’s not a tape gateway. 

In other words: we built an object store where you can have a single namespace across disk and tape. And the way that an HPC application would interact with it is use S3, standard to read and write objects to disk. And then use either an S3 Glacier API set to put objects on tape and restore objects from tape, or used what’s called the AWS lifecycle policies, which are part of the S3 standard API set.

There are other other solutions out there. There are gateways to put data on tape. But now you’re talking about different namespaces, different user interfaces, and multiple key management points. What we’ve done with ActiveScale, we think is unique, because it’s the only object store where you can create an object storage system on both disk and tape; you can take advantage of the economics of tape. I think we’ve abstracted the way that an application has to interact with tape in a way that’s better than what anyone’s done in the past.

Blocks & Files: Are there any other advantages to ActiveScale for HPC users?

Eric Bassier:  The second, really the key innovation for us, is the way that we do the erasure encoding of the objects on tape – that’s where we have patents. And why that matters is you get much better data durability on tape, and you get much better storage efficiency. Instead of making three copies of a single file where you’ve tripled your tape capacity, we erasure code the object and then we create the parity bits and we striped it over tape and it’s more efficient. The other thing that it unlocks through the way we do erasure encoding is that, most of the time, we can recover an object from cold storage with just a single tape mount. And it turns out that’s been a really difficult technical challenge to enable this concept for many, many years. 

Blocks & Files: I’m thinking that you’ve got something here that is in its early days.

Eric Bassier:  We view this as an area where we are going to grow. We actually think tape is going to be more relevant because of this type of a use case. But it’s important to put that in context. The overall tape storage market is still under $1 billion versus the disk market and the flash market, which are massive – many, many billions. 

What I will say, though, is the tape business is growing. Our tape revenues are increasing. And the reason is because of the way that the largest hyperscalers are using tape. Effectively, they’re using it behind object stores with software that they’ve developed themselves. And here is the premise of our strategy. 

The whole way we’ve developed the portfolio we have is our belief that HPC organizations have basically the same need at maybe a slightly smaller scale, and maybe a few years back, too. They’re not going to invest the four or five years of engineering time to develop their own object store stack code. So we’ve said: we’ve built this for you, we’ve put it in a box. If you’d like AWS Glacier, you don’t want to put all your HPC data in the public cloud, we’ve built Glacier in a box for you. We can deploy it at your site, we can deploy it at multiple sites. We could, and where we’re going is, maybe we might host part of that for you. And that’s where our roadmap takes us. 

Blocks & Files: So that could be as a Quantum cloud. You could have, for example, some Quantum ActiveScale systems in a colocation sites or an Equinix centre or something like that, and make that available to HPC customers?

That is something that we are considering. And just to make a comment on Genomics England. They take advantage of the capability of our ActiveScale object store software to do what we call geo-spreading. So they have object store systems that are deployed at three sites, and the ActiveScale software geo-spreads to the objects across all three sites. So you actually have a single system, a single namespace, that’s spread across three sites. And we can do that either on disk or tape. So conceptually, you can have disk at three locations and a tape system at three locations. But it’s a single namespace, a single object store. 

But we have many customers that say, well, I might have two sites, but I don’t have three or, you know, I’ve got only got one site. Would Quantum be willing to host the other two? And that’s where I think our our customers are going to lead us there in terms of what’s the right model.

Blocks & Files: So you could provide a component of a customer’s private cloud? 

Eric Bassier:  Correct. I think really, you know, private cloud is one of these things that means different things to different people. But that is where we’re getting a lot of the early customer engagements that we have. That is how they’re expressing their initiative. They’re saying, ‘hey, we want to build a private cloud for archival data’. And we say, ‘we can help you build a private cloud for your archival data’. So yes, we think we’re pretty excited about that. 

Blocks & Files thinks Quantum is well positioned with StorNext and ActiveScale using both disk and tape to pick up a number of HPC customers as they accumulate too much data to store on their primary and possibly flash storage systems and tier older data off to nearline disk and then to tape. The single namespace and geo-spreading unifies tape and disk into effectively a single object store and that potentially makes life easier for HPC admin staff.