Recently, I spoke to Cloudian CEO Michael Tso about the awesome scale and data storage management challenges of exabyte-levels of storage. This prompted me to wonder what an exabyte vault should look like – and who would build them.
For starters, an exabyte vault will be based on disk drives as no other affordable media performs at an acceptable level. The vault will contain many hundreds of thousands of drives.
For example, as we wrote in our earlier article, an exabyte of capacity entails buying, deploying and managing one million 10TB disk drives (HDDs), 500,000 x 20TB HDDs or 750,000 14TB drives. There will be a continuous need to add new capacity, physically deliver, rack-up, cable, power and cool, and bring on-line disk chassis.
So, the sheer scale of exabyte-level storage vaults will necessarily entail hard drive management at the intelligent enclosure level – and not, as in most cases today, at the drive level. Quite simply, there are far too many drives to manage individually.
As far as we are aware, this multi-layered management does not exist yet in the public domain. But the customised software stacks built by Backblaze, a cloud service provider, illustrates how this could work in enterprise data centres.
Backblaze CEO Gleb Budman said in an email interview that the company no longer manages just at the drive level; it’s multi-level: “Our entire Backblaze Storage Cloud is designed to scale horizontally: drives to Storage Pods to Vaults to Clusters to Regions. We’re confident in continuing to be able to scale past exabytes and to zettabytes with this approach.”
Backblaze’s experience gives us a glimpse in a future in which an exabyte vault has multi-layered management; drives in enclosures, enclosures in racks or vaults, and vaults in clusters, and then clusters in some larger entity.
That prompted us to ask Seagate about the need for drive enclosure-level management as data centre disk stores head towards the exabyte level. According to Seagate, the exabyte storage vault will likely be a distributed system via a collection of data centres.
Ken Claffey, SVP, enterprise data solutions, at Seagate, told us: “We agree with your line of thinking that managing large numbers of drives in a distributed system requires a new approach, as enterprise customers move to 100PBs and even exabyte scale.”
“This approach needs more intelligence at the ‘enclosure’ level that can reduce the burden on the higher level SDS (Software-Defined Storage) stacks to effectively manage 1000s or 10,000s of individual drives/storage devices.
“The move to ever greater drive capacity (for example, as enabled through HAMR technology), will further stress the current approach (think about having to rebuild 30, 40, or 50TB drives across the network).
“Therefore, the industry is looking at new multi-level erasure coding approaches and other such innovations to address this challenge. Seagate is at the forefront of this innovation.”
So what is Seagate doing on this front? We recently heard that Seagate is developing a 107-drive Pod –a building-block for massive storage deployments.
BackBlaze’s Budman confirmed what our sources were saying: “We had a Seagate 107-drive chassis to use as a batch processor for a data migration project recently. It’s a JBOD setup though, so not the same as our systems which have onboard compute. It’s an impressive amount of storage in one box, but not quite right for our architecture.”
He added: “We also have a Seagate AP 5U 84-drive chassis in-house. This system does include compute, so it’s a little more interesting for us. We’re regularly testing new systems to see how they’d work in our environment.”
So who would build the exabyte storage vault management software? I think that two HDD suppliers – Seagate and Western Digital – are key to making it work, as they have control at the base drive-level access layer.
They are also building their own intelligent enclosures – see Seagate above, and Western Digital, with OpenFlex.
Possibly CSPs like Backblaze could also do this. Backblaze is already building and deploying its own exabyte-scale vault and manages this using its own software. Now uppose Backblaze, or another bulk cloud storage CSP, such as Datto or Wasabi, productised their exabyte-capable disk vault management software. That could appeal to on-premises exabyte vault builders?
That said, the software stacks would have to manage commercially-available HDD enclosures – and not just their own proprietary Pods.
Lastly, it is remotely conceivable that the big three CSPs, AWS, Azure and Google could do the same. I think this is unlikely because they all appear hell bent on getting bulk on-premises storage data moved to their clouds rather than helping large-scale on-premises storage survive.