Tape (and DNA?) needed to meet archive demand by 2030

A report has claimed that there could be a huge gap in archive data storage needs by 2030, with tape and possibly DNA storage the only viable technologies to fill it.

Fujifilm (tape) and Twist Bioscience (DNA storage) jointly sponsored the 26-page report – “The Escalating Challenge of Preserving Enterprise Data” – which estimates there could be a 7.87 zettabye (7.87 million petabyte) total gap between overall storage demand and storage production in 2030 if growth rates are extrapolated forwards.

Emily Leproust PhD, CEO and co-founder of Twist Bioscience, said: “DNA holds the promise of offering the magic three in storage: ultra-high density, reasonable cost, and sustainability. We expect that new media will be needed to address the $7 billion-plus of unmet storage demand projected in the years ahead, and we remain at the forefront of innovation in this market looking toward introducing the first commercial DNA data storage solution in the near future.”

Note that she does not mention access times.

The report

Report author John Monroe writes: “Our estimated maximum production and shipment capabilities of enterprise storage vendors will fall far short of any actual 2022–2030 demand that evolves at >25 percent-per-year growth rates.

“In our most-likely 35 percent demand growth scenario, with new shipments of 4.7 million petabytes in 2025 and 21 million petabytes in 2030, the industry in this revised forecast falls behind potential demand by ~85,988 petabytes in 2022 (this year). By 2030, the zone of potential insufficiency grows to 7.9 million petabytes. Our aggressive (but still feasible) 40 percent and 45 percent growth scenarios project zones of potential insufficiency growing to more than 15 million petabytes.

“It is becoming more and more obvious that more and more tape – as well as other forms of ultra-low-cost, massive-capacity enterprise storage technology, such as DNA data storage or new breeds of optical devices – will be needed.”

This is because data retention times are seemingly infinite with data deletion not considered as an enterprise strategy by three enterprise storage execs looking after 150PB-plus vaults who were interviewed by Monroe.

He suggests: “The preponderance of the historical evidence, coupled with deliberately conservative yet immense revenue forecasts, suggest that storage industry financial and business executives should immediately and generously fund the enhancement of old enterprise technologies and the creation of new enterprise technologies that can be deployed more cost-effectively at massive scale with minimal power consumption.”

The report first looks at disk drive, SSD, and tape shipments from 2010 to 2021 to work out annual change percentages and arrive at a compound annual growth rate: 30.5 percent in total petabytes and 41 percent in the active installed base (assuming a five-year refresh/replacement cycle).

Then he extrapolates forwards with 25, 35, and 45 percent shipment and active installed base growth rates, and produces a chart showing the result: 

Tape DNA report graphic

Monroe then argues “anything greater than a 25 percent annual 2022–2030 expansion rate in potential demand might fall into a zone of potential insufficiency” because the projected installed shipments fall short of the projected active installed base. In other words, supply falls behind demand.

The bulk of the shortfall concerns unstructured data needing to be archived, as Monroe makes clear: “We believe most of this enterprise data will be unstructured, ‘cold,’ infrequently accessed and will have to be maintained at minimal cost.”

The storage pyramid will change to accommodate this idea, with 55 percent of total enterprise petabyte shipments falling into the cold data tier:

Tape DNA report graphic

At an average $/GB cost of $0.0015 in 2030, this could mean a $17.3 billion spend on such storage that year (assuming what he calls a reasonable 35 percent per year growth).

Monroe explains there are four archival storage technologies:

  1. Tape, which is proven;
  2. Optical (Sony/Panasonic AD, Microsoft Project Silica, Folio Photonics), none of which are yet commercially viable, apart from AD, which is a fading format;
  3. Holography – but funding and research efforts have ceased, so it won’t be a market force in the foreseeable future;
  4. DNA storage.

He does not include high cell count NAND, such as PLC (5 bits/cell), as a potential archive medium.

About DNA storage, he writes: “The eventual space and power-saving attributes of DNA data storage are still largely unknown, but there is general agreement that they will be formidable with thousands of times the data density of current technologies packed into ‘time capsules’ half the size of a human finger.”

Such storage density is highly attractive – much more than optical or holographic storage – but there are serious packaging challenges to overcome.

Also: “Inordinate costs and lengthy times to synthesize (hours) and sequence (days) remain daunting challenges. It’s a virtual certainty that write times will remain comparatively (and frustratingly) slow, and although new nanopore sequencing technologies have been validated that might substantially reduce retrieval times, there are more doubts than hopes that data access times can be reduced to much less than several hours, but most developers believe the desired goal of $0.001 per gigabyte or less end-user cost for error-free, durable, uncontaminated, deep-but accessible archive data will be achieved.”

“The question is, when? Many companies have expressed their willingness to participate in early trials and most companies will consider using DNA data storage in some form. We expect to see a few commercial DNA data storage designs enter an initial phase of production during the next two or three years, but mass-market deployments will not occur until the 2026–2030 time frame.”

Overall, he concludes: “The datacenters of the future will need everything the SSD, HDD, and tape industries can manufacture and deliver, as well as requiring new DNA and optical and perhaps other enterprise storage technologies, to cost-effectively and reliably preserve the priceless artefacts of our personal, corporate, and cultural history.”

“Availability and sustainability challenges, combined with the costs of managing our multi-millionfold-petabyte dataverse over increasingly lengthy time periods, will create new use cases for old storage technologies and demand the creation of new, more cost-effective, and power-efficient storage technologies.”

DNA storage is great big “maybe” and, until its read and write access speeds and $/GB cost can reach reasonable numbers, it will remain a “maybe” technology while tape scoops the archive pool.

Download the report from Twist Bioscience here.