Western Digital, Microsoft and all that DNA storage jazz

Western Digital, Microsoft, Twist Bioscience and Illumina have set up the DNA Data Storage Alliance to develop a commercial DNA archival storage ecosystem.

They will devise an industry roadmap, build use cases for various markets and industries, and promote and educate the larger storage community to advance DNA storage adoption.

The founder members say that 30 per cent of digital businesses will mandate DNA storage trials by 2024, addressing an exponential growth of data that threatens to overwhelm existing storage technology.

Stefan Hellmold, Western Digital VP for corporate initiatives, said there is an “unmet need for a new long-term archival storage medium that keeps up with the rate of digital data growth. We estimate that almost half of the data storage solutions shipped in 2030 will be used to archive data as the overall temperature of data is cooling down. We are committed to providing a full portfolio of storage solutions addressing the demand for hot, warm and cold storage.”

Dr. Emily Leproust, CEO and co-founder of Twist Bioscience, said in a press statement: “DNA is an incredible molecule that, by its very nature, provides ultra-high-density storage for thousands of years. By joining with other technology leaders to develop a common framework for commercial implementation, we drive a shared vision to build this new market solution for digital storage.”

Emily Leproust in video.

Twist BioScience provides DNA fragments and data writing capabilities. Illumina has DNA sequencing and genotyping technology. Microsoft has been involved in DNA storage research projects, and Western Digital thinks the field is interesting.

The four alliance founders claim DNA data storage has the potential to deliver a low-cost archival data storage technology. They say 10 full length digital movies fitting into a volume the size of a single grain of salt.

DNA data storage encodes binary data (base 2 numbering scheme) into a 4-element coding scheme using the four DNA nucleic acid bases; adenine (A), guanine (G), cytosine (C) and thymine (T). For example, 00 = A, 01 = C, 10 = G and 11 = T. This transformed data is encoded into short DNA fragments and packed inside some kind of container, such as a glass bead, for preservation.  Such fragments are tiny and can theoretically last for an extraordinarily long time. They can be read in a DNA sequencing operation.

Twist and Microsoft have said that, theoretically, one gram of DNA can store almost a zettabyte of digital data – one trillion gigabytes. Fewer than twenty grams of DNA could store all the digital data in the world. 

That is hugely impressive but a tad misleading. Grains of salt-grain-size glass beads are in turn stored in cylindrical phials the size of spectacle cases. 

They also say DNA enables cost effective and rapid duplication. That’s not rapid on the same timescale as SSD accesses. The Microsoft and University of Washington demo system had a write-to-read latency of approximately 21 hours for its 5-byte data payload. The researchers wrote: ”While 5  bytes in 21 hours is not yet commercially viable, there is precedent for many orders of magnitude improvement in data storage.”

Karin Strauss

Dr. Karin Strauss, senior principal research manager at Microsoft, issued a quote: “We’re encouraged by the potential for more sustainable data storage with DNA and look forward to collaborating with others in the industry to explore early commercialisation of this technology.”

Ten other organisations have joined the alliance: Ansa Biotechnologies, CATALOG, The Claude Nobs Foundation, DNA Script, EPFL, ETH Zurich, imec, Iridia, Molecular Assemblies, and the Molecular Information Systems Lab at the University of Washington.

Claude Nobs? He created the Montreux Jazz Festival in 1967 and the foundation is investigating DNA storage of more than 14,000 tape reels of live jazz recordings.