DNA storage lab-on-a-chip technology from Seagate promises gumstick card-sized DNA storage readers and writers that could speed up DNA storage IO by one, two or more factors.
The Catalog DNA storage technology is based on coding binary data into the four-component nucleobase DNA double helix molecule. Catalog uses around 200 pre-synthesized DNA sequences, oligonucleotides, which are 30-40 base pairs long, analogous to letters of the alphabet. These are connected together to form billions of words, bytes in IT storage language. These DNA sequences are desiccated and stored in pellets, with an information density of 200PB/gram or more and endurance in the region of 1,000 years.
Reading the data means rehydrating some of the DNA powder or flakes from a pellet and then sequencing it to find the nucleobase contents, before recovering the binary data from them. There is a lot of fluid transfer and processing involved, and existing micro-fluidics research data is being used. This envisages femtoliter-sized droplets of fluid, where a femtoliter is 10−15 liters. That makes a femtoliter one quadrillionth (one million billionth) of an American liter.
Blocks & Files was briefed by Seagate about its lab-on-a-chip research and partnership with Catalog to develop DNA storage technology. We were interested to understand why disk drive manufacturer Seagate would be interested in DNA storage as the technology is so utterly different from disk drives and at the far edge of scientific research.
Ed Gage, VP of Research, said Seagate crafts the data sphere which stores the world’s digital data. The data sphere is growing to zettabyte levels, and DNA storage promises to have the ability to store terabytes of data in very small amounts of fluid to provide the storage capacity for exabytes to zettabytes of information.
A Microsoft paper published in December stated: “It would take millions of tape cartridges – the current densest commercial storage media – to store 9 zettabytes of information, whereas it would take the footprint of one small refrigerator if stored in DNA.”
That’s over 1 exabyte per cubic inch.
Gage talked about the initial vision: “I think the starting vision … looks something like a tape drive with very, very small cartridges that are probably desiccated so they’re probably not fluid at all. They’re dried and stored in a library. We have some other storage medium, probably an HDD, sitting in front of it that keeps track of all the metadata.”
The lab-on-a-chip is sized like an M.2 2280 gumstick NAND drive, with Gage saying: “As we scale it up, it might get larger than that. But that’s kind of the size today.”
It will be used to create DNA sequences encoded with binary data and the research work is looking at ways to speed up the writing (creating the DNA sequences) and reading (sequencing the DNA) processes.
Increasing write speed
Seagate biochemical engineer Gemma Mendonsa said: “We’re basically looking at ways to leverage chemistry to improve the write speeds, shortening the length of time it takes to put DNA pieces together, because right now it’s very slow … Over the write speeds today, we need several orders of magnitude improvement.”
The Catalog oligonucleotide concept provides scope for doing this. Gage said: “Instead of writing one nucleotide, we have a kind of a geometrical progression writing that we think greatly accelerates the speed.” He referred to a linker library concept that enables this: “Assembly using the linker libraries lets us build geometrically rather than one at a time.”
Mendonsa explained: “In traditional DNA synthesis chemistry … phosphoramidite chemistry, it takes about two to three minutes to add one nucleotide to a strand of DNA … What’s going to be a lot faster is if you can fabricate a bunch of nucleotides or oligos, short pieces of DNA, that are all the same sequence and store those different sequences in some kind of library. Then pull them out when you need them, and piece them together in the right order. So if you can do that, you have DNA strands that are like length L, you can assemble 10 of them at once. So now you have a strand that’s length 10 L, then you gather 10 of those strings together in a solution. So you now you have a string that’s 100 L.”
If we gather 10 of these together we have 1,000 L. Gage said this is not enough: “We need to get beyond that. We probably have to have another another step or two.”
A Microsoft DNA storage paper projects that DNA electro-chemical array technology will enable “synthesis throughput to reach megabytes-per-second levels in a single write module.” This is a huge improvement on writing one bit in 2 to 3 minutes.
These DNA strands are floating around in droplets of fluid in the lab-on-a-chip, femtoliter-sized droplets. Gage said: “We’re actually funding work to help us with the droplet traffic problem. As you’re routing all these droplets, they can’t run into each other unless you want to mix them. It’s really a great complicated traffic problem that we’re working to address.”
Does this mean the lab chip contains pipes? No, said Gage: “The droplets move on a grid of electrodes [and] the droplets are flying around at high speeds. But there still are many challenges to address to get to the speeds we need to be; something that can write a petabyte in a finite amount of time.”
Mendonsa said: “Any kind of liquid handling device can be very complex. It’s just a matter of putting it into a lab-on-a-chip instead of something that’s armed with liquid lines and dispensing nozzles and stuff like that.”
Reading and the tape cartridge analogy
Let’s try and understand the reading process in terms of the tape cartridge analogy. A tape has a start point and an end point on a ribbon of tape that is hundreds of meters long with parallel data tracks on it.
Let’s take out a few flakes of desiccated DNA from a pellet and rehydrate it. What we’re looking at is an amount of liquid with DNA strands floating around in it, which we need to read to sequence it. If we put this back in tape cartridge terms, it’s like snipping a tape ribbon hundreds of meters long into millions of pieces, and throwing them into a bowl of water. Then we go along to a tape drive and say: “Hey, read this. And figure out, if you can, where the start of it is and the correct sequence of all the little snips.” It sounds impossible.
Not so, Mendonsa said: “There’s a fair amount of literature out there; Microsoft has some papers on this. Other universities do as well, on how to search data that’s stored in DNA and pull out the pieces you need.
“DNA is formed from two strands that hybridize together and they’re complementary to each other. So you can imagine, if you wanted to pull out something specific, like you wanted to search for some file with a specific search term, you could encode that into a piece of DNA. And it would hybridize to anything that had a match to it. Then you can pull it out that way. You don’t necessarily have to sequence the entire thing if you just want a few files. You can leverage the properties of DNA to pull out the files that you want.”
Seagate has a prototype lab-on-a-chip. The overall system architecture is still being developed. For example, should the desiccation be carried out on the lab chip or in a separate process? Then there is the potential for contamination. Once a component process is complete, the fluid involved can leave traces behind.
Mendonsa said: “Contamination is one of the things that we have to work really hard to avoid on the lab-on-a-chip, particularly since we don’t want it to be a disposable device. That would be very expensive. We’ve explored different kinds of purification or cleansing solutions that we can use to prevent contamination, or clean up after after we perform our reaction.”
Imagine a rackable DNA storage device. It would have, we imagine, a warranted working life. That means the lab chips must last that amount of time; they must be loaded with enough chemicals to do their job over that time.
We need to be thinking in terms of a multiple-year effort here. This is quite far edge nanoscale, electro-chemical array science and it’s a long, long way from the semiconductor technology involved in fabricating NAND and DRAM chips. Getting to grips with electrostatically moving femtoliter-sized droplets of fluid containing oligonucleotides between complex biochemical reactions is not something covered at Stanford University computing science courses.