The Rise of DNA Data Storage


With DNA data storage soon becoming a reality, will there be a day when we download and store files onto genetic information? What are the challenges and opportunities? Maya Raghunandan, PhD in Biochemistry and Molecular Biology and Kolabtree freelance scientist, provides an overview. 

Every day, we use digital technology. We take pictures, chat, download documents, watch a video and so forth. A plethora of information is accessible at our fingertips due to the data stored digitally; as arrays of 0s and 1s on numerous hard disk drives with silicone chips and magnetic tapes. The ongoing data explosion is already pushing the limits of storage capacities and is projected to outgrow the available infrastructure. By 2040, the data deluge may require 10-100 times more microchip-grade silicon, than what is available. This problem has lead researchers globally to find a breakthrough in the alternative means of data storage.

DNA as a data storage tool: Pros

Fortunately, mother nature provides us with a solution in the form of our genetic information encoding blueprint- DNA. This idea gained traction in 2012, when researchers at Harvard encoded a 52,000-word book in 1000s of DNA strands. Subsequently, many other researchers validated and further advanced the approach used for writing digital data to DNA and retrieving it. Some prominent examples include researchers at Columbia University, University of Illinois at Urbana-Champaign, University of Washington, European Bioinformatics Institute and ETH Zurich. In 2017, the US government announced potential interest in DNA storage research, as a part of its national security research branch activities. Many tech giants, including Microsoft, are showing a keen interest in pursuing DNA as the future data storage bank.  In fact, Microsoft and University of Washington collaborated to store “35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides” that was recovered error free. As the data storage crises looms over us, DNA can offer a biological alternative for many reasons.

  • DNA is very stable and can last stably for 100s of years
  • It is very compact and requires very less storage space
  • It is easy to replicate and create back ups
  • It won’t become obsolete as a technology
  • It has an information storage density that surpasses any other technology. 1 gram of DNA can store 700 terabytes of data. A few kilograms of DNA can store majority of the world’s right now.

Converting digital data to biological data

Theoretically, the basis of information storage in DNA is fundamentally similar to the binary code, though following a slightly different process. Instead of using 0s and 1s in the traditional methods, information in the DNA is recorded as A, T, G and Cs (adenine, cytosine, thymine and guanine). The way this would work is that we assign different binary codes to the nucleotides. For example, 00 = A, 01 = C, 10 = T and 11 = G. Now, a picture is normally coded as a series of 0s and 1s; an example of the series start can be 0011100100. If we break them into pairs—00 11 10 01 00 –– it would translate to A-G-T-C-A. This sequence would be the order of nucleotides to form a DNA strand. Once the sequence is determined, the DNA can be chemically synthesized, dried and stored in tiny vials shielded from light and humidity. A single gram of DNA can store up to 215 million gigabytes of data.

DNA can “read back” using a DNA sequencing approach, used on a routine basis in various laboratories worldwide. This process would yield a letter sequence file that can then be decoded into 0s and 1s back again, to get the original digital data.

DNA as a data storage tool: Current hurdles

For DNA data storage to be a practical choice of data storage, the entire process of coding-writing-reading and decoding information must be automated. The current technologies being used are time consuming and error prone. Above all, the costs underlying these techniques needs to come down drastically, for DNA data storage to be economically viable.

Current DNA synthesis technologies, cannot synthesize particularly long stretches of DNA; the limit being about 20 bytes data per strand. Thus, the data would need to be broken down into chunks, marked for the breaks, and then converted to DNA format. The bigger hurdles can be found in the data retrieval system. Reading strand of DNA is “read” destroys is permanently. Thus, having a lot of backup copies would be essential. Additionally, current data retrieval systems would require reading the entire data present in a storage vial. Meanwhile, new random access techniques on the horizon, foreshadow making individual file retrievals a real possibility.

The usability of DNA as a means of data storage is going to be dictated by the advancements made in biotechnology and genetic engineering industries. The rate at which the DNA sequencing techniques have advanced would suggest that DNA data storage may become a reality soon. As various technologies slowly turn to bio-mimicry, it is befitting that DNA- the fundamental life storage molecule also serves as our data-bank.

Need to hire a freelance scientist? Consult a scientist on Kolabtree today. Post your project and receive quotes for free.


About Author

Maya Raghunandan obtained her Ph.D in Biochemistry and Molecular Biology from the University of Minnesota, Twin cities, USA. Currently, she is a cancer biology scientist at Université Catholique de Louvain, Brussels, Belgium. In her spare time, she writes about cool science discoveries in her jargon-free blog Because, science doesn’t have to sound complicated. Instead, it must be comprehensible for everyone.

Leave A Reply