One gram of DNA can hold up to a billion terabytes.
Photo by National Cancer Institute on Unsplash
The need to store data digitally is becoming more and more indispensable. We are not only becoming accustomed to digital media but also creating digital copies of previously handwritten documents.
The amount of data in the world was estimated to be 44 zettabytes (i.e 44 billion terabytes) at the start of this year.
The number of bytes in the digital universe is 40 times more than the stars in the observable universe.
No doubt we are creating data at an unprecedented rate. We created 16 billion terabytes (1 terabyte is 1000 Gigabytes) in 2018. As of May 2019, 500 hours of video were uploaded to YouTube every minute.
In today’s time using flash drives and hard disks has become the norm to store data. Even all the data you upload to the “cloud” ends up in a data warehouse somewhere in an isolated corner of the world and these data warehouses also use the traditional hard drives to store information. Facebook alone has about 15 million square feet of data center space.
Each warehouse is capable of storing about 1 billion gigabytes. To store all the data created in 2018 alone, we would roughly need 35,000 such warehouses.
Surely we want to preserve all the data we create. Hard drives usually last five years before something breaks. Floppy disks were very prevalent in the 1980s but barely 40 years later, we have completely removed floppy disk readers from our devices. Thus, we don’t even know if we will be able to access our working hard drives after 20 years or so.
This is where DNA data storage comes into play
Stable Storage Method
DNA-based data storage is staggeringly stable. Data stored in DNA can almost last forever.
This was demonstrated in 2013 when scientists were able to decipher the 700,000-year-old horse genome of an extinct horse.
DNA data storage is without a shadow of doubt much more stable than hard drives which typically lasts 10 years.
It is worth noting that DNA is made up of four components, namely Adenine, Thymine, Cytosine, and Guanine, referred to as A, T, C, and G. This is different from the way we store our files into a computer. Our files are turned into binary 0s and 1s before getting saved to a hard drive. This means we have to reinvent the wheel when saving data in DNA because now we have four values instead of two, which to be fair, is not that difficult. In fact, one can argue that storing data in four distinct values is much better.
Read and write errors are bound to happen when storing data in any form of media. Our organic DNA has a way of repairing itself but we use synthetic DNA to store data. Therefore, we add redundancy and error-correcting codes to tolerate errors and data corruption. This error-correcting codes and encoding schemes are not a new invention limited to DNA storage. We have been using them for decades now to enhance our TV signals.
Data stored in DNA can easily last a few centuries at least. It the oldest and most robust data storage media.
High Storage Density
Theoretically, we can store a billion terabytes in a single gram of DNA. However, in practice, we have been able to store about 215 petabytes of data per gram of DNA.
It is still very impressive because when we put things into perspective, we can replace an entire data center with just a few grams of DNA. In fact, we can store all the world’s data in just one room using DNA.
Synthetic DNA is an attractive medium to store data. It is so ultracompact that we have over 37.2 trillion cells in our body, each containing a copy of our entire DNA. Additionally, DNA has the potential to hold gigantic volumes of data per gram. It has a theoretical limit of eight orders of magnitude denser than tape storage.
Moreover, recent advancements have made it possible to read billions of DNA sequences easily and simultaneously.
Reliable, Future-proof Access
DNA is not a human invention. It is nature's way of storing data and has been around since the beginning of life. Additionally, the properties of DNA haven’t changed at all which makes DNA readers or “sequencers” capable of reading all the DNA. The same DNA sequencing machine can be used to read your DNA as well as the DNA of the 700,000-year-old horse I mentioned earlier.
Storage media invented by us tend to become obsolete within the span of a few decades. This is very much evident when it comes to floppy disks and DVDs. It is next to impossible to find a decent laptop that comes with a CD reader these days.
As Yanic Erlich, a computer scientist at Columbia University said,
“DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete”
This is a very crucial factor because the data we create today will become invaluable in the future. Hence it is important that we are able to access and read data that we have stored today even after a century. Using DNA to store data provides an eternal relevance because as long as there is life, there is no reason not to read and manipulate DNA.
Final thoughts
DNA data storage indeed outshines human-made data storage mediums. Not only can it save 1 zettabyte of data, which equals roughly 71 million hard drives, per gram but also provides long-term reliability. It has an observed half-life of over 500 years in harsh environments.
Using DNA-based storage seems undeniable but it comes with its own set of limitations.
Right now, it costs about $3500 to store one megabyte of data. This is one of the major drawbacks of using DNA. But we have to acknowledge that this is just the start. The first hard drive made by IBM back in 1956 cost $2000 to store one megabyte.
Another drawback is time. Not only does it cost $2000 to read 2 megabytes of data but the whole process of writing and reading data can range from a few hours to days. Besides, due to the lack of proper data archive principle, storing data is much easier than reading it.
Despite these shortcomings, DNA-based storage is the future. A growing number of scientists are working on making DNA data storage feasible. Just last year Microsoft announced the world’s first fully automated DNA data storage.
It is only a matter of time before DNA-based storage becomes the new norm.