Data Storage: The DNA Option

by Paul Gilster | Jan 28, 2013 | Autonomy and Robotics | 38 comments

One of the benefits of constantly proliferating information is that we’re getting better and better at storing lots of stuff in small spaces. I love the fact that when I travel, I can carry hundreds of books with me on my Kindle, and to those who say you can only read one book at a time, I respond that I like the choice of books always at hand, and the ability to keep key reference sources in my briefcase. Try lugging Webster’s 3rd New International Dictionary around with you and you’ll see why putting it on a Palm III was so delightful about a decade ago. There is, alas, no Kindle or Nook version.

Did I say information was proliferating? Dave Turek, a designer of supercomputers for IBM (world chess champion Deep Blue is among his creations) wrote last May that from the beginning of recorded time until 2003, humans had created five billion gigabytes of information (five exabytes). In 2011, that amount of information was being created every two days. Turek’s article says that by 2013, IBM expects that interval to shrink to every ten minutes, which calls for new computing designs that can handle data density of all but unfathomable proportions.

A recent post on Smithsonian.com’s Innovations blog captures the essence of what’s happening:

But how is this possible? How did data become such digital kudzu? Put simply, every time your cell phone sends out its GPS location, every time you buy something online, every time you click the Like button on Facebook, you’re putting another digital message in a bottle. And now the oceans are pretty much covered with them.

And that’s only part of the story. Text messages, customer records, ATM transactions, security camera images…the list goes on and on. The buzzword to describe this is “Big Data,” though that hardly does justice to the scale of the monster we’ve created.

The article rightly notes that we haven’t begun to catch up with our ability to capture information, which is why, for example, so much fertile ground for exploration can be found inside the data sets from astronomical surveys and other projects that have been making observations faster than scientists can analyze them. Learning how to work our way through gigantic databases is the premise of Google’s BigQuery software, which is designed to comb terabytes of information in seconds. Even so, the challenge is immense. Consider that the algorithms used by the Kepler team, sharp as they are, have been usefully supplemented by human volunteers working with the Planet Hunters project, who sometimes see things that computers do not.

Shakespeare

But as we work to draw value out of the data influx, we’re also finding ways to translate data into even denser media, a prerequisite for future deep space probes that will, we hope, be gathering information at faster clips than ever before. Consider work at the European Bioinformatics Institute in the UK, where researchers Nick Goldman and Ewan Birney have managed to code Shakespeare’s 154 sonnets into DNA, in which form a single sonnet weighs 0.3 millionths of a millionth of a gram. You can read about this in Shakespeare and Martin Luther King demonstrate potential of DNA storage, an article on their paper in Nature which just ran in The Guardian.

Image: Coding The Bard into DNA makes for intriguing data storage prospects. This portrait, possibly by John Taylor, is one of the few images we have of the playwright (now on display at the National Portrait Gallery in London).

Goldman and Birney are talking about DNA as an alternative to spinning hard disks and newer methods of solid-state storage. Their work is given punch by the calculation that a gram of DNA could hold as much information as more than a million CDs. Here’s how The Guardian describes their method:

The scientists developed a code that used the four molecular letters or “bases” of genetic material – known as G, T, C and A – to store information.

Digital files store data as strings of 1s and 0s. The Cambridge team’s code turns every block of eight numbers in a digital code into five letters of DNA. For example, the eight digit binary code for the letter “T” becomes TAGAT. To store words, the scientists simply run the strands of five DNA letters together. So the first word in “Thou art more lovely and more temperate” from Shakespeare’s sonnet 18, becomes TAGATGTGTACAGACTACGC.

The converted sonnets, along with DNA codings of Martin Luther King’s ‘I Have a Dream’ speech and the famous double helix paper by Francis Crick and James Watson, were sent to Agilent, a US firm that makes physical strands of DNA for researchers. The test tube Goldman and Birney got back held just a speck of DNA, but running it through a gene sequencing machine, the researchers were able to read the files again. This parallels work by George Church (Harvard University), who last year preserved his own book Regenesis via DNA storage.

The differences between DNA and conventional storage are striking. From the paper in Nature (thanks to Eric Davis for passing along a copy):

The DNA-based storage medium has different properties from traditional tape- or disk-based storage.As DNA is the basis of life on Earth, methods for manipulating, storing and reading it will remain the subject of continual technological innovation.As with any storage system, a large-scale DNA archive would need stable DNA management and physical indexing of depositions.But whereas current digital schemes for archiving require active and continuing maintenance and regular transferring between storage media, the DNA-based storage medium requires no active maintenance other than a cold, dry and dark environment (such as the Global Crop Diversity Trust’s Svalbard Global Seed Vault, which has no permanent on-site staff) yet remains viable for thousands of years even by conservative estimates.

The paper goes on to describe DNA as ‘an excellent medium for the creation of copies of any archive for transportation, sharing or security.’ The problem today is the high cost of DNA production, but the trends are moving in the right direction. Couple this with DNA’s incredible storage possibilities — one of the Harvard researchers working with George Church estimates that the total of the world’s information could one day be stored in about four grams of the stuff — and you have a storage medium that could handle vast data-gathering projects like those that will spring from the next generation of telescope technology both here on Earth and aboard space platforms.

The paper is Goldman et al., “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,” Nature, published online 23 January 2013.

tzf_img_post

38 Comments

Greg on January 28, 2013 at 12:04

DNA still has problems with pyrimidine dimers, nucleotide errors due to radiation damage. In a live organism there are enzymes that constantly repair them. In a data system like this, some method will need to be devised to prevent the data from being lost over time. I’m guessing ultimately the data may be incorporated into live organisms to use their photolyase enzymes to prevent the damage.
Michael on January 28, 2013 at 13:12

The storage density of DNA is quite good, reading the information back is a real issue.
ljk on January 28, 2013 at 13:25

This online article from 1992 references and briefly describes two papers from 1979 and 1986 about ETI sending messages via DNA through interstellar space (near the end of the piece):

http://www.coseti.org/lemarch1.htm

Joe Davis has been advocating and sending messages and art via DNA for a while now:

http://www.viewingspace.com/genetics_culture/pages_genetics_culture/gc_w03/davis_j_webarchive/davis_profile_sciam/jd.htm

Davis was also the lead person in sending into the galaxy the DNA sequence of the RuBisCo protein via the Arecibo radio telescope in 2009:

https://centauri-dreams.org/?p=10283

Part 2 here:

https://centauri-dreams.org/?p=10346
nullzero on January 28, 2013 at 13:52

“until 2003, humans had created five billion gigabytes of information (five exabytes). In 2011, that amount of information was being created every two days”

This may be true, but it does not count the information value. The data created by a billion downloads of Gangnam Style is way more than the Rosetta Stone. However, we cannot argue which one is way more important.
Thomas Mazanec on January 28, 2013 at 15:36

K. Eric Drexler in “Engines of Creation”, described a carbon chain with side atoms that had considerably greater storage density than DNA.
My Kindle over 2 years ago came with the Oxford English Dictionary.
Paul Gilster on January 28, 2013 at 16:07

Thomas, re the Kindle, would that it did come with the Oxford English Dictionary! The dictionary that is standard in the UK is the Oxford Dictionary of English (in the States, it’s the New Oxford American Dictionary). Neither, alas, is the OED, which is a fabulous but huge, multi-volume set, and if I could get it in workable form on a mobile device, I’d be delighted. In its absence, though, the Kindle’s Oxford Dictionary of English does very well.
Mark on January 28, 2013 at 16:31

Data storage in DNA — I knew it would come to this someday yet I never imagined it would happen during my lifetime. I would have bet my money on holographic optical storage being the data storage medium of the future. Think of those transparent optical bars inside the fictional HAL 9000. Maybe they weren’t optical at all, but filled with millions of wiggly DNA strands.
Gerry on January 28, 2013 at 17:11

Sorry for being a sci-fi nerd here, but this reminded me of Frank Herbert’s “Dune” books. A major theme therein is the storage and retrieval of human “ancestral memories” through some kind of not-too-clearly explained hereditary mechanism. Maybe the idea’s not so far-fetched after all…

On a more serious note, with this kind of storage medium, what kind of read-write access times would we be looking at? I’d imagine this would be primarily useful as a long-term storage-mechanism, not a quickly and routinely accessible form of memory for frequent use.
Mike Prather on January 28, 2013 at 17:22

Playing devil’s advocate, I have to wonder about the albeit remote but still present chance that incorporating DNA-stored data into living organisms for maintenance might eventually bring up the scenario where a particular sequence might represent a protein or virus code that turns out to be deadly to life on our planet. A few seconds of video from the old Lawrence Welk show could spell disaster!

I’d say that a more important task would be designing systems that compress data in much the same way as our brains do, by editing out unnecessary details and representing them with simpler markers that can be used to reconstruct the original event from similar data. I.e, if you pass a pretty red flower on your way to work, you don’t necessarily capture and store a video or even detailed images of that flower, except in maybe a very granular format. You recall it by building up part of your visual memory with other information about the same kind of flower. This might also lead to breakthroughs in artificial intelligence since I believe that that editing ability is what gives us our particular form of consciousness – we perceive ourselves in the present moment because of that difference in “freshness” between the detail being just recorded vs. what’s edited and stored into memory. Granted, this does make human memory notoriously unreliable, but I think we have the potential to improve upon that by designing better data editing and reconstruction routines than those that are built into our brains.
Mike Lockmoore on January 28, 2013 at 18:32

Several years ago I imagined that espionage “bugs” could literally be bugs/spiders where their sensory inputs get directly coded into DNA or RNA sequences in real-time via some kind of enzyme similar to reverse transcriptase. To avoid generating dangerous strings, use of non-protein-coding base sequences could be used. I.e. the recordings would be just “junk” DNA relative to the host organism’s biology, but be arranged in a way that would be meaningful when reading out and reconstructing the original sensory information.

A “fly on the wall” might literally useful to record a conversation… a spider could quietly watch and record a visual scene, and even move to a better vantage point as needed… a moth might sample and record scents of trace substances. A host of bi0-punk spy thriller ideas came to me. Recovery of the host organism and read-out of the DNA sequence might not be as simple and easy as today’s electronic bugs, I suppose, and could make for some interesting, if not funny, plot elements in a novel or film.

But I also see that a non-DNA use of organic molecules as pointed out by Thomas Mazanec above would be more efficient in terms of space, energy, and accuracy, and perhaps better for information transmission during space exploration or METI (message in a bottle).

Hasn’t some author (Sagan? Adams?) suggested looking for something obviously intelligent like hundreds of digits of PI encoded and embedded in the most ancient parts of our genomes? The creator’s signature, as it were.
Michael Spencer on January 29, 2013 at 8:32

I’ll never forget the day I brought home my OED. I was in grad school and believe me the money was hard to find, but I bought the two-volume set with the slide out tray holding the hand lens. I’ve used those books constantly in the past thirty years or so. Now, though, I use the digital version, which I note is actually available for Kindle (and at a significantly lower price than I paid).
Paul Gilster on January 29, 2013 at 10:11

Michael, are you sure it’s the OED you’re seeing for the Kindle? I think you’re talking about the Oxford Dictionary of English. If the true OED is available for the Kindle, I’ll get one immediately.
Eniac on January 29, 2013 at 22:39

The ultimate in data storage is not DNA, it is crystals. By alternating different atoms of similar chemistry in a solid crystal, extremely dense information storage is theoretically possible. DNA is a wonder of nature, but like most biological solutions, it is soft, squishy and perishable and can be much improved on. Even today’s flash chips can store the human genome many times over, and are much easier to read and write.
Eniac on January 29, 2013 at 22:40

The OED is a twenty volume set, so would not fit Michael Spencer’s description. He must have the ODE….
jkittle on January 30, 2013 at 20:58

the interesting thing about DNA is is that it may be possible to encode info into living organisms. it is possible to encode a functional protein and also endoce a digital message in the same sequence. this is because each protein is encoded with a redundant code, thus for a protein that is only 100 amino acids long * a very small protein indeed) there are more possible ways to encode THE EXACT SAME amino acid than there are electrons in the universe.
thus a living replicating organism could also hide messages in it code( if suitable designed.) thus a liter of diverse bacterial culture could endode a lot of data. pure DNA is even higher information density. as I posted a couple of weeks ago, this may represent a way to send data back for a deep space probe, where the distance or the need to maintain a private datastream may make other forms of communication difficult.
Joe Davis on January 30, 2013 at 21:23

Umm… Are you talking perpetual flash drives?
Joe Davis on January 30, 2013 at 21:32

First publication here:

http://www.jstor.org/discover/10.2307/777811?uid=2129&uid=2&uid=70&uid=4&sid=21101732349967
Joe Davis on January 30, 2013 at 22:01

..some of the same problems (mononucleotide repeats and overlaps to account for errors) confronted with DNA Supercode in the late 1990s and DNA Manifolds a few years laler. The vision is for biochenically friendly and biologically silent genetic manipulation. An environmentally non-interfering, ecologically benign and permanently stable biological archive is the Holy Grail: something you don’t actually have to have humans around to make flashdrives for.
Eniac on January 30, 2013 at 22:38

If you have a data plan with your tablet, you could always get the on-line access: http://www.oed.com/. Lighter than 20 volumes, and always up to date, too.
Thomas Hackney on January 31, 2013 at 6:21

As a storage model, DNA is a wonderfully capacious thing, but I have to wonder, or worry, about the future implications and potential of such a paradigm. Is research into DNA storage being funded because of this potential. To what use would this line of research & development) be put? Good use or bad use? If history is any indication, I think one can assume bad use, or at least some pretty creepy use. But does the prospect of this ever concern your basic scientist?” Hey, it’s very interesting stuff, and it pays. (Answer: No, or rarely.)

Oh, the “good uses” will be touted to no end, no doubt. But then there will be the “bad uses.” Dr. Moreau come to mind.

So what are these rich funders into, exactly? Eugenics, totalitarian control? Naa.

Basic research is great
Paul Gilster on January 31, 2013 at 11:14

Eniac writes:

If you have a data plan with your tablet, you could always get the on-line access: http://www.oed.com/. Lighter than 20 volumes, and always up to date, too.

Good point, and probably the best solution. I do love the print version but even the miniaturized 2-volume version is cumbersome to consult.
ljk on January 31, 2013 at 15:07

Mike Lockmoore said on January 28, 2013 at 18:32:

“Hasn’t some author (Sagan? Adams?) suggested looking for something obviously intelligent like hundreds of digits of PI encoded and embedded in the most ancient parts of our genomes? The creator’s signature, as it were.”

Carl Sagan had messages from the creators of the Universe embedded in the value of pi in the 1985 SF novel version of Contact (apparently that was considered a bit too much for the 1997 film version).

http://kasmana.people.cofc.edu/MATHFICT/mf55-spoiler.html

http://kasmana.people.cofc.edu/MATHFICT/mfview.php?callnumber=mf55

In The Hitch-Hiker’s Guide to the Galaxy, Douglas Adams said that since humans were part of the matrix that made the supercomputer known as Earth (to figure out the Question to Life, the Universe, and Everything, since the Answer was 42), the code for deriving the Question was in all of us.

Adams then added that every time someone figured out the Answer to Life, the Universe, and Everything, the Question would change. :^)

In the Star Trek: The Next Generation episode titled “The Chase”, we find out why most of the intelligent species in the galaxy are humanoid:

http://en.memory-alpha.org/wiki/The_Chase_(episode)#http://loadm.exelator.com/load/?p=204&g=260&buid=fda4d9b4d143e9ac558eadaaf076b15b&j=0
ljk on January 31, 2013 at 15:09

The ST:TNG episode I mention above was inspired by Contact, though in their case the hidden message from the progenitor species was hidden in DNA scattered all over the galaxy.
ljk on January 31, 2013 at 15:16

http://www.scientificamerican.com/article.cfm?id=data-saved-quartz-glass-might-last-300-million-years

Data Saved In Quartz Glass Might Last 300 Million Years

By Timothy Hornyak

Most cultural institutions and research laboratories still rely on magnetic tape to archive their collections. Hitachi recently announced that it has developed a medium that can outlast not only this old-school format but also CDs, DVDs, hard drives and MP3s.

The electronics giant partnered with Kyoto University’s Kiyotaka Miura to develop “semiperpetual” slivers of quartz glass that Hitachi says can preserve information for hundreds of millions of years with virtually no degradation.

The prototype is made of a square of quartz two centimeters wide and two millimeters thick. It houses four layers of dots that are created with a femtosecond laser, which produces extremely short pulses of light. The dots represent information in binary form, a standard that should be comprehensible even in the distant future and can be read with a basic optical microscope. Because the layers are embedded, surface erosion would not affect them.

The medium has a storage density slightly better than that of a CD. Additional layers could be added, which would increase the density. But the medium is more remarkable for its durability. It is waterproof and resistant to chemicals and weathering, and it was undamaged when exposed to 1,000-degree heat for two hours in a test. The results of that experiment led Hitachi to conclude that the quartz data could last hundreds of eons.

“If both readers and writers can be produced at a reasonable price, this has the potential to greatly change archival storage systems,” says Ethan Miller, director for the Center for Research in Intelligent Storage at the University of California, Santa Cruz. The medium could be ideal for safekeeping a civilization’s most vital information, museum holdings or sacred texts. The question is whether the world as we know it would even last that long. “Pangaea broke up less than several hundred million years ago,” Miller adds. “Many quartz-based rocks from that time are now sand on our beaches—how would this quartz medium fare any differently?”

This article was originally published with the title Super Long-Term Storage.
Eniac on January 31, 2013 at 23:00

Joe Davis:

Umm… Are you talking perpetual flash drives?

No more than there is perpetual DNA.

An environmentally non-interfering, ecologically benign and permanently stable biological archive is the Holy Grail

Holy grail indeed, and even harder to get hold of. If you wanted to embed a truly permanent message of even minimal size, you would have to engineer organisms with an error correction mechanism well beyond that achieved by nature in billions of years of trying. And you would also have to make this organism a super-bug able to successfully compete with existing organisms, in perpetuity. In effect, that means all non-engineered organisms would have to eventually become extinct. That would be very difficult to reconcile with the “biologically benign” part of your grail.
Joe Davis on February 1, 2013 at 16:35

Eniac:

You’re right about the need to develop reliable means to create biologically stable DNA archives. It’s a problem I’ve been working on for many years and I have made some modest progress. But please note that we’re really talking about genes more than about whole organisms. We would write our archives under genes like “rubisco” (ribulose 1,5 bisphosphate carboxylase oxygenase), essential for all life, and certainly not restricted to particular organisms. This one has been around for geologic time despite the coming and going of individual species. Flashdrives don’t possess in situ memory-repair apparatus (like living organisms actually already do), cannot inexpensively reproduce themselves in astronomical numbers of copies, and cannot survive environmental extremes to match the known extent biological survivability. Though, I think it might be a good idea to create extremophile flash drives. Imagine one that could survive the cores of nuclear reactors, powerful magnetic fields, and temperatures hot enough to melt acrylic plastics. I want one.
Joe Davis on February 1, 2013 at 16:50

ljk:

It so happens that in the late 1980’s Star Trek writer/producer(?) came to visit my studio, then at MIT Center for Advanced Visual Studies. We talked about the implications of bacterial carriers like “Microvenus” as biological messages for ETI.
Mark on February 1, 2013 at 19:07

Would it be possible to encode a unique identification number into the DNA of each individual before birth, so that the solving of crimes would be made utterly simple? Say investigators find a hair at some future crime scene, the detective drops the hair in a handheld analyzer, and it reads out the perpetrator’s social security number. All they have to do at that point is go round him up. Trials would be speedy, and executions would not need to drag out on appeals.

It’s not certain if I’m describing a utopian or dystopian future here…
Eniac on February 1, 2013 at 23:19

Mark: Of course this is feasible. You do not even need to do any genetic engineering. Our DNA already is a unique identification number. One simple mandatory test at birth (CODIS would be quite sufficient) and a central database where the result is linked to identity is all that’s needed.

Not that I am advocating this, mind you….
Eniac on February 1, 2013 at 23:40

We would write our archives under genes like “rubisco” (ribulose 1,5 bisphosphate carboxylase oxygenase), essential for all life, and certainly not restricted to particular organisms. This one has been around for geologic time despite the coming and going of individual species.

I assume by “write under” you mean genetic changes that do not change the translated sequence of the protein. Alas, such neutral changes are very volatile under mutation, on geological timescales, because they are not constrained by natural selection. RuBisCo is so well conserved because of functional constraints. Any information encoded in it would by definition not be functional, and thus not be conserved.

What you would have to do is come up with a way where the same sequence occurs in many replicates, and when errors occur the organism either fixes them or discontinues the strain by constant consensus comparison. This is well beyond the information processing capabilities of current biological systems. The closest came with the advent of Eukaryotes, which have duplicate genomes and a proofreading mechanism. This is not enough, however, as mutation rates are still substantial in Eukaryotes. Even if you could do a much better job of it artificially, the change would be counter-adaptive and your organism (or protein) would likely not be able to outcompete its unburdened cousins for any significant amount of time.

That said, a small amount of information could be encoded in stable “conventions”, such as the genetic code. The genetic code is both arbitrary and unique, so by chosing it a particular way you could probably spell out a small word of information. Not much, but good for a few billion years….
Eniac on February 2, 2013 at 0:24

Mark: Come to think of it, newborns routinely have their foot pricked and blood dripped on a paper card for a number of important health tests. Are we really sure that a punch or two of those cards are not already going to some top secret unit of Homeland Security to be typed, cross-referenced, and archived?
Eniac on February 2, 2013 at 0:45

Joe Davis: microRNA sequences are very conserved, because a microRNA is coded in one part of the genome and its function is dependent of a complementary match in other parts of the genome (the “targets”). In effect, then, microRNA sequences are present in multiple copies and proof-read by selection. I.e., if a base in the microRNA gene changes, it won’t fit the target site and the organism will be compromised. Same if a base in the target sequence changes. Only a simultaneous change of multiple bases, in the gene and each of its targets, will preserve function of a microRNA.

Changing the genetic code is a tall order, but taking an organism and changing its microRNA sequences in synchrony with their targets so that the matches are preserved, but the sequences spell something interesting, is at least conceivable. There being about 1,000 microRNA of ~20 bases each in the mammalian genome, there could be parchment here to write up to ~40 kbits of information. Maybe enough for the Declaration of Independence.
Ron S on February 2, 2013 at 11:58

Eniac: “Our DNA already is a unique identification number.”
—-
“Yes, officer, that’s my DNA but it wasn’t me that did the crime! I was hacked and temporarily possessed by another person’s mind. So, yes, it was my body that did it, but not *me*.”

There may come a time when identity of the body and identity of the mind might not necessarily have a 1:1 correspondence. Then someone will come up with a “foolproof” way to watermark our minds.
Joe Davis on February 3, 2013 at 17:20

Eniac:

Hmmm… Nice ideas! Perhaps even for access keys or for more redundancy. Sounds like we should collaborate :)
Eniac on February 3, 2013 at 23:58

Joe Davis: I’d be thrilled. Perhaps Paul could do an e-mail introduction?
Lukas on February 12, 2013 at 14:12

Although digital storage technologies tend to come and go it seems to me that they won’t be replaced completely by DNA. However, what bothers me is the fact that the longer you want to store data, the more attractive DNA becomes.

Thinking in a bigger context, which information really need to be stored over thousands of years and who might benefit from this kind of system?
ljk on February 16, 2013 at 20:12

http://philosophyofscienceportal.blogspot.com/2013/02/library-of-congress-and-vintage.html

“Our Most Treasured Recordings Are Dying: Congress Has a Plan to Save Them”

by

Matt Peckham

February 15th, 2013

Time

Some of our most cherished historical recordings are in danger of being lost forever — indeed, some by George Gershwin, Judy Garland and Frank Sinatra are already goners. But we live in the future, where technology can land a crazy-advanced robotics rover on Mars and put exabytes of information at our fingertips. Isn’t someone working to safeguard this stuff?

It turns out they are: The Library of Congress unveiled a plan on Wednesday to preserve the country’s mammoth trove of recorded sound. It’s a huge deal if you care about this sort of thing, the culmination of sweeping research conducted over more than a decade.

The National Recording Preservation Act of 2000 originally called on the Librarian of Congress to establish a National Recording Registry “for the purpose of maintaining and preserving sound recordings that are culturally, historically, or aesthetically significant.” The Library’s just-announced plan is the net result of what remains an ongoing effort, reflecting the cumulative work of both the Library and a range of audio professionals, from composers and musicians to musicologists, archivists and members of the recording industry.

As early as 2002, the National Recording Preservation Board was nominating recordings for preservation annually — the total through 2012 stands at around 350. It’s an eclectic mix of recorded sound, ranging from seminal works by Thomas Edison in the late 1880s, Abbot and Costello’s infamous “Who’s on First?” 1938 radio broadcast and Miles Davis’ groundbreaking jazz album Kind of Blue to Martin Luther King, Jr.’s “I Have a Dream” speech, Glenn Gould’s original 1955 recording of Bach’s Goldberg Variations and Yahi language cylinder recordings by the last surviving member of the Native American Yana tribe (who died in 1916).

What does the Library’s announcement mean in view of what it’s already been doing? It’s not enough to simply preserve these recordings, argues the Library in an elaborate 78-page overview of the plan. We have to think about public access, educating professionals about preservation rolling forward and dealing with “outdated laws that impede both preservation and access.”

One of the key challenges, for instance, involves accommodating all the different types of recorded audio, ranging from wax cylinders and music rolls to vinyl records and magnetic tape; that much anyone with a basic historical understanding of the audio industry might surmise. Here’s something you might not: The problem doesn’t go away or even necessarily get smaller with digital audio. As NPR notes, the fact that we often don’t know how today’s “born digital” recordings were created puts them in as much danger as older recording formats.

The complications extend to software like Avid’s Pro Tools, one of the more popular high-end digital audio workstation tools. As National Recording Preservation Board chairman Sam Brylawski notes in the NPR piece, ”You hear stories about things recorded on … things like ProTools editing software 10 years ago, but the new version of the software isn’t compatible with the digital files of 10 and 20 years ago.” Think about the difficulties you’ve encountered working with text documents when moving between older and newer versions of popular word processors, then extrapolate to software that’s much more specialized and niche.

Then there’s storage to consider (do we have enough?), preservation know-how (are people sufficiently trained?), the pace of technological change (think how far we’ve come in the last decade alone) and inconsistent copyright laws with regard to historical recordings (the Library estimates rights owners have to date only released around 14% of historical recordings — the rest are essentially off-limits to preservationists).

To grapple with all that, the Library pulled together six task groups to study the issues and develop a set of formal recommendations. These were then distilled down to a total of 32, organized into four general areas: building the national sound recording preservation infrastructure, creating a blueprint for preservation strategies, promoting public access for educational purposes and outlining longer-term national strategies.

“The publication of this plan is a timely and historic achievement,” Librarian of Congress James H. Billington said of the Library’s plan. “As a nation, we have good reason to be proud of our record of creativity in the sound-recording arts and sciences. However, our collective energy in creating and consuming sound recordings has not been matched by an equal level of interest in preserving them for posterity. Radio broadcasts, music, interviews, historic speeches, field recordings, comedy records, author readings and other recordings have already been forever lost to the American people.”
That said, we’re already too late for some of these recordings. According to the Library:

Experts estimate that more than half of the titles recorded on cylinder records—the dominant format used by the U.S. recording industry during its first 23 years—have not survived. The archive of one of radio’s leading networks is lost. A fire at the storage facility of a principal record company ruined an unknown number of master recordings of both owned and leased materials. The whereabouts of a wire recording made by the crew members of the Enola Gay from inside the plane as the atom bomb was dropped on Hiroshima are unknown. Many key recordings made by George Gershwin no longer survive. Recordings by Frank Sinatra, Judy Garland, and other top recording artists have been lost. Personal collections belonging to recording artists were destroyed in Hurricanes Katrina and Sandy.

Among the Library’s recommendations, it’s arguing for the creation of a “publicly accessible national directory of institutional, corporate and private recorded-sound collections and an authoritative national discography that details the production of recordings and the location of preservation copies in public institutions.” It’s recommending that we develop a national collection standard for all sound recordings as well as secure environmentally controlled storage facilities, that we need academic degree-based programs for audio archiving and preservation, that we lay down standards for preserving digital audio files, that we develop a license agreement for streaming out-of-print recordings and that we apply federal copyright law to sound recordings created prior to Feb. 15, 1972 (recordings prior to this date are still subject to state or common copyright law through Feb. 15, 2067).

“Saving America’s recorded sound history and culture will require a concerted effort lasting many years,” adds Billington, sounding a pragmatics-minded cautionary note. “Keep in mind while reading the plan that its recommendations require a deliberately long view. The Library published its national plan for preserving the nation’s film heritage in 1994. Great progress has since been made in implementing its recommendations, but the efforts continue, much remains to be done, and similar long-term commitment and collaboration will be necessary to achieve many of the recommendations in the National Recording Preservation Plan.”

It’s hard to tell just glancing through the Library’s report how effective it’ll be rolling forward, or whether it should have taken this long just to present a plan in the first place. But maintaining stuff like this in its original form for historical purposes…you can’t really put a price tag on it. And so I’ll just join with radio legend Bob Edwards, who’s said of the plan, “I applaud this initiative and wish the Library of Congress success in its mission.”
ljk on April 11, 2013 at 15:17

An Alien Code May Be Hidden Inside Our DNA!

io9

… that the best way for an extraterrestrial civilization to communicate across stellar distances is to send messages embedded within genetic code. It’s an interesting take on the panspermia hypothesis, one the scientists hope will lead to “biological …

http://io9.com/scientists-say-an-alien-code-may-be-hidden-inside-our-d-472157262

Trackbacks/Pingbacks

Digital Data and DNA « Dad2059's Webzine of Science Fiction, Science Fact and Esoterica - [...] Data Storage: The DNA Option [...]
[links] Link salad buries its soul in a scrapbook | jlake.com - [...] Data Storage: The DNA Option — Brings a whole new meaning to the term “thumb drive”. [...]
Archiviazione dati: l’ipotesi DNA « Il Tredicesimo Cavaliere - [...] originale: Data Storage: The DNA Option, scritto da Paul Gilster e pubblicato su Centauri Dreams il [...]

Data Storage: The DNA Option

38 Comments

Trackbacks/Pingbacks

Now Reading

Charter

Recent Posts

On Comments

Advanced Propulsion Research

Exoplanet Projects (Earth)

Exoplanet Projects (Space)

Further Astronomical and Astronautical Resources

Weblogs, Discussions, Commentaries

Archives