I’m often asked why the preservation of digital materials is so complicated. After all, isn’t it simply about the storage and migration, or emulation, of digital objects and metadata? Why do you need all of these policies and procedures around a data or digital archive? Why can’t you just store the digital files and leave them?
Recently, Forbes.com opened an email time capsule the company set aside a mere five years ago. The blog post illustrates some of the difficulties of preserving digital materials over time — in this case, only half a decade.
First, I’ll describe the purpose of the experiment.
The experiment, which we called an “E-Mail Time Capsule,” was part of a special report on Communicating. We invited our readers to communicate with their future selves by writing a letter, which we would store for them, and send at a later date.
Over 140,000 people participated, choosing whether they wanted their capsule “opened” in one, three, five, ten or twenty years.
In 2006 and 2008, we successfully sent over 40,000 messages. And now we’ve hit the five year mark, and are in the process of sending 17,000 emails to our users –half a decade after they wrote them.
Simple, right? You set up three geographically disparate servers, and set one to send if the first one doesn’t, and the third to send the emails if the first two servers fail. However, it wasn’t quite that simple, as the author notes.
We’re excited to see this strange thing is still working, because while it’s pretty simple to preserve a physical time capsule (dig hole, insert non-biodegradable container), the realities of digital preservation are surprisingly complicated.
Here’s the problem: Anyone can send an email scheduled for future delivery. It’s just a matter of writing it and setting a send date in the future. Some e-mail programs will do it for you, and there are a couple of web sites dedicated to the practice, including FutureMe.org (more on them here). But once your message is written and waiting to be sent, all kinds of things can happen to prevent delivery, particularly if you’re going to be waiting for decades.
One hard drive crash, and you could lose the emails. Try to protect them by storing on a physical medium like a CD or magnetic tape, and you could lose that too –or discover, years later, that the format is obsolete and unreadable. And since we’re not just storing messages in a box, and actually want to send them, there’s all kinds of other complications: What if our server gets disconnected from the Internet? What if e-mail protocols change?
So we planned to keep things working using the same strategy that keeps the Internet up and running: redundancy.
Still sounds relatively simple, right? Um…no. Here are some of the problems Forbes encountered only five years into this experiment, in my words:
- Ch-ch-changes: Email addresses change (even though they did ask participants to keep the same email address).
- Success Relies on One Person/A Single Point of Failure: The partner providing the 2nd backup server is a small company; it appears the company president designed the database and coded the application. What happens if he sells the company or dies? Does he himself have a back up? Does someone else have the skills and knowledge to take over the project at the company if he disappears without warning one day to hike the Appalachian Trail? Does anyone else at his company know about the project? Did he upgrade and migrate the software and databases over these past five years? What if it wasn’t five years — but 10, or 20? 100?
- People Leave Jobs, Are Laid Off, and Divisions Disappear: The partner providing the 3rd back up server is Yahoo!. The author writes:
Anyone who follows tech knows Yahoo has had some tough times in the last five years, but they’re still around, and still a very important company. Nonetheless, just one year into the project, with the technological infrastructure still not completely up and running, everyone we had worked with at Yahoo had left or been laid off. We had a few discussions with their replacements to try to explain the concept, and get them to set up the time capsule software, but frankly, it was more complicated than it was worth: Codefix and Forbes simply coordinated the project on our end, and Garrison sent out the first batch of messages manually.
- Reliance on Human Memory: The author himself forgot about another batch of emails coming out next month, because the reminder was on a piece of hardware that he no longer uses and he didn’t migrate the (reminder) data from it to another, similar piece of hardware.
The author of the blog post about Forbes’ email time capsule project concludes:
So it only took a year for our concept of a distributed, automated time capsule mechanism to break at a fundamental level. But at least the emails were still going out as scheduled, and that was the point of the distributed model: We’d keep each other on the job.
Thankfully, it’s still working –if barely. A few weeks ago I emailed Garrison about a totally unrelated matter, and in his reply, he added a small postscript: Aren’t we supposed to send out more emails next month?
Honestly, I’d forgotten. (The reminders I’d left myself half a decade ago were probably scheduled on my Palm Treo, which was replaced not too long after by Apple’s first iPhone, and Google’s Calendar service).
But Garrison remembered, and as I write this, he’s queuing up 16,980 email messages to send to people around the world — a little digital blast from their past.
So five years into this experiment, the Internet giant is out of the picture, and the media company nearly dropped the ball. Will we be able to keep the emails coming? The next time capsule opening is in the year 2015. I can’t wait to see what happens.
Neither can I.
The question is, what if the stored data wasn’t for fun, but was of national importance? What if you need provenance? What if you need an audit trail to prove that none of the emails were altered?
The problems Forbes encountered are only the tip of the iceberg, but they illustrate the challenges digital preservationists encounter as they try to preserve cultural heritage material, research data, the scholarly record, and the national record of their respective countries.