Keeping your digital pictures safe.
"Time" is one of my original works of art (C) 2010. It represents many things. For the most part I leave the interpretation to the viewer. However, in the context of this article, "time" is ideal to illustrate the possible fragility of digital data.
Head on over to spOOks-art to see more graphite portraits if you like. A graphite portrait makes a very long lasting gift.
Sit tight and I'll explain...
Archival properties of various media.
Papyrus, paper, stone.
These have been proven to last thousands of years. That's an amazing quality. Even an oil painting can last centuries, and a black and white photograph can survive more than one hundred years. My pencil drawing is done on acid-free paper and will last hundreds of years, especially because graphite is very stable.
When the compact disk (CD) was first marketed, it was said to have almost indestructible qualities. I recall adverts showing how you could spill things on it, wash it, even scratch it and it played as good as new.
However, in recent tiems, the CD has proven to be rather unreliable. In particular, it is affected by chemicals and by light and temperature.
I have several CDs that have failed in the span of 5 to 10 years, with the most susceptible ones being those that you write yourself. A great many CDs were ruined by the very lables used to identify them.
Although they use error correcting codes, these are only sufficient for partial surface damage. On the label side of any CD is a reflective coating. If that gets damaged, then the data in that area is unreadable.
A quick word about bits and bytes.
Bits and bytes simply refer to a coding scheme. In the case of a picture, sound-clip or video clip or text, a series of patterns from a library of only two symbols is sufficient to encode them all.This is where we get ones and zeros, and 'binary digits'. See "What's all this binary and octal about anyway?" for more information on that topic.
There is a key difference between a real photo and one which has been digitally archived. The real photograph or work of art is an analogue recording. As it degrades due to wear and tear it does so gracefully. This means that there will be a recognizable image of some use all through its life. In contrast, a bit belonging to a digital recording is either perfectly correct or exactly wrong, so when it degrades, it does so spectacularly.
To compensate for this, a clever digital recording includes extra bits which can be used to detect and recalculate damaged bits. This is known as bit-error detection and correction.
To implement the digital encoding, and do bit-error detection and correction demands the invention of a mathematical algorithm. This algorithm will be owned by a certain company in many cases and effectively it becomes part of your digitally archived work of art.
You have to ask yourself, "Is this a good thing?"
Is this a good thing?
On the positive side, we know how to squeeze a tremendous number of bits into a very tiny space, even when the extra bit-error corrections are included. This allows us to store trillions of bits of data in a small space.
The downside is that a small damaged area equals a large loss of data if it damages more than the error-codes can recover. But there is a more worrying problem:
The algorithms and encoding format and schemes could be owned by a corporation. They may be a trade secret, and may simply fall into disuse over the years.
Hence, it has happened and will continue to happen that valuable data will be stored on an archaic digital media using a legacy algorithm that is no longer available. Even if the media is not damaged, it is lost because either no one has the equipment or the knowledge how to retrieve it.
If you are thinking this may be a little far-fetched, then consider the case of some lunar-landing tapes, held by NASA in Western Australia. These were recorded in 1969 and today there are no tape readers available and the tapes cannot be read until new equipment is manufactured. This is a close-call for longevity of this data. I don't know if it is digital or analogue, but this case certainly illustrates the problem of obsolescence.
There are several ways to lose your digital data:
- equipment obsolescence
- a proprietary algorithm is lost
- the media is damaged
- the media is lost
- the media gets old and degrades
Old techniques of analogue data storage has the following problems:
- the media is badly damaged
- the media is lost
- the media gets really old and eventually degrades completely
You can see why modern digital media is more fragile than the old methods. To compound the problem, modern digital techniques have such a high recording density that a relatively small amount of damage can cause massive data loss. This particular problem will get worse because the packing density of data has been exponentially increasing for years. The commercially-active lifetime of any given media technology is also becoming shorter.
Tape formats lasted longer than Vinal records which laster longer than CD which have a longer history than DVD which is being superceded by Bluray which may get superceded by flash technology, digital drives or some other kind of non-volatile solid state storage, and then the potential for format changes and technological changes increases on an ever decreasing timescale.
The global holocaust of 2234
Imagine a future archaeologist who finds some old media.
Imagine an archaeologist, far in the future who finds two kinds of of digital media. She is looking first at a microfilm, and the second, we know as an archival quality audio CD. Both are hermetically sealed in an old vault and in very good condition. The archeologist has never seen such odd artifacts, but quickly concludes that they were lost before the global holocaust of 2234.
She is very keen to find out what these objects are.
The microfilm presents no problem because when magnified, she can see pictures and symbols. Over time, the pictures helps her and her team decode some of the symbols and for that the team wins an industry award.
The other object is confusing. It is shiny like jewelry, and has a hole in the center. There is an abstract picture on one side, and some symbols, none of which make sense, even with the clues afforded by the microfilm. After several years, they conclude that it is indeed jewelry and bizarrely used to insert into a tribal leader's lower lip as a status symbol. This theory is apparently supported by some pictures found on the microfilm.
Not everyone believes the theory, and the object remains under study for many years. Finally, someone proposes that it contains information, and they verify this by finding a regular pattern on the shiny side of the disk. They can even see some pits in regular patterns but the theory never advances beyond this.
How much storage do you have at home?
If I add up all my disk space at home it comes to:
What does this imaginary story tell us?
The microfilm is an analogue recording. For all time, it remains an easily recognisable and relatively easily decoded body of work.
The CD is useless and mysterious without the knowledge of what it is exactly for, and most importantly how the information is physically read and decoded.
Reading a CD is very complex. How could a future archeologist possibly re-invent the laser reading device, the encoding algorithm AND the bit-error and detection technique?
As a parallel to Moore's law for computer chip speeds, Kryder's law is named after Mark Kryder, the chief technical officer of Seagate who manufacture hard disks.
I've been known in the last couple of years to say things like, "Sure - data storage is free these days anyway". What I am half joking about is Kryder's law. When I bought my first "HUGE" disk drive in 1988, it was a "massive" 300MB. Today, a 300MB disk drive, if you can find one, heads straight to the recycle bin. Today, for 1/3 of the cost of that 300MB drive, you can purchase 1 Terabyte or more.
Let's put some numbers into focus. A large book, like the English translation of "War and Peace" contains about 1.3 million characters.
Without using any form of compression, an ASCII encoded copy of this book would use about 0.4 percent of this 300MB drive - i.e. you could store more than 200 similar sized books on a drive that size. Modern compression techniques could more than double this to say, 500 books.
A 1 TB drive contains 1000 GB which is 1,000,000 MB which is more than 3 thousand times larger than my old 300MB drive, and I could store over 1.6 million books of similar size. Each of those books would typically contain 1,400 pages, so 1 Terabyte represents about 2.3 billion pages. Each book could be 1cm thick for every 150 pages. A complete pile of 2.3 billion books would be 3,450,000 Km high.
I think it was 1986 or 1987 when an April 1st joke was published telling people how to convert their new external Commodore external storage into a 1 Terabyte drive. The joke is not so funny today.
It will only be a few years before you can buy a 1 Petabyte external hard drive on the domestic market. Our pile of books would now be 3.5 billion Km high.
The average distance from Earth to the Sun is about 150 Million Km. So our pile of books could be stacked 23 wide and still reach the Sun. That's a lot of data. The cost for this storage will soon be about $100. Now it should be clear why I joke that "storage is free". It also explains why you can get a lot of storage on-line on the Internet for nothing more than the (dis)-pleasure of receiving a few advertisements.
Yet another observation and potential problem is the disparity between advancement of CPU speed and storage capacity.
Both Moore's law, and Kryder's Law are based on a logarithmic trend, but Kryder's law represents a doubling in storage capacity faster than Moore's law represents a doubling in CPU speed. In the early days of development this was insignificant, but the underlying trend has remained consistent. Today, we find that the difference between the growth rate of disk capacity and CPU speed is also exponential.
The worrying problem here is whether it will progressively take longer to move data from one storage device to another? Will CPU speed and communication channels keep up with the task? This is known as the CPU storage gap.
How do you back up an array of 10 Petabytes, 100 Petabytes, an Exabyte?
Another problem is the inherent difference in access-times for data stored in various ways. For example, data which is held in a cache on the CPU is available within picoseconds while that in volatile RAM is available in nanoseconds. Data on a solid state drive is microseconds away and that on a normal magnetic disk is milliseconds away. When you consider this, then perhaps the CPU storage gap is not the main issue. It could be that unless access times are significantly increased along with storage capacity, it may become impractical or impossible to copy all the data from one location to another in a reasonable time.
Do you take care of your digital data?
Do you have old digital data on outdated media?
What does this mean for me at home?
If you digitally archive your artwork, photographs, home videos and other valuable treasures, then you must take extra precautions.
Print them out! At least the important ones. Use the services of a specialized fine art photographer for your paintings and drawings. Make copies and post them to relatives for safe keeping. With so many photographs only in digital form today, there is a danger that far future generations will not be able to read it for some reason.
It is very risky to keep this data in one place and on one media-type. You need to keep making copies as new technology is invented, so that your kid's voices on cassette tape or your home videos on 8mm cellulose film, or the thousands of digital photographs don't disappear into a personal digital dark-age.
Never keep only one copy, or multiple copies in the same place! If you have a fire or flood, then all the copies could get destroyed. You should send the most important copies to a relative, or take advantage of free storage provided by internet-based providers. This is a sort of personal disaster-recovery plan for your digital data.
If you keep moving it from technology to technology, then it will remain available, and you also know that it is free of error. If you archived a photo to a floppy disk and left that only copy in a drawer somewhere, then even today (2010) it could be quite difficult to restore. Not many people are using floppy disks these days, and the floppy disk could have suffered a 'silent' error or two.
Take advantage of services to transfer old media to new media - like VHS tapes to Blueray or Vinal disks to digital WAV files. There is no need to destroy the old media - try and keep it, but don't hold out hopes that your great great grandchildren will be able to use it. Hopefully, they will be able to download a copy from their newer/faster/bigger version of the internet.
Links to things that are relevant
- Hitachi Says Data Lives Forever in Quartz Glass
The AFP news agency reports that Hitachi has discovered a way to store digital information on slivers of quartz glass. This data can seemingly exist forever, enduring extreme temperatures and hostile conditions without degrading... at least until...
- Photobooks by albumworks. Simply better photo books, printed photo albums, posters and calendars, Au
Photobooks by albumworks - simply better photobooks, delivered within Australia. Calendars, posters, photo books and more. This is a great idea. You can create your own book using your own words and pictures.