Data rot is the scourge of modern day computing. When I think of the ~2 million files on my iMac, from photos and videos to school reports dating back to the 90s, it’s disturbing to think that those virtual memories could become corrupt at any time without any warning.
What’s a moderately concerned and bored person to do? But not so concerned or bored as to take on the challenge of grafting a 3rd party filesystem and the associated unknown bugs on to OS X? They hack together a poor man’s version of course! Thus, I present to you my hack, Silent Corruption Detector. This is a simple little ruby script that will generate a hash of each of your files and store them in a SQLite database. Check out the screenshot above for an example of what the output looks like and how to run the script.
The basic principal is that I run this script about once a week which is well within my backup rotation schedule. That gives me time to recover any silently corrupted data from a good backup before it’s rotated.
The odds of a bit flipping are hard to believe. Much like a lottery with odds of winning so low that it’s not worth entering en masse. But unlike a lottery where a single ticket is no great expense, a single bit flip can mean the permanent loss of data.
What’s worse is that the error may not be found until several months or more likely years have passed. By then, it will be too late to recover. What about Time Machine you ask? If your setup is like mine, Time Machine is only able to keep a few months of history before it starts rotating. The window of opportunity to go back and retrieve an uncorrupted version is slim at best. The only saving grace would be the use of an eternal archive system such as bup combined with parity archives where every change back to the beginning of time are tracked. But these are only band aids to the symptoms of the problem.
That problem is that we don’t have perfect hardware yet. Our hard drives, Blu-rays, DVDs, and such are not infallible and everlasting. So, as engineers we do the best that we can to work around those limitations. The ideal place to do this is at the filesystem layer where it is an integral feature to the library housing our data. I say ideal place because the filesystem provides an extra layer of security on top of the checksumming that a hard drive controller should already be doing, and provides the protection without a complex advanced RAID configuration with error correction.
What’s interesting is that there seems to be a general feeling of “eh the odds are still pretty low” and so there’s not a lot of push by the majority to move to the next generation filesystems.