13/07/2018
Backup vs. archive: Why it’s important to know the difference
> Restore vs. retrieval
Even if the purpose of an archive is to save space on primary storage, it needs to be able to perform a retrieval vs a restore if it is to be called an archive. Backup systems restore and archive systems retrieve.
When you restore something, it is typically a single file, server or database. When you retrieve something, it’s usually a collection of related data, that may or may not have been stored on the same server or even in the same format. A restore is also done to a single point in time, such as restoring a database to the way it looked yesterday. A retrieval uses a range of time, such as all emails for the last three years.
Restores require you to know a lot of about where the file or data was when it was backed up; otherwise, you can’t find it. You need to know the name of the server it was on, the database or directory it was in, the name(s) of the file or table you want back and the date when it was last seen. Retrievals have none of that information; they just know they need all the files or records that match a set of parameters. Give me all files or emails that were created in the last three years that contain a particular phrase or were authored by a particular person.
Why the difference matters
Many people try to use their backup system as an archive system, meaning they keep their backups for many years – or even forever. The first time you get a real retrieval request, you’ll find how difficult it is to perform a retrieve from something that is mean to do restores. This will make the retrieval take much, much longer – potentially months instead of minutes – and cost much, much, more – millions instead of a few dollars.
If the retrieval is for an electronic discovery request from a lawsuit, and you are unable to satisfy it in a timely manner, you run the risk of the judge issuing an adverse inference instruction. You’ve taken six months to satisfy what they know to be a simple request, and you’re nowhere near complete. The judge infers you’re trying to hide something, and they say that to the jury. You just lost the case. The most infamous example of this was the Morgan Stanley lawsuit where they lost billions in this exact scenario.
Don’t use your backups as archives. If you have a long-term storage need, investigate an actual archive system. There will be an upfront cost, but it will be worth it in the long run.