One of my storage vendors recently phoned and suggested that I upgrade our SANs (storage area networks) to “get ready for big data”. I hung up on him. Big data is already here. We’re awash in data that we never meant to collect and don’t know what to do with. Storage has gotten so cheap that we’ve lost our discipline.
The idea that, “if we have the free space, then why not save everything and keep it for ever?” is terrible because it’s based on the theory that having data allows you to find what you need when you need it. CIOs who remember the 1980s will disagree. When hard drives were small and expensive, and when DOS file names were limited to eight characters, trying to find a specific record was maddening.
We had to methodically sort directory-by-directory, file-by-file. In those days, records management was akin to finding a needle hidden in a haystack. It was true, in that you could always burn the haystack down and sift the ashes.
Modern distributed and cloud storage solutions make that analogy obsolete. Your critical records aren’t needles hidden within a single, finite haystack; your records are more like sparrows hiding within a forest where each tree represents a different volume. When there are thousands to search, where do you start looking?
Also critical documents, like sparrows, move. Virtual storage can leap from one physical data centre to another in seconds. Managing data on a global scale is more difficult now, since data won’t sit still.
The key to success in the big data era is to create and enforce a comprehensive information management strategy. Identify what constitutes your critical information. Define where and how it’s used, where it must reside, what format it must exist in, and how it’s to be marked in the system so that you can facilitate finding it when you need it.
For individual records this can entail mandating that computer-searchable metadata be added to the document properties. For large data sources, it can entail flagging critical shares and databases so that their mobility within the cloud or between data centres is both restricted and reported to management.
The other end of the spectrum is just as important. Not all of your data is critical, important or even necessary. Tonnes of data gets collected every hour that no one actually needs. When you define what’s most important to the business, also define what’s least important to the company and put procedures in place to annihilate the dross as quickly as possible.
For everything between the grades of critical and trivial, establish rules on retiring data as soon as it ceases to be meaningful. Keep only what’s necessary according to laws and regulations, and what might provide you with strategic value down the road. Keep the forest small enough to track your elusive sparrows.