There are many ways to approach data archiving and each approach stores data differently. W. Curtis Preston, backup, storage and recovery expert, shares an informative article on Network World of three main approaches to archiving data: traditional batch archive, real-time archive and hierarchical storage management (HSM) archive.
Traditional batch archive
“With a traditional batch archive, data serves its purpose for a certain period before being tucked away in a safe repository, awaiting the possibility of being of some use in the future. The main idea behind this type of archive is to preserve data over an extended timeframe, while keeping costs at a minimum and ensuring that retrieval remains a breeze even years down the line. In this kind of archive system, each collection of data selected for archiving is given one or more identities, stored as metadata alongside the archived data. This metadata plays a pivotal role in locating and retrieving the archived information, with details such as project names, tools used for to create the data, the creator’s name, and the creation timeframe all forming part of this digital fingerprint.”
“In this type of archive, data created or stored in the production environment is instantaneously duplicated and sent to a secondary location for archiving purposes. Compliance and auditing are the primary use cases for real-time archives. Take, for instance, the classic example of journal email accounts in the era when on-premises email systems reigned supreme. As an email entered the mail system, an identical copy found its way into the journal mailbox, while the original landed in the recipient’s inbox. This journal mailbox served as a reservoir accessible to auditors and managers seeking information for legal matters or fulfilling freedom of information (FOIA) requests. Access to real-time archives typically occurs through specialized portals equipped with granular search capabilities. It’s important to note that (unlike traditional archive) real-time archives don’t alleviate the pressure on production storage systems – unless, of course, they incorporate the features discussed later in this article regarding hierarchical storage management (HSM).”
“Among the diverse archive systems, the “HSM-style” archive is a standout. It leverages hierarchical storage management (HSM) to govern data storage – a term that has somewhat gone by the wayside, even though the concept remains. When users no longer require daily access to data, or when data becomes dated but must be retained for compliance, organizations start exploring alternatives like storing this data on scalable object storage systems or dedicated cloud-based cold storage. Additionally, some solutions allow archive data migration to tape for off-site and offline storage, with the notion that tape provides enhanced security by being virtually inaccessible unless explicitly needed. Moreover, tape often offers a lower cost per gigabyte compared to most other storage systems. Tape also excels at long-term data retention. One common implementation of this concept applied HSM to real-time email archives, a prevalent practice in the early 2000s. As user mailboxes swelled with HTML-formatted emails and hefty attachments, organizations were faced with burgeoning storage requirements. HSM-style archives typically relocate data based on age or the last access timestamp. As data migrates from the filesystem to the archive, it often leaves behind pointers or stubs in the source system, facilitating automated retrieval when required.”
Your specific needs for accessing your historical data will determine which of the three categories your archive solution option falls into. Whether it’s traditional, real-time or HSM, your historical data will be better stored in an archive system than in simple storage databases.