Back in January of 2010, the news that the National Historical Publications and Records Commission would fund the MFAH’s two-year Electronic Records Archives (ERA) planning grant was met with elation by the project team, which originally consisted of the chief technology officer, the director of information technology, the records manager, and myself, the Archives director. (A third I.T. staffer was added last spring to assist in the painstaking process of paring down the National Archives and Records Administration ambitious ERA Concept of Operations into one tailored to the more modest needs of the MFAH.) After a few weeks of blissfully issuing press releases, and even speaking to an interviewer or two, it was time to roll up our sleeves.
Dutifully, we turned to the Open Archival Information System (OAIS) Reference Model and began to contemplate Submission Ingestion Packages (SIPs). The question that arose almost immediately was a deceptively simple one, “Who is going to create the SIPs?” The MFAH is a private non-profit institution, not a government agency with statutory mandates and record classification systems. There is a comprehensive records management program that has operated successfully since 1994, but how well would it translate to the virtual environment? How much unclassified data lived on the servers anyway? The CTO estimated sixty terabytes, mostly unstructured (meaning data residing outside of a database and, as the name implies, not conforming to uniform attributes as in a data table. Semi-structured data such as found in e-mail and Sharepoint comprise the third category.) Sixty terabytes? The estimate had increased three-fold over the prior year when we had proposed the grant. While most of the increase could be attributed to an institutional mandate to create publication quality images of the permanent collection, the increase and sheer magnitude of the data gave us pause.
Since that time the project team has moved forward on two fronts. The first has been to apply existing retention schedules to the museum’s e-records. Toward that end, the departmental retention schedule model has been mapped to a functional one that facilitates plans to standardize similar record schedules across departments. The second has been to explore the emerging technology of automated classification for the preliminary appraisal of large sets of e-records. This fascinating technology is designed to analyze the content of unstructured data; several MFAH departments participated in a testbed to be used to judge the accuracy of the software before implementation. Until such time that an ideal solution is found, the current path chosen is to pilot the ERA with select records. Measured steps for the long ascent.