archivematica-architectural-decisions

Archivematica original design principles

Context and problem statement

There is no open-source software system capable of implementing the functional requirements of the OAIS reference model (ISO 14721:2003). Digital preservation specialists must use multiple tools, which can be difficult to install and use, to perform discrete preservation actions. Those tools produce metadata that do not conform to digital preservation and exchange standards and schemas, and do not provide a way to automatically generate standardized, system-independent, self-documenting Archival Information Packages (AIPs) that package content and Preservation Description Information (PDI) as described by OAIS. Repository applications such as Fedora are capable of performing some but not all OAIS preservation actions, and tend to be complex to develop and maintain, posing a risk to future retrieval and readability of the AIPs. In fact, any middleware repository or database software that is required to access and read AIPs is inherently a risk to their long-term usability.

Decision drivers

Artefactual designed an open-source, web-based archival description and access system called ICA-AtoM (Access To Memory) that has a broad user base around the world. ICA-AtoM does not provide digital preservation functionality as described by OAIS. It would benefit ICA-AtoM users to be able to integrate with a back-end system designed to preserve digital objects that are linked to access copies in ICA-AtoM. The system should also be usable on its own or in conjunction with other access tools.

Considered options

  1. Develop a repository interface to a digital object store (e.g. Fedora or DSpace); or
  2. Develop a database-backed application to manage AIPs (e.g. Preservica); or
  3. Use a micro-services approach with loosely-coupled tools to provide digital curation services built around file system storage (e.g. University of California’s Merritt System).

Decision outcome

Design principles:

  1. The application will perform a set of configurable preservation actions on ingested digital objects, using the file system as the focal point of preservation action operations. Making the file system the focal point of micro-services operations is noteworthy as a long-term preservation strategy because it provides users with the option of direct, unmediated access to archival storage. This might be necessary one day because the various layers and generations of digital preservation system components are just as susceptible to the risk of technology obsolescence and incompatibility as the digital objects they are attempting to preserve.
  2. The information packages ingested by the application will be moved from one micro-service to the next using the Unix pipeline pattern.
  3. Micro-service functionality will be provided by one or more of the open-source software utilities and applications bundled into the application. Where necessary, these will be supplemented by integration code written as Python scripts.
  4. The application will provide a graphical user interface so that the end user can determine and control the status of digital objects moving through the pipeline.
  5. The application will generate AIPs that are system-independent, self-documenting and self-contained. The AIPs will contain PDI that conforms to recognized standards and schemas, such as PREMIS, METS and Dublin Core.
  6. The file-based AIP is the canonical source of the preserved digital objects and metadata. The preservation application may be supported by databases and indexes that are derived and updated from this source, but these are not essential to long-term preservation of the AIP.
  7. The application will be designed to integrate with diverse external systems capable of serving as sources of digital objects to be ingested.
  8. The application will be designed to integrate with diverse storage systems and protocols for deposit and long-term preservation of AIPs.
  9. The application will be designed to integrate with diverse external systems capable of providing search, browse and display capabilities for Dissemination Information Packages (DIPs) generated by the application.