archivematica-architectural-decisions

New API endpoint for CSV validation

Context and problem statement

Archivematica users rely on spreadsheets created in a specific way to perform tasks within or after Archivematica. Documentations and examples can be found here but there is no clear validation as can be performed by a machine. As one example, the metadata.csv and rights.csv files are “special” and are utilized by Archivematica to add metadata or rights metadata into the AIP’s METS file. Another example is the Avalon Media System having a specific Manifest.csv file that is used to recreate hierarchical information and additional metadata, which is used after a DIP is created from a stored AIP. It would be beneficial if this manifest could be validated prior to going through the preservation process. Both of these examples would benefit from a validation service that a user (or automated system) could access prior to ingest into Archivematica.

Decision drivers

Considered options

  1. API endpoint for pre-ingest CSV validation
  2. Create validation microservice
  3. Using Automation-tools to validate specific CSVs
  4. Upload UI for validating CSV files

Decision outcome

Chosen option: “1. API endpoint for pre-ingest CSV validation”, because it is flexible, it lays the groundwork for future work around CSV validation as a step to be taken by Archivematica. It doesn’t have the complications of the long-term maintenance and testing of a GUI component. It can be more easily automated. The solution allows for custom or institutionally-specific CSV to be used/added.

Technically, this would live in the Archivematica codebase and be a new endpoint in the Archivematica API

Proposed endpoint below:

URL: /api/transfer/validate_csv
Verb: POST
Validates local CSV with validator service Python script
Parameters: CSV
    input: Path to the CSV
    validator: Name of service CSV should be checked against, i.e. "avalon"
                or "rights"
Response: JSON
    message: Approval or non-approval, depending on service output

Positive consequences

Negative consequences

Pros and cons of the options

Options are discussed with pros/cons outlined in the linked issue and pull request. Some are implicit in the above decision outcome and positive/negative consequences sections.