Skip to content

The Index File — mzpeak_index.json

An mzPeak archive is made up of multiple named files. To leave room for future files and avoid complicated file-name resolution, an index file identifies the contents of each file and broadly defines the kind of schema it carries. The file MUST be serialised as UTF-8.

{
  "files": [
    {
      "name": "spectra_data.parquet",
      "entity_type": "spectrum",
      "data_kind": "data arrays"
    },
    {
      "name": "spectra_metadata.parquet",
      "entity_type": "spectrum",
      "data_kind": "metadata"
    },
    {
      "name": "chromatograms_data.parquet",
      "entity_type": "chromatogram",
      "data_kind": "data arrays"
    },
    {
      "name": "chromatograms_metadata.parquet",
      "entity_type": "chromatogram",
      "data_kind": "metadata"
    }
  ],
  "metadata": {
    "version": "0.9.0",
    "cv_list": [
      {
        "id": "MS",
        "full_name": "Proteomics Standards Initiative Mass Spectrometry Ontology",
        "uri": "http://purl.obolibrary.org/obo/ms/4.1.248/ms.obo",
        "version": "4.1.248"
      },
      {
        "id": "UO",
        "full_name": "Units of measurement ontology",
        "uri": "http://purl.obolibrary.org/obo/uo/releases/2026-01-16/uo.obo",
        "version": "2026-01-16"
      }
    ],
    "file_description": { }
  }
}

The metadata object carries the archive version, the cv_list declaring every controlled vocabulary used (with source URI and version, so CURIEs resolve reproducibly), and the file-level metadata objects.

Governed by the JSON Schema schema/mzpeak_index.json.

Each entry pairs a data_kind with an entity_type. Both are loose enumerations expected to grow over time; resolving files by these controlled terms is more robust than matching file names.

File-level metadata

File-level metadata SHOULD be stored in mzpeak_index.metadata and in the metadata Parquet files' key–value pairs, as JSON encoded according to the schemas below:

Open item — cleartext vs. encryptable metadata

Anything in mzpeak_index.json is necessarily cleartext to all readers unless ZIP encryption is used — and ZIP encryption is known to be flawed and inconsistent. Anything in a Parquet footer's key–value pairs is encryptable. The index is JSON for convenience and ease of access from scripting languages; whether some fields should move to encryptable Parquet metadata is unresolved.