The Index File — mzpeak_index.json¶
An mzPeak archive is made up of multiple named files. To leave room for future files and avoid complicated file-name resolution, an index file identifies the contents of each file and broadly defines the kind of schema it carries. The file MUST be serialised as UTF-8.
{
"files": [
{
"name": "spectra_data.parquet",
"entity_type": "spectrum",
"data_kind": "data arrays"
},
{
"name": "spectra_metadata.parquet",
"entity_type": "spectrum",
"data_kind": "metadata"
},
{
"name": "chromatograms_data.parquet",
"entity_type": "chromatogram",
"data_kind": "data arrays"
},
{
"name": "chromatograms_metadata.parquet",
"entity_type": "chromatogram",
"data_kind": "metadata"
}
],
"metadata": {
"version": "0.9.0",
"cv_list": [
{
"id": "MS",
"full_name": "Proteomics Standards Initiative Mass Spectrometry Ontology",
"uri": "http://purl.obolibrary.org/obo/ms/4.1.248/ms.obo",
"version": "4.1.248"
},
{
"id": "UO",
"full_name": "Units of measurement ontology",
"uri": "http://purl.obolibrary.org/obo/uo/releases/2026-01-16/uo.obo",
"version": "2026-01-16"
}
],
"file_description": { }
}
}
The metadata object carries the archive version, the cv_list declaring every
controlled vocabulary used (with source URI and version, so CURIEs resolve
reproducibly), and the file-level metadata objects.
Governed by the JSON Schema
schema/mzpeak_index.json.
Each entry pairs a data_kind with an
entity_type. Both are loose enumerations expected to grow
over time; resolving files by these controlled terms is more robust than matching
file names.
File-level metadata¶
File-level metadata SHOULD be stored in mzpeak_index.metadata and in the
metadata Parquet files' key–value pairs, as JSON encoded according to the schemas
below:
cv_listfile_descriptioninstrument_configuration_listdata_processing_method_listsoftware_listsample_listscan_settings_listrun
Open item — cleartext vs. encryptable metadata
Anything in mzpeak_index.json is necessarily cleartext to all readers
unless ZIP encryption is used — and ZIP encryption is known to be flawed and
inconsistent. Anything in a Parquet footer's key–value pairs is
encryptable. The index is JSON for convenience and ease of access from
scripting languages; whether some fields should move to encryptable Parquet
metadata is unresolved.