Skip to content

What is mzPeak?

mzPeak is a next-generation open file format for mass spectrometry data — designed as the successor to mzML. It keeps everything mzML can describe, but stores it in a layout built for today's data volumes and cloud workflows.

The idea in one line

An mzPeak file is a ZIP archive of Apache Parquet tables plus a small JSON index — columnar, compressed, randomly addressable, and self-describing.

An mzPeak archive: Parquet data columns, metadata, and an index, inside one ZIP container.

Why a new format?

  • Size. Spectra are columnar and compressed, so a file is typically a fraction of the equivalent mzML — without losing a single data point.
  • Speed & cloud-native access. Parquet's column + row-group layout means a reader can fetch just the bytes it needs over an HTTP range request. Open one spectrum from a multi-gigabyte run in a browser, with no download.
  • Interoperable by design. The structure is language-independent (ZIP + Parquet are everywhere), and the semantics are anchored in the PSI-MS controlled vocabulary — the same vocabulary the community already uses. A versioned validator checks conformance.
  • Extensible without forking. New kinds of data attach through documented entity-type and data-kind mechanisms, so the format grows by extension rather than by incompatible variants.

Smaller, on real data

Across three families of public datasets, mzPeak is consistently about half the size of the original vendor raw file — and a fraction of the equivalent mzML. Browse the example data ↗, or see the per-instrument numbers on the home page.

General MS data: mzPeak about 50% of the vendor raw sizeImaging MS (MSI): mzPeak about 35% of the vendor raw sizeStudy-design embedding: mzPeak about 45% of the vendor raw size

What's inside an archive

MemberWhat it holds
*_data.parquetthe signal: m/z + intensity (and ion-mobility, etc.) as sorted columns
metadatainstrument, software, samples, run description, controlled-vocabulary declarations
mzpeak_index.jsonthe manifest — what members exist and how to find them
Other membersoptional embedded artifacts: optical images, SDRF/ISA sample metadata, provenance

Governance

mzPeak is developed as an open community effort under HUPO-PSI (the Proteomics Standards Initiative). The canonical specification and the reference implementation are public — see Tools.

An open HUPO-PSI community format.