Skip to content

Point Layout

The point layout stores data arrays as-is in parallel columns alongside a repeated index column. The top-level schema node is named point and is a group with an arbitrary number of columns. The entity index column MUST be the first column under point.

Point-layout schema: a single top-level point group holding a repeated spectrum_index column alongside parallel mz and intensity columns, with one table row per data point.

point
spectrum_indexmzintensity
1213.21002
1506.9500
1758405
2329.150
2516.55002
2783.8302

This layout is simple, but carries several advantages:

  • Predicate filtering. Scalar columns are easily filtered along the page-level range index, which makes multi-dimensional queries easy to write and optimise.
  • Transparent compression. Arrays are encoded and compressed by Parquet, so the data is still stored compactly.

The trade-off: data MUST be stored as-is to keep the page index meaningful, so no additional obscuring transformations (delta encoding, Numpress, etc.) may be used. The zero-run stripping and null-marking methods remain available, because they only remove non-meaningful points from the array rather than transforming the values that remain.