9. Point Layout¶

The point layout stores data arrays as-is in parallel columns alongside a repeated index column. The top-level schema node is named point and is a group with an arbitrary number of columns. The entity index column MUST be the first column under point.

Point-layout schema: a single top-level point group holding a repeated spectrum_index column alongside parallel mz and intensity columns, with one table row per data point.

point
spectrum_index	mz	intensity
1	213.2	1002
1	506.9	500
1	758	405
…	…	…
2	329.1	50
2	516.5	5002
2	783.8	302

This layout is simple, but carries several advantages:

Predicate filtering. Scalar columns are easily filtered along the page-level range index, which makes multi-dimensional queries easy to write and optimise.
Transparent compression. Arrays are encoded and compressed by Parquet, so the data is still stored compactly.

The trade-off: data MUST be stored as-is to keep the page index meaningful, so no additional obscuring transformations (delta encoding, Numpress, etc.) may be used. The zero-run stripping and null-marking methods remain available, because they only remove non-meaningful points from the array rather than transforming the values that remain.