Point Layout¶
The point layout stores data arrays as-is in parallel columns alongside a
repeated index column. The top-level schema node is named point and is a group
with an arbitrary number of columns. The entity index column MUST be the
first column under point.

| point | ||
|---|---|---|
| spectrum_index | mz | intensity |
| 1 | 213.2 | 1002 |
| 1 | 506.9 | 500 |
| 1 | 758 | 405 |
| … | … | … |
| 2 | 329.1 | 50 |
| 2 | 516.5 | 5002 |
| 2 | 783.8 | 302 |
This layout is simple, but carries several advantages:
- Predicate filtering. Scalar columns are easily filtered along the page-level range index, which makes multi-dimensional queries easy to write and optimise.
- Transparent compression. Arrays are encoded and compressed by Parquet, so the data is still stored compactly.
The trade-off: data MUST be stored as-is to keep the page index meaningful, so no additional obscuring transformations (delta encoding, Numpress, etc.) may be used. The zero-run stripping and null-marking methods remain available, because they only remove non-meaningful points from the array rather than transforming the values that remain.