Skip to content

Auxiliary Data Arrays

When an array is present in an entry but is not encoded as a column in the schema, it must be stored as an auxiliary array. This happens when mixing different kinds of detectors in a single collection, and especially with diagnostic traces, where every array might have a different length along a shared time axis, or be sub-sampled.

Auxiliary data arrays have a schema similar to mzML's binaryDataArray, encoded in Parquet. They are governed by the JSON Schema schema/auxiliary_array.json.

optional group auxiliary_arrays (List) {
  repeated group list {
    optional group item {
      optional group data (List) {
        repeated group list {
          required int32 item (Int(bitWidth=8, isSigned=false));
        }
      }
      optional group name {
        optional group value {
          optional int64   integer;
          optional double  float;
          optional binary  string (String);
          optional boolean boolean;
        }
        optional binary accession (String);
        optional binary name (String);
        optional binary unit (String);
      }
      optional binary data_type (String);
      optional binary compression (String);
      optional binary unit (String);
      optional group parameters (List) {
        repeated group list {
          optional group item {
            optional group value {
              optional int64   integer;
              optional double  float;
              optional binary  string (String);
              optional boolean boolean;
            }
            optional binary accession (String);
            optional binary name (String);
            optional binary unit (String);
          }
        }
      }
      optional binary data_processing_ref (String);
    }
  }
}

Auxiliary arrays cannot be sliced

Because an auxiliary array is stored as an opaque encoded buffer rather than a first-class column, it cannot be searched or sliced without decoding the whole array — exactly as in mzML. The associated metadata row records the count in number_of_auxiliary_arrays, so a reader can cheaply decide whether the more expensive decoding step is worthwhile before attempting it.