-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
A Parquet field with "Repeated" repetition and no "LIST" annotation are read as primitives instead of as list.
To reproduce: create a file with a top level field schema like:
REPEATED BYTE_ARRAY vals (UTF8);
and write lists of strings (i.e. with repetition levels of '0' and '1')
this should be read as a List of strings as specified in https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types
This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a LIST- or MAP-annotated group nor annotated by LIST or MAP should be interpreted as a required list of required elements where the element type is the type of the field.
Instead it is read as a field of single string values, where string comprising a logical list are instead read as distinct rows.
It is read correctly by pyarrow