Skip to content

Support Repeated fields in Record APIs #2394

@zeevm

Description

@zeevm

A Parquet field with "Repeated" repetition and no "LIST" annotation are read as primitives instead of as list.

To reproduce: create a file with a top level field schema like:

REPEATED BYTE_ARRAY vals (UTF8);

and write lists of strings (i.e. with repetition levels of '0' and '1')

this should be read as a List of strings as specified in https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types

This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a LIST- or MAP-annotated group nor annotated by LIST or MAP should be interpreted as a required list of required elements where the element type is the type of the field.

Instead it is read as a field of single string values, where string comprising a logical list are instead read as distinct rows.

It is read correctly by pyarrow

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changeloghelp wantedparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions