Skip to content

Nested nullable fields do not get treated as nullable in data_gen #5712

@alexwilcoxson-rel

Description

@alexwilcoxson-rel

Describe the bug
When a schema has a nullable field nested under a not nullable field, the random array/batch functions in util::data_gen do not generate null values for the nested field and will fail in other scenarios

To Reproduce
See test cases on PR #5713

Expected behavior
When I have a nested nullable field and provide a null_density > 0.0 I expect some null values to be present in the generated array and it not to error out.

Additional context
A nullable primitive nested under a not-null list would not fail but would rather pass null_density = 0 into create_random_array for the primitive resulting in no non-nulls

A nullable struct with nullable fields nested under a not-null list would fail in the list creation because the struct would infer nullability based on null_count of the its' arrays (which never had nulls because null_density was zeroed out) and thus change the datatype for the struct and fail in list creation.

Also when generating struct arrays if the struct field is nullable, null_density is not used as the struct array generation is using TryFrom<Vec(str, Array)> which does not allow you to specify any of the struct array entries as null.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions