-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
When a schema has a nullable field nested under a not nullable field, the random array/batch functions in util::data_gen do not generate null values for the nested field and will fail in other scenarios
To Reproduce
See test cases on PR #5713
Expected behavior
When I have a nested nullable field and provide a null_density > 0.0 I expect some null values to be present in the generated array and it not to error out.
Additional context
A nullable primitive nested under a not-null list would not fail but would rather pass null_density = 0 into create_random_array for the primitive resulting in no non-nulls
A nullable struct with nullable fields nested under a not-null list would fail in the list creation because the struct would infer nullability based on null_count of the its' arrays (which never had nulls because null_density was zeroed out) and thus change the datatype for the struct and fail in list creation.
Also when generating struct arrays if the struct field is nullable, null_density is not used as the struct array generation is using TryFrom<Vec(str, Array)> which does not allow you to specify any of the struct array entries as null.