Investigate usages of ArrayData::new (WIP)#813
Investigate usages of ArrayData::new (WIP)#813jhorstmann wants to merge 1 commit intoapache:masterfrom
Conversation
|
@alamb I also started looking into all the |
|
This is a very cool idea @jhorstmann -- thank you. Do you have any thoughts on when it would be most appropriate to apply the two levels of "validate"?
Specifically, I am wondering if you think |
|
Validating primitive or boolean arrays is probably cheap, so most kernels (arithmetic or comparisons for example) could always use the safe version. The main usecase for the unsafe variant is probably the Interestingly |
Amusingly (?) the cast kernels is the one place some of these checks have found a bug already by applying the validity checks, resulting #815 :) |
|
Changes here were merged via #822 |
Which issue does this PR close?
Related/complementary to the validation in #810, this PR investigates our usages of
ArrayData::newand whether they are actually safe. Creating primitive or boolean arrays can be done safely with minimal validation. For other data types this introduces an unsafenew_uncheckedmethod for usages in performane critical kernels.Todo:
Safetyannotations to all usages ofnew_uncheckedor decide whether to use a safe and validating alternativenew_uncheckedArrayDataBuilder::buildneeds to be either unsafe or do validationnull_countthat differs from the actualnull_bitmapviolates any invariants"Closes" #806, #704, #705, #706 and possibly #777 since it would be no longer possible to create invalid ArrayData without unsafe code. We shouldn't close those issues without having a safe alternative.
Rationale for this change
The above mentioned security issues all start by creating
ArrayDataobjects that violate the invariants of those arrays. Marking this creation unsafe thus satisfies the rust guidelines.What changes are included in this PR?
Are there any user-facing changes?
The
ArrayData::newmethod is removed in this PR but should be reintroduced with full validation of all parameters.