Validate arguments to ArrayData::try_new()

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
This ticket lists a high level plan to address one of the main sources of security issues in arrow-rs
such as https://github.com/apache/arrow-rs/issues/772 and likely several others on https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity

As demonstrated in https://github.com/jorgecarleitao/arrow2#why, and almost all of the examples in https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity, creating `ArrayData::new` with invalid arguments can lead to undefined behavior.

See also the discussion with @jhorstmann  and others on https://lists.apache.org/thread.html/r3f12f3352ca36264622d4103fcb6c7c71544dcaf0f0a7e842f00c3a0%40%3Cdev.arrow.apache.org%3E

**Describe the solution you'd like**
I propose to follow the C++ implementation (kudos to @pitrou) in https://github.com/apache/arrow/blob/b73af9a1607caa4a04e1a11896aed6669847a4d4/cpp/src/arrow/array/validate.cc#L388-L392

Add two new functions:
1. `ArrayData::validate()` -- checks offsets / buffer sizes, relatively inexpensive
2. `ArrayData::validate_full` --  which calls`validate()` **AND** checks all variable length data structures for consistency (e.g. ensures the offsets of a StringArray are within the size of the base array

Then, change `ArrayData` to have two constructors:
1. `unsafe ArrayData::new_unchecked()` - Behaves like `ArrayData::new()` does today -- namely has no validation 
2. `ArrayData::try_new()` will be safe in the Rust sense -- can not cause undefined behavior and thus will call `ArrayData::validate_full`

This design will follow the Rust philosophy of "safe by default" but offer an alternative (`unsafe`) mechanism to bypass checking for known good inputs. This `unsafe` mechanism has been prototyped by @jhorstmann in  https://github.com/apache/arrow-rs/pull/813

**Describe alternatives you've considered**
Could wait for `arrow2` convergence, if that happens, but since the timeline on that ETA is still unknown, safety for the `arrow-rs` implementation seems to justify spending time here

** Progress **
- [x] Change ArrayData::API - https://github.com/apache/arrow-rs/pull/822
- [x] Add ArrayData::validate() for cheap bounds checks - https://github.com/apache/arrow-rs/pull/810
- [x] Add ArrayData::validate_full() for content checks (valid UTF8, valid offsets, etc) - https://github.com/apache/arrow-rs/pull/921

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate arguments to ArrayData::try_new() #817

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validate arguments to ArrayData::try_new() #817

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions