-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This ticket lists a high level plan to address one of the main sources of security issues in arrow-rs
such as #772 and likely several others on https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity
As demonstrated in https://github.com/jorgecarleitao/arrow2#why, and almost all of the examples in https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity, creating ArrayData::new with invalid arguments can lead to undefined behavior.
See also the discussion with @jhorstmann and others on https://lists.apache.org/thread.html/r3f12f3352ca36264622d4103fcb6c7c71544dcaf0f0a7e842f00c3a0%40%3Cdev.arrow.apache.org%3E
Describe the solution you'd like
I propose to follow the C++ implementation (kudos to @pitrou) in https://github.com/apache/arrow/blob/b73af9a1607caa4a04e1a11896aed6669847a4d4/cpp/src/arrow/array/validate.cc#L388-L392
Add two new functions:
ArrayData::validate()-- checks offsets / buffer sizes, relatively inexpensiveArrayData::validate_full-- which callsvalidate()AND checks all variable length data structures for consistency (e.g. ensures the offsets of a StringArray are within the size of the base array
Then, change ArrayData to have two constructors:
unsafe ArrayData::new_unchecked()- Behaves likeArrayData::new()does today -- namely has no validationArrayData::try_new()will be safe in the Rust sense -- can not cause undefined behavior and thus will callArrayData::validate_full
This design will follow the Rust philosophy of "safe by default" but offer an alternative (unsafe) mechanism to bypass checking for known good inputs. This unsafe mechanism has been prototyped by @jhorstmann in #813
Describe alternatives you've considered
Could wait for arrow2 convergence, if that happens, but since the timeline on that ETA is still unknown, safety for the arrow-rs implementation seems to justify spending time here
** Progress **
- Change ArrayData::API - Replace
ArrayData::new()withArrayData::try_new()andunsafe ArrayData::new_unchecked#822 - Add ArrayData::validate() for cheap bounds checks - Validate arguments to ArrayData::new and null bit buffer and buffers #810
- Add ArrayData::validate_full() for content checks (valid UTF8, valid offsets, etc) - Add full data validation for ArrayData::try_new() #921