Skip to content

Theoretical integer overflow in StringArrayBuilder / LargeStringArrayBuilder #13796

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Similarly to #13759, we found another potential code vulnerability from a security audit performed by InfluxData

The public method with_capacity of StringArrayBuilder accepts a parameter item_capacity: usize. It is used to create a MutableBuffer with a capacity calculated as (item_capacity + 1)*size_of::<i32>(), which can overflow undetected, leaving the buffer
too small. The subsequent unsafe call to push a value into the buffer can lead to an out-of-bounds memory access.

In datafusion/functions/src/strings.rs, lines 125ff:

impl StringArrayBuilder {
    pub fn with_capacity(item_capacity: usize, data_capacity: usize) -> Self {
        let mut offsets_buffer = MutableBuffer::with_capacity((item_capacity + 1) * size_of::<i32>());
        // SAFETY: the first offset value is definitely not going to exceed the bounds.
        unsafe { offsets_buffer.push_unchecked(0_i32) };
        Self {
            offsets_buffer,
            value_buffer:
            MutableBuffer::with_capacity(data_capacity),
        }
    }

I analyzed the potential risk, and I agree there is a risk of memory unsafety but I do not think it is exploitable via DataFusion APIs. Specifically, the only callsites are:

let mut builder = StringArrayBuilder::with_capacity(len, data_size);

The argument is taken from

// Array
let len = array_len.unwrap();

So to trigger this code you would have to provide an input record batch with more thanu32::MAX rows

Reproducer showing segfault:

Here is a test case I wrote:

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ git diff
diff --git a/datafusion/functions/src/strings.rs b/datafusion/functions/src/strings.rs
index d2fb5d585..8e7c05fa1 100644
--- a/datafusion/functions/src/strings.rs
+++ b/datafusion/functions/src/strings.rs
@@ -422,3 +422,13 @@ impl ColumnarValueRef<'_> {
         }
     }
 }
+
+
+#[cfg(test)]
+mod test {
+    use super::*;
+    #[test]
+    fn test_overflow() {
+        let builder = StringArrayBuilder::with_capacity(usize::MAX, usize::MAX);
+    }
+}

It must be run in release mode

cargo test --release  -p datafusion-functions --lib -- overflow
error: test failed, to rerun pass `-p datafusion-functions --lib`

Caused by:
  process didn't exit successfully: `/Users/andrewlamb/Software/datafusion/target/release/deps/datafusion_functions-e3de98ad3d4dc27c overflow` (signal: 11, SIGSEGV: invalid memory reference)
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$

Describe the solution you'd like

Recommendations: Check for integer overflow when calculating the capacity.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions