Skip to content

[C++][FlightRPC] Flight generates misaligned buffers #32276

@asfimport

Description

@asfimport

Protobuf's wire format design + our zero-copy serializer/deserializer mean that buffers can end up misaligned. On some Arrow versions, this can cause segfaults in kernels assuming alignment (and generally violates expectations).

We should:

  • Possibly include buffer alignment in array validation

  • See if we can adjust the serializer to somehow pad things properly

  • See if we can do anything about this in the deserializer

    Example:

    import pyarrow as pa
    import pyarrow.flight as flight
    
    class TestServer(flight.FlightServerBase):
        def do_get(self, context, ticket):
            schema = pa.schema(
                [
                    ("index", pa.int64()),
                    ("int8", pa.float64()),
                    ("int16", pa.float64()),
                    ("int32", pa.float64()),
                ]
            )
            return flight.RecordBatchStream(pa.table([
                [0, 1, 2, 3],
                [0, 1, None, 3],
                [0, 1, 2, None],
                [0, None, 2, 3],
            ], schema=schema))
    
    
    with TestServer() as server:
        client = flight.connect(f"grpc://localhost:{server.port}")
        table = client.do_get(flight.Ticket(b"")).read_all()
        for col in table:
            print(col.type)
            for chunk in col.chunks:
                for buf in chunk.buffers():
                    if not buf: continue
                    print("buffer is 8-byte aligned?", buf.address % 8)
                chunk.cast(pa.float32())

    On Arrow 8

    
    int64
    buffer is 8-byte aligned? 1
    double
    buffer is 8-byte aligned? 1
    buffer is 8-byte aligned? 1
    double
    buffer is 8-byte aligned? 1
    buffer is 8-byte aligned? 1
    double
    buffer is 8-byte aligned? 1
    buffer is 8-byte aligned? 1
    

    On Arrow 7

    
    int64
    buffer is 8-byte aligned? 4
    double
    buffer is 8-byte aligned? 4
    buffer is 8-byte aligned? 4
    fish: Job 1, 'python ../test.py' terminated by signal SIGSEGV (Address boundary error)
    

Reporter: David Li / @lidavidm

Note: This issue was originally created as ARROW-16958. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions