-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Protobuf's wire format design + our zero-copy serializer/deserializer mean that buffers can end up misaligned. On some Arrow versions, this can cause segfaults in kernels assuming alignment (and generally violates expectations).
We should:
-
Possibly include buffer alignment in array validation
-
See if we can adjust the serializer to somehow pad things properly
-
See if we can do anything about this in the deserializer
Example:
import pyarrow as pa import pyarrow.flight as flight class TestServer(flight.FlightServerBase): def do_get(self, context, ticket): schema = pa.schema( [ ("index", pa.int64()), ("int8", pa.float64()), ("int16", pa.float64()), ("int32", pa.float64()), ] ) return flight.RecordBatchStream(pa.table([ [0, 1, 2, 3], [0, 1, None, 3], [0, 1, 2, None], [0, None, 2, 3], ], schema=schema)) with TestServer() as server: client = flight.connect(f"grpc://localhost:{server.port}") table = client.do_get(flight.Ticket(b"")).read_all() for col in table: print(col.type) for chunk in col.chunks: for buf in chunk.buffers(): if not buf: continue print("buffer is 8-byte aligned?", buf.address % 8) chunk.cast(pa.float32())
On Arrow 8
int64 buffer is 8-byte aligned? 1 double buffer is 8-byte aligned? 1 buffer is 8-byte aligned? 1 double buffer is 8-byte aligned? 1 buffer is 8-byte aligned? 1 double buffer is 8-byte aligned? 1 buffer is 8-byte aligned? 1On Arrow 7
int64 buffer is 8-byte aligned? 4 double buffer is 8-byte aligned? 4 buffer is 8-byte aligned? 4 fish: Job 1, 'python ../test.py' terminated by signal SIGSEGV (Address boundary error)
Reporter: David Li / @lidavidm
Note: This issue was originally created as ARROW-16958. Please see the migration documentation for further details.
thomasaarholtalexander-beedie