ARROW-18152: [Python] DataFrame Interchange Protocol for pyarrow Table#14613
ARROW-18152: [Python] DataFrame Interchange Protocol for pyarrow Table#14613AlenkaF wants to merge 29 commits intoapache:masterfrom
Conversation
|
|
python/pyarrow/interchange/column.py
Outdated
There was a problem hiding this comment.
Example: selecting 5 chunks for an array of length 12. If chunk_size=2, we get 6 chunks, if chunk_size=3, we get 4 chunks =) So we end up producing 4 chunks with chunk_size=3 plus an empty chunk.
|
FWIW we have a compliance suite for interchange protocol adopters over at data-apis/dataframe-interchange-tests. It's a bit awkward to use as you'd have to write a compatibility layer in |
|
Thanks for the info @honno, the compliance suite is definitely something we have to use! |
…le.__dataframe__ and do some minor corrections
…t, float with missing values
…d necessary defenitions to separate implementation files
…andas timestamp in the tests
|
Will close this PR as I moved the work into another branch: #14804 |
Produce a
__dataframe__objectDataFrame,ColumnandBuffersclasspa.Table->pd.DataFrameWhat should be looked into after the initial test:
Update: Columns without missing values are defined as non-nullable for now.
Update: casting boolean column/array to
uint8solves this issue (boolean arrays are bit packed which is not supported by the protocol)Update: Bit-width for the offset buffer dtype must be set to 32 instead of 64.
Update: Pandas implementation seems to expect the column of categories to be an instance of PandasColumn instead of general
__dataframe__column object.Update: Pandas implementation doesn't yet support bitmasks:
This code in the PR tested with pandas implementation as a consumer currently works with integers, floats, booleans, strings and timestamps without missing values:
Consume a
__dataframe__objectfrom_dataframemethod