PR: Add allow_copy flag to interchange protocol#44
PR: Add allow_copy flag to interchange protocol#44steff456 wants to merge 3 commits intodf-interchange-protocol-stagingfrom
allow_copy flag to interchange protocol#44Conversation
|
To summarize the options for naming this keyword that were suggested yesterday:
I like the positive Regarding the default value (now
I believe that there will be enough cases that will always require a copy (object-strings, columns backed by non-contiguous arrays, non-columnar data storage like for Koalas and Ibis) that using |
allow_copy flag to interchange protocol
Adding the following to the requirements docs: |
|
|
||
| ``allow_copy`` is a keyword that defines if the given implementation | ||
| is going to support striding buffers. It is optional, and the libraries | ||
| do not need to implement it. |
There was a problem hiding this comment.
Rephrasing this a bit:
``allow_copy`` is a keyword that defines whether or not the library is
allowed to make a copy of the data. This can for example happen if a
library supports strided buffers (those would need a copy, because this
protocol specifies contiguous buffers).
|
|
||
| ``allow_copy`` is a keyword that defines if the given implementation | ||
| is going to support striding buffers. It is optional, and the libraries | ||
| do not need to implement it. Currently, if the flag is set to ``True`` it |
| # complexity for libraries that don't support it (e.g. Arrow), | ||
| # but would help with numpy-based libraries like Pandas. | ||
| raise RuntimeError("Design needs fixing - non-contiguous buffer") | ||
| if not allow_copy: |
There was a problem hiding this comment.
This removes the actual check for striding (not x.strides == (x.dtype.itemsize,):), that needs to be put back.
There was a problem hiding this comment.
Logic:
if not x.strides == (x.dtype.itemsize,):
# The protocol does not support strided buffers, so a copy is
# necessary. If that's not allowed, we need to raise an exception.
if allow_copy:
x = x.copy()
else:
raise RuntimeError("Exports cannot be zero-copy in the case "
"of a non-contiguous buffer")There was a problem hiding this comment.
And then adding a test case shows there's a problem - we don't hold on to the memory.
This PR adds a flag to throw an exception if the export cannot be zero-copy. (e.g. for pandas, possible due to block manager where rows are contiguous and columns are not) .
allow_zero_copyflag to the DataFrame class.RuntimeErrorwhen it is truetest_noncontiguous_columns