Skip to content

BUG: interchange categorical_column_to_series() should not accept only PandasColumn#52763

Merged
mroeschke merged 9 commits intopandas-dev:mainfrom
MarcoGorelli:categorical-xch
Apr 19, 2023
Merged

BUG: interchange categorical_column_to_series() should not accept only PandasColumn#52763
mroeschke merged 9 commits intopandas-dev:mainfrom
MarcoGorelli:categorical-xch

Conversation

@MarcoGorelli
Copy link
Copy Markdown
Member

@MarcoGorelli MarcoGorelli commented Apr 18, 2023

cc @AlenkaF


I've added this to the 2.0.1 whatsnew, and it's a trivial 1-line change (which was only there to begin with to help type checkers, it doesn't have any runtime purpose) and it unlocks a fair bit

@MarcoGorelli MarcoGorelli added the Interchange Dataframe Interchange Protocol label Apr 18, 2023
Comment thread pandas/tests/interchange/test_impl.py Outdated
tm.assert_frame_equal(df, from_dataframe(df.__dataframe__()))


@td.skip_if_no("pyarrow")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use pytest.importorskip instead?

@mroeschke mroeschke added this to the 2.0.1 milestone Apr 18, 2023
@AlenkaF
Copy link
Copy Markdown

AlenkaF commented Apr 19, 2023

Thank you @MarcoGorelli! LGTM +1

Comment thread pandas/core/interchange/from_dataframe.py
@mroeschke
Copy link
Copy Markdown
Member

Just a merge conflict

@mroeschke mroeschke merged commit 2750ee2 into pandas-dev:main Apr 19, 2023
@mroeschke
Copy link
Copy Markdown
Member

Thanks @MarcoGorelli

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Apr 19, 2023
…series() should not accept only PandasColumn
mroeschke pushed a commit that referenced this pull request Apr 20, 2023
…mn_to_series() should not accept only PandasColumn) (#52793)

Backport PR #52763: BUG: interchange categorical_column_to_series() should not accept only PandasColumn

Co-authored-by: Marco Edward Gorelli <[email protected]>
# for mypy/pyright
assert isinstance(cat_column, PandasColumn), "categories must be a PandasColumn"
categories = np.array(cat_column._col)
if hasattr(cat_column, "_col"):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for using this attribute? (that also seems like relying on and exposing some internal detail)
Haven't looked in detail, but my understanding is that this object is just a protocol Column object, and we already have all the code to convert that to a pandas object?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know it was already there, but IMO the "proper" fix for the reported issue is to stop relying on this detail.
Because the the test you added only "happens" to work because pyarrow's Column object also uses _col to store the original array; if we would rename that internal property (and I would say it's actually a confusing name in hindsight, since it's an array ;)), the test would start failing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, agreed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Interchange Dataframe Interchange Protocol

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: interchange categorical_column_to_series() should not accept only PandasColumn

5 participants