Skip to content

Conversation

@phofl
Copy link
Member

@phofl phofl commented Feb 16, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @jorisvandenbossche

Follow-up on #49450

The equals check causes failures in Dask. I think we are better off moving to reference checking anyway. If you are ok with the general logic here, I'd refactor this check down to the Manager level.

@rhshadrach
Copy link
Member

This would make df.groupy(df['a']) behave differently from df.groupby(df['a'].copy()), is that right? In general I'd like to move away from "in-axis" vs "out-axis" groupers behaving differently (ref: #49519), so I agree this is the right direction, but it might cause some odd edge cases to behave differently until we resolve those issues.

Still - I'm +0 here; checking equality gives some odd behaviors as well.

@phofl
Copy link
Member Author

phofl commented Feb 16, 2023

Yes those 2 behave different, but this pr establishes the same behavior as in the non-cow mode for this case. So that's actually a plus imo

return False
if isinstance(gpr, Series) and isinstance(obj_gpr_column, Series):
ref = weakref.ref(obj_gpr_column._mgr.blocks[0])
return ref in gpr._mgr.blocks[0].refs.referenced_blocks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you create multiple times a weakref to the same object, this is always the same (i.e. identical) weakref object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's how it is documented (https://docs.python.org/3/library/weakref.html#weakref.ref). Did some basic checks with out blocks and it works as I would have expected

@jorisvandenbossche
Copy link
Member

Yes, I think this is a good solution for now (essentially the same as what was suggested in #50730). And it indeed aligns with the behaviour we have for non-CoW.

Longer term, I agree with @rhshadrach that the current semantics are not really great. We should probably at some point move to not doing such check at all, at which point this discussion about identity vs equality also just doesn't matter anymore?

@phofl
Copy link
Member Author

phofl commented Feb 17, 2023

Yep this sounds sensible, but I think this needs a deprecation cycle?

I’ll move the check to the BlockManager then

@phofl
Copy link
Member Author

phofl commented Feb 17, 2023

Implemented it on the manager level

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small comments to clean-up

if not hasattr(gpr, "name"):
return False
if using_copy_on_write():
# For the CoW case, we need an equality check as the identity check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try:
return gpr.equals(obj[gpr.name])
obj_gpr_column = obj[gpr.name]
except (AttributeError, KeyError, IndexError, InvalidIndexError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the AttributeError now, I think (.name is already checked above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, thx

@jorisvandenbossche jorisvandenbossche merged commit 8275e88 into pandas-dev:main Feb 20, 2023
@jorisvandenbossche
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants