Skip to content

Conversation

@phofl
Copy link
Member

@phofl phofl commented Nov 19, 2023

xref #56019

@phofl phofl added Warnings Warnings that appear or should be added to pandas Copy / view semantics labels Nov 19, 2023
Comment on lines 7780 to 7781
if isinstance(self, ABCSeries) and hasattr(self, "_cacher"):
ref_count += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem might be that in the warning mode, this is not the case? (so that might need to add a not warn_copy_on_write() to this if block updating the ref_count

Although the tests are not failing ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your comment, why would this not be the case?

I double checked and it seems to work in warning mode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #55838 (comment), I thought I had disabled the item cache for the warning mode, which I would think to affect the count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked this locally, the cache is still populated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will take a closer look

Copy link
Member

@jorisvandenbossche jorisvandenbossche Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, the _cacher points to the DataFrame (in a case like s = df["col"]), so it doesn't increase the ref count for s (which is what is relevant for chained setitem detection). And also it uses a weakref, so shouldn't actually increase the ref count?

The hasattr(self, "_cacher") is kind of a check whether the Series is derived from a DataFrame?
So I think that the reason we need to increase the ref count is still because of _item_cache, not actually _cacher. The presence of _cacher just turns out to be an equivalent check for checking that we are a Series in non-CoW mode (only in that mode _item_cache would be populated). So maybe a more "correct" check (although they will give the same result) would be:

Suggested change
if isinstance(self, ABCSeries) and hasattr(self, "_cacher"):
ref_count += 1
if isinstance(self, ABCSeries) and not (using_copy_on_write() or warn_copy_on_write()):
ref_count += 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that suggestion gives a bunch of failures (false positives) in pandas/tests/series/methods/test_replace.py

Copy link
Member

@jorisvandenbossche jorisvandenbossche Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is logical, because that's the whole point with _item_cache: you don't know when it is populated or when not in the non-CoW case. So you can't use a single fixed REF_COUNT value to check, this actually depends on the circumstances.

And so what you have (checking _cacher) is a way to check if the Series object is derived from a DataFrame with simple indexing (and has populated _item_cache). So that's probably fine? Or are there ways to get a Series that does not populate the cacher?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are there ways to get a Series that does not populate the cacher?

Getting a single column with ["col"], .loc[:, "col"], .iloc[:, 0], .get("col"), .xs("col", axis=1) all populate the cacher.

In theory you can get a Series as a result of a calculation that doesn't do this (df.mean().replace(..), but of course that never could have done something useful so we don't need to care about that. We only need to care about getting a Series that is view.

So to summarize: your fix is probably fully correct, I only didn't understand why ;) You might want to add a comment about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes your conclusion is correct as far as I can tell. That was the reason why I added the hasattr check. Sorry for omitting this information.

Added a comment

tm.assert_frame_equal(df, df_orig)
else:
with tm.assert_produces_warning(FutureWarning, match="inplace method"):
with option_context("mode.chained_assignment", None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it otherwise also raise a SettingWithCopyWarning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, [["a"]] currently copies

elif not PYPY and not using_copy_on_write():
ctr = sys.getrefcount(self)
ref_count = REF_COUNT
if isinstance(self, ABCSeries) and hasattr(self, "_cacher"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(self, ABCSeries) and hasattr(self, "_cacher"):
# in non-CoW mode, chained Series access will populate the `_item_cache` which results in an increased ref count not below the threshold, while we still need to warn. We detect this case of a Series derived from a DataFrame through the presence of `_cacher`
if isinstance(self, ABCSeries) and hasattr(self, "_cacher"):

(maybe a bit long to put in every place that has this check (after the other PRs), but a comment like this would have explained it to me)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with adding this here. Maybe adding a link to this comment for the other prs?

@phofl
Copy link
Member Author

phofl commented Nov 20, 2023

Are you ok with merging this so that I can adjust the others?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 20, 2023 via email

@phofl phofl added this to the 2.2 milestone Nov 20, 2023
@phofl phofl merged commit 5625236 into pandas-dev:main Nov 20, 2023
@phofl phofl deleted the warn_cow_mode_replace branch November 20, 2023 22:23
phofl added a commit to phofl/pandas that referenced this pull request Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Copy / view semantics Warnings Warnings that appear or should be added to pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants