Reset the interactive namespace warningregistry before executing code #6680

njsmith · 2014-10-11T03:13:27Z

Idea:

Right now, people often don't see important warnings when running
code in IPython, because (to a first approximation) any given warning
will only issue once per session. Blink and you'll miss it! This is a
very common contributor to confused emails to numpy-discussion. E.g.:

In [5]: 1 / my_array_with_random_contents
/home/njs/.user-python2.7-64bit-3/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide
  #!/home/njs/.user-python2.7-64bit-3/bin/python
Out[5]:
array([ 1.77073316, -2.29765021, -2.01800811, ...,  1.13871243,
       -1.08302964, -8.6185091 ])

Oo, right, guess I gotta be careful of those zeros -- thanks, numpy,
for giving me that warning!

A few days later:

In [592]: 1 / some_other_array
Out[592]:
array([ 3.07735763,  0.50769289,  0.83984078, ..., -0.67563917,
       -0.85736257, -1.36511271])

Oops, it turns out that this array had a zero in it too, and that's
going to bite me later. But no warning this time!

The effect of this commit is to make it so that warnings triggered by
the code in cell 5 do not suppress warnings triggered by the code in
cell 592. Note that this only applies to warnings triggered directly
by code entered interactively -- if somepkg.foo() calls
anotherpkg.bad_func() which issues a warning, then this warning will
still only be displayed once, even if multiple cells call
somepkg.foo(). But if cell 5 and cell 592 both call
anotherpkg.bad_func() directly, then both will get warnings.

(Important exception: if foo() is defined interactively, and calls
anotherpkg.bad_func(), then every cell that calls foo() will display
the warning again. This is unavoidable without fixes to CPython
upstream.)

Explanation:

Python's warning system has some weird quirks. By default, it tries to
suppress duplicate warnings, where "duplicate" means the same warning
message triggered twice by the same line of code. This requires
determining which line of code is responsible for triggering a
warning, and this is controlled by the stacklevel= argument to
warnings.warn. Basically, though, the idea is that if foo() calls
bar() which calls baz() which calls some_deprecated_api(), then baz()
will get counted as being "responsible", and the warning system will
make a note that the usage of some_deprecated_api() inside baz() has
already been warned about and doesn't need to be warned about
again. So far so good.

To accomplish this, obviously, there has to be a record of somewhere
which line this was. You might think that this would be done by
recording the filename:linenumber pair in a dict inside the warnings
module, or something like that. You would be wrong.

What actually happens is that the warnings module will use stack
introspection to reach into baz()'s execution environment, create a
global (module-level) variable there named __warningregistry__, and
then, inside this dictionary, record just the line number. Basically,
it assumes that any given module contains only one line 1, only one
line 2, etc., so storing the filename is irrelevant. Obviously for
interactive code this is totally wrong -- all cells share the same
execution environment and global namespace, and they all contain a new
line 1. Currently the warnings module treats these as if they were all
the same line.

In fact they are not the same line; once we have executed a given
chunk of code, we will never see those particular lines again. As soon
as a given chunk of code finishes executing, its line number labels
become meaningless, and the corresponding warning registry entries
become meaningless as well. Therefore, with this patch we delete the
__warningregistry__ each time we execute a new block of code.

damianavila · 2014-10-11T11:42:18Z

Despite the fact that the fix is simple, I love to see this kind of explanation in the PR first message.
Thanks for spending time to do it in this way.

👍 for me.

njsmith · 2014-10-11T18:53:20Z

@damianavila: Eh, it saves time in the end ;-)

minrk · 2014-10-11T19:26:28Z

After the discussion in #6611, this makes sense to me. I'd like to hear from @takluyver on the implementation.

takluyver · 2014-10-11T22:47:16Z

Let's not push it into run_code - that's the very innermost core of IPython. Is it practical to do it using a post_execute callback?

damianavila · 2014-10-12T09:49:18Z

Is it practical to do it using a post_execute callback?

I don't see a problem... probably safer than the current one.
Let's see what @njsmith think about that...

minrk · 2014-10-12T19:05:11Z

Is it practical to do it using a post_execute callback?

That should be fine. Just register it as a hook during initialization.

njsmith · 2014-10-13T01:35:19Z

There are three separate issues here: (a) to use a hook or just to hard-code it in the file, (b) whether to clear the registry before or after the code runs, (c) whether to clear the registry in run_cell (*_execute hooks) or run_code (some sort of legacy hook system?).

Re: (a): I don't see much point in decoupling for the sake of decoupling -- there's a tradeoff between the benefits of flexibility and the costs of reduce locality making it harder to read and understand the code. In this case, the __warningregistry__ thing isn't something that needs to be configured -- it should happen unconditionally every time we execute some code, so I just put it in the code. This might seem less flexible, but IME it's easier to edit a hardcoded bit source than to disable a hook -- it takes less grepping to find the source itself than to find where a hook is registered :-). But, you know, this is basically a taste issue, and it's your codebase, so!

Re: (b): IMO it's better to clear the __warningregistry__ before executing code rather than after. If our clearing code gets called every single time we execute code, then it doesn't matter whether we do the clearing pre_ or post_ execution. But, it's also possible that some code will be run without our knowledge and pollute the __warningregistry__ with spurious entries. (This is esp. true if we move the logic into run_cell. I don't know under what circumstances people use run_cell vs run_ast_nodes vs run_code, but anyone using the latter two will skip the logic in run_cell.) So if we use a hook we should use a pre_* hook, etc.

Re: (c): My reasoning for pushing the registry clearing down into run_code instead of just run_cell was that every call to exec creates a new line number scope and invalidates the entries in __warningregistry__, so we should clear the registry every time we call exec. And run_code is where we call exec. Pushing it further out creates more situations where code could be run with spurious entries in the __warningregistry__.

So, that's why I wrote the patch the way I did :-). If on reflection you'd prefer something else let me know.

takluyver · 2014-10-13T17:10:27Z

Thanks for explaining your reasoning:

I would still like to do this with a hook, just for the sake of decoupling. Clearing the warnings registry is not fundamental to running code, and that part of the codebase is already more than complex enough without adding extra concerns to it.
In terms of when people use the different methods: I promote only run_cell as something that third parties should use. There might be third parties using run_ast_nodes or run_code, but I would consider those corner cases. All our own uses, as far as I recall, go through run_cell.
Pre/post: Whichever you prefer. I don't think we've ever heard about code polluting __warningregistry__, and it seems hard to do accidentally, so I'm not especially concerned about that.

njsmith · 2014-10-21T19:01:16Z

So I made an attempt to get this working as a hook (see the current version of this PR), and noticed two issues:

AFAICT the only place to define and register the hook is still directly inside InteractiveShell code, and the hook code itself is a method on InteractiveShell. So the coupling is not very reduced.
The tests are failing because there's a test which calls ip.events.reset_all(), which of course deletes my hook before my test gets a chance to run. IMO if I have a pristine InteractiveShell, then calling ip.events.reset_all() should be a no-op. But there's really no way to make this work with a default hook, unless we start poking at the definition of EventManager itself which ugh.

So..... how do you want to do this?

takluyver · 2014-10-21T22:51:12Z

Registering hooks doesn't have to be done inside interactiveshell code - anything that gets a reference to the shell object (like extensions) can use shell.events.register(). In this case, since you want it on all shell objects, doing it inside interactiveshell makes sense.

Maybe the reset methods should be renamed something like clear (events is a new API, so I think we can get away with this before release). The test shouldn't really be doing reset_all(), whatever the name is - it should be taking out only the callbacks it installed. Using reset_all() there was just laziness on my part.

Going one step further, maybe reset() and reset_all() shouldn't even exist. The design of using events is precisely that you shouldn't think about what other callbacks might be triggered by an event, so killing all the callbacks on an event is generally the wrong thing to do. It's especially bad when most events aren't handled by default, so it's easy to fall into using it just to clear a callback you know about. That should probably be a separate PR, though.

minrk · 2014-10-30T21:27:03Z

@takluyver it sounds like cleaning up event reset should be a separate PR. What should we do here so this can be merged, and the other stuff investigated separately?

The whole idea of the EventManager is that you can register hooks without worrying about what hooks other pieces of code might be registering. The reset methods violate this separation of concerns, since they will blow away everyone else's hooks too. (See ipythongh-6680 for an example of this breaking things.) Since there is never any safe way to use them, we simply remove them entirely.

…code Fixes ipython#6611. Idea: Right now, people often don't see important warnings when running code in IPython, because (to a first approximation) any given warning will only issue once per session. Blink and you'll miss it! This is a very common contributor to confused emails to numpy-discussion. E.g.: In [5]: 1 / my_array_with_random_contents /home/njs/.user-python2.7-64bit-3/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide #!/home/njs/.user-python2.7-64bit-3/bin/python Out[5]: array([ 1.77073316, -2.29765021, -2.01800811, ..., 1.13871243, -1.08302964, -8.6185091 ]) Oo, right, guess I gotta be careful of those zeros -- thanks, numpy, for giving me that warning! A few days later: In [592]: 1 / some_other_array Out[592]: array([ 3.07735763, 0.50769289, 0.83984078, ..., -0.67563917, -0.85736257, -1.36511271]) Oops, it turns out that this array had a zero in it too, and that's going to bite me later. But no warning this time! The effect of this commit is to make it so that warnings triggered by the code in cell 5 do *not* suppress warnings triggered by the code in cell 592. Note that this only applies to warnings triggered *directly* by code entered interactively -- if somepkg.foo() calls anotherpkg.bad_func() which issues a warning, then this warning will still only be displayed once, even if multiple cells call somepkg.foo(). But if cell 5 and cell 592 both call anotherpkg.bad_func() directly, then both will get warnings. (Important exception: if foo() is defined *interactively*, and calls anotherpkg.bad_func(), then every cell that calls foo() will display the warning again. This is unavoidable without fixes to CPython upstream.) Explanation: Python's warning system has some weird quirks. By default, it tries to suppress duplicate warnings, where "duplicate" means the same warning message triggered twice by the same line of code. This requires determining which line of code is responsible for triggering a warning, and this is controlled by the stacklevel= argument to warnings.warn. Basically, though, the idea is that if foo() calls bar() which calls baz() which calls some_deprecated_api(), then baz() will get counted as being "responsible", and the warning system will make a note that the usage of some_deprecated_api() inside baz() has already been warned about and doesn't need to be warned about again. So far so good. To accomplish this, obviously, there has to be a record of somewhere which line this was. You might think that this would be done by recording the filename:linenumber pair in a dict inside the warnings module, or something like that. You would be wrong. What actually happens is that the warnings module will use stack introspection to reach into baz()'s execution environment, create a global (module-level) variable there named __warningregistry__, and then, inside this dictionary, record just the line number. Basically, it assumes that any given module contains only one line 1, only one line 2, etc., so storing the filename is irrelevant. Obviously for interactive code this is totally wrong -- all cells share the same execution environment and global namespace, and they all contain a new line 1. Currently the warnings module treats these as if they were all the same line. In fact they are not the same line; once we have executed a given chunk of code, we will never see those particular lines again. As soon as a given chunk of code finishes executing, its line number labels become meaningless, and the corresponding warning registry entries become meaningless as well. Therefore, with this patch we delete the __warningregistry__ each time we execute a new block of code.

njsmith · 2014-10-31T02:19:05Z

As a start, I went ahead and made a commit that just removes the reset APIs. This PR currently contains that commit plus the __warningregistry__ commit (click the "commits" tab if you want to review them separately). We could make them into two PRs I guess if that would make everyone happier, though if they look good then it might be simpler just to merge them both together here.

minrk · 2014-10-31T16:45:38Z

I'm fine with this as-is. @takluyver?

takluyver · 2014-10-31T16:57:34Z

Yep. Thanks for your patience, @njsmith .

Reset the interactive namespace __warningregistry__ before executing code

The whole idea of the EventManager is that you can register hooks without worrying about what hooks other pieces of code might be registering. The reset methods violate this separation of concerns, since they will blow away everyone else's hooks too. (See ipythongh-6680 for an example of this breaking things.) Since there is never any safe way to use them, we simply remove them entirely.

Reset the interactive namespace __warningregistry__ before executing code

minrk added this to the 3.0 milestone Oct 11, 2014

jdfreder added the -Needs review label Oct 14, 2014

takluyver added -Waiting for author and removed -Needs review labels Oct 14, 2014

minrk force-pushed the master branch from 7cc431b to 5416279 Compare October 16, 2014 17:11

njsmith force-pushed the reset-warning-registry branch from e09673f to 48f4d74 Compare October 21, 2014 17:48

njsmith added 2 commits October 31, 2014 01:54

njsmith force-pushed the reset-warning-registry branch from 48f4d74 to 61431d7 Compare October 31, 2014 02:17

minrk added -Needs review and removed -Waiting for author labels Oct 31, 2014

takluyver added a commit that referenced this pull request Oct 31, 2014

Merge pull request #6680 from njsmith/reset-warning-registry

f3be3b1

Reset the interactive namespace __warningregistry__ before executing code

takluyver merged commit f3be3b1 into ipython:master Oct 31, 2014

ivanov removed the -Needs review label Oct 31, 2014

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

Merge pull request ipython#6680 from njsmith/reset-warning-registry

5bc76be

Reset the interactive namespace __warningregistry__ before executing code

Carreau mentioned this pull request May 27, 2015

register default deprecation warning filter. #8480

Merged

Carreau mentioned this pull request Jun 28, 2018

Python 3: warnings displayed multiple times despite default 'once' filter #11207

Open

gsakkis mentioned this pull request Sep 11, 2021

Print warning for ipykernel < 6 when using scope_ctx or default_ctx TileDB-Inc/TileDB-Py#665

Merged

Uh oh!

Reset the interactive namespace __warningregistry__ before executing code #6680

Reset the interactive namespace __warningregistry__ before executing code #6680

Uh oh!

Conversation

njsmith commented Oct 11, 2014

Uh oh!

damianavila commented Oct 11, 2014

Uh oh!

njsmith commented Oct 11, 2014

Uh oh!

minrk commented Oct 11, 2014

Uh oh!

takluyver commented Oct 11, 2014

Uh oh!

damianavila commented Oct 12, 2014

Uh oh!

minrk commented Oct 12, 2014

Uh oh!

njsmith commented Oct 13, 2014

Uh oh!

takluyver commented Oct 13, 2014

Uh oh!

njsmith commented Oct 21, 2014

Uh oh!

takluyver commented Oct 21, 2014

Uh oh!

minrk commented Oct 30, 2014

Uh oh!

njsmith commented Oct 31, 2014

Uh oh!

minrk commented Oct 31, 2014

Uh oh!

takluyver commented Oct 31, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Reset the interactive namespace warningregistry before executing code #6680

Reset the interactive namespace warningregistry before executing code #6680