Skip to content

Conversation

@njsmith
Copy link
Contributor

@njsmith njsmith commented Oct 11, 2014

Fixes #6611.

Idea:

Right now, people often don't see important warnings when running
code in IPython, because (to a first approximation) any given warning
will only issue once per session. Blink and you'll miss it! This is a
very common contributor to confused emails to numpy-discussion. E.g.:

In [5]: 1 / my_array_with_random_contents
/home/njs/.user-python2.7-64bit-3/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide
  #!/home/njs/.user-python2.7-64bit-3/bin/python
Out[5]:
array([ 1.77073316, -2.29765021, -2.01800811, ...,  1.13871243,
       -1.08302964, -8.6185091 ])

Oo, right, guess I gotta be careful of those zeros -- thanks, numpy,
for giving me that warning!

A few days later:

In [592]: 1 / some_other_array
Out[592]:
array([ 3.07735763,  0.50769289,  0.83984078, ..., -0.67563917,
       -0.85736257, -1.36511271])

Oops, it turns out that this array had a zero in it too, and that's
going to bite me later. But no warning this time!

The effect of this commit is to make it so that warnings triggered by
the code in cell 5 do not suppress warnings triggered by the code in
cell 592. Note that this only applies to warnings triggered directly
by code entered interactively -- if somepkg.foo() calls
anotherpkg.bad_func() which issues a warning, then this warning will
still only be displayed once, even if multiple cells call
somepkg.foo(). But if cell 5 and cell 592 both call
anotherpkg.bad_func() directly, then both will get warnings.

(Important exception: if foo() is defined interactively, and calls
anotherpkg.bad_func(), then every cell that calls foo() will display
the warning again. This is unavoidable without fixes to CPython
upstream.)

Explanation:

Python's warning system has some weird quirks. By default, it tries to
suppress duplicate warnings, where "duplicate" means the same warning
message triggered twice by the same line of code. This requires
determining which line of code is responsible for triggering a
warning, and this is controlled by the stacklevel= argument to
warnings.warn. Basically, though, the idea is that if foo() calls
bar() which calls baz() which calls some_deprecated_api(), then baz()
will get counted as being "responsible", and the warning system will
make a note that the usage of some_deprecated_api() inside baz() has
already been warned about and doesn't need to be warned about
again. So far so good.

To accomplish this, obviously, there has to be a record of somewhere
which line this was. You might think that this would be done by
recording the filename:linenumber pair in a dict inside the warnings
module, or something like that. You would be wrong.

What actually happens is that the warnings module will use stack
introspection to reach into baz()'s execution environment, create a
global (module-level) variable there named __warningregistry__, and
then, inside this dictionary, record just the line number. Basically,
it assumes that any given module contains only one line 1, only one
line 2, etc., so storing the filename is irrelevant. Obviously for
interactive code this is totally wrong -- all cells share the same
execution environment and global namespace, and they all contain a new
line 1. Currently the warnings module treats these as if they were all
the same line.

In fact they are not the same line; once we have executed a given
chunk of code, we will never see those particular lines again. As soon
as a given chunk of code finishes executing, its line number labels
become meaningless, and the corresponding warning registry entries
become meaningless as well. Therefore, with this patch we delete the
__warningregistry__ each time we execute a new block of code.

@minrk minrk added this to the 3.0 milestone Oct 11, 2014
@damianavila
Copy link
Member

Despite the fact that the fix is simple, I love to see this kind of explanation in the PR first message.
Thanks for spending time to do it in this way.

👍 for me.

@njsmith
Copy link
Contributor Author

njsmith commented Oct 11, 2014

@damianavila: Eh, it saves time in the end ;-)

@minrk
Copy link
Member

minrk commented Oct 11, 2014

After the discussion in #6611, this makes sense to me. I'd like to hear from @takluyver on the implementation.

@takluyver
Copy link
Member

Let's not push it into run_code - that's the very innermost core of IPython. Is it practical to do it using a post_execute callback?

@damianavila
Copy link
Member

Is it practical to do it using a post_execute callback?

I don't see a problem... probably safer than the current one.
Let's see what @njsmith think about that...

@minrk
Copy link
Member

minrk commented Oct 12, 2014

Is it practical to do it using a post_execute callback?

That should be fine. Just register it as a hook during initialization.

@njsmith
Copy link
Contributor Author

njsmith commented Oct 13, 2014

There are three separate issues here: (a) to use a hook or just to hard-code it in the file, (b) whether to clear the registry before or after the code runs, (c) whether to clear the registry in run_cell (*_execute hooks) or run_code (some sort of legacy hook system?).

Re: (a): I don't see much point in decoupling for the sake of decoupling -- there's a tradeoff between the benefits of flexibility and the costs of reduce locality making it harder to read and understand the code. In this case, the __warningregistry__ thing isn't something that needs to be configured -- it should happen unconditionally every time we execute some code, so I just put it in the code. This might seem less flexible, but IME it's easier to edit a hardcoded bit source than to disable a hook -- it takes less grepping to find the source itself than to find where a hook is registered :-). But, you know, this is basically a taste issue, and it's your codebase, so!

Re: (b): IMO it's better to clear the __warningregistry__ before executing code rather than after. If our clearing code gets called every single time we execute code, then it doesn't matter whether we do the clearing pre_ or post_ execution. But, it's also possible that some code will be run without our knowledge and pollute the __warningregistry__ with spurious entries. (This is esp. true if we move the logic into run_cell. I don't know under what circumstances people use run_cell vs run_ast_nodes vs run_code, but anyone using the latter two will skip the logic in run_cell.) So if we use a hook we should use a pre_* hook, etc.

Re: (c): My reasoning for pushing the registry clearing down into run_code instead of just run_cell was that every call to exec creates a new line number scope and invalidates the entries in __warningregistry__, so we should clear the registry every time we call exec. And run_code is where we call exec. Pushing it further out creates more situations where code could be run with spurious entries in the __warningregistry__.

So, that's why I wrote the patch the way I did :-). If on reflection you'd prefer something else let me know.

@takluyver
Copy link
Member

Thanks for explaining your reasoning:

  • I would still like to do this with a hook, just for the sake of decoupling. Clearing the warnings registry is not fundamental to running code, and that part of the codebase is already more than complex enough without adding extra concerns to it.
  • In terms of when people use the different methods: I promote only run_cell as something that third parties should use. There might be third parties using run_ast_nodes or run_code, but I would consider those corner cases. All our own uses, as far as I recall, go through run_cell.
  • Pre/post: Whichever you prefer. I don't think we've ever heard about code polluting __warningregistry__, and it seems hard to do accidentally, so I'm not especially concerned about that.

@njsmith
Copy link
Contributor Author

njsmith commented Oct 21, 2014

So I made an attempt to get this working as a hook (see the current version of this PR), and noticed two issues:

  • AFAICT the only place to define and register the hook is still directly inside InteractiveShell code, and the hook code itself is a method on InteractiveShell. So the coupling is not very reduced.
  • The tests are failing because there's a test which calls ip.events.reset_all(), which of course deletes my hook before my test gets a chance to run. IMO if I have a pristine InteractiveShell, then calling ip.events.reset_all() should be a no-op. But there's really no way to make this work with a default hook, unless we start poking at the definition of EventManager itself which ugh.

So..... how do you want to do this?

@takluyver
Copy link
Member

Registering hooks doesn't have to be done inside interactiveshell code - anything that gets a reference to the shell object (like extensions) can use shell.events.register(). In this case, since you want it on all shell objects, doing it inside interactiveshell makes sense.

Maybe the reset methods should be renamed something like clear (events is a new API, so I think we can get away with this before release). The test shouldn't really be doing reset_all(), whatever the name is - it should be taking out only the callbacks it installed. Using reset_all() there was just laziness on my part.

Going one step further, maybe reset() and reset_all() shouldn't even exist. The design of using events is precisely that you shouldn't think about what other callbacks might be triggered by an event, so killing all the callbacks on an event is generally the wrong thing to do. It's especially bad when most events aren't handled by default, so it's easy to fall into using it just to clear a callback you know about. That should probably be a separate PR, though.

@minrk
Copy link
Member

minrk commented Oct 30, 2014

@takluyver it sounds like cleaning up event reset should be a separate PR. What should we do here so this can be merged, and the other stuff investigated separately?

The whole idea of the EventManager is that you can register hooks
without worrying about what hooks other pieces of code might be
registering. The reset methods violate this separation of concerns,
since they will blow away everyone else's hooks too. (See ipythongh-6680 for
an example of this breaking things.) Since there is never any safe way
to use them, we simply remove them entirely.
…code

Fixes ipython#6611.

Idea:

Right now, people often don't see important warnings when running
code in IPython, because (to a first approximation) any given warning
will only issue once per session. Blink and you'll miss it! This is a
very common contributor to confused emails to numpy-discussion. E.g.:

In [5]: 1 / my_array_with_random_contents
/home/njs/.user-python2.7-64bit-3/bin/ipython:1: RuntimeWarning:
divide by zero encountered in divide
  #!/home/njs/.user-python2.7-64bit-3/bin/python
Out[5]:
array([ 1.77073316, -2.29765021, -2.01800811, ...,  1.13871243,
       -1.08302964, -8.6185091 ])

Oo, right, guess I gotta be careful of those zeros -- thanks, numpy,
for giving me that warning!

A few days later:

In [592]: 1 / some_other_array
Out[592]:
array([ 3.07735763,  0.50769289,  0.83984078, ..., -0.67563917,
       -0.85736257, -1.36511271])

Oops, it turns out that this array had a zero in it too, and that's
going to bite me later. But no warning this time!

The effect of this commit is to make it so that warnings triggered by
the code in cell 5 do *not* suppress warnings triggered by the code in
cell 592. Note that this only applies to warnings triggered *directly*
by code entered interactively -- if somepkg.foo() calls
anotherpkg.bad_func() which issues a warning, then this warning will
still only be displayed once, even if multiple cells call
somepkg.foo(). But if cell 5 and cell 592 both call
anotherpkg.bad_func() directly, then both will get warnings.

(Important exception: if foo() is defined *interactively*, and calls
anotherpkg.bad_func(), then every cell that calls foo() will display
the warning again. This is unavoidable without fixes to CPython
upstream.)

Explanation:

Python's warning system has some weird quirks. By default, it tries to
suppress duplicate warnings, where "duplicate" means the same warning
message triggered twice by the same line of code. This requires
determining which line of code is responsible for triggering a
warning, and this is controlled by the stacklevel= argument to
warnings.warn. Basically, though, the idea is that if foo() calls
bar() which calls baz() which calls some_deprecated_api(), then baz()
will get counted as being "responsible", and the warning system will
make a note that the usage of some_deprecated_api() inside baz() has
already been warned about and doesn't need to be warned about
again. So far so good.

To accomplish this, obviously, there has to be a record of somewhere
which line this was. You might think that this would be done by
recording the filename:linenumber pair in a dict inside the warnings
module, or something like that. You would be wrong.

What actually happens is that the warnings module will use stack
introspection to reach into baz()'s execution environment, create a
global (module-level) variable there named __warningregistry__, and
then, inside this dictionary, record just the line number. Basically,
it assumes that any given module contains only one line 1, only one
line 2, etc., so storing the filename is irrelevant. Obviously for
interactive code this is totally wrong -- all cells share the same
execution environment and global namespace, and they all contain a new
line 1. Currently the warnings module treats these as if they were all
the same line.

In fact they are not the same line; once we have executed a given
chunk of code, we will never see those particular lines again. As soon
as a given chunk of code finishes executing, its line number labels
become meaningless, and the corresponding warning registry entries
become meaningless as well. Therefore, with this patch we delete the
__warningregistry__ each time we execute a new block of code.
@njsmith njsmith force-pushed the reset-warning-registry branch from 48f4d74 to 61431d7 Compare October 31, 2014 02:17
@njsmith
Copy link
Contributor Author

njsmith commented Oct 31, 2014

As a start, I went ahead and made a commit that just removes the reset APIs. This PR currently contains that commit plus the __warningregistry__ commit (click the "commits" tab if you want to review them separately). We could make them into two PRs I guess if that would make everyone happier, though if they look good then it might be simpler just to merge them both together here.

@minrk
Copy link
Member

minrk commented Oct 31, 2014

I'm fine with this as-is. @takluyver?

@takluyver
Copy link
Member

Yep. Thanks for your patience, @njsmith .

takluyver added a commit that referenced this pull request Oct 31, 2014
Reset the interactive namespace __warningregistry__ before executing code
@takluyver takluyver merged commit f3be3b1 into ipython:master Oct 31, 2014
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
The whole idea of the EventManager is that you can register hooks
without worrying about what hooks other pieces of code might be
registering. The reset methods violate this separation of concerns,
since they will blow away everyone else's hooks too. (See ipythongh-6680 for
an example of this breaking things.) Since there is never any safe way
to use them, we simply remove them entirely.
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Reset the interactive namespace __warningregistry__ before executing code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We should clear interactive namespace's __warningregistry__ between executing each cell

6 participants