Skip to content

Comments

MNT: remove redundant normalization from set#30979

Closed
rcomer wants to merge 1 commit intomatplotlib:mainfrom
rcomer:set-normalize
Closed

MNT: remove redundant normalization from set#30979
rcomer wants to merge 1 commit intomatplotlib:mainfrom
rcomer:set-normalize

Conversation

@rcomer
Copy link
Member

@rcomer rcomer commented Jan 16, 2026

PR summary

Inspired by #30957

The normalization is not needed here because the alias forms of the set_* methods can be called. So we can save ourselves microseconds 🚀

Before:

In [5]: %timeit line.set(lw=3, color='r', ls='--')
12.3 μs ± 116 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

After:

In [8]: %timeit line.set(lw=3, color='r', ls='--')
10.9 μs ± 76.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

QuLogic
QuLogic previously approved these changes Jan 17, 2026
@timhoffm
Copy link
Member

From a quick look, I haven't seen a test for this. Please add one. I think Artist itself does not have aliases, so either you would have to create a sublcass with aliases for the test or just use one of the subclasses like

line = Line2D([0, 1], [0, 1])
line.set(lw=5)
assert line.get_linewidth() == 5

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, the _api.define_aliases decorator automatically creates the alias getters and setters. So we are safe that all the alias setters exist and normalization is indeed not necessary.

Btw. do we test set() at all? If not, you may just add the long form here as well

line.set(linewidth=6)
assert line.get_linewidth() == 6

@rcomer
Copy link
Member Author

rcomer commented Jan 17, 2026

Btw. do we test set() at all?

Not that I can see, though it is of course tested indirectly many times.

@timhoffm timhoffm mentioned this pull request Jan 17, 2026
5 tasks
@anntzer
Copy link
Contributor

anntzer commented Jan 17, 2026

A difference is that if going through normalize_kwargs, passing {"linewidth":, 1, "lw": 2} will raise an error because it sees that two aliases of the same property are passed (and it explicitly checks for that), whereas directly calling the setters will just let the second value win. I don't have a strong opinion right now as to whether this actually matters, or whether it's even worth worrying about this.

@rcomer rcomer marked this pull request as draft January 17, 2026 19:20
@rcomer
Copy link
Member Author

rcomer commented Jan 17, 2026

Thanks @anntzer I think it should error for that case. I will wait to see if anyone thinks differently and, if, not, reinstate normalize_kwargs here and add a check for the error to the new test.

@timhoffm
Copy link
Member

timhoffm commented Jan 18, 2026

Note that we are not consistent with duplicating through aliases

In [33]: plt.plot([0, 1], lw=2, linewidth=3)

TypeError: Got both 'lw' and 'linewidth', which are aliases of one another

but

In [29]: line = Line2D([], [], lw=2, linewidth=3)

In [30]: line.get_linewidth()
Out[30]: 2.0

In [31]: line = Line2D([], [], linewidth=2, lw=3)

In [32]: line.get_linewidth()
Out[32]: 3.0

So, plot() does a normalization and warns, but Line2D.__init__ does not warn. Surprisingly, lw always wins - That's an implementation detail as linewidth is an explicit parameter, whereas lw comes in via kwargs. __init__ first sets all explicit parameters and then does self._internal_update(kwargs), with which lw overwrites the previously set linewidth. I'm quite sure this is accidental behavior and not intentionally designed.

I would have said that it is acceptable for set() to not warn on the basis that set(a=A, b=B) is just set_a(A); set_b(B) and if these parameters interact (e.g. because they overwrite each other, but could also be other interactions), that is your responsibility. Warning may be nice, but I'm not sure we need to jump extra hoops for that.

But this all shows that we should have a consistent concept for aliases:

Should they exist as dedicated methods? i.e. do we want set_lw()?

I don't think we necessarily need them as a public interface. If somebody bothers to resort to set_* methods, they could write out the full name. The abbreviations are primarily useful for kwargs.

The benefit of the alias functions is that we can pass aliases all the way through to their function. This feels cheaper because no remapping is needed. But cost would really need a dedicated investigation. If we just do a remapping (no duplicacy check), that's basically a dict-lookup, which is extremely fast. It's likely even faster than calling the alias setter, because that in turn has to lookup and call the actual property setter. But there could be a caveat that if we have to do normalization at every step of a call hierarchy, that could still become expensive.

The downside of not normalizing and falling through to the respective setter is that unnormalized state can have surprising effects. Which parameter eventually wins can depend on implementation details like in the Line2D.__init__.

When do we want to do normalization
Should this eventually happen for every API call? If so, we must normalize on the most inner level, i.e. in the public API of the Artists. We could but don't have to normalize in outer API layers like plot(). Some of these still need/use normalization to be able to conflate or sanity check parameters, e.g. mashing up c, color, facecolor, fc state in scatter().

Do we want duplicacy checks?
i.e. warn when both lw and linewidth are supplied. That's an additional cost and more costly than normalization, because you have to do comparison with all parameters. If we want to check for duplicacy, we obviously need to do this before each normalization. Alternative to duplicacy checking would be putting the inputs in the responsibility of the user. One can argue, as above, that we just apply all parameters - either with defined priorities (e.g. original/alias name wins, first/last name wins) - or stating that priority is undefined.

Overall, I'm not clear what is best. I think we should defer this to 3.12. It's nice to clean this mess up, but it has been as is for decades, so no need to hurry.

@rcomer
Copy link
Member Author

rcomer commented Jan 18, 2026

😱

I should have learned by now that nothing is as simple as it first appears!

Should they exist as dedicated methods? i.e. do we want set_lw()?

I don't think we necessarily need them as a public interface. If somebody bothers to resort to set_* methods, they could write out the full name. The abbreviations are primarily useful for kwargs.

I agree. I actually hadn’t realised that they exist until this PR. I don’t think I would have otherwise gone looking for them. These days most people use tab-completion so they won’t save many keystrokes.

@anntzer
Copy link
Contributor

anntzer commented Jan 18, 2026

That's an implementation detail as linewidth is an explicit parameter, whereas lw comes in via kwargs. init first sets all explicit parameters and then does self._internal_update(kwargs)

I believe that the technically correct solution is to not include any property in the actual constructor signature (the documented and displayed signature can and should likely be different, but that can be done via the usual signature-rewriting tricks that we already use for .set()); we should just capture all kwargs in a kwargs dict and forward them to set().

@rcomer
Copy link
Member Author

rcomer commented Jan 18, 2026

Do we want duplicacy checks?
i.e. warn when both lw and linewidth are supplied. That's an additional cost and more costly than normalization, because you have to do comparison with all parameters. If we want to check for duplicacy, we obviously need to do this before each normalization.

We can skip that check if the normalized dictionary is the same size as the original dictionary

--- a/lib/matplotlib/cbook.py
+++ b/lib/matplotlib/cbook.py
@@ -1862,18 +1862,19 @@ def normalize_kwargs(kw, alias_mapping=None):
     to_canonical = {alias: canonical
                     for canonical, alias_list in alias_mapping.items()
                     for alias in alias_list}
-    canonical_to_seen = {}
-    ret = {}  # output dictionary
 
+    ret = {to_canonical.get(k, k): v for k, v in kw.items()}  # output dictionary
+    if len(ret) == len(kw):
+        return ret
+
+    # look for duplicates
+    canonical_to_seen = {}
     for k, v in kw.items():
         canonical = to_canonical.get(k, k)
         if canonical in canonical_to_seen:
             raise TypeError(f"Got both {canonical_to_seen[canonical]!r} and "
                             f"{k!r}, which are aliases of one another")
         canonical_to_seen[canonical] = k
-        ret[canonical] = v
-
-    return ret

However, with kwargs_good = {'lw': 3, 'color': 'r', 'linestyle': 'dashed'}, that only saved me 0.1 of a microsecond

main

In [8]: %timeit normalize_kwargs(kwargs_good, Line2D)
1.51 μs ± 8.46 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

branch

In [4]: %timeit normalize_kwargs(kwargs_good, Line2D)
1.39 μs ± 9.61 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

@anntzer
Copy link
Contributor

anntzer commented Jan 18, 2026

If we care about performance then the biggest bottleneck actually appears to be the regeneration of to_canonical again and again, ie. caching that gives me a >2x speedup on @rcomer's example:

diff --git i/lib/matplotlib/cbook.py w/lib/matplotlib/cbook.py
index 2e416486ba..34bc774aa7 100644
--- i/lib/matplotlib/cbook.py
+++ w/lib/matplotlib/cbook.py
@@ -1820,6 +1820,13 @@ def _resize_sequence(seq, N):
         return list(itertools.islice(itertools.cycle(seq), N))
 
 
+@functools.cache
+def _fix_alias_map(cls):
+    return {alias: canonical
+            for canonical, alias_list in getattr(cls, "_alias_map", {}).items()
+            for alias in alias_list}
+
+
 def normalize_kwargs(kw, alias_mapping=None):
     """
     Helper function to normalize kwarg inputs.
@@ -1852,28 +1859,32 @@ def normalize_kwargs(kw, alias_mapping=None):
     if kw is None:
         return {}
 
-    # deal with default value of alias_mapping
+    # deal with default value of alias_mapping -- could be deprecated away.
     if alias_mapping is None:
-        alias_mapping = {}
-    elif (isinstance(alias_mapping, type) and issubclass(alias_mapping, Artist)
-          or isinstance(alias_mapping, Artist)):
-        alias_mapping = getattr(alias_mapping, "_alias_map", {})
-
-    to_canonical = {alias: canonical
-                    for canonical, alias_list in alias_mapping.items()
-                    for alias in alias_list}
+        return kw
+
+    if isinstance(alias_mapping, Artist):
+        alias_mapping = type(alias_mapping)
+
+    if isinstance(alias_mapping, type) and issubclass(alias_mapping, Artist):
+        to_canonical = _fix_alias_map(alias_mapping)
+    else:
+        to_canonical = {alias: canonical
+                        for canonical, alias_list in alias_mapping.items()
+                        for alias in alias_list}
+
+    ret = {to_canonical.get(k, k): v for k, v in kw.items()}  # output dictionary
+    if len(ret) == len(kw):
+        return ret
+
+    # look for duplicates
     canonical_to_seen = {}
-    ret = {}  # output dictionary
-
     for k, v in kw.items():
         canonical = to_canonical.get(k, k)
         if canonical in canonical_to_seen:
             raise TypeError(f"Got both {canonical_to_seen[canonical]!r} and "
                             f"{k!r}, which are aliases of one another")
         canonical_to_seen[canonical] = k
-        ret[canonical] = v
-
-    return ret
 
 
 @contextlib.contextmanager

The patch above is just a quick try but I suspect the "correct" approach is to change the format of Artist._alias_map (as set up by define_aliases) to directly be in the "efficient" format ({alias: canonical, alias: canonical, ...}) (with whatever deprecation dance is necessary, although most of the relevant stuff is private).

@timhoffm
Copy link
Member

I believe that the technically correct solution is to not include any property in the actual constructor signature (the documented and displayed signature can and should likely be different, but that can be done via the usual signature-rewriting tricks that we already use for .set()); we should just capture all kwargs in a kwargs dict and forward them to set().

I fundamentally agree. Two additional thoughts to consider:

  • How does completion work nowadays? At what exactly are they looking? It would be nice to support that. OTOH constructors are rarely called directy. But possibly the same argument will hold for our plotting functions?
  • How costly is signature rewriting? It's an import-time cost and a few won't matter, but if we do this large-scale on the library we should at least have a guess whether that's significant.

@rcomer
Copy link
Member Author

rcomer commented Jan 18, 2026

  • How does completion work nowadays? At what exactly are they looking? It would be nice to support that. OTOH constructors are rarely called directy. But possibly the same argument will hold for our plotting functions?

A quick check with ipython and vscode show they give me no help for passing linewidth to plot, but for Line2D tab-completion on linewidth works nicely. This is probably the wrong way around!

I am not too experienced with decorators, but I wonder if it's possible to make one that injects the canonical properties of an artist into a call sequence, and passes them as **kwargs to the inner function.

@timhoffm
Copy link
Member

timhoffm commented Jan 18, 2026

A quick chat with AI reveals the following - to be checked, but plausible behavior:

Aspect IPython VS Code
Execution model Runtime Static
Parameter source inspect.signature , __text_signature__ AST + .pyi
Works without running code No Yes

So for static analyzers, we likely have to tune our .pyi files while for runtime inspection, we need to adapt the signature via decorators. Both approaches should be compatible with each other and with still actually using **kwargs in the atcual code logic.

@scottshambaugh
Copy link
Contributor

scottshambaugh commented Jan 20, 2026

I'm seeing big performance gains in #31000 from skipping the heaviest use of this code path. So if the discussion here is borderline on keeping/changing the behavior then it's worth flagging that the performance hit from keeping it is going to be pretty minimal after that merges. It sounds mostly philosophical though. :)

@timhoffm
Copy link
Member

My primary intention here is to come up with a concept: How do we get from short and long kwargs (in Artist.init, set, plotting functions) to having the right state set. Aspects to that are

  • performance: This is a quite common operation (though possibly often with few or even 0 parameters).
  • clarity: We should have a clear mental model how we handle this, ideally with not too much code
  • usability: In particular support of completion.

Performance is still a design criteria. If possible, the architecture should be efficient. So that we are not too dependent on special shortcuts. #31000 brings benefits on _set_cm, which is great and good to know, but doesn't alleviate the need for a fundamental concept.

@QuLogic QuLogic dismissed their stale review January 23, 2026 00:51

Seems to need more discussion.

@timhoffm
Copy link
Member

timhoffm commented Jan 23, 2026

The patch above is just a quick try but I suspect the "correct" approach is to change the format of Artist._alias_map (as set up by define_aliases) to directly be in the "efficient" format ({alias: canonical, alias: canonical, ...}) (with whatever deprecation dance is necessary, although most of the relevant stuff is private).

If _alias_map is still needed somewhere, it's also not the end of the world to additionally store a _reverse_alias_map. That can be generated right away in define_aliases. It's a little bit additional import-time effort, but we can easily affort a dict comprehension compared to all the other stuff we do.

@timhoffm
Copy link
Member

timhoffm commented Jan 23, 2026

Overall the strategy here should be:

Short-term:
make normalize_kwargs faster through a pre-built reverse_alias_map #30979 (comment) #30979 (comment)
Done in #31023.

Mid-term: (larger project)
Revisit the architecture and handling of artist properties, including

@timhoffm
Copy link
Member

timhoffm commented Feb 2, 2026

@rcomer I propose to close this. #31023 has made the normalization faster and any further changes are likely better addessed as part of a systematic reevaluation of the property handling, which we'll track in #31055.

@rcomer rcomer closed this Feb 2, 2026
@rcomer rcomer deleted the set-normalize branch February 2, 2026 09:43
@rcomer rcomer mentioned this pull request Feb 2, 2026
5 tasks
@QuLogic QuLogic removed this from the v3.12.0 milestone Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants