MNT: remove redundant normalization from set#30979
MNT: remove redundant normalization from set#30979rcomer wants to merge 1 commit intomatplotlib:mainfrom
Conversation
|
From a quick look, I haven't seen a test for this. Please add one. I think Artist itself does not have aliases, so either you would have to create a sublcass with aliases for the test or just use one of the subclasses like |
There was a problem hiding this comment.
For context, the _api.define_aliases decorator automatically creates the alias getters and setters. So we are safe that all the alias setters exist and normalization is indeed not necessary.
Btw. do we test set() at all? If not, you may just add the long form here as well
line.set(linewidth=6)
assert line.get_linewidth() == 6
Not that I can see, though it is of course tested indirectly many times. |
|
A difference is that if going through normalize_kwargs, passing {"linewidth":, 1, "lw": 2} will raise an error because it sees that two aliases of the same property are passed (and it explicitly checks for that), whereas directly calling the setters will just let the second value win. I don't have a strong opinion right now as to whether this actually matters, or whether it's even worth worrying about this. |
|
Thanks @anntzer I think it should error for that case. I will wait to see if anyone thinks differently and, if, not, reinstate |
|
Note that we are not consistent with duplicating through aliases In [33]: plt.plot([0, 1], lw=2, linewidth=3)
TypeError: Got both 'lw' and 'linewidth', which are aliases of one anotherbut In [29]: line = Line2D([], [], lw=2, linewidth=3)
In [30]: line.get_linewidth()
Out[30]: 2.0
In [31]: line = Line2D([], [], linewidth=2, lw=3)
In [32]: line.get_linewidth()
Out[32]: 3.0So, I would have said that it is acceptable for But this all shows that we should have a consistent concept for aliases: Should they exist as dedicated methods? i.e. do we want I don't think we necessarily need them as a public interface. If somebody bothers to resort to The benefit of the alias functions is that we can pass aliases all the way through to their function. This feels cheaper because no remapping is needed. But cost would really need a dedicated investigation. If we just do a remapping (no duplicacy check), that's basically a dict-lookup, which is extremely fast. It's likely even faster than calling the alias setter, because that in turn has to lookup and call the actual property setter. But there could be a caveat that if we have to do normalization at every step of a call hierarchy, that could still become expensive. The downside of not normalizing and falling through to the respective setter is that unnormalized state can have surprising effects. Which parameter eventually wins can depend on implementation details like in the When do we want to do normalization Do we want duplicacy checks? Overall, I'm not clear what is best. I think we should defer this to 3.12. It's nice to clean this mess up, but it has been as is for decades, so no need to hurry. |
|
😱 I should have learned by now that nothing is as simple as it first appears!
I agree. I actually hadn’t realised that they exist until this PR. I don’t think I would have otherwise gone looking for them. These days most people use tab-completion so they won’t save many keystrokes. |
I believe that the technically correct solution is to not include any property in the actual constructor signature (the documented and displayed signature can and should likely be different, but that can be done via the usual signature-rewriting tricks that we already use for |
We can skip that check if the normalized dictionary is the same size as the original dictionary --- a/lib/matplotlib/cbook.py
+++ b/lib/matplotlib/cbook.py
@@ -1862,18 +1862,19 @@ def normalize_kwargs(kw, alias_mapping=None):
to_canonical = {alias: canonical
for canonical, alias_list in alias_mapping.items()
for alias in alias_list}
- canonical_to_seen = {}
- ret = {} # output dictionary
+ ret = {to_canonical.get(k, k): v for k, v in kw.items()} # output dictionary
+ if len(ret) == len(kw):
+ return ret
+
+ # look for duplicates
+ canonical_to_seen = {}
for k, v in kw.items():
canonical = to_canonical.get(k, k)
if canonical in canonical_to_seen:
raise TypeError(f"Got both {canonical_to_seen[canonical]!r} and "
f"{k!r}, which are aliases of one another")
canonical_to_seen[canonical] = k
- ret[canonical] = v
-
- return retHowever, with
In [8]: %timeit normalize_kwargs(kwargs_good, Line2D)
1.51 μs ± 8.46 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [4]: %timeit normalize_kwargs(kwargs_good, Line2D)
1.39 μs ± 9.61 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) |
|
If we care about performance then the biggest bottleneck actually appears to be the regeneration of to_canonical again and again, ie. caching that gives me a >2x speedup on @rcomer's example: diff --git i/lib/matplotlib/cbook.py w/lib/matplotlib/cbook.py
index 2e416486ba..34bc774aa7 100644
--- i/lib/matplotlib/cbook.py
+++ w/lib/matplotlib/cbook.py
@@ -1820,6 +1820,13 @@ def _resize_sequence(seq, N):
return list(itertools.islice(itertools.cycle(seq), N))
+@functools.cache
+def _fix_alias_map(cls):
+ return {alias: canonical
+ for canonical, alias_list in getattr(cls, "_alias_map", {}).items()
+ for alias in alias_list}
+
+
def normalize_kwargs(kw, alias_mapping=None):
"""
Helper function to normalize kwarg inputs.
@@ -1852,28 +1859,32 @@ def normalize_kwargs(kw, alias_mapping=None):
if kw is None:
return {}
- # deal with default value of alias_mapping
+ # deal with default value of alias_mapping -- could be deprecated away.
if alias_mapping is None:
- alias_mapping = {}
- elif (isinstance(alias_mapping, type) and issubclass(alias_mapping, Artist)
- or isinstance(alias_mapping, Artist)):
- alias_mapping = getattr(alias_mapping, "_alias_map", {})
-
- to_canonical = {alias: canonical
- for canonical, alias_list in alias_mapping.items()
- for alias in alias_list}
+ return kw
+
+ if isinstance(alias_mapping, Artist):
+ alias_mapping = type(alias_mapping)
+
+ if isinstance(alias_mapping, type) and issubclass(alias_mapping, Artist):
+ to_canonical = _fix_alias_map(alias_mapping)
+ else:
+ to_canonical = {alias: canonical
+ for canonical, alias_list in alias_mapping.items()
+ for alias in alias_list}
+
+ ret = {to_canonical.get(k, k): v for k, v in kw.items()} # output dictionary
+ if len(ret) == len(kw):
+ return ret
+
+ # look for duplicates
canonical_to_seen = {}
- ret = {} # output dictionary
-
for k, v in kw.items():
canonical = to_canonical.get(k, k)
if canonical in canonical_to_seen:
raise TypeError(f"Got both {canonical_to_seen[canonical]!r} and "
f"{k!r}, which are aliases of one another")
canonical_to_seen[canonical] = k
- ret[canonical] = v
-
- return ret
@contextlib.contextmanagerThe patch above is just a quick try but I suspect the "correct" approach is to change the format of Artist._alias_map (as set up by |
I fundamentally agree. Two additional thoughts to consider:
|
A quick check with ipython and vscode show they give me no help for passing linewidth to I am not too experienced with decorators, but I wonder if it's possible to make one that injects the canonical properties of an artist into a call sequence, and passes them as **kwargs to the inner function. |
|
A quick chat with AI reveals the following - to be checked, but plausible behavior:
So for static analyzers, we likely have to tune our .pyi files while for runtime inspection, we need to adapt the signature via decorators. Both approaches should be compatible with each other and with still actually using |
|
I'm seeing big performance gains in #31000 from skipping the heaviest use of this code path. So if the discussion here is borderline on keeping/changing the behavior then it's worth flagging that the performance hit from keeping it is going to be pretty minimal after that merges. It sounds mostly philosophical though. :) |
|
My primary intention here is to come up with a concept: How do we get from short and long kwargs (in Artist.init, set, plotting functions) to having the right state set. Aspects to that are
Performance is still a design criteria. If possible, the architecture should be efficient. So that we are not too dependent on special shortcuts. #31000 brings benefits on _set_cm, which is great and good to know, but doesn't alleviate the need for a fundamental concept. |
If |
|
Overall the strategy here should be: Short-term: Mid-term: (larger project)
|
PR summary
Inspired by #30957
The normalization is not needed here because the alias forms of the
set_*methods can be called. So we can save ourselves microseconds 🚀Before:
After: