PERF: Text handling speedups#31001
Conversation
267a86f to
6a7f558
Compare
fe70db4 to
f35c71d
Compare
|
Ready for review. @anntzer FYI - you're probably the most familiar with the text sections here |
81dfd7e to
810db09
Compare
|
Do you want to also include the simplification(/speedup) mentioned at #31000 (review) (don't bother with cm_set() in Text.draw)? I can also make a separate PR for that if you prefer. |
c0941a2 to
4695392
Compare
|
Removed the wrapped text context manager |
Fix tests
More robust BBox creation
c40ef34 to
55e11b4
Compare
24eb2c1 to
78a46cd
Compare
| else: | ||
| y = self._y | ||
|
|
||
| self._xy = np.column_stack(np.broadcast_arrays(x, y)).astype(float) |
There was a problem hiding this comment.
There's a fair number of uses of column_stack in the codebase which are always converting two 1D arrays into a (n, 2) 2D array (sometimes also relying on broadcasting) that could probably benefit from a similar treatment; perhaps factor this pattern to a helper function and use it throughout?
There was a problem hiding this comment.
I looked into it, and found that vstack(np.broadcast_arrays(x, y)).T is actually the same speed, and a little less verbose. The reason that column_stack() is generally slower than vstack().T, is since the former has to interleave elements in memory whereas the second does contiguous memory copies and returns a view.
10,000 elements: 10 runs x 10,000 iterations
With broadcast:
- `np.column_stack(np.broadcast_arrays(x, y))`: 36.47 us
- `np.vstack(np.broadcast_arrays(x, y)).T`: 27.67 us
- `np.empty + assign`: 30.09 us
Without broadcast:
- `np.column_stack([x, y])`: 20.63 us
- `np.vstack((x, y)).T`: 13.18 us
I went and updated this call, but I think a broader conversion is best suited for another PR.
There was a problem hiding this comment.
Given the discussion at #31130 let's just revert this for now and postpone the discussion? I'll approve the PR without this change.
Revert "Prefer np.vstack().T to np.column_stack() for speed" This reverts commit 2e32436. Simplify column stack
2e32436 to
193a794
Compare
| halign = self._ha_for_angle(angle) | ||
| halign = self._ha_for_angle(rotation) |
There was a problem hiding this comment.
I find this renaming unfortunate. Since we deal a lot with transformations rotation is quite ambiguous. I see that the change comes from roation = self.get_rotation(), on the lower level here rotation is too imprecise. I suggest either to stay with angle or be very explcit and switch to rotation_angle.
PR summary
I've been having a lot of fun profiling the past two days. This PR is the result of optimizing slow bits of the text rendering code paths that are called downstream of
axis3d._draw_ticks(). None of these changes are 3D specific, so they should speed up 2D draw times as well. The non-agg-rendering code in this part of the stack is sped up by a cumulative 2.2x, which is an 8% reduction in the total draw time for my test script of an empty 3D plot.The commits are all self-contained, so I can break them apart if that's easier to review.
The font property cache is the change where I'm least confident in my understanding of the original design decisions, but it's simpler and 2x faster.The new__copy__method does help partially speed things up if we want to keep the original structure insteadSummary of the changes:
text.py:Rework the font property cache to use a plain dict instead oflru_cache@lru_cachefor rotation transforms via a_rotate(theta)helper function (common case is only a few angles)font_manager.py:__copy__method onFontPropertiesthat bypasses__init__validation__hash__more robust to new attrslines.pybroadcast_arrays.Tunpacking with column slicing for viewspath.pyInline shape validation instead of calling_api.check_shapetransforms.pyBbox.from_extentsBefore:

After (less time on things that aren't

draw_text):Test script:
PR checklist