PERF: Text handling speedups by scottshambaugh · Pull Request #31001 · matplotlib/matplotlib

scottshambaugh · 2026-01-20T02:21:42Z

PR summary

I've been having a lot of fun profiling the past two days. This PR is the result of optimizing slow bits of the text rendering code paths that are called downstream of axis3d._draw_ticks(). None of these changes are 3D specific, so they should speed up 2D draw times as well. The non-agg-rendering code in this part of the stack is sped up by a cumulative 2.2x, which is an 8% reduction in the total draw time for my test script of an empty 3D plot.

The commits are all self-contained, so I can break them apart if that's easier to review.

~~The font property cache is the change where I'm least confident in my understanding of the original design decisions, but it's simpler and 2x faster.~~ The new __copy__ method does help partially speed things up if we want to keep the original structure instead

Summary of the changes:

text.py:

~~Rework the font property cache to use a plain dict instead of lru_cache~~
Add @lru_cache for rotation transforms via a _rotate(theta) helper function (common case is only a few angles)
Add fast path to skip rotation transform operations when rotation=0 (the most common case)
Use direct indexing instead of numpy array operations for several small lists

font_manager.py:

Implement __copy__ method on FontProperties that bypasses __init__ validation
Make __hash__ more robust to new attrs

lines.py

Add fast path for same-shape x/y arrays using direct assignment instead of broadcast_arrays
Replace .T unpacking with column slicing for views

path.py

~~Inline shape validation instead of calling _api.check_shape~~

transforms.py

Type the array construction in Bbox.from_extents

Before:

After (less time on things that aren't draw_text):

Test script:

import time
import matplotlib.pyplot as plt

print("Starting...")

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

print("Timing...")

start_time = time.perf_counter()
for i in range(250):
    ax.view_init(elev=i, azim=i)
    fig.canvas.draw()
end_time = time.perf_counter()

plt.close()

print(f"Time taken: {end_time - start_time:.4f} seconds")

PR checklist

[n/a] "closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
[n/a] Plotting related features are demonstrated in an example
[n/a] New Features and API Changes are noted with a directive and release note
[n/a] Documentation complies with general and docstring guidelines

scottshambaugh · 2026-01-20T03:20:18Z

Ready for review. @anntzer FYI - you're probably the most familiar with the text sections here

lib/matplotlib/image.py

lib/matplotlib/lines.py

lib/matplotlib/transforms.py

lib/matplotlib/text.py

anntzer · 2026-01-29T16:51:37Z

Do you want to also include the simplification(/speedup) mentioned at #31000 (review) (don't bother with cm_set() in Text.draw)? I can also make a separate PR for that if you prefer.

scottshambaugh · 2026-02-02T19:51:50Z

Removed the wrapped text context manager

lib/matplotlib/path.py

lib/matplotlib/text.py

Fix tests

More robust BBox creation

lib/matplotlib/text.py

anntzer · 2026-02-07T17:48:24Z

lib/matplotlib/lines.py

        else:
            y = self._y

-        self._xy = np.column_stack(np.broadcast_arrays(x, y)).astype(float)


There's a fair number of uses of column_stack in the codebase which are always converting two 1D arrays into a (n, 2) 2D array (sometimes also relying on broadcasting) that could probably benefit from a similar treatment; perhaps factor this pattern to a helper function and use it throughout?

I looked into it, and found that vstack(np.broadcast_arrays(x, y)).T is actually the same speed, and a little less verbose. The reason that column_stack() is generally slower than vstack().T, is since the former has to interleave elements in memory whereas the second does contiguous memory copies and returns a view.

10,000 elements: 10 runs x 10,000 iterations With broadcast: - `np.column_stack(np.broadcast_arrays(x, y))`: 36.47 us - `np.vstack(np.broadcast_arrays(x, y)).T`: 27.67 us - `np.empty + assign`: 30.09 us Without broadcast: - `np.column_stack([x, y])`: 20.63 us - `np.vstack((x, y)).T`: 13.18 us

I went and updated this call, but I think a broader conversion is best suited for another PR.

Issue here: #31130

Given the discussion at #31130 let's just revert this for now and postpone the discussion? I'll approve the PR without this change.

Revert "Prefer np.vstack().T to np.column_stack() for speed" This reverts commit 2e32436. Simplify column stack

timhoffm · 2026-02-11T10:55:20Z

lib/matplotlib/text.py

-                halign = self._ha_for_angle(angle)
+                halign = self._ha_for_angle(rotation)


I find this renaming unfortunate. Since we deal a lot with transformations rotation is quite ambiguous. I see that the change comes from roation = self.get_rotation(), on the lower level here rotation is too imprecise. I suggest either to stay with angle or be very explcit and switch to rotation_angle.

angle seems fine to me.

github-actions bot added topic: text topic: path handling topic: transforms and scales labels Jan 20, 2026

scottshambaugh force-pushed the text_speedups branch from 267a86f to 6a7f558 Compare January 20, 2026 02:24

scottshambaugh added the Performance label Jan 20, 2026

scottshambaugh force-pushed the text_speedups branch 2 times, most recently from fe70db4 to f35c71d Compare January 20, 2026 03:08

github-actions bot added the topic: images label Jan 20, 2026

scottshambaugh marked this pull request as ready for review January 20, 2026 03:19