Skip to content

Comments

PERF: Text handling speedups#31001

Open
scottshambaugh wants to merge 15 commits intomatplotlib:mainfrom
scottshambaugh:text_speedups
Open

PERF: Text handling speedups#31001
scottshambaugh wants to merge 15 commits intomatplotlib:mainfrom
scottshambaugh:text_speedups

Conversation

@scottshambaugh
Copy link
Contributor

@scottshambaugh scottshambaugh commented Jan 20, 2026

PR summary

I've been having a lot of fun profiling the past two days. This PR is the result of optimizing slow bits of the text rendering code paths that are called downstream of axis3d._draw_ticks(). None of these changes are 3D specific, so they should speed up 2D draw times as well. The non-agg-rendering code in this part of the stack is sped up by a cumulative 2.2x, which is an 8% reduction in the total draw time for my test script of an empty 3D plot.

The commits are all self-contained, so I can break them apart if that's easier to review.

The font property cache is the change where I'm least confident in my understanding of the original design decisions, but it's simpler and 2x faster. The new __copy__ method does help partially speed things up if we want to keep the original structure instead

Summary of the changes:

text.py:

  • Rework the font property cache to use a plain dict instead of lru_cache
  • Add @lru_cache for rotation transforms via a _rotate(theta) helper function (common case is only a few angles)
  • Add fast path to skip rotation transform operations when rotation=0 (the most common case)
  • Use direct indexing instead of numpy array operations for several small lists

font_manager.py:

  • Implement __copy__ method on FontProperties that bypasses __init__ validation
  • Make __hash__ more robust to new attrs

lines.py

  • Add fast path for same-shape x/y arrays using direct assignment instead of broadcast_arrays
  • Replace .T unpacking with column slicing for views

path.py

  • Inline shape validation instead of calling _api.check_shape

transforms.py

  • Type the array construction in Bbox.from_extents

Before:
image

After (less time on things that aren't draw_text):
image

Test script:

import time
import matplotlib.pyplot as plt

print("Starting...")

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

print("Timing...")

start_time = time.perf_counter()
for i in range(250):
    ax.view_init(elev=i, azim=i)
    fig.canvas.draw()
end_time = time.perf_counter()

plt.close()

print(f"Time taken: {end_time - start_time:.4f} seconds")

PR checklist

@scottshambaugh
Copy link
Contributor Author

scottshambaugh commented Jan 20, 2026

Ready for review. @anntzer FYI - you're probably the most familiar with the text sections here

@scottshambaugh scottshambaugh force-pushed the text_speedups branch 3 times, most recently from 81dfd7e to 810db09 Compare January 20, 2026 19:43
@anntzer
Copy link
Contributor

anntzer commented Jan 29, 2026

Do you want to also include the simplification(/speedup) mentioned at #31000 (review) (don't bother with cm_set() in Text.draw)? I can also make a separate PR for that if you prefer.

@scottshambaugh scottshambaugh force-pushed the text_speedups branch 2 times, most recently from c0941a2 to 4695392 Compare February 2, 2026 19:51
@scottshambaugh
Copy link
Contributor Author

Removed the wrapped text context manager

else:
y = self._y

self._xy = np.column_stack(np.broadcast_arrays(x, y)).astype(float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a fair number of uses of column_stack in the codebase which are always converting two 1D arrays into a (n, 2) 2D array (sometimes also relying on broadcasting) that could probably benefit from a similar treatment; perhaps factor this pattern to a helper function and use it throughout?

Copy link
Contributor Author

@scottshambaugh scottshambaugh Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it, and found that vstack(np.broadcast_arrays(x, y)).T is actually the same speed, and a little less verbose. The reason that column_stack() is generally slower than vstack().T, is since the former has to interleave elements in memory whereas the second does contiguous memory copies and returns a view.

10,000 elements: 10 runs x 10,000 iterations

With broadcast:
- `np.column_stack(np.broadcast_arrays(x, y))`: 36.47 us
- `np.vstack(np.broadcast_arrays(x, y)).T`: 27.67 us
- `np.empty + assign`: 30.09 us

Without broadcast:
- `np.column_stack([x, y])`: 20.63 us
- `np.vstack((x, y)).T`: 13.18 us

I went and updated this call, but I think a broader conversion is best suited for another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue here: #31130

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion at #31130 let's just revert this for now and postpone the discussion? I'll approve the PR without this change.

Revert "Prefer np.vstack().T to np.column_stack() for speed"

This reverts commit 2e32436.

Simplify column stack
Comment on lines -533 to +551
halign = self._ha_for_angle(angle)
halign = self._ha_for_angle(rotation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this renaming unfortunate. Since we deal a lot with transformations rotation is quite ambiguous. I see that the change comes from roation = self.get_rotation(), on the lower level here rotation is too imprecise. I suggest either to stay with angle or be very explcit and switch to rotation_angle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

angle seems fine to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants