Skip to content

Conversation

@blyxyas
Copy link
Contributor

@blyxyas blyxyas commented Apr 24, 2025

Being that this information is already parsed into ItemBody and the user might want to behave differently depending on the style (either backslash or double spaced) of the hardbreak

I think it's the best option to expose it as a new enum.

This issue came up in Clippy, we has to do very weird things to get the hardbreak style, and results in performance issues. Sorry if this is controversial enough to need an issue.

Being that this information is already parsed into ItemBody
and the user might want to behave differently depending on
the style (either backslash or double spaced) of the hardbreak

I think it's the best option to expose it as a new enum.

This issue came up in Clippy, we has to do very weird things
to get the hardbreak style, and results in performance issues.
Sorry if this is controversial enough to need an issue.
Copy link
Collaborator

@ollpu ollpu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this makes sense, and is in line with some other enums we already expose. Check fmt!

Those other enums tend to be named ..Kind, so maybe this should be the same for the sake of uniformity, HardBreakKind?

Strictly speaking, the double-space style/kind is "two or more spaces", so I wonder if there's a more accurate name. Spaces?

By the way, do you have a pointer to the poorly performing post-hoc implementation in Clippy? I'm curious because I imagine it to be a relatively simple check if you know the event spans (with OffsetIter)

@blyxyas blyxyas force-pushed the expose-hardbreakstyle branch from 623d18e to cc77981 Compare April 24, 2025 17:01
@blyxyas
Copy link
Contributor Author

blyxyas commented Apr 24, 2025

This line A calls Fragments::span(), which calls source_span_for_markdown_range, which is a fairly expensive function. That first case calls it just to check if we're on a line that terminates in \, and it's called about 700 times while analyzing tokio taking up about 10% of the Clippy runtime in that crate (which is unacceptable!).

@blyxyas
Copy link
Contributor Author

blyxyas commented Apr 25, 2025

Oh I didn't see the failling CI :/ GitHub seems to be funky with the CI interface in this PR.

blyxyas added a commit to blyxyas/rust-clippy that referenced this pull request Apr 25, 2025
Turns out that `doc_markdown` uses a non-cheap rustdoc function
to convert from markdown ranges into source spans. And it was using it
a lot (about once every 18 lines of documentation on `tokio`, which ends
up being about 1800 times).

This ended up being about 18% of the total Clippy runtime as discovered by
lintcheck --perf in docs-heavy crates. This PR optimizes one of the cases
in which Clippy calls the function, and a future PR once
pulldown-cmark/pulldown-cmark#1034 is merged will be open.

Note that not all crates were affected by this crate equally, those with
more docs are affected far more than those light ones.
@ollpu
Copy link
Collaborator

ollpu commented Apr 25, 2025

This line A calls Fragments::span(), which calls source_span_for_markdown_range, which is a fairly expensive function. That first case calls it just to check if we're on a line that terminates in \, and it's called about 700 times while analyzing tokio taking up about 10% of the Clippy runtime in that crate (which is unacceptable!).

I see, but I suppose you don't have to go through that machinery to get the snippet and perform the check?

-                if let Some(span) = fragments.span(cx, range.clone())
-                    && !span.from_expansion()
-                    && let Some(snippet) = snippet_opt(cx, span)
-                    && !snippet.trim().starts_with('\\')
-                    && event == HardBreak {
-                    collected_breaks.push(span);
-                }
+                if event == HardBreak
+                    && !doc[range.clone()].trim().starts_with('\\')
+                    && let Some(span) = fragments.span(cx, range.clone())
+                    && !span.from_expansion() {
+                    collected_breaks.push(span);
+                }

(Edit: Also do the HardBreak check first to avoid string slicing, since softs are much more common)

CI now failing from missing derive(serde)

@blyxyas
Copy link
Contributor Author

blyxyas commented Apr 29, 2025

Honestly, I'm not sure how to proceed 😅 This PR is then a little bit redundant in our specific case. If the project would still appreciate it, I can do the necessary fixes for CI (which is proving tricky, I'll manually check .github/workflows).

If you think that it does not add any value, I have no problems closing it and using the patch you provided (with the appropriate authoring information for the commit)

@ollpu
Copy link
Collaborator

ollpu commented May 2, 2025

I'm not sure either. Clearly there is a use for distinguishing the kinds of hard break, but it can be recovered fairly easily with offsets. The downside to this change is it makes it more cumbersome to match on Event, or to generate synthetic ones, like in the README example.

From a princpled perspective, if we add this, then in some sense we're also welcoming other AST enrichments like:

  • ATX vs setext heading (# heading vs heading\n===)
  • Bullet list marker (-, + or *)
  • Ordered list delimiter (1. or 1))
  • Emphasis marker (_ or *)

Also related to the discussion in #892

Opinions? @Martin1887 @notriddle

Either way, you're welcome to incorporate my patch in Clippy :)

@Martin1887
Copy link
Collaborator

I totally agree with your conclusions, we have to think about maintaining the current code or exposing more details, with a more cumbersome interface and breaking changes.

Thanks!

@blyxyas
Copy link
Contributor Author

blyxyas commented May 5, 2025

In that case, I'll close this PR. In case that in a distant future this feature is more needed in the project feel free to implement it without attribution hassle (it's a 20 line change)

Thanks for the feedback and the reviews, and have a good week!

@blyxyas blyxyas closed this May 5, 2025
github-merge-queue bot pushed a commit to rust-lang/rust-clippy that referenced this pull request May 21, 2025
)

Turns out that `doc_markdown` uses a non-cheap rustdoc function to
convert from markdown ranges into source spans. And it was using it a
lot (about once every 17 lines of documentation on `tokio`, which ends
up being about 2000 times).

This ended up being about 18% of the total Clippy runtime as discovered
by lintcheck --perf in docs-heavy crates. This PR optimizes one of the
cases in which Clippy calls the function, and a future PR once
pulldown-cmark/pulldown-cmark#1034 is merged will be opened. This PR
lands the use of the function into the single-digit zone.

Note that not all crates were affected by this crate equally, those with
more docs are affected far more than those light ones.

changelog:[`clippy::doc_markdown`] has been optimized by 50%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants