Reduce memory usage of `Docstring` struct by AlexWaygood · Pull Request #16183 · astral-sh/ruff

AlexWaygood · 2025-02-16T13:43:53Z

Summary

I noticed that our Docstring struct, that we use for querying information on docstrings in various linter rules, eagerly stores a lot of information in the struct that could probably be lazily computed in methods on the struct. This PR reworks the API to store less information on the struct itself, reducing memory usage.

In addition to this, it changes the struct to store an ast::StringLiteral node rather than an ast::ExprStringLiteral node. This:

Allows us to enforce in the type system the existing invariant that we never consider an implicitly concatenated string to be a valid Docstring definition
Allows us to use the ast::StringLiteral::flags attribute to query information directly from the underlying AST node, removing the need for quite a few .unwrap() calls (accompanied by // SAFETY comments) across our docstring-related rules

Test Plan

cargo test
There should be 0 ecosystem hits, since this is a pure refactor

github-actions · 2025-02-16T13:52:54Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser

Nice, it helps clean up the code a lot. I only have a few nit comments.

MichaReiser · 2025-02-16T14:03:06Z

crates/ruff_linter/src/checkers/ast/analyze/definitions.rs

-            ));
-
-            if string_literal.value.is_implicit_concatenated() {
+            let [sole_string_part] = string_literal.value.as_slice() else {


I found the old explicit implicit concatenated string helped with readability. I wonder if we should add a method for non-implicit concatenated strings to string_literal.

Coming up with a good name is hard: let Some(sole_string_part) = string_literal.as_non_implicit()

Yeah, good point about readability. I will defer this to another PR, though, as I think almost every current usage of the StringLiteralValue::as_slice() method would probably switch to using this new API -- and I agree that what we would call it isn't necessarily obvious. It seems worthy of considering as its own thing.

For now I'll just add a comment here.

crates/ruff_linter/src/docstrings/mod.rs

MichaReiser · 2025-02-16T14:05:41Z

crates/ruff_linter/src/docstrings/mod.rs

+        TextRange::new(
+            self.docstring.start() + self.docstring.flags().opener_len(),
+            self.docstring.end() - self.docstring.flags().closer_len(),
+        )


Nit: It would be nice if we could avoid copying this computation. Consider adding a content_range to StringLiteral similar to

ruff/crates/ruff_python_ast/src/expression.rs

Lines 221 to 228 in e178148

/// Returns the range of the string's content in the source (minus prefix and quotes).

pub fn content_range(self) -> TextRange {

let kind = self.flags();

TextRange::new(

self.start() + kind.opener_len(),

self.end() - kind.closer_len(),

)

}

Hehe. I was planning this for a followup, since there are a number of other places where this method would be useful ;) But I'll add it in this PR, and switch other places in the codebase to this new method as the followup!

crates/ruff_linter/src/rules/pydocstyle/rules/blank_before_after_class.rs

crates/ruff_linter/src/rules/pydocstyle/rules/indent.rs

@MichaReiser

## Summary This PR makes the following changes: - It adjusts various callsites to use the new `ast::StringLiteral::contents_range()` method that was introduced in #16183. This is less verbose and more type-safe than using the `ast::str::raw_contents()` helper function. - It adds a new `ast::ExprStringLiteral::as_unconcatenated_literal()` helper method, and adjusts various callsites to use it. This addresses @MichaReiser's review comment at #16183 (comment). There is no functional change here, but it helps readability to make it clearer that we're differentiating between implicitly concatenated strings and unconcatenated strings at various points. - It renames the `StringLiteralValue::flags()` method to `StringLiteralFlags::first_literal_flags()`. If you're dealing with an implicitly concatenated string `string_node`, `string_node.value.flags().closer_len()` could give an incorrect result; this renaming makes it clearer that the `StringLiteralFlags` instance returned by the method is only guaranteed to give accurate information for the first `StringLiteral` contained in the `ExprStringLiteral` node. - It deletes the unused `BytesLiteralValue::flags()` method. This seems prone to misuse in the same way as `StringLiteralValue::flags()`: if it's an implicitly concatenated bytestring, the `BytesLiteralFlags` instance returned by the method would only give accurate information for the first `BytesLiteral` in the bytestring. ## Test Plan `cargo test`

Reduce memory usage of Docstring struct

ba7f58c

AlexWaygood added internal An internal refactor or improvement docstring Related to docstring linting or formatting labels Feb 16, 2025

AlexWaygood marked this pull request as ready for review February 16, 2025 13:52

MichaReiser approved these changes Feb 16, 2025

View reviewed changes

AlexWaygood force-pushed the alex/smaller-docstring branch 5 times, most recently from 459dbd3 to ef91991 Compare February 16, 2025 15:15

address most review comments

fe17ea6

AlexWaygood force-pushed the alex/smaller-docstring branch from ef91991 to fe17ea6 Compare February 16, 2025 15:19

AlexWaygood enabled auto-merge (squash) February 16, 2025 15:20

AlexWaygood merged commit 61fef0a into astral-sh:main Feb 16, 2025
20 checks passed

AlexWaygood deleted the alex/smaller-docstring branch February 16, 2025 15:51

AlexWaygood mentioned this pull request Feb 16, 2025

Improve API exposed on ExprStringLiteral nodes #16192

Merged

AlexWaygood mentioned this pull request Feb 19, 2025

Add text_len() methods to more *Prefix enums in ruff_python_ast #16254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Reduce memory usage of `Docstring` struct#16183

Reduce memory usage of `Docstring` struct#16183
AlexWaygood merged 2 commits intoastral-sh:mainfrom
AlexWaygood:alex/smaller-docstring

AlexWaygood commented Feb 16, 2025

Uh oh!

github-actions bot commented Feb 16, 2025 •

edited

Loading

Uh oh!

MichaReiser left a comment

Uh oh!

MichaReiser Feb 16, 2025

Uh oh!

AlexWaygood Feb 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

MichaReiser Feb 16, 2025

Uh oh!

AlexWaygood Feb 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	/// Returns the range of the string's content in the source (minus prefix and quotes).
	pub fn content_range(self) -> TextRange {
	let kind = self.flags();
	TextRange::new(
	self.start() + kind.opener_len(),
	self.end() - kind.closer_len(),
	)
	}

Comments

Conversation

AlexWaygood commented Feb 16, 2025

Summary

Test Plan

Uh oh!

github-actions bot commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

MichaReiser Feb 16, 2025

Choose a reason for hiding this comment

Uh oh!

AlexWaygood Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichaReiser Feb 16, 2025

Choose a reason for hiding this comment

Uh oh!

AlexWaygood Feb 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Feb 16, 2025 •

edited

Loading

`ruff-ecosystem` results

AlexWaygood Feb 16, 2025 •

edited

Loading