Skip to content

Comments

Reduce memory usage of Docstring struct#16183

Merged
AlexWaygood merged 2 commits intoastral-sh:mainfrom
AlexWaygood:alex/smaller-docstring
Feb 16, 2025
Merged

Reduce memory usage of Docstring struct#16183
AlexWaygood merged 2 commits intoastral-sh:mainfrom
AlexWaygood:alex/smaller-docstring

Conversation

@AlexWaygood
Copy link
Member

Summary

I noticed that our Docstring struct, that we use for querying information on docstrings in various linter rules, eagerly stores a lot of information in the struct that could probably be lazily computed in methods on the struct. This PR reworks the API to store less information on the struct itself, reducing memory usage.

In addition to this, it changes the struct to store an ast::StringLiteral node rather than an ast::ExprStringLiteral node. This:

  • Allows us to enforce in the type system the existing invariant that we never consider an implicitly concatenated string to be a valid Docstring definition
  • Allows us to use the ast::StringLiteral::flags attribute to query information directly from the underlying AST node, removing the need for quite a few .unwrap() calls (accompanied by // SAFETY comments) across our docstring-related rules

Test Plan

  • cargo test
  • There should be 0 ecosystem hits, since this is a pure refactor

@AlexWaygood AlexWaygood added internal An internal refactor or improvement docstring Related to docstring linting or formatting labels Feb 16, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Feb 16, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@AlexWaygood AlexWaygood marked this pull request as ready for review February 16, 2025 13:52
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, it helps clean up the code a lot. I only have a few nit comments.

));

if string_literal.value.is_implicit_concatenated() {
let [sole_string_part] = string_literal.value.as_slice() else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the old explicit implicit concatenated string helped with readability. I wonder if we should add a method for non-implicit concatenated strings to string_literal.

Coming up with a good name is hard: let Some(sole_string_part) = string_literal.as_non_implicit()

Copy link
Member Author

@AlexWaygood AlexWaygood Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point about readability. I will defer this to another PR, though, as I think almost every current usage of the StringLiteralValue::as_slice() method would probably switch to using this new API -- and I agree that what we would call it isn't necessarily obvious. It seems worthy of considering as its own thing.

For now I'll just add a comment here.

Comment on lines 96 to 99
TextRange::new(
self.docstring.start() + self.docstring.flags().opener_len(),
self.docstring.end() - self.docstring.flags().closer_len(),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It would be nice if we could avoid copying this computation. Consider adding a content_range to StringLiteral similar to

/// Returns the range of the string's content in the source (minus prefix and quotes).
pub fn content_range(self) -> TextRange {
let kind = self.flags();
TextRange::new(
self.start() + kind.opener_len(),
self.end() - kind.closer_len(),
)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe. I was planning this for a followup, since there are a number of other places where this method would be useful ;) But I'll add it in this PR, and switch other places in the codebase to this new method as the followup!

@AlexWaygood AlexWaygood force-pushed the alex/smaller-docstring branch 5 times, most recently from 459dbd3 to ef91991 Compare February 16, 2025 15:15
@AlexWaygood AlexWaygood force-pushed the alex/smaller-docstring branch from ef91991 to fe17ea6 Compare February 16, 2025 15:19
@AlexWaygood AlexWaygood enabled auto-merge (squash) February 16, 2025 15:20
@AlexWaygood AlexWaygood merged commit 61fef0a into astral-sh:main Feb 16, 2025
20 checks passed
@AlexWaygood AlexWaygood deleted the alex/smaller-docstring branch February 16, 2025 15:51
AlexWaygood added a commit that referenced this pull request Feb 17, 2025
## Summary

This PR makes the following changes:
- It adjusts various callsites to use the new
`ast::StringLiteral::contents_range()` method that was introduced in
#16183. This is less verbose and
more type-safe than using the `ast::str::raw_contents()` helper
function.
- It adds a new `ast::ExprStringLiteral::as_unconcatenated_literal()`
helper method, and adjusts various callsites to use it. This addresses
@MichaReiser's review comment at
#16183 (comment).
There is no functional change here, but it helps readability to make it
clearer that we're differentiating between implicitly concatenated
strings and unconcatenated strings at various points.
- It renames the `StringLiteralValue::flags()` method to
`StringLiteralFlags::first_literal_flags()`. If you're dealing with an
implicitly concatenated string `string_node`,
`string_node.value.flags().closer_len()` could give an incorrect result;
this renaming makes it clearer that the `StringLiteralFlags` instance
returned by the method is only guaranteed to give accurate information
for the first `StringLiteral` contained in the `ExprStringLiteral` node.
- It deletes the unused `BytesLiteralValue::flags()` method. This seems
prone to misuse in the same way as `StringLiteralValue::flags()`: if
it's an implicitly concatenated bytestring, the `BytesLiteralFlags`
instance returned by the method would only give accurate information for
the first `BytesLiteral` in the bytestring.

## Test Plan

`cargo test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docstring Related to docstring linting or formatting internal An internal refactor or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants