Skip to content

Comments

Auto-generate AST boilerplate#15544

Merged
dcreager merged 32 commits intomainfrom
dcreager/generate-ast
Jan 17, 2025
Merged

Auto-generate AST boilerplate#15544
dcreager merged 32 commits intomainfrom
dcreager/generate-ast

Conversation

@dcreager
Copy link
Member

This PR replaces most of the hard-coded AST definitions with a generation script, similar to what happens in rust_python_formatter. I've replaced every "rote" definition that I could find, where the content is entirely boilerplate and only depends on what syntax nodes there are and which groups they belong to.

This is a pretty massive diff, but it's entirely a refactoring. It should make absolutely no changes to the API or implementation. In particular, this required adding some configuration knobs that let us override default auto-generated names where they don't line up with types that we created previously by hand.

Test plan

There should be no changes outside of the rust_python_ast crate, which verifies that there were no API changes as a result of the auto-generation. Aggressive cargo clippy and uvx pre-commit runs after each commit in the branch.

@dcreager dcreager added the internal An internal refactor or improvement label Jan 17, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jan 17, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to read examples/Assistants_API_overview_python.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: expected `,` or `]` at line 197 column 8

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to read examples/Assistants_API_overview_python.ipynb: Expected a Jupyter Notebook, which must be internally stored as JSON, but this file isn't valid JSON: expected `,` or `]` at line 197 column 8

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks for doing this!

Unrelated to this PR, Biome does something similar (https://github.com/biomejs/biome/tree/main/xtask/codegen).

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is great.

I think my long term preference would be to use something like ungrammar that would also allow us to e.g. generate the source order visitor but I think this is a great start.

Comment on lines 24 to 28
root = check_output(["git", "rev-parse", "--show-toplevel"], text=True).strip()
ast_path = Path(root).joinpath("crates", "ruff_python_ast", "ast.toml")
out_path = Path(root).joinpath("crates", "ruff_python_ast", "src", "generated.rs")
with ast_path.open("rb") as ast_file:
ast = tomllib.load(ast_file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the code a bit hard to read because it mixes top level statement with class and declarations. I'd prefer to instead have a main function that's called in if __name__ == "__main__".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 74 to 77
out = ["""
// This is a generated file. Don't modify it by hand!
// Run `crates/ruff_python_ast/generate.py` to re-generate the file.
"""] # fmt: skip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use one of:

out = [
    textwrap.dedent("""
        // This is a generated file. Don't modify it by hand!
        // Run `crates/ruff_python_ast/generate.py` to re-generate the file.
    """)
]


out = [
    textwrap.dedent(
        """
        // This is a generated file. Don't modify it by hand!
        // Run `crates/ruff_python_ast/generate.py` to re-generate the file.
        """
    )
]

Or use the auto formatted code. IMO, it's roughly equally esthetic as the manually formatted array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moot as part of reorganizing into functions.

Comment on lines 79 to 83
# ------------------------------------------------------------------------------
# Owned enum
#
# For each group, we create an enum that contains an owned copy of a syntax
# node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think each of those blocks could be functions instead where the inline comment becomes a docstring.

It would also be nice if the docstring could contain. code snippet of what it generates:

Generates the owned enum for each group, e.g Stmt, Expr etc. For each, generate:

  • The enum
  • Implement From<Variant>
  • Implement Ranged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/// See also [stmt](https://docs.python.org/3/library/ast.html#ast.stmt)
#[derive(Clone, Debug, PartialEq, is_macro::Is)]
pub enum Stmt {
#[is(name = "function_def_stmt")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but I think it would be nice if we generated the is implementations as well. The is macro doesn't always work well with ide's and macros are also more expensive to compile.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some Python nitpicks ;)

Comment on lines 84 to 85
--extend-select,
I,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to put this config in the tool.ruff section of our pyproject.toml file at the directory root?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it would! Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth considering moving the ruff config in the scripts dir into the root dir pyproject.toml. Isort and some other common rules are already enabled there, so it may be useful to enable them for the whole project if scripts are being added outwith the scripts dir.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I agree with @calumy's comment at #15544 (comment), but that could be done as a followup!

* main:
  [red-knot] Inline `SubclassOfType::as_instance_type_of_metaclass()` (#15556)
  [`flake8-comprehensions`] strip parentheses around generators in `unnecessary-generator-set` (`C401`) (#15553)
  [`pylint`] Implement `redefined-slots-in-subclass` (`W0244`) (#9640)
  [`flake8-bugbear`] Do not raise error if keyword argument is present and target-python version is less or equals than 3.9 (`B903`) (#15549)
  [red-knot] `type[T]` is disjoint from `type[S]` if the metaclass of `T` is disjoint from the metaclass of `S` (#15547)
  [red-knot] Pure instance variables declared in class body (#15515)
  Update snapshots of #15507 with new annotated snipetts rendering (#15546)
  [`pylint`] Do not report methods with only one `EM101`-compatible `raise` (`PLR6301`) (#15507)
  Fix unstable f-string formatting for expressions containing a trailing comma (#15545)
  Support `knot.toml` files in project discovery (#15505)
  Add support for configuring knot in `pyproject.toml` files (#15493)
  Fix bracket spacing for single-element tuples in f-string expressions (#15537)
  [`flake8-simplify`] Do not emit diagnostics for expressions inside string type annotations (`SIM222`, `SIM223`) (#15405)
  [`flake8-pytest-style`] Do not emit diagnostics for empty `for` loops (`PT012`, `PT031`) (#15542)
@dcreager dcreager merged commit 8e3633f into main Jan 17, 2025
39 checks passed
@dcreager dcreager deleted the dcreager/generate-ast branch January 17, 2025 19:23
@calumy calumy mentioned this pull request Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants