Skip to content

refactor[parser]: remove ASTTokens#4364

Merged
charles-cooper merged 48 commits intovyperlang:masterfrom
charles-cooper:refactor/kill-asttokens
Jan 12, 2025
Merged

refactor[parser]: remove ASTTokens#4364
charles-cooper merged 48 commits intovyperlang:masterfrom
charles-cooper:refactor/kill-asttokens

Conversation

@charles-cooper
Copy link
Copy Markdown
Member

@charles-cooper charles-cooper commented Nov 18, 2024

What I did

remove asttokens from the parse machinery, since the method is buggy (see below bugs) and slow (#4272). brings down parse time (time spent in ast generation) between 40-70%.

fixes multiple tokenizer-related bugs, including

How I did it

another way to adjust tokens

How to verify it

Commit message

this commit removes `asttokens` from the parse machinery, since the
method is buggy (see below bugs) and slow. this commit brings down
parse time (time spent in ast generation) between 40-70%.

the `mark_tokens()` machinery is replaced with a modified version of
`python.ast`'s `fix_missing_locations()` function, which recurses
through the AST and adds missing line info based on the parent node.

it also changes to a more consistent method for updating source
offsets that are modified by the `pre_parse` step, which fixes several
outstanding bugs with source location reporting.

there were some exceptions to the line info fixup working, the issues
and corresponding workarounds are described as follows:

- some python AST nodes returned by `ast.parse()` are singletons, which
  we work around by deepcopying the AST before operating on it.

- notably, there is an interaction between our AST annotation and
  `coverage.py` in the case of `USub`. in this commit we paper over the
  issue by simply always overriding line info for `USub` nodes. in the
  future, we should refactor `VyperNode` generation by bypassing the
  python AST annotation step entirely, which is a more proper fix to the
  problems encountered in this PR.

the `asttokens` package is not removed entirely since it still has a
limited usage inside of the natspec parser. we could remove it in a
future PR; for now it is out-of-scope.

referenced bugs:
- https://github.com/vyperlang/vyper/issues/2258
- https://github.com/vyperlang/vyper/issues/3059
- https://github.com/vyperlang/vyper/issues/3430
- https://github.com/vyperlang/vyper/issues/4139

Description for the changelog

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@charles-cooper charles-cooper added this to the v0.4.1 milestone Jan 2, 2025
@charles-cooper charles-cooper added the release - must release blocker label Jan 2, 2025
@cyberthirst
Copy link
Copy Markdown
Collaborator

i find the name modification_offset confusing, it blends with adjustments and imo doesn't capture the essence. can't we name smth like original_types?

@cyberthirst
Copy link
Copy Markdown
Collaborator

can we generalize this code? it's pretty much the same thing 3 times

if string in VYPER_CLASS_TYPES and start[1] == 0:
new_keyword = "class"
toks = [TokenInfo(NAME, new_keyword, start, end, line)]
adjustment = len(string) - len(new_keyword)
# adjustments for following tokens
lineno, col = start
_col_adjustments[lineno] += adjustment
modification_offsets[newstart] = VYPER_CLASS_TYPES[string]
elif string in CUSTOM_STATEMENT_TYPES:
new_keyword = "yield"
adjustment = len(string) - len(new_keyword)
# adjustments for following tokens
lineno, col = start
_col_adjustments[lineno] += adjustment
toks = [TokenInfo(NAME, new_keyword, start, end, line)]
modification_offsets[newstart] = CUSTOM_STATEMENT_TYPES[string]
elif string in CUSTOM_EXPRESSION_TYPES:
# a bit cursed technique to get untokenize to put
# the new tokens in the right place so that modification_offsets
# will work correctly.
# (recommend comparing the result of parse with the
# source code side by side to visualize the whitespace)
new_keyword = "await"
vyper_type = CUSTOM_EXPRESSION_TYPES[string]
adjustment = len(string) - len(new_keyword)
# adjustments for following tokens
lineno, col = start
_col_adjustments[lineno] += adjustment
# fixup for when `extcall/staticcall` follows `log`
modification_offsets[newstart] = vyper_type

# A mapping of class names to their original class types.
modification_offsets: dict[tuple[int, int], str]

# Magic adjustments
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make a better comment hehe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release - must release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants