Skip to content

[ty] Increase worker-thread stack size#17869

Merged
sharkdp merged 1 commit intomainfrom
david/increase-default-stack-size
May 5, 2025
Merged

[ty] Increase worker-thread stack size#17869
sharkdp merged 1 commit intomainfrom
david/increase-default-stack-size

Conversation

@sharkdp
Copy link
Contributor

@sharkdp sharkdp commented May 5, 2025

Summary

closes #17472

This is obviously just a band-aid solution to this problem (in that you can always make your pathological inputs bigger and it will still crash), but I think this is not an unreasonable change — even if we add more sophisticated solutions later. I tried using stacker as suggested by @MichaReiser, and it works. But it's unclear where exactly would be the right place to put it, and even for the sympy problem, we would need to add it both in the semantic index builder AST traversal and in type inference. Increasing the default stack size for worker threads, as proposed here, doesn't solve the underlying problem (that there is a hard limit), but it is more universal in the sense that it is not specific to large binary-operator expression chains.

To determine a reasonable stack size, I created files that look like

right associative:

from typing import reveal_type
total = (1 + (1 + (1 + (1 + (… + 1)))))
reveal_type(total)

left associative

from typing import reveal_type
total = 1 + 1 + 1 + 1 ++ 1
reveal_type(total)

with a variable amount of operands (N). I then chose the stack size large enough to still be able to handle cases that existing type checkers can not:

right

  N = 20: mypy takes ~ 1min
  N = 350: pyright crashes with a stack overflow (mypy fails with "too many nested parentheses")
  N = 800: ty(main) infers Literal[800] instantly
  N = 1000: ty(main) crashes with "thread '<unknown>' has overflowed its stack"

  N = 7000: ty(this branch) infers Literal[7000] instantly
  N = 8000+: ty(this branch) crashes


left

  N = 300: pyright emits "Maximum parse depth exceeded; break expression into smaller sub-expressions"
           total is inferred as Unknown
  N = 5500: mypy crashes with "INTERNAL ERROR"
  N = 2500: ty(main) infers Literal[2500] instantly
  N = 3000: ty(main) crashes with "thread '<unknown>' has overflowed its stack"

  N = 22000: ty(this branch) infers Literal[22000] instantly
  N = 23000+: ty(this branch) crashes

Test Plan

New regression test.

@sharkdp sharkdp added the ty Multi-file analysis & type inference label May 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2025

mypy_primer results

No ecosystem changes detected ✅

@AlexWaygood AlexWaygood removed their request for review May 5, 2025 19:20
// binary operators (x + x + … + x) both during the AST walk in semantic index building as
// well as during type checking. Using this stack size, we can handle handle expressions
// that are several times larger than the corresponding limits in existing type checkers.
.stack_size(16 * 1024 * 1024)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth noting that this does not generally increase memory usage by 16 MiB × num_threads. It's just the max. available stack space that a single thread can use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And 16MiB/thread is not even detectable, relative to the memory we need for source texts and AST, for any project large enough to care about memory usage.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you

// binary operators (x + x + … + x) both during the AST walk in semantic index building as
// well as during type checking. Using this stack size, we can handle handle expressions
// that are several times larger than the corresponding limits in existing type checkers.
.stack_size(16 * 1024 * 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And 16MiB/thread is not even detectable, relative to the memory we need for source texts and AST, for any project large enough to care about memory usage.

total = 1{plus_one_repeated}
reveal_type(total)
",
plus_one_repeated = " + 1".repeat(2000 - 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose a much smaller limit than what we can handle (but still bigger than what we can handle on main) for two reasons:

  • Debug builds take up more stack space and crash at smaller input sizes
  • We don't want minor increases of stack frame sizes to lead to a test failure here.

@sharkdp sharkdp merged commit de78da5 into main May 5, 2025
35 checks passed
@sharkdp sharkdp deleted the david/increase-default-stack-size branch May 5, 2025 19:31
sharkdp added a commit that referenced this pull request May 5, 2025
## Summary

Following #17869, we can now run `ty` on `sympy`.
@MichaReiser
Copy link
Member

I think the long term solution here would be to implement binary expression traversal in a loop rather than with recursion. But this is fine for now

Glyphack pushed a commit to Glyphack/ruff that referenced this pull request May 6, 2025
## Summary

closes astral-sh#17472 

This is obviously just a band-aid solution to this problem (in that you
can always make your [pathological
inputs](https://github.com/sympy/sympy/blob/28994edd82a18c22a98a52245c173f3c836dcad3/sympy/polys/numberfields/resolvent_lookup.py)
bigger and it will still crash), but I think this is not an unreasonable
change — even if we add more sophisticated solutions later. I tried
using `stacker` as suggested by @MichaReiser, and it works. But it's
unclear where exactly would be the right place to put it, and even for
the `sympy` problem, we would need to add it both in the semantic index
builder AST traversal and in type inference. Increasing the default
stack size for worker threads, as proposed here, doesn't solve the
underlying problem (that there is a hard limit), but it is more
universal in the sense that it is not specific to large binary-operator
expression chains.

To determine a reasonable stack size, I created files that look like

*right associative*:
```py
from typing import reveal_type
total = (1 + (1 + (1 + (1 + (… + 1)))))
reveal_type(total)
```

*left associative*
```py
from typing import reveal_type
total = 1 + 1 + 1 + 1 + … + 1
reveal_type(total)
```

with a variable amount of operands (`N`). I then chose the stack size
large enough to still be able to handle cases that existing type
checkers can not:

```
right

  N = 20: mypy takes ~ 1min
  N = 350: pyright crashes with a stack overflow (mypy fails with "too many nested parentheses")
  N = 800: ty(main) infers Literal[800] instantly
  N = 1000: ty(main) crashes with "thread '<unknown>' has overflowed its stack"

  N = 7000: ty(this branch) infers Literal[7000] instantly
  N = 8000+: ty(this branch) crashes


left

  N = 300: pyright emits "Maximum parse depth exceeded; break expression into smaller sub-expressions"
           total is inferred as Unknown
  N = 5500: mypy crashes with "INTERNAL ERROR"
  N = 2500: ty(main) infers Literal[2500] instantly
  N = 3000: ty(main) crashes with "thread '<unknown>' has overflowed its stack"

  N = 22000: ty(this branch) infers Literal[22000] instantly
  N = 23000+: ty(this branch) crashes
```

## Test Plan

New regression test.
Glyphack pushed a commit to Glyphack/ruff that referenced this pull request May 6, 2025
## Summary

Following astral-sh#17869, we can now run `ty` on `sympy`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ty] stack overflow checking 'sympy'

3 participants