Skip to content

Conversation

@pianpwk
Copy link
Contributor

@pianpwk pianpwk commented Nov 8, 2024

Summary: The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling str(expr), and deserialize by calling sympy.sympify(expr_str)). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797

This starts to deprecate the use of expr_str and stores expressions in AST format instead. For BC purposes, expr_str deserialization is still supported, but we will always serialize to expr_ast. We'll kill this once the serialization upgrader design is finalized and implemented.

Test Plan: test_export

Differential Revision: D65638757

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140084

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures

As of commit 5087a21 with merge base 0443398 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65638757

# This is for storing the symbolic expressions behind symints/symfloats/symbools
# For example, we can get something like
# SymExpr(expr_str="s0 + s1", hint=SymExprHint(as_int=4)
# if we also have the hint that s0 and s1 are both 2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the comment?


if val.expr_str in self.symbol_name_to_symbol:
sym = self.symbol_name_to_symbol[val.expr_str]
if val.expr_str != "":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to check like, if hasattr(val, "expr_ast")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah the field isn't optional, and it's BC breaking if we change that. I could switch to None instead.

)
return expr

def __deprecated_do_not_use_deserialize_symint_expr_str(self, s: str, hint: Optional[int] = None) -> sympy.Expr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think theres a way to implement this like an upgrader? If we see something using expr_str can we somehow change it to become an expr_ast and then use the expr_ast implementation?

We eventually want to implement an upgrader/downgrader so if we just do that here then it could save us the work later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeahh this is a temporary fix while we figure out the upgrader design

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 8, 2024
facebook-github-bot pushed a commit that referenced this pull request Nov 8, 2024
Summary:

for roundtrippability

Test Plan: test_export

Reviewed By: angelayi

Differential Revision: D65638757
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65638757

return SymExprNode(target=None, base=base, args=[])
else:
return SymExprNode(
target=type(expr).__module__ + "." + type(expr).__qualname__,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
assert s.target is not None
*_module, _qualname = s.target.split(".")
node_cls = getattr(sys.modules[".".join(_module)], _qualname)
Copy link
Contributor Author

@pianpwk pianpwk Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhxchen17 @avikchaudhuri any thoughts on this? storing the node target as the python class name (similar to nn_module_stack).

An alternative is to keep a mapping of known functions <-> names, something like

{
    "mul": sympy.Mul,
    "add": sympy.Add,
    ...
}

But that requires constant updating if we run into new functions. It potentially is safer though, and easier to catch if some day sympy changes their function names/locations in library (we'd break running this file & upgrading sympy pin, instead of when we try to deserialize the particular program).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I doubt the locations will change much otherwise any user of sympy will also break? Do the class names here resolve to public paths or private paths (since sympy has many aliases and deep directory structure)?

facebook-github-bot pushed a commit that referenced this pull request Nov 18, 2024
Summary:

The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797

This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented.

Test Plan: test_export

Reviewed By: angelayi

Differential Revision: D65638757
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65638757

facebook-github-bot pushed a commit that referenced this pull request Nov 19, 2024
Summary:

The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797

This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented.

Test Plan: test_export

Reviewed By: angelayi

Differential Revision: D65638757
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65638757

facebook-github-bot pushed a commit that referenced this pull request Nov 19, 2024
Summary:

The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797

This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented.

Test Plan: test_export

Reviewed By: angelayi

Differential Revision: D65638757
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65638757

Summary:

The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797

This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented.

Test Plan: test_export

Reviewed By: angelayi

Differential Revision: D65638757
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65638757

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: Lint / lintrunner-noclang / linux-job

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@izaitsevfb
Copy link
Contributor

@pytorchbot revert -m "reverted internally in D66253238" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@pianpwk your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Nov 21, 2024
facebook-github-bot pushed a commit that referenced this pull request Nov 22, 2024
Summary:

Latest attempt after [136802](#136802) and [140084](#140084) got shelved. 

This keeps the string format for `expr_str`, but calls `sympy.printing.repr.srepr(s)` instead of `str(s)`, which prints expressions more explicitly, e.g.
```
((2*x)//(3*y + 4)) -> "FloorDiv(Mul(Integer(2), Symbol('x')), Add(Mul(Integer(3), Symbol('y')), Integer(4)))"
```

This is nice because:
- we have better roundtrippability for deserialization, robust to pretty printing changes like [this](https://github.com/pytorch/pytorch/blob/6c9bfd52b6a76ddff053bcff4d23ea7f4c280e9a/torch/utils/_sympy/functions.py#L208) that caused the issue in the first place.
- this preserves the BC surface for both 1) sigmoid thrift serialization, by keeping the string format, and 2) deserialization for old IRs, since `sympy.sympify(...)` still handles the old `str(s)` format.
- more memory efficient than storing ASTs; the [AST attempt](#140084) increased artifact size by 20% on some toy programs.
- doesn't even require a schema version bump.

Additionally to push some test cases over the line, this redoes expression processing (handling ranges, symbol caching) by doing bottom-up processing instead of the current hacky-ish workflow.

Test Plan: test_serdes, test_serialize, internal tests broken by AST PR

Reviewed By: zhxchen17

Differential Revision: D66283208
pytorchmergebot pushed a commit that referenced this pull request Nov 22, 2024
Summary:
Latest attempt after [136802](#136802) and [140084](#140084) got shelved.

This keeps the string format for `expr_str`, but calls `sympy.printing.repr.srepr(s)` instead of `str(s)`, which prints expressions more explicitly, e.g.
```
((2*x)//(3*y + 4)) -> "FloorDiv(Mul(Integer(2), Symbol('x')), Add(Mul(Integer(3), Symbol('y')), Integer(4)))"
```

This is nice because:
- we have better roundtrippability for deserialization, robust to pretty printing changes like [this](https://github.com/pytorch/pytorch/blob/6c9bfd52b6a76ddff053bcff4d23ea7f4c280e9a/torch/utils/_sympy/functions.py#L208) that caused the issue in the first place.
- this preserves the BC surface for both 1) sigmoid thrift serialization, by keeping the string format, and 2) deserialization for old IRs, since `sympy.sympify(...)` still handles the old `str(s)` format.
- more memory efficient than storing ASTs; the [AST attempt](#140084) increased artifact size by 20% on some toy programs.
- doesn't even require a schema version bump.

Additionally to push some test cases over the line, this redoes expression processing (handling ranges, symbol caching) by doing bottom-up processing instead of the current hacky-ish workflow.

Test Plan: test_serdes, test_serialize, internal tests broken by AST PR

Differential Revision: D66283208

Pull Request resolved: #141284
Approved by: https://github.com/zhxchen17
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…0084)

Summary: The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: pytorch#136797

This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented.

Test Plan: test_export

Differential Revision: D65638757

Pull Request resolved: pytorch#140084
Approved by: https://github.com/angelayi
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…141284)

Summary:
Latest attempt after [136802](pytorch#136802) and [140084](pytorch#140084) got shelved.

This keeps the string format for `expr_str`, but calls `sympy.printing.repr.srepr(s)` instead of `str(s)`, which prints expressions more explicitly, e.g.
```
((2*x)//(3*y + 4)) -> "FloorDiv(Mul(Integer(2), Symbol('x')), Add(Mul(Integer(3), Symbol('y')), Integer(4)))"
```

This is nice because:
- we have better roundtrippability for deserialization, robust to pretty printing changes like [this](https://github.com/pytorch/pytorch/blob/6c9bfd52b6a76ddff053bcff4d23ea7f4c280e9a/torch/utils/_sympy/functions.py#L208) that caused the issue in the first place.
- this preserves the BC surface for both 1) sigmoid thrift serialization, by keeping the string format, and 2) deserialization for old IRs, since `sympy.sympify(...)` still handles the old `str(s)` format.
- more memory efficient than storing ASTs; the [AST attempt](pytorch#140084) increased artifact size by 20% on some toy programs.
- doesn't even require a schema version bump.

Additionally to push some test cases over the line, this redoes expression processing (handling ranges, symbol caching) by doing bottom-up processing instead of the current hacky-ish workflow.

Test Plan: test_serdes, test_serialize, internal tests broken by AST PR

Differential Revision: D66283208

Pull Request resolved: pytorch#141284
Approved by: https://github.com/zhxchen17
@ezyang
Copy link
Contributor

ezyang commented Dec 11, 2024

What's the status on this?

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Feb 9, 2025
@github-actions github-actions bot closed this Mar 11, 2025
@github-actions github-actions bot deleted the export-D65638757 branch April 11, 2025 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants