-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[export] serialize sympy.Exprs as ASTs instead of strings #140084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140084
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New FailuresAs of commit 5087a21 with merge base 0443398 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D65638757 |
torch/_export/serde/schema.py
Outdated
| # This is for storing the symbolic expressions behind symints/symfloats/symbools | ||
| # For example, we can get something like | ||
| # SymExpr(expr_str="s0 + s1", hint=SymExprHint(as_int=4) | ||
| # if we also have the hint that s0 and s1 are both 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the comment?
|
|
||
| if val.expr_str in self.symbol_name_to_symbol: | ||
| sym = self.symbol_name_to_symbol[val.expr_str] | ||
| if val.expr_str != "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to check like, if hasattr(val, "expr_ast")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah the field isn't optional, and it's BC breaking if we change that. I could switch to None instead.
| ) | ||
| return expr | ||
|
|
||
| def __deprecated_do_not_use_deserialize_symint_expr_str(self, s: str, hint: Optional[int] = None) -> sympy.Expr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think theres a way to implement this like an upgrader? If we see something using expr_str can we somehow change it to become an expr_ast and then use the expr_ast implementation?
We eventually want to implement an upgrader/downgrader so if we just do that here then it could save us the work later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeahh this is a temporary fix while we figure out the upgrader design
Summary: for roundtrippability Test Plan: test_export Reviewed By: angelayi Differential Revision: D65638757
00ef9a2 to
6ea6823
Compare
|
This pull request was exported from Phabricator. Differential Revision: D65638757 |
| return SymExprNode(target=None, base=base, args=[]) | ||
| else: | ||
| return SymExprNode( | ||
| target=type(expr).__module__ + "." + type(expr).__qualname__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch/_export/serde/serialize.py
Outdated
| else: | ||
| assert s.target is not None | ||
| *_module, _qualname = s.target.split(".") | ||
| node_cls = getattr(sys.modules[".".join(_module)], _qualname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhxchen17 @avikchaudhuri any thoughts on this? storing the node target as the python class name (similar to nn_module_stack).
An alternative is to keep a mapping of known functions <-> names, something like
{
"mul": sympy.Mul,
"add": sympy.Add,
...
}
But that requires constant updating if we run into new functions. It potentially is safer though, and easier to catch if some day sympy changes their function names/locations in library (we'd break running this file & upgrading sympy pin, instead of when we try to deserialize the particular program).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I doubt the locations will change much otherwise any user of sympy will also break? Do the class names here resolve to public paths or private paths (since sympy has many aliases and deep directory structure)?
Summary: The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797 This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented. Test Plan: test_export Reviewed By: angelayi Differential Revision: D65638757
6ea6823 to
9242544
Compare
|
This pull request was exported from Phabricator. Differential Revision: D65638757 |
Summary: The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797 This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented. Test Plan: test_export Reviewed By: angelayi Differential Revision: D65638757
9242544 to
d9bf2dc
Compare
|
This pull request was exported from Phabricator. Differential Revision: D65638757 |
Summary: The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797 This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented. Test Plan: test_export Reviewed By: angelayi Differential Revision: D65638757
d9bf2dc to
18dab5c
Compare
|
This pull request was exported from Phabricator. Differential Revision: D65638757 |
Summary: The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797 This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented. Test Plan: test_export Reviewed By: angelayi Differential Revision: D65638757
18dab5c to
5087a21
Compare
|
This pull request was exported from Phabricator. Differential Revision: D65638757 |
Merge startedYour change will be merged while ignoring the following 1 checks: Lint / lintrunner-noclang / linux-job Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "reverted internally in D66253238" -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
…40084)" This reverts commit d869344. Reverted #140084 on behalf of https://github.com/izaitsevfb due to reverted internally in D66253238 ([comment](#140084 (comment)))
|
@pianpwk your PR has been successfully reverted. |
Summary: Latest attempt after [136802](#136802) and [140084](#140084) got shelved. This keeps the string format for `expr_str`, but calls `sympy.printing.repr.srepr(s)` instead of `str(s)`, which prints expressions more explicitly, e.g. ``` ((2*x)//(3*y + 4)) -> "FloorDiv(Mul(Integer(2), Symbol('x')), Add(Mul(Integer(3), Symbol('y')), Integer(4)))" ``` This is nice because: - we have better roundtrippability for deserialization, robust to pretty printing changes like [this](https://github.com/pytorch/pytorch/blob/6c9bfd52b6a76ddff053bcff4d23ea7f4c280e9a/torch/utils/_sympy/functions.py#L208) that caused the issue in the first place. - this preserves the BC surface for both 1) sigmoid thrift serialization, by keeping the string format, and 2) deserialization for old IRs, since `sympy.sympify(...)` still handles the old `str(s)` format. - more memory efficient than storing ASTs; the [AST attempt](#140084) increased artifact size by 20% on some toy programs. - doesn't even require a schema version bump. Additionally to push some test cases over the line, this redoes expression processing (handling ranges, symbol caching) by doing bottom-up processing instead of the current hacky-ish workflow. Test Plan: test_serdes, test_serialize, internal tests broken by AST PR Reviewed By: zhxchen17 Differential Revision: D66283208
Summary: Latest attempt after [136802](#136802) and [140084](#140084) got shelved. This keeps the string format for `expr_str`, but calls `sympy.printing.repr.srepr(s)` instead of `str(s)`, which prints expressions more explicitly, e.g. ``` ((2*x)//(3*y + 4)) -> "FloorDiv(Mul(Integer(2), Symbol('x')), Add(Mul(Integer(3), Symbol('y')), Integer(4)))" ``` This is nice because: - we have better roundtrippability for deserialization, robust to pretty printing changes like [this](https://github.com/pytorch/pytorch/blob/6c9bfd52b6a76ddff053bcff4d23ea7f4c280e9a/torch/utils/_sympy/functions.py#L208) that caused the issue in the first place. - this preserves the BC surface for both 1) sigmoid thrift serialization, by keeping the string format, and 2) deserialization for old IRs, since `sympy.sympify(...)` still handles the old `str(s)` format. - more memory efficient than storing ASTs; the [AST attempt](#140084) increased artifact size by 20% on some toy programs. - doesn't even require a schema version bump. Additionally to push some test cases over the line, this redoes expression processing (handling ranges, symbol caching) by doing bottom-up processing instead of the current hacky-ish workflow. Test Plan: test_serdes, test_serialize, internal tests broken by AST PR Differential Revision: D66283208 Pull Request resolved: #141284 Approved by: https://github.com/zhxchen17
…0084) Summary: The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: pytorch#136797 This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented. Test Plan: test_export Differential Revision: D65638757 Pull Request resolved: pytorch#140084 Approved by: https://github.com/angelayi
…torch#140084)" This reverts commit d869344. Reverted pytorch#140084 on behalf of https://github.com/izaitsevfb due to reverted internally in D66253238 ([comment](pytorch#140084 (comment)))
…141284) Summary: Latest attempt after [136802](pytorch#136802) and [140084](pytorch#140084) got shelved. This keeps the string format for `expr_str`, but calls `sympy.printing.repr.srepr(s)` instead of `str(s)`, which prints expressions more explicitly, e.g. ``` ((2*x)//(3*y + 4)) -> "FloorDiv(Mul(Integer(2), Symbol('x')), Add(Mul(Integer(3), Symbol('y')), Integer(4)))" ``` This is nice because: - we have better roundtrippability for deserialization, robust to pretty printing changes like [this](https://github.com/pytorch/pytorch/blob/6c9bfd52b6a76ddff053bcff4d23ea7f4c280e9a/torch/utils/_sympy/functions.py#L208) that caused the issue in the first place. - this preserves the BC surface for both 1) sigmoid thrift serialization, by keeping the string format, and 2) deserialization for old IRs, since `sympy.sympify(...)` still handles the old `str(s)` format. - more memory efficient than storing ASTs; the [AST attempt](pytorch#140084) increased artifact size by 20% on some toy programs. - doesn't even require a schema version bump. Additionally to push some test cases over the line, this redoes expression processing (handling ranges, symbol caching) by doing bottom-up processing instead of the current hacky-ish workflow. Test Plan: test_serdes, test_serialize, internal tests broken by AST PR Differential Revision: D66283208 Pull Request resolved: pytorch#141284 Approved by: https://github.com/zhxchen17
|
What's the status on this? |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Summary: The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling
str(expr), and deserialize by callingsympy.sympify(expr_str)). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: #136797This starts to deprecate the use of
expr_strand stores expressions in AST format instead. For BC purposes,expr_strdeserialization is still supported, but we will always serialize toexpr_ast. We'll kill this once the serialization upgrader design is finalized and implemented.Test Plan: test_export
Differential Revision: D65638757