Clean up identify_variables / identify_mutable_parameters; deprecate SimpleExpressionVisitor#3436
Clean up identify_variables / identify_mutable_parameters; deprecate SimpleExpressionVisitor#3436mrmundt merged 23 commits intoPyomo:mainfrom
identify_variables / identify_mutable_parameters; deprecate SimpleExpressionVisitor#3436Conversation
identify_variables / identify_mutable_parameters; deprecate SImpleExpressionVisitoridentify_variables / identify_mutable_parameters; deprecate SimpleExpressionVisitor
Co-authored-by: Miranda Mundt <[email protected]>
Robbybp
left a comment
There was a problem hiding this comment.
Performance comparison on my motivating example for the previous identify_variables rewrite:
Main
Identifier ncalls cumtime percall %
----------------------------------------------------------------------------------
root 1 2.905 2.905 100.0
-----------------------------------------------------------------------------
full model post-solve 1 0.484 0.484 16.7
solve-scc 1 2.421 2.421 83.3
--------------------------------------------------------
igraph 1 0.359 0.359 14.8
scc-subsolver 546 1.610 0.003 66.5
vars-from-components 546 0.187 0.000 7.7
other n/a 0.264 n/a 10.9
========================================================
other n/a 0.000 n/a 0.0
=============================================================================
==================================================================================This branch
Identifier ncalls cumtime percall %
----------------------------------------------------------------------------------
root 1 2.960 2.960 100.0
-----------------------------------------------------------------------------
full model post-solve 1 0.482 0.482 16.3
solve-scc 1 2.478 2.478 83.7
--------------------------------------------------------
igraph 1 0.358 0.358 14.4
scc-subsolver 546 1.536 0.003 62.0
vars-from-components 546 0.326 0.001 13.1
other n/a 0.259 n/a 10.4
========================================================
other n/a 0.000 n/a 0.0
=============================================================================
==================================================================================Less than a factor of 2 overhead (see the vars-from-components category) and still much better than the 2s this was taking before exploiting named expressions. Thanks for fixing this.
| # The following attributes will be added by initializeWalker: | ||
| # self._objs: the list of found objects | ||
| # self._seen: set(self._objs) | ||
| # self._exprs: list of (e, e.expr) for any (nested) named expressions |
There was a problem hiding this comment.
Is there any reason we need to store e.expr in addition to e?
There was a problem hiding this comment.
I see that we need to store the immutable expression object in case e changes what expression it contains.
| if self._cache is None: | ||
| return True, None |
There was a problem hiding this comment.
To check my understanding: This means that if named_expression_cache was not provided in __init__, we won't exploit repeated named expressions within this expression?
There was a problem hiding this comment.
Correct. My assumption is that the same named expression rarely appears twice in a single expression. So, if you don't provide a cache, then there isn't a big win for defining a cache -- there probably won't be a cache hit (and there is the overhead of creating the cache and throwing it away).
There was a problem hiding this comment.
That's probably a good assumption in general, but I know that for some IDAES models (at least the autothermal reformer), there is a significant benefit to exploiting named expressions within a single constraint. (Of course the user can always do this by explicitly passing the named expression cache.)
| v = identify_variables.visitor | ||
| save = v._include_fixed, v._cache | ||
| try: | ||
| v._include_fixed = include_fixed | ||
| v._cache = named_expression_cache | ||
| yield from v.walk_expression(expr) | ||
| finally: | ||
| v._include_fixed, v._cache = save | ||
|
|
||
|
|
||
| identify_variables.visitor = IdentifyVariableVisitor() |
There was a problem hiding this comment.
Why use a global visitor instead of a new one every time identify_variables is called? Is there a significant overhead to __init__?
There was a problem hiding this comment.
There is a bit of overhead in object creation and disposal. In other cases, I have seen performance benefits by keeping the visitor around between calls. I did not profile the impact here, so maybe this is a red herring?
There was a problem hiding this comment.
I have seen this in AMPLRepnVisitor, so I'm not too surprised, but it hasn't shown up in any of my profiles.
|
@Robbybp: I am surprised that the overhead went up. can you share your test with me (off-line is fine)? I wonder if we are measuring something else (like the GC)? |
|
@jsiirola Run this script: https://github.com/Robbybp/surrogate-vs-implicit/blob/main/svi/auto_thermal_reformer/fullspace_flowsheet.py I insert the timing calls into |
|
We are waiting to merge this until we double check the performance degradation @Robbybp noted |
|
FWIW, the evidence I presented for a performance degradation is pretty flimsy (~0.15s diff), so I wouldn't be opposed to just merging this as-is, given the bug fixes. |
|
@Robbybp I finally got a chance to look at this, and there was an issue with the implementation. The original approach to validating the named expression cache involved storing a list of all expressions seen in the current expression or any sub-expression. Because of how interconnected named expressions are in IDAES models, this would occasionally get to be a very long list (>1k!). The implementation has been updated to store this as a dict, which prunes redundant instances of the same subexpression (so the list stays under 20 long). This PR's implementation is now somewhere between equivalent to the old implementation to possibly slightly faster (the noise in timing in Python is now more than the difference between the two implementations) for your case. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3436 +/- ##
==========================================
+ Coverage 88.61% 88.62% +0.01%
==========================================
Files 880 880
Lines 100639 100646 +7
==========================================
+ Hits 89181 89198 +17
+ Misses 11458 11448 -10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Robbybp
left a comment
There was a problem hiding this comment.
The recent commits look good to me, thanks for looking into this.
blnicho
left a comment
There was a problem hiding this comment.
I found a minor issue in the docs that needs to be fixed but otherwise this looks fine
Fixes # .
Summary/Motivation:
This cleans up the implementation of
identify_variablesto simplify the implementation, improve efficiency, and improve robustness of the named expression cache:In addition:
identify_mutables_parametersis moved to build onidentify_variables(by deriving from the IdentifyVariablesVisitor`)identify_componentsto use theStreamBasedExpressionVisitorThis allows us to (finally) deprecate the
SimpleExpressionVisitorand remove it from the documentationChanges proposed in this PR:
Legal Acknowledgement
By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution: