Improve Set initialization performance#3302
Conversation
(which attempts to hash all known exception types)
emma58
left a comment
There was a problem hiding this comment.
I think this looks good. A few questions mostly out of curiosity!
| # Note that we reset _ordered_values within the loop because | ||
| # of an old example where the initializer rule makes | ||
| # reference to values previously inserted into the Set | ||
| # (which triggered the creation of the _ordered_values) | ||
| self._ordered_values = None |
There was a problem hiding this comment.
Wait, I'm still confused why you do this for each val though? What would be the difference with doing this above the loop?
There was a problem hiding this comment.
The offending example is:
def U_init(model, z):
if z == 6:
return Set.End
if z == 1:
return 1
else:
return model.U[z - 1] * z
model.U = Set(ordered=True, initialize=U_init)The issue is that when z == 2 the __getitem__ causes the _ordered_values list to be created with a single entry. If we don't clear _ordered_values, then it will stay a 1-element list, and when z == 3, looking for model.U[2] looks up _ordered_values[1]; running off the end of the list.
There was a problem hiding this comment.
Ewwww, this is horrifying! But okay, makes sense.
|
(Also I am very excited about not warning for duplicate entries! :)) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3302 +/- ##
=======================================
Coverage 88.46% 88.46%
=======================================
Files 867 867
Lines 98218 98250 +32
=======================================
+ Hits 86886 86919 +33
+ Misses 11332 11331 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
blnicho
left a comment
There was a problem hiding this comment.
I have a couple minor questions but otherwise this looks great!
pyomo/core/base/set.py
Outdated
| # It is important that the iterator is an actual iterator | ||
| val_iter = iter(val_iter) |
There was a problem hiding this comment.
Is this needed? Seems like val_iter is guaranteed to be an iterator from line 1399 unless you expect this method to be called from elsewhere.
There was a problem hiding this comment.
Great catch! Removed this, plus updated some other bugs / inconsistencies / documentation around flattening & initialization.
Fixes # .
Summary/Motivation:
When working on some very large models (100M+ variables), we identified some performance issues in how we were initializing
Setobjects. This PR resolves those performance bottlenecks:is reduced from 91.3s to 13.2s, and a real model with a more complex set initializer (for 200M elements) is reduced from 305s to 69s.
One side effect is to remove the warning message for adding duplicate items to a Set (which brings the Pyomo
Setbehavior closer in line with the Pythonsetbehavior).Changes proposed in this PR:
Setinitialization to make initializing large data sets more efficientLegal Acknowledgement
By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution: