Skip to content

Improve Set initialization performance#3302

Merged
blnicho merged 15 commits intoPyomo:mainfrom
jsiirola:set-init-performance
Jul 31, 2024
Merged

Improve Set initialization performance#3302
blnicho merged 15 commits intoPyomo:mainfrom
jsiirola:set-init-performance

Conversation

@jsiirola
Copy link
Copy Markdown
Member

@jsiirola jsiirola commented Jun 25, 2024

Fixes # .

Summary/Motivation:

When working on some very large models (100M+ variables), we identified some performance issues in how we were initializing Set objects. This PR resolves those performance bottlenecks:

import time
from pyomo.environ import *
start = time.time()
m = ConcreteModel()
m.I = Set(initialize=range(100000000))
print(time.time() - start)

is reduced from 91.3s to 13.2s, and a real model with a more complex set initializer (for 200M elements) is reduced from 305s to 69s.

One side effect is to remove the warning message for adding duplicate items to a Set (which brings the Pyomo Set behavior closer in line with the Python set behavior).

Changes proposed in this PR:

  • Rework Set initialization to make initializing large data sets more efficient
  • Remove the warning for adding duplicate elements to a Set

Legal Acknowledgement

By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the BSD license.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

Copy link
Copy Markdown
Contributor

@emma58 emma58 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. A few questions mostly out of curiosity!

Comment on lines +1732 to +1736
# Note that we reset _ordered_values within the loop because
# of an old example where the initializer rule makes
# reference to values previously inserted into the Set
# (which triggered the creation of the _ordered_values)
self._ordered_values = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm still confused why you do this for each val though? What would be the difference with doing this above the loop?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offending example is:

def U_init(model, z):
    if z == 6:
        return Set.End
    if z == 1:
        return 1
    else:
        return model.U[z - 1] * z

model.U = Set(ordered=True, initialize=U_init)

The issue is that when z == 2 the __getitem__ causes the _ordered_values list to be created with a single entry. If we don't clear _ordered_values, then it will stay a 1-element list, and when z == 3, looking for model.U[2] looks up _ordered_values[1]; running off the end of the list.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ewwww, this is horrifying! But okay, makes sense.

@emma58
Copy link
Copy Markdown
Contributor

emma58 commented Jul 9, 2024

(Also I am very excited about not warning for duplicate entries! :))

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 10, 2024

Codecov Report

Attention: Patch coverage is 98.13665% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.46%. Comparing base (17f8aed) to head (ba9706d).
Report is 98 commits behind head on main.

Files Patch % Lines
pyomo/core/base/set.py 98.13% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3302   +/-   ##
=======================================
  Coverage   88.46%   88.46%           
=======================================
  Files         867      867           
  Lines       98218    98250   +32     
=======================================
+ Hits        86886    86919   +33     
+ Misses      11332    11331    -1     
Flag Coverage Δ
linux 86.28% <98.13%> (+<0.01%) ⬆️
osx 75.57% <98.13%> (+<0.01%) ⬆️
other 86.47% <98.13%> (+<0.01%) ⬆️
win 83.77% <98.13%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Member

@blnicho blnicho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple minor questions but otherwise this looks great!

Comment on lines +1475 to +1476
# It is important that the iterator is an actual iterator
val_iter = iter(val_iter)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Seems like val_iter is guaranteed to be an iterator from line 1399 unless you expect this method to be called from elsewhere.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Removed this, plus updated some other bugs / inconsistencies / documentation around flattening & initialization.

@blnicho blnicho merged commit 5c85b7c into Pyomo:main Jul 31, 2024
@jsiirola jsiirola deleted the set-init-performance branch August 2, 2024 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

4 participants