Resolve intermittent capture_output deadlock #3601
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3601 +/- ##
=======================================
Coverage 88.76% 88.76%
=======================================
Files 891 891
Lines 102297 102320 +23
=======================================
+ Hits 90802 90826 +24
+ Misses 11495 11494 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| sys.stdout, sys.stderr = old | ||
| logging.getLogger('pyomo.common.tee').handlers.clear() | ||
|
|
||
| def test_atomic_deadlock(self): |
There was a problem hiding this comment.
This is a nifty test, I like it.
| ) | ||
|
|
||
| with TeeStream(*ostreams) as t, capture_output(t.STDOUT, capture_fd=True): | ||
| with capture_output(TeeStream(*ostreams), capture_fd=True): |
There was a problem hiding this comment.
What is the difference between doing this and just doing with TeeStream(*ostreams) as OUT? Do we have a preference? (I ask because we don't seem to have decided on a "standard" and we probably should)
There was a problem hiding this comment.
capture_output wraps the streams it is passed into a TeeStream in order to get handles it can give to stdout / stderr. So the former implementation actually had 2 nested TeeStream objects. If you just pass the TeeStream object to capture_output, it won't create a new TeeStream, so the new implementation only has a single TeeStream.
There was a problem hiding this comment.
Sorry, let me rephrase. In ipopt currently, we do this:
with TeeStream(*ostreams) as t:
timer.start('subprocess')
process = subprocess.run(
cmd,
timeout=timeout,
env=env,
universal_newlines=True,
stdout=t.STDOUT,
stderr=t.STDERR,
check=False,
)
timer.stop('subprocess')
# This is the stuff we need to parse to get the iterations
# and time
(iters, ipopt_time_nofunc, ipopt_time_func, ipopt_total_time) = (
self._parse_ipopt_output(ostreams[0])
)
In GUROBI, we do:
with capture_output(TeeStream(*ostreams), capture_fd=False):
gurobi_model = gurobipy.Model()
timer.start('transfer_model')
x = gurobi_model.addMVar(
len(repn.columns),
lb=lb,
ub=ub,
obj=repn.c.todense()[0] if repn.c.shape[0] else 0,
vtype=vtype,
)
Which one is preferable?
There was a problem hiding this comment.
They are both correct for their respective use cases:
subprocess.runneeds to be passed 2 TextIO objects, so it is appropriate (and necessary) to pass the raw TextIO objects fromTeeStream.capture_outputneeds a single stream to send all output to (we have no added support for separate our/err targets (so internally it creates a TeeStream so that it can get raw file descriptors if it needs them (forcapture_fd=True). Here Passing a TeeStream directly avoids the need to create a second one that wraps the stream from the first.
|
Per @blnicho - we could merge this today if no one has time to do another review. Ergo, risking it for the biscuit and merging. |
Fixes # .
Summary/Motivation:
We are continuing to see intermittent deadlocks or attempts to write to closed file descriptors in the HiGHS tests on main. I think I have (finally) tracked down the root cause to the use of
capture_input()in the Gurobi interface__del__method. It looks like the GC will (randomly) interrupt a HiGHS test to run the collector (which, if it is collecting a Gurobi interface will callcapture_output. If this happens during another test's use ofcapture_output.__enter__or.__exit__, then the Gurobi interface's use of capture could happen when the underlying output streams are in a transitory state (and could be closed).This PR resolves that by:
capture_outputin__del__.capture_outputwill not interrupts another fall to__enter__ /exit`Changes proposed in this PR:
Legal Acknowledgement
By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution: