Conversation
…n under heavy cancellation
|
Performance benchmarks:
|
7a6405a to
bb11e3b
Compare
|
Hi @souro26, thanks for the PR. I ran the issue repro benchmark locally ( One question on trigger accounting: this issue often cancels via direct Would you be open to tracking compaction pressure based on what |
mesa/time/events.py
Outdated
|
|
||
| # Adaptive compaction check | ||
| if ( | ||
| self._canceled_count > 0 |
There was a problem hiding this comment.
is this first check even necessary?
There was a problem hiding this comment.
yes that check is redundant, i'll remove it and simplify the code.
mesa/time/events.py
Outdated
| def _compact(self) -> None: | ||
| """Remove canceled events from the heap when they dominate.""" | ||
| self._events = [e for e in self._events if not e.CANCELED] | ||
| heapify(self._events) | ||
| self._canceled_count = 0 |
There was a problem hiding this comment.
The stress test you ran is really extreme. Can you do an analysis of how the cost of heapify/compact and popping canceled events affect each other?
There was a problem hiding this comment.
i ran a sweep across cancellation ratios from 10%–90%, n=200k, and 3 runs each to understand the interaction between skipping canceled entries and compaction cost.
approximate results (no compact vs compact + pop) :
10%: 0.70 vs 0.69
30%: 0.66 vs 0.51
50%: 0.67 vs 0.39
70%: 0.67 vs 0.21
90%: 0.68 vs 0.07
without compaction, pop time is fairly flat, between 0.66 and 0.70 seconds. heapify cost decreases as cancellation increases and the heap shrinks. compaction becomes clearly beneficial at around 20–30% cancellation. the 50% threshold is a bit on the higher side since it avoids rebuilds in moderate workloads while preventing degradation when cancellations dominate. i can adjust the trigger to be at around 30% since that's where a more significant difference can be seen.
There was a problem hiding this comment.
but what is the cost of the compacting itself?
There was a problem hiding this comment.
the time complexity of compaction is linear. it consists of,
- filtering the heap list which is O(N)
- heapify on the remaining active elements which is O(A)
so total rebuild cost is O(N + A), which is linear in the heap size. in contrast, skipping canceled entries costs O(log N) per discarded element, so with C canceled entries the skip cost is roughly C * log N.
|
I have been thinking about this PR for the last few days. In my experience with event scheduling, cancelations are infrequent. So, I am not convinced the added complexity of this PR is worth it. |
|
In the normal case with few cancellations, this doesn’t change behavior or performance in any measurable way. It’s just a counter and one conditional in pop_event(), and compaction never triggers. The goal was to prevent severe degradation when cancellations accumulate. Since lazy deletion keeps canceled entries in the heap, pop performance can degrade significantly once they dominate. The adaptive check just prevents that situation without changing semantics. I chose the 50% threshold to avoid frequent rebuilds but i can adjust it to 30% where the benefit becomes noticeable. I’m open to simplifying the logic further. I can remove the persistent counter and instead trigger compaction based only on how many canceled entries are skipped during popping, which would reduce internal state further. |
|
I tried a few different approaches and ended up going with a simple If a model cancels a large number of events, |
…n under heavy cancellation (#3359)
Summary
This PR improves
EventListperformance under heavy event cancellation by introducing adaptive heap compaction. When a large proportion of scheduled events are canceled, the internal heap can become dominated by canceled entries, significantly degradingpop_event()performance. The adaptive compaction mechanism rebuilds the heap when canceled events exceed a threshold.Fixes #3358
Motive
EventListcurrently uses lazy deletion for cancellations. Under heavy cancellation like 90%, the heap can grow large while most entries are canceled. This leads to repeated skipping duringpop_event(), causing significant performance degradation.Benchmark (N=200,000, 90% canceled):
Implementation
_canceled_countcounter to track canceled events.remove()when an event is canceled.pop_event()when canceled events are skipped._canceled_countCompaction is triggered adaptively inside
pop_event()whenThis keeps the normal case of low cancellation unchanged while preventing severe degradation when cancellations dominate. I considered manual compaction , but chose adaptive compaction to preserve lazy deletion semantics and avoid requiring explicit API calls from users.
Additional Notes