Skip to content

Add compaction method to Eventlist to reduce performance degradation under heavy cancellation#3359

Merged
quaquel merged 14 commits intomesa:mainfrom
souro26:eventlist-adaptive-compaction
Mar 2, 2026
Merged

Add compaction method to Eventlist to reduce performance degradation under heavy cancellation#3359
quaquel merged 14 commits intomesa:mainfrom
souro26:eventlist-adaptive-compaction

Conversation

@souro26
Copy link
Copy Markdown
Contributor

@souro26 souro26 commented Feb 21, 2026

Summary

This PR improves EventList performance under heavy event cancellation by introducing adaptive heap compaction. When a large proportion of scheduled events are canceled, the internal heap can become dominated by canceled entries, significantly degrading pop_event() performance. The adaptive compaction mechanism rebuilds the heap when canceled events exceed a threshold.

Fixes #3358

Motive

EventList currently uses lazy deletion for cancellations. Under heavy cancellation like 90%, the heap can grow large while most entries are canceled. This leads to repeated skipping during pop_event(), causing significant performance degradation.

Benchmark (N=200,000, 90% canceled):

Without compaction: ~0.50–0.98s pop time
With compaction:    ~0.047–0.07s pop time

Implementation

  • Added an internal _canceled_count counter to track canceled events.
  • Incremented the counter in remove() when an event is canceled.
  • Decremented it during pop_event() when canceled events are skipped.
  • Introduced a private _compact() method that filters out canceled events, puts the remaining events in a heap and then resets _canceled_count

Compaction is triggered adaptively inside pop_event() when

_canceled_count > len(_events) // 2

This keeps the normal case of low cancellation unchanged while preventing severe degradation when cancellations dominate. I considered manual compaction , but chose adaptive compaction to preserve lazy deletion semantics and avoid requiring explicit API calls from users.

Additional Notes

  • Existing lazy deletion semantics are kept the same.
  • Added a behavioral test covering heavy cancellation to ensure correctness under stress.

@github-actions
Copy link
Copy Markdown

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 +1.7% [+1.1%, +2.3%] 🔵 +0.4% [+0.0%, +0.8%]
BoltzmannWealth large 🔵 +0.8% [+0.2%, +1.5%] 🔵 +0.6% [-2.1%, +3.4%]
Schelling small 🔵 -0.7% [-1.0%, -0.4%] 🔵 +0.0% [-0.2%, +0.2%]
Schelling large 🔵 +0.1% [-0.5%, +0.8%] 🔵 -2.1% [-3.9%, -0.3%]
WolfSheep small 🔵 -0.3% [-0.8%, +0.2%] 🔵 -0.2% [-0.6%, +0.1%]
WolfSheep large 🔵 -3.0% [-4.1%, -2.0%] 🔵 -4.4% [-6.2%, -2.7%]
BoidFlockers small 🔵 -1.8% [-2.1%, -1.5%] 🔵 -0.9% [-1.1%, -0.7%]
BoidFlockers large 🔵 -2.0% [-2.4%, -1.5%] 🔵 -2.8% [-3.1%, -2.4%]

@souro26 souro26 force-pushed the eventlist-adaptive-compaction branch from 7a6405a to bb11e3b Compare February 21, 2026 15:12
@SamipSGz
Copy link
Copy Markdown

Hi @souro26, thanks for the PR.

I ran the issue repro benchmark locally (N=200000, cancel_ratio=0.9, runs=3) and saw a clear improvement from ~0.619s avg (main) to ~0.330s avg (~46.6% faster, ~1.87x) with adaptive compaction.

One question on trigger accounting: this issue often cancels via direct event.cancel() (not only EventList.remove(...)). If _canceled_count is only incremented in remove(), compaction might not trigger reliably for that path.

Would you be open to tracking compaction pressure based on what pop_event() actually discards (or otherwise accounting for direct event.cancel() calls)?


# Adaptive compaction check
if (
self._canceled_count > 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this first check even necessary?

Copy link
Copy Markdown
Contributor Author

@souro26 souro26 Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that check is redundant, i'll remove it and simplify the code.

Comment on lines +382 to +386
def _compact(self) -> None:
"""Remove canceled events from the heap when they dominate."""
self._events = [e for e in self._events if not e.CANCELED]
heapify(self._events)
self._canceled_count = 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stress test you ran is really extreme. Can you do an analysis of how the cost of heapify/compact and popping canceled events affect each other?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i ran a sweep across cancellation ratios from 10%–90%, n=200k, and 3 runs each to understand the interaction between skipping canceled entries and compaction cost.

approximate results (no compact vs compact + pop) :
10%: 0.70 vs 0.69
30%: 0.66 vs 0.51
50%: 0.67 vs 0.39
70%: 0.67 vs 0.21
90%: 0.68 vs 0.07

without compaction, pop time is fairly flat, between 0.66 and 0.70 seconds. heapify cost decreases as cancellation increases and the heap shrinks. compaction becomes clearly beneficial at around 20–30% cancellation. the 50% threshold is a bit on the higher side since it avoids rebuilds in moderate workloads while preventing degradation when cancellations dominate. i can adjust the trigger to be at around 30% since that's where a more significant difference can be seen.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but what is the cost of the compacting itself?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the time complexity of compaction is linear. it consists of,

  • filtering the heap list which is O(N)
  • heapify on the remaining active elements which is O(A)

so total rebuild cost is O(N + A), which is linear in the heap size. in contrast, skipping canceled entries costs O(log N) per discarded element, so with C canceled entries the skip cost is roughly C * log N.

@souro26 souro26 requested a review from quaquel February 26, 2026 08:19
@quaquel
Copy link
Copy Markdown
Member

quaquel commented Feb 26, 2026

I have been thinking about this PR for the last few days. In my experience with event scheduling, cancelations are infrequent. So, I am not convinced the added complexity of this PR is worth it.

@souro26
Copy link
Copy Markdown
Contributor Author

souro26 commented Feb 26, 2026

In the normal case with few cancellations, this doesn’t change behavior or performance in any measurable way. It’s just a counter and one conditional in pop_event(), and compaction never triggers.

The goal was to prevent severe degradation when cancellations accumulate. Since lazy deletion keeps canceled entries in the heap, pop performance can degrade significantly once they dominate. The adaptive check just prevents that situation without changing semantics. I chose the 50% threshold to avoid frequent rebuilds but i can adjust it to 30% where the benefit becomes noticeable.

I’m open to simplifying the logic further. I can remove the persistent counter and instead trigger compaction based only on how many canceled entries are skipped during popping, which would reduce internal state further.

@souro26
Copy link
Copy Markdown
Contributor Author

souro26 commented Feb 28, 2026

I tried a few different approaches and ended up going with a simple compact() method instead. pop_event() works the same as before and nothing happens automatically.

If a model cancels a large number of events, compact() can be called to rebuild the heap and remove canceled entries. This keeps the core logic simple while still providing an option for heavy cancellation cases. I updated the tests and everything passes locally. Let me know if this approach works @quaquel.

@quaquel quaquel merged commit cfd8ccd into mesa:main Mar 2, 2026
11 of 14 checks passed
@quaquel quaquel added the enhancement Release notes label label Mar 2, 2026
@quaquel quaquel changed the title Add adaptive compaction to Eventlist to reduce performance degradation under heavy cancellation Add compaction method to Eventlist to reduce performance degradation under heavy cancellation Mar 2, 2026
@EwoutH EwoutH mentioned this pull request Mar 13, 2026
42 tasks
EwoutH pushed a commit that referenced this pull request Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Release notes label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EventList performance degradation under heavy cancellation due to lazy deletion

3 participants