Skip to content

parallelize(): fix occasional crashes on high load#904

Merged
alltilla merged 2 commits intoaxoflow:mainfrom
MrAnno:fix-refcache-ack-and-ref-and-rock-and-roll
Jan 15, 2026
Merged

parallelize(): fix occasional crashes on high load#904
alltilla merged 2 commits intoaxoflow:mainfrom
MrAnno:fix-refcache-ack-and-ref-and-rock-and-roll

Conversation

@MrAnno
Copy link
Contributor

@MrAnno MrAnno commented Jan 14, 2026

Calling log_msg_update_ack_and_ref_and_abort_and_suspended() and doing free/ack operations based on the returned value is atomic as long as the values passed to this function are not zero.

The new acks_changed/refs_changed checks make sure that there is a real "non-zero to zero" transition before doing ack/free operations.

The double-ack crash can be reproduced with parallelize(), using a big number of workers: workers(nproc) batch_size(100).
In this specific case, a given message is going through 3 ref-cached threads: 1 producer and 2 consumers (LogScheduler, LogWriter).

Calling log_msg_update_ack_and_ref_and_abort_and_suspended() and
doing free/ack operations based on the returned value is atomic as long as
the values passed to this function are not zero.

The new acks_changed/refs_changed checks make sure that there is a real
"non-zero to zero" transition before doing ack/free operations.

The double-ack crash can be reproduced with parallelize(), using
a big number of workers: workers(nproc) batch_size(100).
In this specific case, a given message is going through 3 ref-cached
threads: 1 producer and 2 consumers (LogScheduler, LogWriter).

Signed-off-by: László Várady <[email protected]>
Signed-off-by: László Várady <[email protected]>
@MrAnno MrAnno force-pushed the fix-refcache-ack-and-ref-and-rock-and-roll branch from f5171eb to fb90b56 Compare January 14, 2026 13:38
Copy link
Member

@bazsi bazsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, and good solution. As a next step in calmer times, we should drop the whole of refcache.

@alltilla
Copy link
Member

Nice one!

@alltilla alltilla merged commit ff879f1 into axoflow:main Jan 15, 2026
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants