Skip to content

Scalable disk queue#857

Merged
alltilla merged 19 commits intoaxoflow:mainfrom
sodomelle:scalable-disk-queue
Dec 2, 2025
Merged

Scalable disk queue#857
alltilla merged 19 commits intoaxoflow:mainfrom
sodomelle:scalable-disk-queue

Conversation

@sodomelle
Copy link
Contributor

@bazsi did a bunch of changes. The most important one in our case is that he changed from g_lists to ivykis list for in-memory queues in LogQueueNonReliableDisk.

@bshifter and I added a new output queue, that takes batches of messages instead of a single message, to avoid frequent locking and unlocking.

@MrAnno
Copy link
Contributor

MrAnno commented Nov 26, 2025

I'd like to review this too.

Copy link
Member

@alltilla alltilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am done with one round of review, I will do 1-2 more rounds.

@alltilla
Copy link
Member

As I've heard the code builds on front_cache_size > 0, but nothing ensures this, the user can set it to 0. Shouldn't we have an override for == 0?

@alltilla
Copy link
Member

Done with another round of review, I won't go another one until we process my comments so far.

@sodomelle sodomelle force-pushed the scalable-disk-queue branch 2 times, most recently from 6369a34 to 27b5e31 Compare November 27, 2025 14:38
@sodomelle sodomelle requested a review from alltilla November 27, 2025 15:10
@sodomelle
Copy link
Contributor Author

Current implementation involves the possibility of requiring more memory due to front_cache and front_cache_output both having the full front_cache_size length. If we decide it is too much, or not intended, we can assign half the size to front_cache, and the other half to front_cache_output.

Copy link
Member

@alltilla alltilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked IRL, I had some coding nitpicks, and came to the conclusion that it would be best if I "fixed" them and opened a PR for you. I don't insist on merging the code organization changes, but kindly consider them, I think they make the code easier to read and maintain a bit.

sodomelle#1

@alltilla
Copy link
Member

Thank you for the fixes!

bazsi and others added 8 commits December 1, 2025 08:35
This shifts the diskq implementation to be more similar to LogQueueFifo as a
preparation for taking over similar scalability fixes from LogQueueFifo.

Signed-off-by: Balazs Scheidler <[email protected]>
Signed-off-by: shifter <[email protected]>
Signed-off-by: Tamás Kosztyu <[email protected]>
Signed-off-by: shifter <[email protected]>
Signed-off-by: Tamás Kosztyu <[email protected]>
To avoid frequently locking, when popping messages from the queues,
we introduce a new queue, that takes messages in batches.
This new queue requires no locking afterwards.
The new queue has the same size as the front_cache, and requires
front_cache to be configured to work.

Signed-off-by: shifter <[email protected]>
Signed-off-by: Tamás Kosztyu <[email protected]>
In order to find the first element, we should check the front_cache_output queue first.
If it doesn't contain any messages we peek the lock protected queues.

Signed-off-by: shifter <[email protected]>
Signed-off-by: Tamás Kosztyu <[email protected]>
sodomelle and others added 9 commits December 1, 2025 08:35
In case of rewind, messages from the backlog should go to front_cache_output
instead of front_cache.
Backlog and front_cache_output queues don't require locking.

Signed-off-by: shifter <[email protected]>
Signed-off-by: Tamás Kosztyu <[email protected]>
Signed-off-by: shifter <[email protected]>
Signed-off-by: Tamás Kosztyu <[email protected]>
Signed-off-by: shifter <[email protected]>
Signed-off-by: Tamás Kosztyu <[email protected]>
A bit easier for me to read the important functions
like pop_head.

Signed-off-by: Attila Szakacs <[email protected]>
* Eliminates one lock() call
* Flattens the if statements
* Improves intention behind goto statements

Signed-off-by: Attila Szakacs <[email protected]>
@sodomelle sodomelle force-pushed the scalable-disk-queue branch from c4aaf39 to e5feb09 Compare December 1, 2025 07:46
@sodomelle sodomelle requested review from alltilla and bazsi December 1, 2025 07:49
@sodomelle sodomelle force-pushed the scalable-disk-queue branch from e5feb09 to 0269492 Compare December 1, 2025 08:10
Copy link
Contributor

@MrAnno MrAnno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't do a thorough review within 1 day. Even with two more review rounds, I would still suggest writing new unit tests and Light tests as well to validate the different scenarios (especially when queues gets full and empty + their persistence when shutting down).

I think we can merge the PR today if we agree to wait a week with the release and we write more tests.


// Diskbuffer is the part of disk-queue that is used only when front cache is full
Test(diskq_truncate, test_diskq_truncate_with_diskbuffer_used, .disabled=true)
Test(diskq_truncate, test_diskq_truncate_with_diskbuffer_used)
Copy link
Contributor

@MrAnno MrAnno Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We identified this test as incorrect, because it didn't even truncate the buffer (it was incorrect originally, long before this PR). We should rewrite this test so that it does what it was meant for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we fix it then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways, simply disabling it in an "update tests" commit without any note is not so good.

@alltilla alltilla merged commit 0503e63 into axoflow:main Dec 2, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants