LogScheduler: fix IO job starvation by alltilla · Pull Request #906 · axoflow/axosyslog

alltilla · 2026-01-14T16:09:35Z

We can have more partitions (IO jobs) (from one or multiple parallelize() calls) than number of cores/IO workers available.

The LogScheduler correctly spreads the messages between partitions, but we can only run a limited amount of IO jobs at the same time.

The work() method is implemented in a way that during its run, new batches can be added to its workload. Also, it does not return if there is even one batch available for processing. If there is a big load of messages, IO jobs run indefinitely, or at least until new batches are no longer added to its queue.

If we have some IO jobs scheduled, but not ran, new batches will still be added to the non-running jobs, but we only process batches from the running jobs.

This happens until log-iw-size() number of messages are accumulated in the non-running jobs, which will cause the running jobs to not receive any more logs, thus eventually return, causing the previously non-running jobs to run finally. So the only thing that saves us from complete starvation is log-iw-size() and backpressure, which is not the intended behavior.

We need to put a hard limit to the number of logs processed by one job run, so ivykis can work its magic and cycle between the scheduled IO jobs. We now count the number of logs processed in one work() run, and if it reaches a configurable limit, we finish the batch and return from work(). The new log-fetch-limit() option of parallelize() can change this limit from its default 1000 value.

I have also added some level 4 metrics that can help debug parallelize() errors.

syslogng_parallelized_assigned_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="0"} 614700
syslogng_parallelized_assigned_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="1"} 614661
syslogng_parallelized_assigned_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="2"} 614600
syslogng_parallelized_assigned_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="3"} 614600
syslogng_parallelized_processed_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="0"} 614513
syslogng_parallelized_processed_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="1"} 614500
syslogng_parallelized_processed_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="2"} 614500
syslogng_parallelized_processed_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="3"} 614500

Signed-off-by: Attila Szakacs <[email protected]>

syslogng_parallelized_assigned_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="0"} 614700 syslogng_parallelized_assigned_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="1"} 614661 syslogng_parallelized_assigned_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="2"} 614600 syslogng_parallelized_assigned_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="3"} 614600 syslogng_parallelized_processed_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="0"} 614513 syslogng_parallelized_processed_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="1"} 614500 syslogng_parallelized_processed_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="2"} 614500 syslogng_parallelized_processed_events_total{id="/home/alltilla/repos/axosyslog/build/install/etc/syslog-ng.conf:6:3",partition_index="3"} 614500 Signed-off-by: Attila Szakacs <[email protected]>

Signed-off-by: Attila Szakacs <[email protected]>

MrAnno

Nice catch.

lib/logscheduler.c

Signed-off-by: Attila Szakacs <[email protected]>

We can have more partitions (IO jobs) (from one or multiple parallelize() calls) than number of cores/IO workers available. The LogScheduler correctly spreads the messages between partitions but we can only run a limited amount of IO jobs at the same time. The work() method is implemented in a way that during its run, new batches can be added to its workload. Also, it does not return if there is even one batch available for processing. If there is a big load of messages, IO jobs run indefinitely, or at least until new batches are no longer added to its queue. If we have some IO jobs scheduled, but not ran, new batches will still be added to the non-running jobs, but we only process batches from the running jobs. This happens until log-iw-size() number of logs are accumulated in the non-running jobs, which will cause the running jobs to not receive any more logs, thus eventually return, causing the previously non-running jobs to run finally. So the only thing that saves us from complete starvation is log-iw-size() and backpressure, which is not the intended behavior. We need to put a hard limit to the number of logs processed by one job run, so ivykis can work its magic and cycle between the scheduled IO jobs. We now count the number of logs processed in one work() run, and if it reaches a configurable limit, we finish the batch and return from work(). The new log-fetch-limit() option of parallelize() can change this limit from its default 1000 value. Signed-off-by: Attila Szakacs <[email protected]>

Signed-off-by: Attila Szakacs <[email protected]>

alltilla added a commit to alltilla/axosyslog that referenced this pull request Jan 14, 2026

news: add entry for axoflow#906

51624d9

Signed-off-by: Attila Szakacs <[email protected]>

alltilla force-pushed the logscheduler-fix-starvation branch from fe4959e to 51624d9 Compare January 14, 2026 16:11

alltilla changed the title ~~LogScheduler: fix starvation~~ LogScheduler: fix IO job starvation Jan 14, 2026

alltilla added a commit to alltilla/axosyslog that referenced this pull request Jan 14, 2026

news: add entry for axoflow#906

5051e33

Signed-off-by: Attila Szakacs <[email protected]>

alltilla force-pushed the logscheduler-fix-starvation branch from 51624d9 to 5051e33 Compare January 14, 2026 16:41

alltilla requested review from MrAnno and bazsi and removed request for MrAnno January 14, 2026 16:44

alltilla added a commit to alltilla/axosyslog that referenced this pull request Jan 15, 2026

news: add entry for axoflow#906

b5a6e2e

Signed-off-by: Attila Szakacs <[email protected]>

alltilla force-pushed the logscheduler-fix-starvation branch from 5051e33 to b5a6e2e Compare January 15, 2026 08:37

MrAnno previously approved these changes Jan 15, 2026

View reviewed changes

lib/logscheduler.c Show resolved Hide resolved

alltilla dismissed MrAnno’s stale review via 84d8eed January 15, 2026 10:00

alltilla added 3 commits January 15, 2026 11:01

logscheduler: minor local variable renaming

da09584

Signed-off-by: Attila Szakacs <[email protected]>

news: add entry for axoflow#906

04850eb

Signed-off-by: Attila Szakacs <[email protected]>

alltilla marked this pull request as draft January 15, 2026 10:04

alltilla force-pushed the logscheduler-fix-starvation branch from 84d8eed to 04850eb Compare January 15, 2026 10:28

alltilla marked this pull request as ready for review January 15, 2026 10:39

MrAnno approved these changes Jan 15, 2026

View reviewed changes

MrAnno merged commit ad0be26 into axoflow:main Jan 15, 2026
21 of 22 checks passed

bazsi approved these changes Jan 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LogScheduler: fix IO job starvation#906

LogScheduler: fix IO job starvation#906
MrAnno merged 4 commits intoaxoflow:mainfrom
alltilla:logscheduler-fix-starvation

alltilla commented Jan 14, 2026 •

edited

Loading

Uh oh!

MrAnno left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alltilla commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MrAnno left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alltilla commented Jan 14, 2026 •

edited

Loading