logthrdest: support autoscaling partitions by MrAnno · Pull Request #855 · axoflow/axosyslog

MrAnno · 2025-11-24T14:31:43Z

When worker-partition-key() is used to categorize messages into different batches,
the messages are - by default - hashed into workers, which prevents them from being distributed across workers
efficiently, based on load.

The new worker-partition-autoscaling(yes) option uses a 1-minute statistic to help distribute
high-traffic partitions among multiple workers, allowing each worker to maximize its batch size.

When using this autoscaling option, it is recommended to oversize the number of workers: set it higher than the
expected number of partitions.

Upper limit on the partitions table and falling back to hashing when misconfigured?

github-actions · 2025-11-24T14:36:42Z

This Pull Request introduces config grammar changes

axoflow/65c732858e01325a29dec2576cba7c5a408421ee -> MrAnno/partition-stats

Details

--- a/destination
+++ b/destination

 axosyslog-otlp(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 bigquery(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 clickhouse(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 google-pubsub-grpc(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 http(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 kafka-c(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 loki(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 mongodb(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 opentelemetry(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 redis(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

 syslog-ng-otlp(
+    worker-partition-autoscaling(<yesno>)
+    worker-partition-autoscaling-wfo(<positive-integer>)
 )

Signed-off-by: László Várady <[email protected]>

MrAnno · 2025-11-26T23:10:26Z

http(
      url("http://localhost:8080")
      method("POST")
      batch-lines(10000)
      batch-timeout(2000)
      workers(10)
      worker-partition-key("$PROGRAM")
      flush-on-worker-key-change(yes)
      worker-partition-autoscaling(yes)
);

syslogng_output_batch_size_events_bucket{le="2"} 32
syslogng_output_batch_size_events_bucket{le="8"} 3
syslogng_output_batch_size_events_bucket{le="16"} 3
syslogng_output_batch_size_events_bucket{le="32"} 1
syslogng_output_batch_size_events_bucket{le="256"} 3
syslogng_output_batch_size_events_bucket{le="512"} 2
syslogng_output_batch_size_events_bucket{le="2048"} 1
syslogng_output_batch_size_events_bucket{le="4096"} 35
syslogng_output_batch_size_events_bucket{le="8192"} 494

syslogng_memory_queue_processed_events_total{worker="7"} 52375
syslogng_memory_queue_processed_events_total{worker="8"} 52384
syslogng_memory_queue_processed_events_total{worker="9"} 52385
syslogng_memory_queue_processed_events_total{worker="0"} 52390
syslogng_memory_queue_processed_events_total{worker="1"} 52378
syslogng_memory_queue_processed_events_total{worker="2"} 52507
syslogng_memory_queue_processed_events_total{worker="3"} 1197297
syslogng_memory_queue_processed_events_total{worker="4"} 1254968
syslogng_memory_queue_processed_events_total{worker="5"} 1207284
syslogng_memory_queue_processed_events_total{worker="6"} 1159332

Signed-off-by: László Várady <[email protected]>

MrAnno · 2025-11-27T09:26:08Z

My valgrind is broken, I'll ask someone to help me check memory stuff together.

lib/cfg-parser.c

Signed-off-by: László Várady <[email protected]>

alltilla

Only optional comments, we can merge this as is if you want.

alltilla · 2025-12-02T14:28:30Z

lib/logthrdest/logthrdestdrv.c

+
+/* partition_stats_lock must be held when calling this method */
+static inline gboolean
+_remove_if_partition_expired(Partition **p, GHashTableIter *iter, const struct timespec *now)


[nitpick]

I think p can be a non-output variable?

alltilla · 2025-12-03T08:33:14Z

lib/logthrdest/logthrdestdrv.c

+      part->worker_idx = current_worker_idx;
+      gdouble partition_ratio = _get_partition_rate_ratio(part->rate, total_rate);
+      gdouble partition_workers = partition_ratio * free_workers;
+      part->num_of_workers = (gint) (0 + floor(partition_workers));


[optional]

This floor() call can have nearly a whole worker as a rounding error. I know that we accumulate it and give them to one partition, but I think we have other options too, that spread the error better between the remaining partitions. I have opened a PR with my suggested algorithm, kindly consider it.

MrAnno#3

MrAnno marked this pull request as draft November 24, 2025 14:32

MrAnno force-pushed the partition-stats branch from dabfef8 to de9f5f9 Compare November 25, 2025 09:12

alltilla self-requested a review November 25, 2025 12:15

MrAnno force-pushed the partition-stats branch 2 times, most recently from c2b6677 to bb9258b Compare November 26, 2025 11:49

MrAnno marked this pull request as ready for review November 26, 2025 23:04

MrAnno force-pushed the partition-stats branch from bb9258b to be4f23c Compare November 26, 2025 23:05

MrAnno added a commit to MrAnno/axosyslog that referenced this pull request Nov 26, 2025

news: add feature axoflow#855

be4f23c

Signed-off-by: László Várady <[email protected]>

MrAnno added a commit to MrAnno/axosyslog that referenced this pull request Nov 26, 2025

news: add feature axoflow#855

19a06d6

Signed-off-by: László Várady <[email protected]>

MrAnno force-pushed the partition-stats branch from be4f23c to 19a06d6 Compare November 26, 2025 23:23

MrAnno added a commit to MrAnno/axosyslog that referenced this pull request Nov 26, 2025

news: add feature axoflow#855

a923eed

Signed-off-by: László Várady <[email protected]>

MrAnno force-pushed the partition-stats branch from 19a06d6 to a923eed Compare November 26, 2025 23:39

MrAnno requested a review from sodomelle November 26, 2025 23:40

mitzkia reviewed Nov 28, 2025

View reviewed changes

lib/cfg-parser.c Outdated Show resolved Hide resolved

MrAnno added 4 commits November 28, 2025 22:49

timeutils: fix const parameters of timespec_diff_nsec()

8a7ae43

Signed-off-by: László Várady <[email protected]>

template: extract log_template_format_tmpbuf()

3308b1c

Signed-off-by: László Várady <[email protected]>

logthrdest: support autoscaling partitions

a48085f

Signed-off-by: László Várady <[email protected]>

news: add feature axoflow#855

f889f39

Signed-off-by: László Várady <[email protected]>

MrAnno force-pushed the partition-stats branch from 9f2cc9b to f889f39 Compare November 28, 2025 21:49

logthrdest: add worker-partition-autoscaling-wfo() tuning option

bc9d960

Signed-off-by: László Várady <[email protected]>

MrAnno mentioned this pull request Dec 2, 2025

logthrdest: add active_worker_partitions and workers metrics #866

Merged

alltilla approved these changes Dec 3, 2025

View reviewed changes

alltilla merged commit c5d7dcd into axoflow:main Dec 3, 2025
25 of 26 checks passed

MrAnno mentioned this pull request Dec 3, 2025

logthrdestdrv: autoscaling review fixes #868

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logthrdest: support autoscaling partitions#855

logthrdest: support autoscaling partitions#855
alltilla merged 5 commits intoaxoflow:mainfrom
MrAnno:partition-stats

MrAnno commented Nov 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

MrAnno commented Nov 26, 2025

Uh oh!

MrAnno commented Nov 27, 2025

Uh oh!

Uh oh!

alltilla left a comment

Uh oh!

alltilla Dec 2, 2025

Uh oh!

alltilla Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MrAnno commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This Pull Request introduces config grammar changes

Uh oh!

MrAnno commented Nov 26, 2025

Uh oh!

MrAnno commented Nov 27, 2025

Uh oh!

Uh oh!

alltilla left a comment

Choose a reason for hiding this comment

Uh oh!

alltilla Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

alltilla Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MrAnno commented Nov 24, 2025 •

edited

Loading

github-actions bot commented Nov 24, 2025 •

edited

Loading