Skip to content

logthrdest: add syslogng_output_event_retries_total metric #4807

Merged
MrAnno merged 2 commits intosyslog-ng:masterfrom
alltilla:thrdest-retries-metric
Jan 31, 2024
Merged

logthrdest: add syslogng_output_event_retries_total metric #4807
MrAnno merged 2 commits intosyslog-ng:masterfrom
alltilla:thrdest-retries-metric

Conversation

@alltilla
Copy link
Collaborator

Example metrics:

syslogng_output_event_retries_total{driver="http",url="http://localhost:8888/${path}",id="#anon-destination0#0"} 5

alltilla added a commit to alltilla/syslog-ng that referenced this pull request Jan 29, 2024
Signed-off-by: Attila Szakacs <[email protected]>
@alltilla alltilla force-pushed the thrdest-retries-metric branch from 9ec5ed6 to f72f485 Compare January 29, 2024 14:21
Copy link
Collaborator

@MrAnno MrAnno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reload is a problem here.
With event_retries, I think we want to indicate when something went wrong with event delivery.

Multiple connect calls are normal within syslog-ng. I think even multiple disconnect() calls are normal.
Why don't we increase this counter where actual errors happen during send/flush?

(Around _rewind_batch(), or on the error paths of _process_result().)

@alltilla
Copy link
Collaborator Author

Good point. I did not put the counter increase in the error handling functions, because it is only retried after the next connect, which happens after time-reopen, which is 1 minute by default. So we increase the counter 1 minute earlier, but maybe we can live with that. What do you think?

@MrAnno
Copy link
Collaborator

MrAnno commented Jan 30, 2024

I think we can live with that :)

alltilla added a commit to alltilla/syslog-ng that referenced this pull request Jan 30, 2024
Signed-off-by: Attila Szakacs <[email protected]>
@alltilla alltilla force-pushed the thrdest-retries-metric branch from f72f485 to 551e947 Compare January 30, 2024 14:17
@alltilla
Copy link
Collaborator Author

Moved it to the result handling functions and it happened to end up around rewind_batch(), which is a good indication that it is in the correct place. :) Thanks for the suggestion.

@MrAnno
Copy link
Collaborator

MrAnno commented Jan 30, 2024

Thank you. A last question: can't we actually put this into rewind_batch()? I see that every call of that function precedes the metric update. rewind_batch() is the most direct way of saying "we will retry this" in my opinion, so probably the metric belongs there and it would remain correct even if we introduced more error paths.

@alltilla
Copy link
Collaborator Author

To me it felt a bit side-effecty. If we put it there, I feel like we should rename the function. I will try to come up with something, and if I cannot find, I will just put the stats increase there and not rename it.

@alltilla alltilla force-pushed the thrdest-retries-metric branch from 551e947 to c44425c Compare January 31, 2024 08:31
@alltilla
Copy link
Collaborator Author

@MrAnno Could not come up with a better name :)

@MrAnno
Copy link
Collaborator

MrAnno commented Jan 31, 2024

I don't think we need a better name, recording metrics are normal side-effects.

@MrAnno MrAnno merged commit 3cf7026 into syslog-ng:master Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants