-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
I have an environment running Fluent-bit > Fluentd > Elasticsearch.
For a while my logs are flushed as they should, but after a while the logs stops being flushed and the buffer grows until the port gets blocked becuase of (overflow_action block).
What I have seen:
fluentd_output_status_num_errors is high for certain matches, however most matches have some errors.
Q: How do i view these errors? I dont see them in the logs even though I have tried with debug logging.
fluentd_output_status_buffer_queue_length is very high for two specific matches. These two matches have retry_max_interval 30, yet after a few hours they are still growing. These two matches are the only matches where I do an "include" only.
Q: How do i stop this from happening? Is there a way to see whats holding up the queue? I have checked the buffer path and the log which came in first is looking like a normal log which has been processed before.
To Reproduce
Expected behavior
The logs should constantly flow through, if there are any errors they should be printed in the logs.
Your Environment
Running in a container with:
repository: gcr.io/google-containers/fluentd-elasticsearch
tag: v2.4.0
Your Configuration
Here is my match block:
<match auth.keycloak.app>
@type rewrite_tag_filter
<rule>
key log
pattern /org.keycloak.events/
tag keycloak.auth
</rule>
</match>
<match keycloak.auth>
@type rewrite_tag_filter
<rule>
key log
pattern /type=(?<type>[^ ]+)(?<!,)/
tag authorization.keycloak.$1
</rule>
</match>
<match authorization.keycloak.**>
@id elasticsearch-keycloak-authorization
@type elasticsearch
@log_level error
log_es_400_reason true
include_tag_key true
host "#{ENV['OUTPUT_HOST']}"
port "#{ENV['OUTPUT_PORT']}"
scheme "#{ENV['OUTPUT_SCHEME']}"
ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
user client
password "#{ENV['OUTPUT_PASSWORD']}"
ssl_verify false
logstash_format true
logstash_prefix auth-keycloak
<buffer>
@type file
flush_mode immediate
path /var/log/fluentd-buffers/authorization-keycloak.buffer
retry_type exponential_backoff
flush_thread_count 2
retry_limit 20
retry_max_interval 30
chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
overflow_action block
</buffer>
</match>
Any help is much appreciated!
Your Error Log
There is no error log! I see the problem through prometheus, buffer just keeps growing and does not drop until service has been redeployed or restarted.
<!-- Write your **ALL** error log here -->
Additional context