Skip to content

Logs not being flushed after x amount of time #2969

@Xyrion

Description

@Xyrion

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug
I have an environment running Fluent-bit > Fluentd > Elasticsearch.
For a while my logs are flushed as they should, but after a while the logs stops being flushed and the buffer grows until the port gets blocked becuase of (overflow_action block).

What I have seen:
fluentd_output_status_num_errors is high for certain matches, however most matches have some errors.
Q: How do i view these errors? I dont see them in the logs even though I have tried with debug logging.

fluentd_output_status_buffer_queue_length is very high for two specific matches. These two matches have retry_max_interval 30, yet after a few hours they are still growing. These two matches are the only matches where I do an "include" only.

Q: How do i stop this from happening? Is there a way to see whats holding up the queue? I have checked the buffer path and the log which came in first is looking like a normal log which has been processed before.

To Reproduce

Expected behavior
The logs should constantly flow through, if there are any errors they should be printed in the logs.

Your Environment
Running in a container with:
repository: gcr.io/google-containers/fluentd-elasticsearch
tag: v2.4.0

Your Configuration

Here is my match block:
<match auth.keycloak.app>
   @type rewrite_tag_filter
   <rule>
     key log
     pattern /org.keycloak.events/
     tag keycloak.auth
   </rule>
 </match>

 <match keycloak.auth>
   @type rewrite_tag_filter
   <rule>
     key log
     pattern /type=(?<type>[^ ]+)(?<!,)/
     tag authorization.keycloak.$1
   </rule>
 </match>

    <match authorization.keycloak.**>
      @id elasticsearch-keycloak-authorization
      @type elasticsearch
      @log_level error
      log_es_400_reason true
      include_tag_key true
      host "#{ENV['OUTPUT_HOST']}"
      port "#{ENV['OUTPUT_PORT']}"
      scheme "#{ENV['OUTPUT_SCHEME']}"
      ssl_version "#{ENV['OUTPUT_SSL_VERSION']}"
      user client
      password "#{ENV['OUTPUT_PASSWORD']}"
      ssl_verify false
      logstash_format true
      logstash_prefix auth-keycloak
      <buffer>
        @type file
        flush_mode immediate
        path /var/log/fluentd-buffers/authorization-keycloak.buffer
        retry_type exponential_backoff
        flush_thread_count 2
        retry_limit 20
        retry_max_interval 30
        chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
        queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
        overflow_action block
      </buffer>
    </match>
Any help is much appreciated!

Your Error Log
There is no error log! I see the problem through prometheus, buffer just keeps growing and does not drop until service has been redeployed or restarted.

<!-- Write your **ALL** error log here -->

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions