Skip to content

Conversation

@Watson1978
Copy link
Contributor

@Watson1978 Watson1978 commented Mar 23, 2025

Which issue(s) this PR fixes:
Fixes #4873

What this PR does / why we need it:
Before invoke @num_errors_metrics.inc, the does retries correctly when #try_write fails test will be failed if thread will be switched at

def update_retry_state(chunk_id, using_secondary, error = nil)
@retry_mutex.synchronize do
@num_errors_metrics.inc

It is easy to reproduce by adding sleep just before @num_errors_metrics.inc, like:

diff --git a/lib/fluent/plugin/output.rb b/lib/fluent/plugin/output.rb
index a987937b..d97d8ec6 100644
--- a/lib/fluent/plugin/output.rb
+++ b/lib/fluent/plugin/output.rb
@@ -1312,6 +1312,7 @@ def check_slow_flush(start)
 
       def update_retry_state(chunk_id, using_secondary, error = nil)
         @retry_mutex.synchronize do
+          sleep 0.1
           @num_errors_metrics.inc
           chunk_id_hex = dump_unique_id_hex(chunk_id)
 

This patch will wait until @num_errors_metrics.inc is invoked.

Docs Changes:

Release Note:

@Watson1978 Watson1978 marked this pull request as ready for review March 23, 2025 13:40
@Watson1978 Watson1978 requested a review from daipom March 23, 2025 13:40
@Watson1978 Watson1978 force-pushed the test_output_as_buffered_retries branch from 1880b84 to 90a0025 Compare March 23, 2025 13:41
@Watson1978 Watson1978 added the CI Test/CI issues label Mar 23, 2025
@ashie ashie merged commit a10f0b0 into fluent:master Mar 24, 2025
10 checks passed
@Watson1978
Copy link
Contributor Author

Thanks!

@Watson1978 Watson1978 deleted the test_output_as_buffered_retries branch March 24, 2025 02:03
@Watson1978 Watson1978 added the backport to v1.16 We will backport this fix to the LTS branch label Mar 24, 2025
@daipom daipom added this to the v1.19.0 milestone Apr 23, 2025
daipom pushed a commit that referenced this pull request Apr 23, 2025
**Which issue(s) this PR fixes**:
Fixes #4873

**What this PR does / why we need it**:
Before invoke `@num_errors_metrics.inc`, the `does retries correctly
when #try_write fails` test will be failed if thread will be switched at
https://github.com/fluent/fluentd/blob/f34a2531fcfc4e8958f665301d48558de200aa3a/lib/fluent/plugin/output.rb#L1313-L1315

It is easy to reproduce by adding `sleep` just before
`@num_errors_metrics.inc`, like:

```diff
diff --git a/lib/fluent/plugin/output.rb b/lib/fluent/plugin/output.rb
index a987937..d97d8ec6 100644
--- a/lib/fluent/plugin/output.rb
+++ b/lib/fluent/plugin/output.rb
@@ -1312,6 +1312,7 @@ def check_slow_flush(start)

       def update_retry_state(chunk_id, using_secondary, error = nil)
         @retry_mutex.synchronize do
+          sleep 0.1
           @num_errors_metrics.inc
           chunk_id_hex = dump_unique_id_hex(chunk_id)

```

This patch will wait until `@num_errors_metrics.inc` is invoked.

**Docs Changes**:

**Release Note**:

Signed-off-by: Shizuo Fujita <[email protected]>
Signed-off-by: Daijiro Fukuda <[email protected]>
@daipom daipom added the backported "backport to LTS" is done label Apr 23, 2025
daipom added a commit that referenced this pull request Apr 23, 2025
… (#4915)

**Which issue(s) this PR fixes**:
Backport #4877
Fixes #4873

**What this PR does / why we need it**:
Before invoke `@num_errors_metrics.inc`, the `does retries correctly
when #try_write fails` test will be failed if thread will be switched at
https://github.com/fluent/fluentd/blob/f34a2531fcfc4e8958f665301d48558de200aa3a/lib/fluent/plugin/output.rb#L1313-L1315

It is easy to reproduce by adding `sleep` just before
`@num_errors_metrics.inc`, like:

```diff
diff --git a/lib/fluent/plugin/output.rb b/lib/fluent/plugin/output.rb
index a987937..d97d8ec6 100644
--- a/lib/fluent/plugin/output.rb
+++ b/lib/fluent/plugin/output.rb
@@ -1312,6 +1312,7 @@ def check_slow_flush(start)

       def update_retry_state(chunk_id, using_secondary, error = nil)
         @retry_mutex.synchronize do
+          sleep 0.1
           @num_errors_metrics.inc
           chunk_id_hex = dump_unique_id_hex(chunk_id)

```

This patch will wait until `@num_errors_metrics.inc` is invoked.

**Docs Changes**:

**Release Note**:

<!--
Thank you for contributing to Fluentd!
Your commits need to follow DCO: https://probot.github.io/apps/dco/
And please provide the following information to help us make the most of
your pull request:
-->

**Which issue(s) this PR fixes**: 
Fixes #

**What this PR does / why we need it**: 

**Docs Changes**:

**Release Note**:

Signed-off-by: Shizuo Fujita <[email protected]>
Signed-off-by: Daijiro Fukuda <[email protected]>
Co-authored-by: Shizuo Fujita <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport to v1.16 We will backport this fix to the LTS branch backported "backport to LTS" is done CI Test/CI issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: Flaky 'does retries correctly when #try_write fails' test

3 participants