Retry exporting unsuccessfully exported batches by bencehornak · Pull Request #1493 · open-telemetry/opentelemetry-android

bencehornak · 2026-01-03T16:15:20Z

Current behavior

Since #1326 when the SDK exports batches from the disk it essentially ignores any exporting failures and deletes the unsuccessfully exported batches from the disk. Consequently, valuable telemetry might be dropped silently, if there is any exception taking place in the OTel exporter (e.g. IOException due to the device being offline or the backend experiencing a temporary outage).

Proposed behavior

The export scheduler stops exporting on the first failure to prevent the deletion of unconsumed signals. On the next periodic run, the exporting will be retried from the point where the previous attempt left off.

Note

I think the root cause of the problem is the wrong example in the disk buffering module's README, see open-telemetry/opentelemetry-java-contrib#2539.

codecov · 2026-01-03T16:34:44Z

Codecov Report

❌ Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.60%. Comparing base (73b6d7f) to head (d4ab340).
⚠️ Report is 29 commits behind head on main.

Files with missing lines	Patch %	Lines
.../diskbuffering/scheduler/DefaultExportScheduler.kt	0.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1493      +/-   ##
==========================================
+ Coverage   63.51%   64.60%   +1.09%     
==========================================
  Files         159      159              
  Lines        3152     3176      +24     
  Branches      326      326              
==========================================
+ Hits         2002     2052      +50     
+ Misses       1050     1021      -29     
- Partials      100      103       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LikeTheSalad

Thank you! 🙏

breedx-splk · 2026-01-13T21:29:57Z

-            }
-            return atLeastOneWorked
-        }
+        fun exportAllSignalsFromDisk(): Boolean = exportSpansFromDisk() && exportMetricsFromDisk() && exportLogsFromDisk()


One downside to this approach is that a failure in one telemetry receiver can impact the ability of the others to even attempt export. That is, if a client is configured to use separate endpoints for metrics, traces, and logs, an outage in the spans receiver endpoint will cause the metrics and logs to not even be attempted!

It's probably a minor point, because I think that most users will have the same endpoint for all telemetry, but this is not necessarily the case. Because the signals are stored separately, we could attempt to export the other types even when one type fails. I'm not sure it's worth it, though...and doing so might cause some overengineering.

Good point, I oversaw this possibility. It doesn't cost much to implement the right way, I will make sure that all 3 methods are called regardless of failures.

breedx-splk · 2026-01-13T21:31:30Z

+         * Exports all persisted spans/metrics/logs from the disk and deletes the successfully
+         * consumed batches from the disk. It aborts the exporting on the first failure.


I don't love that the javadocs on these addresses the deletion -- as pointed out in the other repo/issue/PR, the deletion is a side effect of iteration. I know we're going to be addressing this, but because this class doesn't actually do the deletion, I don't think it should be describing behavior performed by a collaborator...unless that is explicitly called out.

The deletion is implicitly there due to the current tricky API (next() with side-effect). Since you can't see the deleting/retrying spelled out in the current version, I'd argue that it's even more important to describe when data is removed from the persistent storage and whether a failure is retried. The android-sdk makes an opinionated choice here (there are at least two ways of consuming the disk-buffering lib, the old way by ignoring failures and the proposed way by retrying failures), so that choice is what I aimed to document here.

I can try to improve the documentation of the opinionated part, e.g. I could put an emphasis on the retry mechanism, like

Suggested change

* Exports all persisted spans/metrics/logs from the disk and deletes the successfully

* consumed batches from the disk. It aborts the exporting on the first failure.

* Exports all persisted spans/metrics/logs from the disk.

* If an export fails, the method aborts and retries exporting the

* affected signals on the next invocation.

would that work for you @breedx-splk?

I mean, I guess that's fine...but I still think it's incorrect for a javadoc to describe the internal workings of one of its collaborators. If (when!) the collaborator changes, we are left with incorrect javadoc.

I think we can just roll with this for now and address that in a follow-up PR effort.

bencehornak added 3 commits January 3, 2026 16:40

Re-export unsuccessfully exported batches

4f64ce0

Add test cases

dbe8486

Move repeated variable initialization to setUp

fed62e0

bencehornak requested a review from a team as a code owner January 3, 2026 16:15

bencehornak mentioned this pull request Jan 3, 2026

Improve disk buffering example to retry exporting unsuccessfully exported batches open-telemetry/opentelemetry-java-contrib#2539

Merged

bencehornak mentioned this pull request Jan 4, 2026

Improve disk buffering reading paradigm open-telemetry/opentelemetry-java-contrib#2540

Closed

LikeTheSalad approved these changes Jan 12, 2026

View reviewed changes

breedx-splk reviewed Jan 13, 2026

View reviewed changes

Try exporting all signals in exportAllSignalsFromDisk

d4ab340

marandaneto approved these changes Jan 16, 2026

View reviewed changes

breedx-splk merged commit 0ab4348 into open-telemetry:main Jan 22, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry exporting unsuccessfully exported batches#1493

Retry exporting unsuccessfully exported batches#1493
breedx-splk merged 4 commits intoopen-telemetry:mainfrom
bencehornak:reexport-unsuccessfully-exported-batches

bencehornak commented Jan 3, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jan 3, 2026 •

edited

Loading

Uh oh!

LikeTheSalad left a comment

Uh oh!

breedx-splk Jan 13, 2026

Uh oh!

bencehornak Jan 15, 2026

Uh oh!

breedx-splk Jan 13, 2026

Uh oh!

bencehornak Jan 15, 2026

Uh oh!

breedx-splk Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		* Exports all persisted spans/metrics/logs from the disk and deletes the successfully
		* consumed batches from the disk. It aborts the exporting on the first failure.

-         * Exports all persisted spans/metrics/logs from the disk and deletes the successfully
-         * consumed batches from the disk. It aborts the exporting on the first failure.
+         * Exports all persisted spans/metrics/logs from the disk.
+         * If an export fails, the method aborts and retries exporting the
+         * affected signals on the next invocation.

Conversation

bencehornak commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current behavior

Proposed behavior

Note

Uh oh!

codecov Bot commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LikeTheSalad left a comment

Choose a reason for hiding this comment

Uh oh!

breedx-splk Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

bencehornak Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

breedx-splk Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

bencehornak Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

breedx-splk Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bencehornak commented Jan 3, 2026 •

edited

Loading

codecov Bot commented Jan 3, 2026 •

edited

Loading