Autoconfigure experimental OTLP retry#3791
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3791 +/- ##
============================================
+ Coverage 89.28% 89.30% +0.01%
- Complexity 3956 4017 +61
============================================
Files 473 476 +3
Lines 12307 12458 +151
Branches 1207 1217 +10
============================================
+ Hits 10988 11125 +137
- Misses 908 920 +12
- Partials 411 413 +2
Continue to review full report at Codecov.
|
anuraaga
left a comment
There was a problem hiding this comment.
The approach of avoiding public API with reflection seems quite reasonable to me
| api(project(":sdk:logs")) | ||
|
|
||
| implementation(project(":exporters:otlp:common")) | ||
| api(project(":exporters:otlp:common")) |
There was a problem hiding this comment.
Is this change needed? common only has internal code, so we'd expect an implementation dependency to always suffice
There was a problem hiding this comment.
I think it is.
Code in :sdk-extensions:autoconfigure needs to access internal code of :exporters:otlp:common (i.e. the RetryPolicy class). The :sdk-extensions:autoconfigure module has only a compileOnly dependency on :exporters:otlp:all (trace) and :exporters:otlp:metrics, since it requires users to add the runtime dependency. But with a runtime dependency on :exporters:otlp:all and :exporters:otlp:metrics, they also now need a runtime dependency on :exporters:otlp:common in order to have access to RetryPolicy. Making :exporters:otlp:common an api dependency makes it work without the additional dependency.
If this eventually matures, I think :exporters:otlp:common would be the natural home for RetryPolicy, so at that point a api gradle dependency on :exporters:otlp:common would make sense.
|
@jkwatson Want to review this? Otherwise we can probably go ahead and merge |
Should be fine from my perspective. I can give it a thorough review tomorrow if you think it needs it |
| * @throws IllegalArgumentException if the instance does not contain a field called "delegate" of | ||
| * type {@link DefaultGrpcExporterBuilder} | ||
| */ | ||
| public static <T> DefaultGrpcExporterBuilder<?> getDelegateBuilder(Class<T> type, T instance) { |
There was a problem hiding this comment.
should we scope T to at least have some bounds? Do we have a common interface we could enforce here to make it a bit clearer what this is for/about? This is a weird API to have be public at the moment.
There was a problem hiding this comment.
should we scope T to at least have some bounds? Do we have a common interface we could enforce here to make it a bit clearer what this is for/about?
Unfortunately not.
This is a weird API to have be public at the moment.
Yeah its not ideal, but at least its an internal package. There's tons of public stuff in the internal packages that is questionable if you view it as truly public.
The alternative to this was to have this functionality package private in the autoconfigure module. I like this approach though because it allows someone to programmatically configure a retry policy that is different from the default one that is used when you enable it via env vars.
There was a problem hiding this comment.
Somehow I missed this was internal. Carry on!
|
@jack-berg is this still "experimental"? Should we consider making it official at this point? |
I'm taking another try at making progress on OTLP retry! 😬
The PR #3636 draft PR demonstrated some strategies for implementing the OTLP retry required as a "MUST" in the spec. There were some good conversations and the resolution was to go to try to improve the wording of the spec so that the java implementation could be confident in its alignment.
I've attempted to improve the spec language, but that has stalled due to what I would describe as a catch 22. The spec says SDKs "MUST" implement retry, but lacks specificity needed by SDKs to implement with confidence that they'll be aligned with future language. At the same time the spec is reluctant to add more specific language without supporting data.
This PR proposes a strategy for adding experimental support for OTLP retry configuration. I hope it can help progress the issue by providing additional data to influence spec language. Here's what I propose:
DefaultGrpcExporterBuilder#addRetryPolicy(RetryPolicy). This method is on a class in an internal package. Neither the gRPC over okhttp or http/protobuf exporters have support for retry in this PR.DefaultGrpcExporterBuilderdelegate of the exporter builders. I.e. you can access theOtlpGrpcSpanExporterBuilder#delegatefield reflectively, and calladdRetryPolicy()on it.otel.experimental.exporter.otlp.retry.enabled. When enabled, this calls sets up the retry policy via the internal accessors.The net affect is the ability to add a retry policy without any additional public API surface area. The compromise needed to accomplish this was an accessor that uses reflection. Maybe there's another way, but I couldn't think of one.
Why should we care about retry?
This isn't a theoretical problem - I run otel for applications and I regularly see failed exports. Some applications may be able to tolerate intermittent failures in exports, but some will not. Lack of tolerance for network faults is a gap for production readiness.