Implement retry for OkHttpGrpcExporter#3936
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3936 +/- ##
============================================
+ Coverage 89.71% 89.79% +0.08%
- Complexity 4170 4182 +12
============================================
Files 498 499 +1
Lines 12661 12702 +41
Branches 1223 1227 +4
============================================
+ Hits 11359 11406 +47
+ Misses 908 904 -4
+ Partials 394 392 -2
Continue to review full report at Codecov.
|
3f7dd41 to
9032b6a
Compare
9032b6a to
fc93256
Compare
…nto okhttp-grpc-retrier
|
FYI, I've been working on something similar for the okhttp http/protobuf otlp exporters. |
| sleeper.sleep(backoffNanos); | ||
| } catch (InterruptedException e) { | ||
| Thread.currentThread().interrupt(); | ||
| return response; |
There was a problem hiding this comment.
Do we want to return here or raise a runtime exception?
There was a problem hiding this comment.
From what I understand calling interrupt and returning is the correct way of handling an interrupt exception, though I've never really run into it differing much from throwing in practice
There was a problem hiding this comment.
Yeah I think in practice the thread will only be interrupted at shutdown so it shouldn't matter much that we return.
491bc52 to
58b3419
Compare
| } | ||
|
|
||
| static boolean isRetryable(Response response) { | ||
| // Only retry on gRPC codes which will always come with an HTTP success |
There was a problem hiding this comment.
@jack-berg BTW, this is unlikely to be the best behavior (i.e., 503 seems like an obvious one to retry on) - gRPC retry lives in a bubble that misses standard HTTP load balancers. But since spec doesn't mention HTTP codes for gRPC retry, went with this for now.
There was a problem hiding this comment.
Yeah that's a good point. There could be a load balancer in front of the gRPC server that gets in the way of the gRPC bubble.
| // Careful - the document says "retry attempt" but our maxAttempts includes the original | ||
| // request. The numbers are tricky but the unit test should be more clear. | ||
| long upperBoundNanos = Math.min(nextBackoffNanos, retryPolicy.getMaxBackoff().toNanos()); | ||
| long backoffNanos = randomLong.get(upperBoundNanos); |
There was a problem hiding this comment.
This can return any number like from 0 to upperBoundNanos, and in worst case all the retries can go off in like 1 or two nanos.
As per the doc attached as comment, the jitter should be 0.2, this doesn't operate that way
There was a problem hiding this comment.
Yeah looks like the link was updated since this code was written, probably good to match the code up with it
There was a problem hiding this comment.
@ayushtkn can you open an issue for this, so your suggestion isn't lost? thanks!
No description provided.