Add network timing attributes to okhttp3 library#15664
Add network timing attributes to okhttp3 library#15664surbhiia wants to merge 6 commits intoopen-telemetry:mainfrom
Conversation
Signed-off-by: Surbhi Agarwal <[email protected]>
|
|
||
| dependencies { | ||
| library("com.squareup.okhttp3:okhttp:3.0.0") | ||
| library("com.squareup.okhttp3:okhttp:3.11.0") |
There was a problem hiding this comment.
A bit of a tangent to this PR, but I'd be surprised if folks are still using versions this old given it was released back in 2018, and would imagine most folks are using either 4 or 5.
There was a problem hiding this comment.
Should we create an issue to discuss and evaluate this and address it as needed in a separate PR?
There was a problem hiding this comment.
Should we create an issue to discuss and evaluate this and address it as needed in a separate PR?
In my opinion creating an issue won't help as this PR won't be merged before there is a decision on whether to increase the minimum supported version or not. Generally we are open to bumping the minimum supported version for library instrumetnations, but prefer to keep supporting old versions in the agent instrumentation. Here I suspect that bumping the version isn't really required since the instrumentation actually works with the old version, only the new network listener doesn't. You could avoid the discussion by replacing library("com.squareup.okhttp3:okhttp:3.0.0") with compileOnly("com.squareup.okhttp3:okhttp:3.11.0") and testLibrary("com.squareup.okhttp3:okhttp:3.0.0") and placing the network listener test into separate suite that tests with 3.11.0.
There was a problem hiding this comment.
I tried creating a separate test suite as per the review comment but still the other tests that are not using the new EventListener are requiring higher version of okhttp3 library as the shared code is loaded which contains references to the new EventListener like the OkHttpTelemetry class. Let me know if there's a way to resolve this or other approaches I can try to prevent increasing the minimum version of okhttp3 library required by this instrumentation. OkHttp3Test fails with following error in current PR code:

There was a problem hiding this comment.
So, I learnt that as java agent depends on the library and we cannot change the base version for java agent, we cannot upgrade the version here directly.
Doing reflection might not be a good idea.
We can instead create a new folder here for okhttp 3.11? Then the question is should it be 4 instead? But then upgrading to 4 does require kotlin upgrade so what about other users who still want the log generated in this PR but not the kotlin upgrade?
|
|
||
| /** Returns the current timestamp in nanoseconds. */ | ||
| private static long currentTimestampNanos() { | ||
| return System.nanoTime(); |
There was a problem hiding this comment.
I suspect that this doesn't do what you want. System.nanoTime is good only for measuring durations since it doesn't start counting from a fixed point like currentTimeMillis does. Note that the log records also have timestamps wouldn't it make more sense to use these than add another attribute?
There was a problem hiding this comment.
We need multiple timing attributes in one log record to store timing for all the different http call phases. That's why we don't use the log record timestamp field here.
As these timing attributes are used for calculating durations in the backend and we do not need the absolute point in time for these various http call phases, System.nanoTime works as it provides us nanosecond precision and all these callbacks happen in the same process/JVM for a given Http request.
Another option we can evaluate is - Use System.currentTimeMillis() (same as Instant.now().toEpochMilli()) which returns the current time in milliseconds since the Unix epoch. Here for duration we will get millisecond level precision but we will also get absolute point in time for these various events (as additional data, not sure if there's need for it).
There was a problem hiding this comment.
Sorry didn't realize that there was only 1 log record with a bunch of attributes. Nevertheless using System.nanoTime as the value is imo strange. You could subtract start time from all the values to make them easier to interpret.
|
|
||
| /** Returns the current timestamp in nanoseconds. */ | ||
| private static long currentTimestampNanos() { | ||
| return System.nanoTime(); |
There was a problem hiding this comment.
To ensure these timestamps line up with the associated span and log record, you should use the same clock instance, which you can set on the logger and tracer via SdkLoggerProviderBuilder and SdkTracerProviderBuilder, respectively. Ideally you'd just get the clock instance from them, but I don't think you can do that?
There was a problem hiding this comment.
See this issue and its related fixes.
@fractalwrench Is there a way for instrumentation to leverage the shared clock instance just through the OTel API?
There was a problem hiding this comment.
opentelemetry-java doesn't expose a convenient API surface that I'm aware of. That probably deserves a separate issue.
|
|
||
| @Override | ||
| public void responseBodyEnd(Call call, long byteCount) { | ||
| addAttribute(call, RESPONSE_BODY_END, currentTimestampNanos()); |
There was a problem hiding this comment.
I'm pretty sure this is closer to where the span used to end - after the body has been transmitted. This will definitely be called on all successful requests, as opposed to callEnd, which may not be invoked right away.
There was a problem hiding this comment.
So I compared the existing implementation of where the span starts and ends, and it doesn't quite match up with the network timing listener in that both the callStart and callEnd events happen outside the duration of the span. As such, I'd suggest you emit the log at the last useful timestamp contained within the span's lifetime, which is responseBodyEnd.
|
hi @surbhiia! I'm going to convert this PR to draft (just housekeeping), once you get CI passing or have any questions for the reviewers, please mark it ready for review, thanks! |
As discussed in the Add HTTP span events for network phases breakdown issue in semantic-convention repo, this PR resolves this issue in this repo.
Changes made in this PR :
OkHttpTelemetry.newCallFactoryWithNetworkTiming) was added that allows customers to get an okhttp3 call factory withNetworkTimingEventListenerattached, the existing API and functionality remains unaffected.Upcoming Work :
OkHttpTelemetry.newCallFactoryWithNetworkTimingAPI will take abooleanargumentcopyHttpSpanAttributesand based on this original HTTP span attributes will be copied to the log record.callEnd/callFailedare not called:callEndorcallFailedwill be called, it's only some of these rare and improper OkHttp client setup scenarios where these might not be called. So, we do not have a strong need for the periodic thread right now or we need to evaluate it more.