Skip to content

Add network timing attributes to okhttp3 library#15664

Draft
surbhiia wants to merge 6 commits intoopen-telemetry:mainfrom
surbhiia:develop/add-http-phases-breakdown-attributes
Draft

Add network timing attributes to okhttp3 library#15664
surbhiia wants to merge 6 commits intoopen-telemetry:mainfrom
surbhiia:develop/add-http-phases-breakdown-attributes

Conversation

@surbhiia
Copy link
Copy Markdown

@surbhiia surbhiia commented Dec 16, 2025

As discussed in the Add HTTP span events for network phases breakdown issue in semantic-convention repo, this PR resolves this issue in this repo.

Changes made in this PR :

  • Emit a standalone log record that contains all the network timing attributes and context of the original HTTP span
  • A new API (OkHttpTelemetry.newCallFactoryWithNetworkTiming) was added that allows customers to get an okhttp3 call factory with NetworkTimingEventListener attached, the existing API and functionality remains unaffected.

Upcoming Work :

  • Replicate client span attributes
    • To help backends which can't correlate two different signals to generate metrics** (if metrics generation happens at ingestion time).
    • OkHttpTelemetry.newCallFactoryWithNetworkTiming API will take a boolean argument copyHttpSpanAttributes and based on this original HTTP span attributes will be copied to the log record.
  • Other edge cases where log is not emitted (only if required and there is demand for it)
    • Scenarios where callEnd / callFailed are not called:
      • App crashes/termination
      • No timeouts configured (indefinite hangs)
      • Response body never closed (connection leaks)
      • Thread interruption etc.
    • For these we can introduce a periodic thread with configurable timeout that emits any idle logs. Something similar to the executor introduced in HttpURLConnection auto instrumentation here.
    • This is not done in this PR as generally okhttp guarantees either callEnd or callFailed will be called, it's only some of these rare and improper OkHttp client setup scenarios where these might not be called. So, we do not have a strong need for the periodic thread right now or we need to evaluate it more.

@surbhiia surbhiia requested a review from a team as a code owner December 16, 2025 12:28

dependencies {
library("com.squareup.okhttp3:okhttp:3.0.0")
library("com.squareup.okhttp3:okhttp:3.11.0")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a tangent to this PR, but I'd be surprised if folks are still using versions this old given it was released back in 2018, and would imagine most folks are using either 4 or 5.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create an issue to discuss and evaluate this and address it as needed in a separate PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create an issue to discuss and evaluate this and address it as needed in a separate PR?

In my opinion creating an issue won't help as this PR won't be merged before there is a decision on whether to increase the minimum supported version or not. Generally we are open to bumping the minimum supported version for library instrumetnations, but prefer to keep supporting old versions in the agent instrumentation. Here I suspect that bumping the version isn't really required since the instrumentation actually works with the old version, only the new network listener doesn't. You could avoid the discussion by replacing library("com.squareup.okhttp3:okhttp:3.0.0") with compileOnly("com.squareup.okhttp3:okhttp:3.11.0") and testLibrary("com.squareup.okhttp3:okhttp:3.0.0") and placing the network listener test into separate suite that tests with 3.11.0.

Copy link
Copy Markdown
Author

@surbhiia surbhiia Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried creating a separate test suite as per the review comment but still the other tests that are not using the new EventListener are requiring higher version of okhttp3 library as the shared code is loaded which contains references to the new EventListener like the OkHttpTelemetry class. Let me know if there's a way to resolve this or other approaches I can try to prevent increasing the minimum version of okhttp3 library required by this instrumentation. OkHttp3Test fails with following error in current PR code:
OkHttp3Test Error

Copy link
Copy Markdown
Author

@surbhiia surbhiia Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I learnt that as java agent depends on the library and we cannot change the base version for java agent, we cannot upgrade the version here directly.

Doing reflection might not be a good idea.

We can instead create a new folder here for okhttp 3.11? Then the question is should it be 4 instead? But then upgrading to 4 does require kotlin upgrade so what about other users who still want the log generated in this PR but not the kotlin upgrade?

@surbhiia surbhiia marked this pull request as draft December 17, 2025 15:21
@surbhiia surbhiia marked this pull request as ready for review January 21, 2026 23:06

/** Returns the current timestamp in nanoseconds. */
private static long currentTimestampNanos() {
return System.nanoTime();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this doesn't do what you want. System.nanoTime is good only for measuring durations since it doesn't start counting from a fixed point like currentTimeMillis does. Note that the log records also have timestamps wouldn't it make more sense to use these than add another attribute?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need multiple timing attributes in one log record to store timing for all the different http call phases. That's why we don't use the log record timestamp field here.

As these timing attributes are used for calculating durations in the backend and we do not need the absolute point in time for these various http call phases, System.nanoTime works as it provides us nanosecond precision and all these callbacks happen in the same process/JVM for a given Http request.

Another option we can evaluate is - Use System.currentTimeMillis() (same as Instant.now().toEpochMilli()) which returns the current time in milliseconds since the Unix epoch. Here for duration we will get millisecond level precision but we will also get absolute point in time for these various events (as additional data, not sure if there's need for it).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry didn't realize that there was only 1 log record with a bunch of attributes. Nevertheless using System.nanoTime as the value is imo strange. You could subtract start time from all the values to make them easier to interpret.


/** Returns the current timestamp in nanoseconds. */
private static long currentTimestampNanos() {
return System.nanoTime();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure these timestamps line up with the associated span and log record, you should use the same clock instance, which you can set on the logger and tracer via SdkLoggerProviderBuilder and SdkTracerProviderBuilder, respectively. Ideally you'd just get the clock instance from them, but I don't think you can do that?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this issue and its related fixes.

@fractalwrench Is there a way for instrumentation to leverage the shared clock instance just through the OTel API?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opentelemetry-java doesn't expose a convenient API surface that I'm aware of. That probably deserves a separate issue.


@Override
public void responseBodyEnd(Call call, long byteCount) {
addAttribute(call, RESPONSE_BODY_END, currentTimestampNanos());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is closer to where the span used to end - after the body has been transmitted. This will definitely be called on all successful requests, as opposed to callEnd, which may not be invoked right away.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I compared the existing implementation of where the span starts and ends, and it doesn't quite match up with the network timing listener in that both the callStart and callEnd events happen outside the duration of the span. As such, I'd suggest you emit the log at the last useful timestamp contained within the span's lifetime, which is responseBodyEnd.

@trask
Copy link
Copy Markdown
Member

trask commented Apr 25, 2026

hi @surbhiia! I'm going to convert this PR to draft (just housekeeping), once you get CI passing or have any questions for the reviewers, please mark it ready for review, thanks!

@trask trask marked this pull request as draft April 25, 2026 21:06
@trask trask mentioned this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OkHttp3 library enhancement for gathering network phases timing via HTTP Span Events

7 participants