Proposal: chunk streaming and LLM provider latency metrics

### Area(s)

area:gen-ai

### Propose new conventions

The current `gen_ai.server.time_to_first_token` metric is useful for tracking server-side latency and llm "spin-up" but this metric is not as informative for client-side optimizations.

My thought was that when instrumenting an application using an agentic framework, it would be helpful for the framework to have appropriate telemetry to answer the following questions:

1. How long was my request in transit to and from the LLM provider before I began seeing a response?
    - I propose `gen_ai.client.operation.time_to_first_chunk` as a client-side version of the `gen_ai.server.time_to_first_token` or time to first token (TTFT) metric.
    - This allows ops to measure (and ultimately optimize) overall lag/latency from the LLM providers' APIs (provisioning, message queue, etc...)
2. How many tokens per second were generated DURING GENERATION (not including resourcing, queues, and provisioning by the server.
    - I propose `gen_ai.client.operation.time_per_output_chunk` as a client-side version of the gen_ai.server.time_per_output_token metric.
    - This allows ops to measure (and ultimately optimize) LLM providers based on their speed/cost
3. How long did my request take to complete in total?
    - The current gen_ai.client.operation.duration you have already implemented

For additional context, many builders are not running inference locally and likely don't have access to the server's token and chunk emission telemetry to measure directly. Considering the client-side lack of telemetry in these regards, having these metrics is valuable from an LLM ops optimization standpoint.

### Tip

<sub>[React](https://github.blog/news-insights/product-news/add-reactions-to-pull-requests-issues-and-comments/) with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding `+1` or `me too`, to help us triage it. Learn more [here](https://opentelemetry.io/community/end-user/issue-participation/).</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: chunk streaming and LLM provider latency metrics #3113

Area(s)

Propose new conventions

Tip

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: chunk streaming and LLM provider latency metrics #3113

Description

Area(s)

Propose new conventions

Tip

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions