Area(s)
area:gen-ai
Propose new conventions
The current gen_ai.server.time_to_first_token metric is useful for tracking server-side latency and llm "spin-up" but this metric is not as informative for client-side optimizations.
My thought was that when instrumenting an application using an agentic framework, it would be helpful for the framework to have appropriate telemetry to answer the following questions:
- How long was my request in transit to and from the LLM provider before I began seeing a response?
- I propose
gen_ai.client.operation.time_to_first_chunk as a client-side version of the gen_ai.server.time_to_first_token or time to first token (TTFT) metric.
- This allows ops to measure (and ultimately optimize) overall lag/latency from the LLM providers' APIs (provisioning, message queue, etc...)
- How many tokens per second were generated DURING GENERATION (not including resourcing, queues, and provisioning by the server.
- I propose
gen_ai.client.operation.time_per_output_chunk as a client-side version of the gen_ai.server.time_per_output_token metric.
- This allows ops to measure (and ultimately optimize) LLM providers based on their speed/cost
- How long did my request take to complete in total?
- The current gen_ai.client.operation.duration you have already implemented
For additional context, many builders are not running inference locally and likely don't have access to the server's token and chunk emission telemetry to measure directly. Considering the client-side lack of telemetry in these regards, having these metrics is valuable from an LLM ops optimization standpoint.
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Area(s)
area:gen-ai
Propose new conventions
The current
gen_ai.server.time_to_first_tokenmetric is useful for tracking server-side latency and llm "spin-up" but this metric is not as informative for client-side optimizations.My thought was that when instrumenting an application using an agentic framework, it would be helpful for the framework to have appropriate telemetry to answer the following questions:
gen_ai.client.operation.time_to_first_chunkas a client-side version of thegen_ai.server.time_to_first_tokenor time to first token (TTFT) metric.gen_ai.client.operation.time_per_output_chunkas a client-side version of the gen_ai.server.time_per_output_token metric.For additional context, many builders are not running inference locally and likely don't have access to the server's token and chunk emission telemetry to measure directly. Considering the client-side lack of telemetry in these regards, having these metrics is valuable from an LLM ops optimization standpoint.
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.