Conversation
| """ | ||
| Creates an LLM endpoint for the current user. | ||
| """ | ||
| external_interfaces.monitoring_metrics_gateway.emit_route_call_metric( |
There was a problem hiding this comment.
Should we emit a metric here? Or add tags to the current trace? cc @song-william
There was a problem hiding this comment.
I think adding tags to the current trace would "kill 2 birds with one stone" so we can search for traces based on user_id as well.
Metrics generated by Traces are not sampled out: https://docs.datadoghq.com/tracing/metrics/metrics_namespace/
There was a problem hiding this comment.
@saiatmakuri If we add a tag to the existing trace, then add this logic into the auth dependency. This would then make it so we don't need to copy-pasta this call to each route.
There was a problem hiding this comment.
Ahh, it looks like we can't customize the tags for trace-based metrics.
Trace metrics tags, possible tags are: env, service, version, resource, sublayer_type, sublayer_service, http.status_code, http.status_class, Datadog Agent tags (including the host and second primary tag). Note: Tags set on spans do not count and will not be available as tags for your traces metrics.
https://docs.datadoghq.com/tracing/metrics/metrics_namespace/ (edited)
| def emit_database_cache_miss_metric(self): | ||
| self.database_cache_miss += 1 | ||
|
|
||
| def emit_route_call_metric(self, route: str, _metadata: MetricMetadata): |
There was a problem hiding this comment.
Maybe outside of the scope of this PR, but might be good to adopt Datadog-esque terminology, where increment means += 1. emit to me sounds more like a gauge or count.
There was a problem hiding this comment.
Do you have a link to the "Datadog-esque terminology" you are referring to? What is currently here looks like it is just pattern matching the emit_* pattern.
There was a problem hiding this comment.
Yeah was thinking of https://docs.datadoghq.com/metrics/custom_metrics/dogstatsd_metrics_submission/

But yeah acknowledge that this is just following the existing naming pattern (which is what I think could be tweaked).
song-william
left a comment
There was a problem hiding this comment.
Approving to unblock
song-william
left a comment
There was a problem hiding this comment.
Thanks for addressing the comments!
|
|
||
| llm_router_v1 = APIRouter(prefix="/v1/llm") | ||
|
|
||
| async def record_route_call( |
There was a problem hiding this comment.
We could consider adding this at the highest level so all model-engine routes get it (e.g bundles/endpoits).
| def emit_database_cache_miss_metric(self): | ||
| self.database_cache_miss += 1 | ||
|
|
||
| def emit_route_call_metric(self, route: str, _metadata: MetricMetadata): |
There was a problem hiding this comment.
Do you have a link to the "Datadog-esque terminology" you are referring to? What is currently here looks like it is just pattern matching the emit_* pattern.
Emit metrics on each route calls for the LLM endpoints.
Tested locally with:
curl -H "Authorization: <AUTH>" http://localhost:5001/v1/llm/model-endpoints