fix(otelx): memory leak in OpenTelemetry tracing middleware#689
fix(otelx): memory leak in OpenTelemetry tracing middleware#689alnr merged 2 commits intoory:masterfrom
Conversation
Hydra serves a large number of requests in our setup. As stated in the PR description, the memory consumption of our pods exceeded 20 GB over the weekend, indicating a clear problem. I set up a local environment with otel-collector and used a script to send a large number of requests (always for the same client/subject - no need for complex request generation). I dumped the memory profile. Unfortunately, I no longer have the file, but it showed a lot of allocations related to the global structures the OTEL stacks write to when creating a new handler. My profile excluded the startup phase, so It didn't include the heavy but unrelated/unproblematic allocations from it. Deploying the fix to our environment confirmed that it the large memleak was gone. After implementing the fix, I found what I've linked in the description. Seems like this mistake occured in other projects as well. PS: To me it looks the CI errors are unrelated to fix? Is there something I can do? |
|
Interesting - did you use the standard Go memory profiling? We tried identifying the memory leak using the same methodologies but due to a secondary context leak caused by this were not able to identify the culprit. I reran the CI - let’s see if it passes |
|
You my friend are amazing! |
* Allocate otelhttp.Handler only once (cherry picked from commit 61a4c85)
This PR addresses a memory leak issue in the OpenTelemetry tracing middleware.
The existing middleware invokes NewHandler for each incoming request, which results in the allocation of a new oteltracehttp.Handler.
In our setup, this led to an increase of over 300MB per day per Hydra instance in allocated memory. Similar observations have been reported in open-telemetry/opentelemetry-go-contrib#2413.
This PR modifies the middleware to utilize a SpanNameFormatter function instead of passing the path to NewHandler. This preserves the existing functionality while eliminating the memory leak.
Related Issue or Design Document
The issue has been reported in projects using the library (e.g., ory/hydra#3495).
Checklist
If this pull request addresses a security vulnerability,
I confirm that I got approval (please contact [email protected]) from the maintainers to push the changes.
Further comments