Skip to content

fix(otelx): memory leak in OpenTelemetry tracing middleware#689

Merged
alnr merged 2 commits intoory:masterfrom
jgraeger:jg/otel-memleak
May 9, 2023
Merged

fix(otelx): memory leak in OpenTelemetry tracing middleware#689
alnr merged 2 commits intoory:masterfrom
jgraeger:jg/otel-memleak

Conversation

@jgraeger
Copy link
Copy Markdown
Contributor

@jgraeger jgraeger commented May 4, 2023

This PR addresses a memory leak issue in the OpenTelemetry tracing middleware.

The existing middleware invokes NewHandler for each incoming request, which results in the allocation of a new oteltracehttp.Handler.

func TraceHandler(h http.Handler, opts ...otelhttp.Option) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		NewHandler(h, r.URL.Path, opts...).ServeHTTP(w, r)
	})
}

In our setup, this led to an increase of over 300MB per day per Hydra instance in allocated memory. Similar observations have been reported in open-telemetry/opentelemetry-go-contrib#2413.

This PR modifies the middleware to utilize a SpanNameFormatter function instead of passing the path to NewHandler. This preserves the existing functionality while eliminating the memory leak.

Related Issue or Design Document

The issue has been reported in projects using the library (e.g., ory/hydra#3495).

Checklist

  • I have read the contributing guidelines and signed the CLA.
  • I have referenced an issue containing the design document if my change introduces a new feature.
  • I have read the security policy.
  • I confirm that this pull request does not address a security vulnerability.
    If this pull request addresses a security vulnerability,
    I confirm that I got approval (please contact [email protected]) from the maintainers to push the changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added the necessary documentation within the code base (if appropriate).

Further comments

  • The otelhttptrace applies the passed options in sequential order. We prepend the SpanNameFormatter to allow users to override the SpanNameHandler if needed.

@jgraeger jgraeger changed the title fix(otelx): Allocate otelhttp.Handler only once per middleware fix(otelx): memory leak in OpenTelemetry tracing middleware May 4, 2023
@aeneasr
Copy link
Copy Markdown
Member

aeneasr commented May 4, 2023

You’re the boss, amazing find! How did you identify this?

@zepatrik @alnr

@jgraeger
Copy link
Copy Markdown
Contributor Author

jgraeger commented May 8, 2023

How did you identify this?

Hydra serves a large number of requests in our setup. As stated in the PR description, the memory consumption of our pods exceeded 20 GB over the weekend, indicating a clear problem.

I set up a local environment with otel-collector and used a script to send a large number of requests (always for the same client/subject - no need for complex request generation). I dumped the memory profile. Unfortunately, I no longer have the file, but it showed a lot of allocations related to the global structures the OTEL stacks write to when creating a new handler. My profile excluded the startup phase, so It didn't include the heavy but unrelated/unproblematic allocations from it. Deploying the fix to our environment confirmed that it the large memleak was gone.

After implementing the fix, I found what I've linked in the description. Seems like this mistake occured in other projects as well.

PS: To me it looks the CI errors are unrelated to fix? Is there something I can do?

@aeneasr
Copy link
Copy Markdown
Member

aeneasr commented May 8, 2023

Interesting - did you use the standard Go memory profiling? We tried identifying the memory leak using the same methodologies but due to a secondary context leak caused by this were not able to identify the culprit.

I reran the CI - let’s see if it passes

@alnr alnr merged commit 61a4c85 into ory:master May 9, 2023
@alnr
Copy link
Copy Markdown
Contributor

alnr commented May 9, 2023

You my friend are amazing!

@jgraeger jgraeger deleted the jg/otel-memleak branch May 9, 2023 16:36
@alnr alnr mentioned this pull request May 16, 2023
splaunov pushed a commit to monta-app/x that referenced this pull request Sep 5, 2023
* Allocate otelhttp.Handler only once

(cherry picked from commit 61a4c85)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants