Skip to content

Conversation

@brendenehlers
Copy link
Contributor

Fix for: #6827

Added try/catch around call to OpenTelemetrySdk.close() to catch NoClassDefFoundException. If the code is running in a maven environment (determined by presence of maven.home system property), then the exception is logged out as a warning. Otherwise, the exception is re-thrown.

@brendenehlers brendenehlers requested a review from a team as a code owner May 9, 2025 03:04
@codecov
Copy link

codecov bot commented May 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.12%. Comparing base (1e763b2) to head (85718a0).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #7336   +/-   ##
=========================================
  Coverage     90.12%   90.12%           
- Complexity     7187     7188    +1     
=========================================
  Files           814      814           
  Lines         21700    21705    +5     
  Branches       2123     2123           
=========================================
+ Hits          19557    19562    +5     
  Misses         1477     1477           
  Partials        666      666           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

have you tested this with the repro in #6827 (comment)?

@brendenehlers
Copy link
Contributor Author

@trask originally built it there, but did a sanity check with the final product and realized that the logging dependencies might not be on the classpath when that hook is called resulting in the same error originally seen but with a different class.

while it might not be the best solution, i opted for System.out since it's guaranteed to be there when the code is run, so at least the end user will get the feedback even tho it won't be as tightly tied into the logging ecosystem. i figured it's at the end of the lifecycle so it should be fine.

@brendenehlers
Copy link
Contributor Author

@trask idk what changed with the test but the codecov check isn't picking up the coverage for the provided test. the code it's saying isn't covered must be hit otherwise the test would fail, so i'm wondering if i've misconfigured something from the original successful pipeline?

// https://github.com/open-telemetry/opentelemetry-java/issues/6827
if (IS_MAVEN) {
// logging deps might not be on the classpath at this point
System.out.printf("%s Flush failed during shutdown: %s%n", Level.WARNING, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal here is not to (overly) alarm people, but also let them know the flush on shutdown did not occur

@cyrille-leclerc wdyt?

Suggested change
System.out.printf("%s Flush failed during shutdown: %s%n", Level.WARNING, e);
System.out.println("%s Flush failed during shutdown, which is a known issue when running OpenTelemetry inside maven (for details see https://github.com/open-telemetry/opentelemetry-java/issues/6827)");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I forgot why this problem could occur in Maven builds.

The Maven OTel extension disables the OTel SDK shutdown hook in favor of explicitly shutting down the OTel SDK as part of the Maven plugin lifecycle. Is it cause by some tests like JUnit tests that run in process of the Maven build and launch an OTel SDK? Or is it happening when the Maven process is started with the OTel auto instrumentation agent?

As @trask highlighted, if there is a NoClassDefFoundError then some spans including the root spns are likely to not be exported which is a critical problem.

By the way, if this problem happens with Maven, shouldn't it also happen with the OTel Gradle Plugin?
FYI The Gradle OTel plugin also implements an explicit shutdown of the otel SDK here and there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyrille-leclerc it seems to have something to do with maven's class loader, you can find a simple repro here that doesn't even instantiate OpenTelemetry SDK: #6827 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess would be that this isn't maven specific. Probably by the time the shutdown hook runs the class loader that contains the sdk classes has been shut down and won't be able to load any more classes. For example the URLClassLoader provides a close method that could be used to trigger this. Perhaps the easiest way to repro would be to run on jdk8 (app loader is URLClassLoader) and manually call close on the app loader. Tomcat class loader also has similar behavior. With maven exec:java the class loader is shut down in https://github.com/mojohaus/exec-maven-plugin/blob/master/src/main/java/org/codehaus/mojo/exec/ExecJavaMojo.java#L334 and the class loader is a URLClassLoader https://github.com/mojohaus/exec-maven-plugin/blob/42bc369e89e51b474b25f03475b7c40af314a4a2/src/main/java/org/codehaus/mojo/exec/URLClassLoaderBuilder.java#L119
I probably would drop all the maven specific checks and just catch the NoClassDefFoundError. There should be a comment explaining under what circumstances the NoClassDefFoundError occurs. Sometimes issues like this can be fixed by ensuring that all classes needed for shutdown are loaded before the shutdown hook is run. Here I don't think this is really viable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to check my understanding, instead of handling it specifically for Maven I should change it to just catch the NoClassDefFoundError and log out the above message that @trask suggested?

@brendenehlers
Copy link
Contributor Author

woah time flies... removed the maven-specific code from the class so it handles the NoClassDefFoundError with a sysout.

@brendenehlers brendenehlers changed the title change exception logging when running in maven fix: change exception logging when running in maven Aug 22, 2025
@brendenehlers brendenehlers requested a review from trask August 22, 2025 03:25
Comment on lines 637 to 640
} catch (NoClassDefFoundError e) {
// https://github.com/open-telemetry/opentelemetry-java/issues/6827
// logging deps might not be on the classpath at this point
System.out.printf("%s Flush failed during shutdown: %s%n", Level.WARNING, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, although this might all be true in one very narrow case, it seems very non-obvious that a) flush is what would have failed (since close() was called, and it might do more than that), and are we sure that NoClassDefFoundError is the only thing we want to catch here? This feels like a bandaid, and if we want this band-aid, we should make it more comprehensive, and not assume the one narrow case we've run into so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you have in mind in this case? Happy to change it whatever it needs to be!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just catch Exception or Throwable to make sure we capture any possible failures?

@jkwatson jkwatson added the needs author feedback Waiting for additional feedback from the author label Sep 11, 2025
@github-actions
Copy link
Contributor

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

@github-actions github-actions bot added Stale and removed Stale labels Sep 22, 2025
@brendenehlers
Copy link
Contributor Author

Updated the catch to Throwable, so it will catch anything thrown as part of the shutdown process. Figured Throwable was better since that'll also catch the Error instances, not just Exception instances.

@github-actions github-actions bot removed the needs author feedback Waiting for additional feedback from the author label Sep 24, 2025
Copy link
Contributor

@jkwatson jkwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jkwatson jkwatson merged commit 7ea2dd9 into open-telemetry:main Sep 25, 2025
29 checks passed
@otelbot
Copy link
Contributor

otelbot bot commented Sep 25, 2025

Thank you for your contribution @brendenehlers! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey.

the-clam pushed a commit to the-clam/opentelemetry-java that referenced this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants