Skip to content

Conversation

@fandreuz
Copy link
Contributor

Fixes #7051

@fandreuz fandreuz requested a review from a team as a code owner May 12, 2025 22:27
@fandreuz
Copy link
Contributor Author

I need to add a bunch more tests, I just wanted to check in to see if the approach is sound. The idea is to have a sort of aggregatorHolder for all storage types, which can be swapped with a DropAggregator according to the current state of the SdkMeter. The complexity lies in how to do this without having partial updates which may result in unexpected recordings.

@codecov
Copy link

codecov bot commented May 16, 2025

Codecov Report

❌ Patch coverage is 93.87755% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.04%. Comparing base (1c5dd50) to head (fbfc03c).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
...try/sdk/metrics/internal/SdkMeterProviderUtil.java 75.00% 2 Missing ⚠️
...sdk/metrics/internal/state/EmptyMetricStorage.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #7346      +/-   ##
============================================
- Coverage     90.06%   90.04%   -0.02%     
- Complexity     7104     7114      +10     
============================================
  Files           805      805              
  Lines         21484    21517      +33     
  Branches       2093     2098       +5     
============================================
+ Hits          19349    19376      +27     
- Misses         1472     1476       +4     
- Partials        663      665       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…l/state/DefaultSynchronousMetricStorage.java
@fandreuz
Copy link
Contributor Author

Hi @jack-berg, any comments on this? I think it's ready for a first round of review

@jack-berg
Copy link
Member

Hey @fandreuz - sorry for the delay. I'm swamped with internal stuff right now. Trying to get in a spot to be able to spend more time doing code reviews for otel next week.

private final Map<RegisteredReader, MetricStorageRegistry> readerStorageRegistries;
private final boolean meterEnabled;

private boolean meterEnabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is going to need some sort of memory barrier around it. Either make it an AtomicBoolean, or provide some kind of synchronization around the update to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jack-berg should this field be marked volatile ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other similar situations we've punted on the volatile keyword with the argument that even the small performance impact of reading a volatile field vs a non-volatile field is too high a price to pay the dynamism we get from this feature, which will only be leveraged by a few users. I know that in theory not marking it volatile may mean that the changes are never seen by readers, but I'd like to see complaints of that in practice before paying the performance penalty.

In this particular instance, adding volatile isn't terribly consequential because the field is only read on collection. The read work happens down in DefaultSynchronousMetricStorage, which has an enabled field which is also marked non-volatile and which is read each and every time a measurement is recorded. So we could update SdkMeter#meterEnabled to be volatile, but I don't think we should make DefaultSynchronousMetricStorage#enabled volatile. So the question is, should we be mark SdkMeter#meterEnabled because we can, even though its not really consequential? Or leave it be for consistency with SdkTracer, SdkLogger, DefaultSynchronousMetricStorage#enabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough. I'm ok with it, if you're ok with it.

private final AggregationTemporality aggregationTemporality;
private final Aggregator<T, U> aggregator;
private volatile AggregatorHolder<T, U> aggregatorHolder = new AggregatorHolder<>();
private final AtomicReference<AggregatorHolder<T, U>> aggregatorHolder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this change affect performance at all? I wouldn't mix changes like this into this PR unless there's a very specific reason for it. If we change internals like this, we should do it in an isolated PR and provide before/after benchmarks to show it is not harmful, or (hopefully) helpful.

Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in review.

Been thinking about this for a while, and don't think its completely obvious how the SDK should behave in a couple of cases surrounding changes to a meter being enabled:

Consider instrument A has cumulative temporality, and is part of meter Foo. Foo is initially enabled, then disabled, then later enabled, all while receiving a steady stream of measurements. It shouldn't be part of exported metrics when its disabled. Its measurements shouldn't be considered when its disabled. But when it is re-enabled, should it include measurements prior to disabling? What if its disabled and re-enabled quickly, before its omitted from any export? What should the start timestamp of the metric be in each export?

If this same thing occurs when temporality is delta its simpler to reason about. Since delta state is reset after each collection, there is no need to answer the question about measurements prior to disabling.

Now consider the same instrument, but the meter is initially disabled then later enabled. Same questions as above about what the start timestamp of each metric export should be.

Given that there is no specification language on this, I propose we do the simplest reasonable thing for the time being. The goal being to avoid cluttering / making the metrics SDK implementation even more complex, while getting something out there to solicit feedback. If the simplest thing turns out to be insufficient, we can iterate and improve.

Here's a commit where I tinker with your branch a bit with dirt simple mechanics: cd130c8

Idea:

  • Whether or not a meter is enabled, always instantiate the instrument and storage normally
  • When a meter is disabled, don't record measurements to synchronous instruments
  • When a meter is disabled, don't call asynchronous instrument callbacks
  • Whether or not a meter is enabled, always call collect on all the storage for the meter. Update storage implementations to be aware of whether their meter is enabled / disabled, and if disabled, return an empty metric after performing all state management.

This has the following properties:

  • Minimally invasive
  • Lets metric storage manage state as is
  • Avoids perf his of recording measurements when a meter is disabled
  • When temporality is delta and a meter is updated to be disabled, state is naturally reset after the next collection, and subsequent collects are cheap
  • When temporality is cumulative and a meter is updated to be disabled, state is retained and may be incremented if the meter is later reenabled. Exported cumulative metrics always carry a start time corresponding to the start time of the metrics SDK.

@fandreuz
Copy link
Contributor Author

fandreuz commented Jun 17, 2025

Hi @jack-berg, thanks for getting back on this. I agree that my solution is a bit invasive and some choices I took are arbitrary. Yours seems to be a good compromise. Let me know what I should do with this PR.

I see many tests commented, are they all failing/not compiling?

@jack-berg
Copy link
Member

Let me know what I should do with this PR. I see many tests commented, are they all failing/not compiling?

I only commented those out to quickly see if the idea was feasible. I'm happy to uncomment / fix those tests and push them to this PR branch, or let you merge in my commit and fix up the tests. No preference here!

@fandreuz
Copy link
Contributor Author

fandreuz commented Jun 20, 2025

I ran the tests and some of them are failing, I'll try to fix them

@jkwatson
Copy link
Contributor

I ran the tests and some of them are failing, I'll try to fix them

What's the status on this?

@fandreuz
Copy link
Contributor Author

Hi @jkwatson, I'm a bit swamped. Somebody else can take it, otherwise I'll get back to this when I have some more time.

@jack-berg
Copy link
Member

I'll pick this up and push commits to this PR if you don't mind @fandreuz

@jkwatson jkwatson merged commit 8ea6895 into open-telemetry:main Sep 19, 2025
29 checks passed
@otelbot
Copy link
Contributor

otelbot bot commented Sep 19, 2025

Thank you for your contribution @fandreuz! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey.

the-clam pushed a commit to the-clam/opentelemetry-java that referenced this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add set{Signal}Configurator support to LoggerProvider, MeterProvider

3 participants