add jvm.file_descriptor.limit metric#16174
Conversation
…nstrumentation into jvm-file-descriptor-limit
…nstrumentation into jvm-file-descriptor-limit
| metric -> | ||
| metric | ||
| .hasName("jvm.file_descriptor.count") | ||
| .hasInstrumentationScope(EXPECTED_SCOPE) | ||
| .hasDescription("Number of open file descriptors as reported by the JVM.") | ||
| .hasUnit("{file_descriptor}") | ||
| .hasLongSumSatisfying(sum -> sum.hasPointsSatisfying(point -> point.hasValue(0))), | ||
| metric -> | ||
| metric | ||
| .hasName("jvm.file_descriptor.limit") | ||
| .hasInstrumentationScope(EXPECTED_SCOPE) | ||
| .hasDescription("Measure of max open file descriptors as reported by the JVM.") | ||
| .hasUnit("{file_descriptor}") | ||
| .hasLongSumSatisfying(sum -> sum.hasPointsSatisfying(point -> point.hasValue(0)))); |
There was a problem hiding this comment.
[for reviewer] I've extended existing tests as-is, but I think what we really should have is no metric/data-point produced when the source value is negative instead of having a zero. Here when no limit it set (thus negative value) it does produces a limit of zero, which is incorrect and will thus be interpreted/filtered by the consumer. Do we have a way to express that using the metrics API at registration time ?
There was a problem hiding this comment.
I think we have limited options:
- Close the instrument when we see a negative observation
- Check before registering the instrument and don't register it in the first place
- Send
-1
Do you know if this can happen in practice, or is just theoretical?
There was a problem hiding this comment.
For the case at hand, I think this is mostly theoretical, as the implementation in the JVM implementation should always return a non-negative value and Integer.MAX_VALUE in the worst case (which could also be something we'd like to filter-out).
I don't know why we haven't seen this while implementing #13589 to filter negative metric values in jmx-metrics, if I understand this correctly we should have had metrics with type updowncounter to report zeroes instead of filtering negative values. I will probably need to investigate this further.
Among all the suggested options, closing the instrument seems the most "clean" way to solve this, are there any particular downside apart from the extra complexity ?
There was a problem hiding this comment.
Among all the suggested options, closing the instrument seems the most "clean" way to solve this, are there any particular downside apart from the extra complexity ?
it seems good to me also
cc @jack-berg
There was a problem hiding this comment.
I don't understand why a 0 value is being reported here. Given that the registered callback has if (value >= 0), no metric should be exported.
Is it asserting against data from a previous collection?
There was a problem hiding this comment.
@trask I think @SylvainJuge already summed it up. If the value is negative it is not recorded. If a value has not been recoded the instrument won't be reported. Once a value has been recoded the instrument is reported which makes the tests flaky as registerObservers_NegativeValue now will see different values depending on whether it runs first or after registerObservers.
the underlying resource limit can't be modified at runtime for the duration of the JVM execution
there is no transition from "with limit" to "no limit", so it's either set at startup or not set.
prlimit should let one change the limits for a running process https://man7.org/linux/man-pages/man1/prlimit.1.html
Note that according to an AI getMaxFileDescriptorCount does not return a negative value when no limit is set but rather Long.MAX_VALUE. -1 is returned when the limit can't be retrieved.
I think the current implementation is fine. The only thing that needs to be changed is that test methods must be made to run in a predictable order with @TestMethodOrder(MethodOrderer.OrderAnnotation.class) and org.junit.jupiter.api.Order annotations, a comment that explains why this is needed should also be added.
There was a problem hiding this comment.
Test order sounds like it would fix it, but I think this also indicates an underlying issue in the test harness's ability to isolate tests from each other.
Here's a minimal (passing) test that demonstrates the behavior I'm talking about: open-telemetry/opentelemetry-java@main...jack-berg:opentelemetry-java:async-demo
There was a problem hiding this comment.
Thanks @jack-berg, your example allows to prove that there is in fact no metric sample produced when there is no call to .record(...).
I think the overall problem we have here is that the test checks for the presence of a given metric and we have no convenient way to test the absence of one.
So, in order to make the test reliable and still relevant, we can test both negative and non-negative values in the same method by making the mock return multiple values, I've implemented this in 0311845, removing the condition in the implementation make the test fail.
Regarding the fact that the implementation might return Long.MAX_VALUE, this is also covered in this test, but it works without explicitly ignoring it so maybe by a happy accident.
Also, this should be quite rare to get it because I expect most OSes to use some default value rather than "no limit".
There was a problem hiding this comment.
we have no convenient way to test the absence of one
I think we typically use Thread.sleep(100) → clearData() → Thread.sleep(100) → assert empty pattern, e.g.
- Connection pool tests: HikariCP, C3P0, Apache DBCP, Vibur DBCP, Alibaba Druid
- Dropwizard metrics: DropwizardMetricsTest
- Runtime metrics: RuntimeMetricsTest
There was a problem hiding this comment.
Thanks @trask, I'll try to reuse that when needed, do you think/prefer me to rework the test here ? I think it should be ok to be merged as-is.
…nstrumentation into jvm-file-descriptor-limit
…nstrumentation into jvm-file-descriptor-limit
…nstrumentation into jvm-file-descriptor-limit
…nstrumentation into jvm-file-descriptor-limit
|
Semconv 1.40 has been published, thus we can mark this as ready as markdown links are now working. |
| metric -> | ||
| metric | ||
| .hasName("jvm.file_descriptor.count") | ||
| .hasInstrumentationScope(EXPECTED_SCOPE) | ||
| .hasDescription("Number of open file descriptors as reported by the JVM.") | ||
| .hasUnit("{file_descriptor}") | ||
| .hasLongSumSatisfying(sum -> sum.hasPointsSatisfying(point -> point.hasValue(0))), | ||
| metric -> | ||
| metric | ||
| .hasName("jvm.file_descriptor.limit") | ||
| .hasInstrumentationScope(EXPECTED_SCOPE) | ||
| .hasDescription("Measure of max open file descriptors as reported by the JVM.") | ||
| .hasUnit("{file_descriptor}") | ||
| .hasLongSumSatisfying(sum -> sum.hasPointsSatisfying(point -> point.hasValue(0)))); |
…nstrumentation into jvm-file-descriptor-limit
Add support for
jvm.file_descriptor.limitmetric.Opened as a draft because:
large refactor ofruntime-telemetrynot merged Unify runtime-telemetry modules #16087, thus waiting until it's done to prevent headaches trying to add this metric toruntime-telemetry.Update: adding implementation in
runtime-metricswith e1e2cb4 as #16087 might take a bit of time to be merged and dealing with conflicts should be minimal work, hence making this ready for review for implementation, the not yet released semconv link still makes test fail.