Failsafe 3.0 instrumentation introduced#14057
Conversation
| | [Elasticsearch API Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html) | 7.16 - 7.17.19,<br>8.0 - 8.9.+ [4] | N/A | [Elasticsearch Client Spans] | | ||
| | [Elasticsearch REST Client](https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html) | 5.0+ | N/A | [Database Client Spans], [Database Client Metrics] [6] | | ||
| | [Elasticsearch Transport Client](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html) | 5.0+ | N/A | [Database Client Spans], [Database Client Metrics] [6] | | ||
| | [Failsafe](https://failsafe.dev/) | 3.0.1+ | [opentelemetry-failsafe-3.0](../instrumentation/failsafe-3.0/library) | none | |
There was a problem hiding this comment.
@trask I think this might be the first library only instrumentation we have. Do we need to point this out somehow here? Set the Auto-instrumented versions to N/A? Any suggestions?
There was a problem hiding this comment.
N/A in the auto-instrumented versions column sounds good to me 👍
There was a problem hiding this comment.
Hey, this is fixed. However, fyi, after I'm done with other Failsafe policies'(RetryPolicy, RateLimiter, Bulkhead) library instrumentation, I'm planning to work on the auto insturmentation.
8fd8249 to
8b580c2
Compare
| CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) { | ||
| LongCounter successCounter = | ||
| meter | ||
| .counterBuilder("failsafe.circuit_breaker.success.count") |
There was a problem hiding this comment.
@trask could you suggest units for these metrics
| public final class FailsafeTelemetry { | ||
| private static final String INSTRUMENTATION_NAME = "io.opentelemetry.failsafe-3.0"; | ||
|
|
||
| private static final AttributeKey<String> CIRCUIT_BREAKER_NAME = AttributeKey.stringKey("name"); |
There was a problem hiding this comment.
@trask is using name here fine or should it be circuit_breaker.name?
There was a problem hiding this comment.
I'd suggest going even further: failsafe.circuit_breaker.name
prefixing it with failsafe.* makes it clear it's an instrumentation specific attribute and we're not attempting to create a general circuit breaker semantic convention (which requires a lot more work)
| CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) { | ||
| LongCounter failureCounter = | ||
| meter | ||
| .counterBuilder("failsafe.circuit_breaker.failure.count") |
There was a problem hiding this comment.
maybe @trask can weigh in on this too because I'm not sure if it's directly applicable, but the docs around "recording errors on metrics" recommends using a single metric with an attribute that differentiates between success and failure
It’s RECOMMENDED to report one metric that includes successes and failures as opposed to reporting two (or more) metrics depending on the operation status.
| CircuitBreakerConfig<R> userConfig, Meter meter, Attributes attributes) { | ||
| LongCounter openCircuitBreakerCounter = | ||
| meter | ||
| .counterBuilder("failsafe.circuit_breaker.open.count") |
There was a problem hiding this comment.
maybe failsafe.circuit_breaker.state_changes
with attribute failsafe.circuit_breaker.state = open / half_open / closed?
| LongCounter failureCounter = | ||
| meter | ||
| .counterBuilder("failsafe.circuit_breaker.failure.count") | ||
| .setDescription("Count of failed circuit breaker executions.") |
There was a problem hiding this comment.
I'm not quite sure what this is measuring, is it just the number of times the circuit breaker has allowed an execution, or how many times the execution it has allowed has failed?
There was a problem hiding this comment.
This represents the number of executions failed by the configured circuit breaker, for example:
var circuitBreaker = CircuitBreaker
.builder()
.handleResultIf(r -> true) // Each execution is failure
.build();
Failsafe.with(circuitBreaker).get(() -> {
// Execution happens here...
return null;
});Failure count will always be incremented for this circuit breaker until the circuit breaker is opened. Once the circuit breaker is open, CircuitBreakerOpenException is thrown until it's half-open again so it is not counted as failure since the circuit breaker didn't let the execution happen.
After a second thought, I'm not sure if this metric is truly valuable, and the same applies to the number of successes. What do you think? Should we keep them?
041830f to
af6013f
Compare
|
Thank you for your contribution @onurkybsi! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. |
Co-authored-by: Lauri Tulmin <[email protected]>
As requested in #10856, Failsafe instrumentation introduced. Intrumentation of other Failsafe policies such as
RetryPolicy,RateLimiter,Bulkheadwill be added with upcoming PRs.