Collect some of the trace_api.* metrics in the trace flusher#418
Collect some of the trace_api.* metrics in the trace flusher#418
trace_api.* metrics in the trace flusher#418Conversation
e0090f8 to
f442b4d
Compare
f442b4d to
2d2ab5e
Compare
19a7ade to
6fda63a
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #418 +/- ##
==========================================
- Coverage 66.26% 65.92% -0.34%
==========================================
Files 189 191 +2
Lines 23650 23767 +117
==========================================
- Hits 15672 15669 -3
- Misses 7978 8098 +120
|
|
|
||
| #[derive(Debug, Clone)] | ||
| pub struct SendData { | ||
| pub tracer_payloads: Vec<pb::TracerPayload>, |
There was a problem hiding this comment.
Do we really need to expose tracer_payloads publicly? From what I can tell, it's only for the tests in the trace-mini-agent, which is not a good pattern. We shouldn't be exposing internal implementation details just for unit tests in this way.
If we are only exposing tracer_payloads for tests, could we refactor the tests to use httpmock and properly test the outputs? If that is something that we want to explore, I think we could do that in a follow-up PR since we want to get this merged in soon. Maybe we can just add a TODO comment then?
There was a problem hiding this comment.
This is a good point. We shouldn't expose internals unnecessarily in this way, but I think that going all the way to sending to the httpmock isn't necessary. Just adding a test only method on the SendData implementation that hands out payload number X, is fine by me.
There was a problem hiding this comment.
I struggled with this because coalesce_send_data needs to mutate the tracer_payloads fields so I thought about implementing a getter to return a mutable reference but in the end is kind of the same as making the the field public. Also I thought about restructuring the SendData and TraceFlusher to improve the encapsulation but I realized it'd overcomplicate this PR. Anyway if you guys think the getter is a better solution we can go that way or if you can think of a better solution please let me know.
There was a problem hiding this comment.
That can be public to the crate, right?
| } | ||
| } | ||
|
|
||
| async fn update( |
There was a problem hiding this comment.
@hoolioh How do you envision this working with retry logic? Would we call update for each attempt of a request payload? Or should update only be called for a payload once, regardless of how many retries it took?
There was a problem hiding this comment.
Since the retry logic tries to minimize errors I'd choose the latter because it'd be confusing to send errors or dropped statistics for traces that will end up being sent.
There was a problem hiding this comment.
@hoolioh Ah, that's a great point. Maybe retry metrics should be it's own thing at some point? It may be useful information.
There was a problem hiding this comment.
I think that the retries can be deduced from the number of API requests, the number of failures of different types, as well as the number of dropped traces.
| } | ||
|
|
||
| impl SendDataResult { | ||
| fn new() -> SendDataResult { |
There was a problem hiding this comment.
Since you're not taking in any parameters, should this just be a Default impl?
| use serde::{Deserialize, Serialize}; | ||
| use std::collections::HashMap; | ||
|
|
||
| pub use crate::send_data::SendData; |
There was a problem hiding this comment.
nitpick: Should we move trace_utils to use a mod.rs file like we do in the sidecar to handle exporting like this? Feels a bit awkward / indirect like this. (If we want to do this, we can do in a separate PR).
| futures.push(self.send( | ||
| self.trace_api_responses, | ||
| *count as f64, | ||
| vec![Tag::new("status_code", status_code.to_string().as_str()).unwrap()], |
There was a problem hiding this comment.
The most minor of nitpicks, feel free to ignore: &status_code.to_string() also works and is a bit less verbose.
| } | ||
| } | ||
|
|
||
| pub fn collect_metrics(&self) -> TraceFlusherMetrics { |
There was a problem hiding this comment.
I know it's technically not mutable, but I'm not sure the name collect_metrics conveys the actual mutability of this function on TraceFlusherMetrics? Also, I don't think it's actually collecting metrics at this point (the caller is doing the collecting)?
What about calling the function remove(), similar to the remove function in HashMap? Or something like fetch_and_clear?
bantonsson
left a comment
There was a problem hiding this comment.
As mentioned internally, it’s fine to merge this as is and make changes in the follow up PR.
6fda63a to
ef3c62e
Compare
What does this PR do?
This PR splits the
tracer_utilsinto smaller parts and collects metrics for:These metrics are then sent by the "self telemetry" worker of the sidecar.
Motivation
What inspired you to submit this pull request?
Additional Notes
Anything else we should know when reviewing?
How to test the change?
Describe here in detail how the change can be validated.