Add measures that are updated for each message#206
Add measures that are updated for each message#206semistrict wants to merge 1 commit intocensus-instrumentation:masterfrom
Conversation
| | grpc.io/client/roundtrip_latency | grpc.io/client/roundtrip_latency | distribution | grpc_client_method | | ||
| | grpc.io/client/completed_rpcs | grpc.io/client/roundtrip_latency | count | grpc_client_method, grpc_client_status | | ||
| | grpc.io/client/started_rpcs | grpc.io/client/started_rpcs | count | grpc_client_method | | ||
| | grpc.io/client/sent_message_bytes | grpc.io/client/sent_message_bytes | distribution | grpc_client_method | |
There was a problem hiding this comment.
You'd probably want the names to be consistent with the existing metrics such as "sent_bytes_per_rpc". So this would be "sent_bytes" instead.
There was a problem hiding this comment.
I wanted it to be clear that these are recorded for each message rather than for the whole RPC.
There was a problem hiding this comment.
"sent_message_bytes" doesn't make that more clear. To a random reader, the message here may be (mis)taken as describing the source of the bytes (message rather than metadata) instead of the granularity at which they are counted.
With the same idea as "sent_bytes_per_rpc", I would suggest "sent_bytes_per_message"
There was a problem hiding this comment.
FYI the names are updated.
| | grpc.io/client/started_rpcs | grpc.io/client/started_rpcs | count | grpc_client_method | | ||
| | grpc.io/client/sent_message_bytes | grpc.io/client/sent_message_bytes | distribution | grpc_client_method | | ||
| | grpc.io/client/received_message_bytes | grpc.io/client/received_message_bytes | distribution | grpc_client_method | | ||
| | grpc.io/client/sent_message_count | grpc.io/client/sent_message_bytes | count | grpc_client_method | |
There was a problem hiding this comment.
Should be referred as "count", not "bytes".
There was a problem hiding this comment.
This has the same rational to https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/gRPC.md#why-are-completed_rpcs-views-defined-over-latency-measures. We can calculate the message count by how many times "sent_message_bytes" are recorded.
There was a problem hiding this comment.
yes, I will remove the _count views, since the count will be in the distribution of the _bytes views
There was a problem hiding this comment.
Please keep the _count views. They will be useful when mapping to internal things.
There was a problem hiding this comment.
Sorry please ignore the former comments - just checked, we only need the measures (not views) internally.
|
Friendly ping @Ramonza |
5d6e049 to
18cd308
Compare
|
Friendly ping @zhangkun83 and @bogdandrutu. Need this PR to unblock census-instrumentation/opencensus-java#1563. |
18cd308 to
7e590a5
Compare
For streaming RPCs, especially long-lived ones, we want to have stats for each message as it is sent/received, rather than waiting until the end of the RPC to record stats.
7e590a5 to
418d06f
Compare
|
I just realized there is a caveat for implementing this with GRPC. GRPC can't always associate wire bytes with individual messages. For example, when stream-level compression is used, while GRPC can see bytes coming in, and messages being parsed, but can't reliably see how many bytes are accounted for each message. If these new metrics are intended by OpenCensus, not just because of the internal streamz requirement, GRPC library can try to get a best estimation, where message counts will be accurate, but the size distribution won't be 100% accurate. Otherwise, a plain byte counter (not distribution) and a separate message counter will be more feasible. |
|
@zhangkun83 do you mean that at one point you will get two numbers:
Is this the case? |
|
I get notification about each message, while the size of the message is not always available. On the other hand, I get notification about a chunk of outbound or inbound bytes, which are not limited to a single message (could be partial message and could be across multiple messages). I do know when the RPC starts and finishes. For the existing metrics, I get the message count per RPC from the first notification, and the byte count per RPC from the second notification. Since there is no timing guarantee between the two types of notifications, I can't associate them based on the order they are called. |
I think this makes more sense. We can mention in the description of these metrics that when compression is used, the bytes histogram distribution may not be always correct. WDYT? |
|
We can't guarantee the distribution is accurate even without stream compression, because there is no timing guarantee between the two types of notifications (message vs. byte blobs) in any case. |
|
I see, in this case the current metrics for the sent/received bytes only represent the distribution of bytes by each In this case I would recommend while keeping the current "per-record" distribution of bytes (IMO they're still useful), we also add *_MESSAGE_COUNT measures to get the accurate count. |
|
It depends on how you define the "per-record" call. If it's recorded per arbitrary chunk of data (not necessarily per message), having the distribution doesn't seem useful. If you still define it as "per-message", GRPC would still call |
Chatted offline, with the stream compression it will be very difficult to get the exact real-time per-message bytes. IMO it will be confusing to export a distribution of random chunks of bytes (looks like we had similar issue in Tracing before and we made the bytes optional). For now I'm leaning towards just having two accumulations of bytes and counts, since the cumulative stats are the only "exact" stats we can get. WDYT? @bogdandrutu |
|
Is this true in all the languages or just a Java problem? |
|
With stream-based compression it's impossible to accurately associate wire bytes with messages, and that's by design for all languages. |
|
Closing this in favor of #212 |
For streaming RPCs, especially long-lived ones, we want to have
stats for each message as it is sent/received, rather than waiting
until the end of the RPC to record stats.