Add measures that are updated for each message by semistrict · Pull Request #206 · census-instrumentation/opencensus-specs

semistrict · 2018-11-02T17:59:06Z

For streaming RPCs, especially long-lived ones, we want to have
stats for each message as it is sent/received, rather than waiting
until the end of the RPC to record stats.

zhangkun83 · 2018-11-07T00:38:58Z

 | grpc.io/client/roundtrip_latency      | grpc.io/client/roundtrip_latency      | distribution | grpc_client_method                     |
 | grpc.io/client/completed_rpcs         | grpc.io/client/roundtrip_latency      | count        | grpc_client_method, grpc_client_status |
 | grpc.io/client/started_rpcs           | grpc.io/client/started_rpcs           | count        | grpc_client_method                     |
+| grpc.io/client/sent_message_bytes     | grpc.io/client/sent_message_bytes     | distribution | grpc_client_method                     |


You'd probably want the names to be consistent with the existing metrics such as "sent_bytes_per_rpc". So this would be "sent_bytes" instead.

I wanted it to be clear that these are recorded for each message rather than for the whole RPC.

"sent_message_bytes" doesn't make that more clear. To a random reader, the message here may be (mis)taken as describing the source of the bytes (message rather than metadata) instead of the granularity at which they are counted.

With the same idea as "sent_bytes_per_rpc", I would suggest "sent_bytes_per_message"

FYI the names are updated.

zhangkun83 · 2018-11-07T00:39:42Z

 | grpc.io/client/started_rpcs           | grpc.io/client/started_rpcs           | count        | grpc_client_method                     |
+| grpc.io/client/sent_message_bytes     | grpc.io/client/sent_message_bytes     | distribution | grpc_client_method                     |
+| grpc.io/client/received_message_bytes | grpc.io/client/received_message_bytes | distribution | grpc_client_method                     |
+| grpc.io/client/sent_message_count     | grpc.io/client/sent_message_bytes     | count        | grpc_client_method                     |


Should be referred as "count", not "bytes".

This has the same rational to https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/gRPC.md#why-are-completed_rpcs-views-defined-over-latency-measures. We can calculate the message count by how many times "sent_message_bytes" are recorded.

yes, I will remove the _count views, since the count will be in the distribution of the _bytes views

Please keep the _count views. They will be useful when mapping to internal things.

Sorry please ignore the former comments - just checked, we only need the measures (not views) internally.

songy23 · 2018-11-09T17:14:29Z

Friendly ping @Ramonza

songy23 · 2018-11-12T18:41:51Z

Friendly ping @zhangkun83 and @bogdandrutu.

Need this PR to unblock census-instrumentation/opencensus-java#1563.

For streaming RPCs, especially long-lived ones, we want to have stats for each message as it is sent/received, rather than waiting until the end of the RPC to record stats.

zhangkun83

LGTM

zhangkun83 · 2018-11-14T01:24:48Z

I just realized there is a caveat for implementing this with GRPC.

GRPC can't always associate wire bytes with individual messages. For example, when stream-level compression is used, while GRPC can see bytes coming in, and messages being parsed, but can't reliably see how many bytes are accounted for each message.

If these new metrics are intended by OpenCensus, not just because of the internal streamz requirement, GRPC library can try to get a best estimation, where message counts will be accurate, but the size distribution won't be 100% accurate. Otherwise, a plain byte counter (not distribution) and a separate message counter will be more feasible.

bogdandrutu · 2018-11-14T01:36:23Z

@zhangkun83 do you mean that at one point you will get two numbers:

num_messages -> not always 1
size_for_all_these_messages

Is this the case?

zhangkun83 · 2018-11-14T01:57:42Z

I get notification about each message, while the size of the message is not always available. On the other hand, I get notification about a chunk of outbound or inbound bytes, which are not limited to a single message (could be partial message and could be across multiple messages). I do know when the RPC starts and finishes.

For the existing metrics, I get the message count per RPC from the first notification, and the byte count per RPC from the second notification. Since there is no timing guarantee between the two types of notifications, I can't associate them based on the order they are called.

songy23 · 2018-11-15T18:20:25Z

GRPC library can try to get a best estimation, where message counts will be accurate, but the size distribution won't be 100% accurate.

I think this makes more sense. We can mention in the description of these metrics that when compression is used, the bytes histogram distribution may not be always correct. WDYT?

zhangkun83 · 2018-11-15T19:10:16Z

We can't guarantee the distribution is accurate even without stream compression, because there is no timing guarantee between the two types of notifications (message vs. byte blobs) in any case.

songy23 · 2018-11-15T19:20:20Z

I see, in this case the current metrics for the sent/received bytes only represent the distribution of bytes by each record call, and that's not correlated with the "per-message" distribution.

In this case I would recommend while keeping the current "per-record" distribution of bytes (IMO they're still useful), we also add *_MESSAGE_COUNT measures to get the accurate count.

zhangkun83 · 2018-11-16T02:06:58Z

It depends on how you define the "per-record" call.

If it's recorded per arbitrary chunk of data (not necessarily per message), having the distribution doesn't seem useful.

If you still define it as "per-message", GRPC would still call record() on a per-message basis, while the size of each message may not be accurate (their sum would be), and you wouldn't need the *_MESSAGE_COUNT measures because the number of the distribution data points still aligns with the number of messages.

songy23 · 2018-11-16T19:53:46Z

If it's recorded per arbitrary chunk of data (not necessarily per message), having the distribution doesn't seem useful.

If you still define it as "per-message", GRPC would still call record() on a per-message basis, while the size of each message may not be accurate (their sum would be),

Chatted offline, with the stream compression it will be very difficult to get the exact real-time per-message bytes. IMO it will be confusing to export a distribution of random chunks of bytes (looks like we had similar issue in Tracing before and we made the bytes optional). For now I'm leaning towards just having two accumulations of bytes and counts, since the cumulative stats are the only "exact" stats we can get.

WDYT? @bogdandrutu

bogdandrutu · 2018-11-16T20:02:39Z

Is this true in all the languages or just a Java problem?

zhangkun83 · 2018-11-16T22:07:04Z

With stream-based compression it's impossible to accurately associate wire bytes with messages, and that's by design for all languages.

semistrict · 2018-11-19T19:34:48Z

Closing this in favor of #212

semistrict requested a review from bogdandrutu as a code owner November 2, 2018 17:59

songy23 requested a review from zhangkun83 November 7, 2018 00:06

songy23 mentioned this pull request Nov 7, 2018

gRPC metrics: Add measures and views for real-time metrics in streaming RPCs. census-instrumentation/opencensus-java#1563

Merged

zhangkun83 reviewed Nov 7, 2018

View reviewed changes

semistrict force-pushed the grpc-streaming-stats branch from 5d6e049 to 18cd308 Compare November 9, 2018 17:49

songy23 approved these changes Nov 10, 2018

View reviewed changes

semistrict force-pushed the grpc-streaming-stats branch from 18cd308 to 7e590a5 Compare November 13, 2018 16:50

Add measures that are updated for each message

418d06f

For streaming RPCs, especially long-lived ones, we want to have stats for each message as it is sent/received, rather than waiting until the end of the RPC to record stats.

semistrict force-pushed the grpc-streaming-stats branch from 7e590a5 to 418d06f Compare November 13, 2018 16:51

songy23 approved these changes Nov 13, 2018

View reviewed changes

zhangkun83 approved these changes Nov 14, 2018

View reviewed changes

songy23 mentioned this pull request Nov 16, 2018

Add gRPC metrics that allow real-time reporting. #212

Merged

semistrict closed this Nov 19, 2018

Conversation

semistrict commented Nov 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

songy23 commented Nov 9, 2018

Uh oh!

songy23 commented Nov 12, 2018

Uh oh!

zhangkun83 left a comment

Choose a reason for hiding this comment

Uh oh!

zhangkun83 commented Nov 14, 2018

Uh oh!

bogdandrutu commented Nov 14, 2018

Uh oh!

zhangkun83 commented Nov 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

songy23 commented Nov 15, 2018

Uh oh!

zhangkun83 commented Nov 15, 2018

Uh oh!

songy23 commented Nov 15, 2018

Uh oh!

zhangkun83 commented Nov 16, 2018

Uh oh!

songy23 commented Nov 16, 2018

Uh oh!

bogdandrutu commented Nov 16, 2018

Uh oh!

zhangkun83 commented Nov 16, 2018

Uh oh!

semistrict commented Nov 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhangkun83 commented Nov 14, 2018 •

edited

Loading