Skip to content
This repository was archived by the owner on Oct 3, 2023. It is now read-only.

Add measures that are updated for each message#206

Closed
semistrict wants to merge 1 commit intocensus-instrumentation:masterfrom
semistrict:grpc-streaming-stats
Closed

Add measures that are updated for each message#206
semistrict wants to merge 1 commit intocensus-instrumentation:masterfrom
semistrict:grpc-streaming-stats

Conversation

@semistrict
Copy link
Copy Markdown
Contributor

For streaming RPCs, especially long-lived ones, we want to have
stats for each message as it is sent/received, rather than waiting
until the end of the RPC to record stats.

Comment thread stats/gRPC.md Outdated
| grpc.io/client/roundtrip_latency | grpc.io/client/roundtrip_latency | distribution | grpc_client_method |
| grpc.io/client/completed_rpcs | grpc.io/client/roundtrip_latency | count | grpc_client_method, grpc_client_status |
| grpc.io/client/started_rpcs | grpc.io/client/started_rpcs | count | grpc_client_method |
| grpc.io/client/sent_message_bytes | grpc.io/client/sent_message_bytes | distribution | grpc_client_method |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd probably want the names to be consistent with the existing metrics such as "sent_bytes_per_rpc". So this would be "sent_bytes" instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted it to be clear that these are recorded for each message rather than for the whole RPC.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sent_message_bytes" doesn't make that more clear. To a random reader, the message here may be (mis)taken as describing the source of the bytes (message rather than metadata) instead of the granularity at which they are counted.

With the same idea as "sent_bytes_per_rpc", I would suggest "sent_bytes_per_message"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the names are updated.

Comment thread stats/gRPC.md Outdated
| grpc.io/client/started_rpcs | grpc.io/client/started_rpcs | count | grpc_client_method |
| grpc.io/client/sent_message_bytes | grpc.io/client/sent_message_bytes | distribution | grpc_client_method |
| grpc.io/client/received_message_bytes | grpc.io/client/received_message_bytes | distribution | grpc_client_method |
| grpc.io/client/sent_message_count | grpc.io/client/sent_message_bytes | count | grpc_client_method |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be referred as "count", not "bytes".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will remove the _count views, since the count will be in the distribution of the _bytes views

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the _count views. They will be useful when mapping to internal things.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry please ignore the former comments - just checked, we only need the measures (not views) internally.

@songy23
Copy link
Copy Markdown
Contributor

songy23 commented Nov 9, 2018

Friendly ping @Ramonza

@semistrict semistrict force-pushed the grpc-streaming-stats branch from 5d6e049 to 18cd308 Compare November 9, 2018 17:49
@songy23
Copy link
Copy Markdown
Contributor

songy23 commented Nov 12, 2018

Friendly ping @zhangkun83 and @bogdandrutu.

Need this PR to unblock census-instrumentation/opencensus-java#1563.

For streaming RPCs, especially long-lived ones, we want to have
stats for each message as it is sent/received, rather than waiting
until the end of the RPC to record stats.
Copy link
Copy Markdown

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhangkun83
Copy link
Copy Markdown

I just realized there is a caveat for implementing this with GRPC.

GRPC can't always associate wire bytes with individual messages. For example, when stream-level compression is used, while GRPC can see bytes coming in, and messages being parsed, but can't reliably see how many bytes are accounted for each message.

If these new metrics are intended by OpenCensus, not just because of the internal streamz requirement, GRPC library can try to get a best estimation, where message counts will be accurate, but the size distribution won't be 100% accurate. Otherwise, a plain byte counter (not distribution) and a separate message counter will be more feasible.

@bogdandrutu
Copy link
Copy Markdown
Contributor

@zhangkun83 do you mean that at one point you will get two numbers:

  • num_messages -> not always 1
  • size_for_all_these_messages

Is this the case?

@zhangkun83
Copy link
Copy Markdown

zhangkun83 commented Nov 14, 2018

I get notification about each message, while the size of the message is not always available. On the other hand, I get notification about a chunk of outbound or inbound bytes, which are not limited to a single message (could be partial message and could be across multiple messages). I do know when the RPC starts and finishes.

For the existing metrics, I get the message count per RPC from the first notification, and the byte count per RPC from the second notification. Since there is no timing guarantee between the two types of notifications, I can't associate them based on the order they are called.

@songy23
Copy link
Copy Markdown
Contributor

songy23 commented Nov 15, 2018

GRPC library can try to get a best estimation, where message counts will be accurate, but the size distribution won't be 100% accurate.

I think this makes more sense. We can mention in the description of these metrics that when compression is used, the bytes histogram distribution may not be always correct. WDYT?

@zhangkun83
Copy link
Copy Markdown

We can't guarantee the distribution is accurate even without stream compression, because there is no timing guarantee between the two types of notifications (message vs. byte blobs) in any case.

@songy23
Copy link
Copy Markdown
Contributor

songy23 commented Nov 15, 2018

I see, in this case the current metrics for the sent/received bytes only represent the distribution of bytes by each record call, and that's not correlated with the "per-message" distribution.

In this case I would recommend while keeping the current "per-record" distribution of bytes (IMO they're still useful), we also add *_MESSAGE_COUNT measures to get the accurate count.

@zhangkun83
Copy link
Copy Markdown

It depends on how you define the "per-record" call.

If it's recorded per arbitrary chunk of data (not necessarily per message), having the distribution doesn't seem useful.

If you still define it as "per-message", GRPC would still call record() on a per-message basis, while the size of each message may not be accurate (their sum would be), and you wouldn't need the *_MESSAGE_COUNT measures because the number of the distribution data points still aligns with the number of messages.

@songy23
Copy link
Copy Markdown
Contributor

songy23 commented Nov 16, 2018

If it's recorded per arbitrary chunk of data (not necessarily per message), having the distribution doesn't seem useful.

If you still define it as "per-message", GRPC would still call record() on a per-message basis, while the size of each message may not be accurate (their sum would be),

Chatted offline, with the stream compression it will be very difficult to get the exact real-time per-message bytes. IMO it will be confusing to export a distribution of random chunks of bytes (looks like we had similar issue in Tracing before and we made the bytes optional). For now I'm leaning towards just having two accumulations of bytes and counts, since the cumulative stats are the only "exact" stats we can get.

WDYT? @bogdandrutu

@bogdandrutu
Copy link
Copy Markdown
Contributor

Is this true in all the languages or just a Java problem?

@zhangkun83
Copy link
Copy Markdown

With stream-based compression it's impossible to accurately associate wire bytes with messages, and that's by design for all languages.

@semistrict
Copy link
Copy Markdown
Contributor Author

Closing this in favor of #212

@semistrict semistrict closed this Nov 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants