Skip to content
This repository was archived by the owner on Jul 31, 2023. It is now read-only.

Update gRPC measures and views from spec#689

Merged
semistrict merged 5 commits intocensus-instrumentation:masterfrom
semistrict:update-grpc-stats
Apr 20, 2018
Merged

Update gRPC measures and views from spec#689
semistrict merged 5 commits intocensus-instrumentation:masterfrom
semistrict:update-grpc-stats

Conversation

@semistrict
Copy link
Copy Markdown
Contributor

@semistrict semistrict commented Apr 8, 2018

Fixes: #687

@semistrict semistrict force-pushed the update-grpc-stats branch 4 times, most recently from c531a42 to d07bc02 Compare April 9, 2018 08:40
@semistrict semistrict requested review from odeke-em and rakyll and removed request for rakyll April 9, 2018 08:51
Comment thread plugin/ocgrpc/client_metrics.go Outdated
ClientRequestCount = stats.Int64("grpc.io/client/request_count", "Number of client RPC request messages", stats.UnitNone)
ClientResponseCount = stats.Int64("grpc.io/client/response_count", "Number of client RPC response messages", stats.UnitNone)
ClientRoundTripLatency = stats.Float64("grpc.io/client/roundtrip_latency", "RPC roundtrip latency in msecs", stats.UnitMilliseconds)
ClientSentMessagesPerRPC = stats.Int64("grpc.io/client/sent_messages_per_rpc", "Number of messages sent in the RPC (always 1 for non-streaming RPCs).", stats.UnitNone)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess I hadn't seen the definition of "stats.UnitNone" which is a dimensionless unit, but it was confusing initially, "UnitNone" to me would be "" instead of "1" whereas perhaps we could use "stats.UnitDimensionless"

Comment thread plugin/ocgrpc/client_metrics.go Outdated
ClientRoundTripLatency = stats.Float64("grpc.io/client/roundtrip_latency", "RPC roundtrip latency in msecs", stats.UnitMilliseconds)
ClientSentMessagesPerRPC = stats.Int64("grpc.io/client/sent_messages_per_rpc", "Number of messages sent in the RPC (always 1 for non-streaming RPCs).", stats.UnitNone)
ClientSentBytesPerRPC = stats.Int64("grpc.io/client/sent_bytes_per_rpc", "Total bytes sent across all request messages per RPC.", stats.UnitBytes)
ClientReceivedMessagesPerRPC = stats.Int64("grpc.io/client/received_messages_per_rpc", "Number of response messages received per RPC (always 1 for non-streaming RPCs).", stats.UnitNone)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about using "1" as the spec requires.

if got, want := m.Description(), defn.desc; got != want {
t.Errorf("%q: Description = %q; want %q", defn.name, got, want)
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great spec enforcement test!

t.Fatalf("len(gotMeasures) = %d; want %d", got, want)
}

for i, v := range views {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, me likey!

case codes.OK:
return "OK"
case codes.Canceled:
return "CANCELLED"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Baby Jesus, perhaps let's follow what grpc-go is doing? Thanks for posting that @rakyll.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go codes look like just the names of the vars, highly unlikely to be consistent across languages. The examples in the OpenCensus spec are consistent with the gRPC doc (all upper case, CANCELLED with two "l", UPPER_SNAKE_CASE) which is the most official source I could find. Also explicitly added to the spec to avoid future confusion: census-instrumentation/opencensus-specs#97

Comment thread stats/record.go Outdated
var debug bool

func init() {
debug = os.Getenv("OC_STATS_DEBUG") != ""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document this perhaps?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it should be a documented feature of the library so I removed it instead.

Comment thread zpages/rpcz.go
}
for _, tag := range row.Tags {
if tag.Key == ocgrpc.KeyClientStatus && tag.Value != "OK" {
s.ErrorsTotal += int(count)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we removed the measure and view for viewing the total number of gRPC errors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so this is why I had to compute it from the CompletedRPCs measures

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a naive question: how are we then going to view and track the number of errors?

@odeke-em
Copy link
Copy Markdown
Member

odeke-em commented Apr 9, 2018

Thank you for this PR @Ramonza, it's great work, I like the rigor with the spec adherence. It is quite a large PR though, so I'll need to do a couple of takes on reviewing it.

view.RegisterExporter(&exporter.PrintExporter{})

// Register the view to collect client request count.
if err := view.Register(ocgrpc.ClientErrorCountView); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a naive question, how are now going to be able to track the number of errors in either the client or server?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now break down the requests by status, so you can get the total errors by adding up all the non-success statuses like we do in rpcz.

Ramon Nogueira added 3 commits April 10, 2018 10:19
TestCumulativenessFromHistograms failed on AppVeyor. Looks likely
to be timing related. Without rewriting the test to avoid timing
issues, I am just increasing the wait times in the hopes that this
will resolve the issue.
@semistrict
Copy link
Copy Markdown
Contributor Author

@odeke-em @rakyll PTAL


func statusCodeToString(s *status.Status) string {
// see https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
switch c := s.Code(); c {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually switch statements are very slow. Probably better to define a static map<Code, String>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the intuition behind that is? Did a benchmark and turns out that the switch is actually significantly faster, at least on my laptop:

name                                  time/op
StatusCodeToString_OK-8               2.66ns ± 0%
StatusCodeToString_Unauthenticated-8  2.88ns ± 0%
MapAlternativeImpl_OK-8               7.56ns ± 1%

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then keep the switch statement. the map search seems to me very slow, maybe is a Go thing with maps. The way how the switch statements are now implemented is using a 2-level jump table (https://www.codeproject.com/Articles/100473/Something-You-May-Not-Know-About-the-Switch-Statem) so it should be one extra jump, but probably the way how the map is implemented in Go is also 2-level jump (does not store the values directly in the map, instead it uses pointers).

@@ -0,0 +1,52 @@
package ocgrpc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license header.

Comment thread plugin/ocgrpc/benchmark_test.go Outdated
func BenchmarkMapAlternativeImpl_OK(b *testing.B) {
st := status.New(codes.OK, "OK")
for i := 0; i < b.N; i++ {
s := codeToStringMap[st.Code()]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codeToStringMap[st.Code()]

@semistrict semistrict merged commit 85464ad into census-instrumentation:master Apr 20, 2018
gopherbot pushed a commit to googleapis/google-cloud-go that referenced this pull request Apr 24, 2018
OpenCensus gRPC views are being updated to reflect the latest spec.
Remove a reference to an old view that will not longer exist after
the update, and instead subscribe to the default set of client-side
views.

OpenCensus PR: census-instrumentation/opencensus-go#689

Change-Id: I3308ec22b71c8548e3c7fd39cfad597cab65d241
Reviewed-on: https://code-review.googlesource.com/26370
Reviewed-by: Jonathan Amsterdam <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants