Skip to content

Conversation

@davidsh
Copy link
Contributor

@davidsh davidsh commented Jun 8, 2020

This is the first of several PRs to add telemetry focused events and counters.

A few things to note:

Contributes to #37428

This is the first of several PRs to add telemetry focused events and counters.

A few things to note:

* Added a new dedicated namespace for telemetry. Current feedback from partners is that our
existing events are "confusing and a mess" due to their focus on low-level diagnostics and verbose
debugging. See discussion in dotnet#37428 regarding reasons for a new namespace.

* Please comment on the naming patterns for the events, counters, and
display names, etc. I looked at ASP.NET events/counters and @anurse older guidance in
https://gist.github.com/anurse/af1859663ac91c6cf69c820cebe92303. But there are inconsistencies
between all of that. So, I'd like to get agreement on naming patterns for events/counters
(i.e. noun-verb vs. verb-noun, present tense vs. past tense, etc.).

* No automated tests were added at this time. The current discussions with others (reverse proxy
team) is that automated tests are brittle in CI. But manual tests will be run.

Contributes to dotnet#37428
@davidsh davidsh added this to the 5.0 milestone Jun 8, 2020
@davidsh davidsh self-assigned this Jun 8, 2020
@ghost
Copy link

ghost commented Jun 8, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

@halter73
Copy link
Member

halter73 commented Jun 9, 2020

No automated tests were added at this time. The current discussions with others (reverse proxy team) is that automated tests are brittle in CI. But manual tests will be run.

What makes testing this brittle?

I don't think Kestrel's EventSourceTests have been flaky in the past year an a half. The last time I remember flakiness in those tests, we determined it was caused by threadpool starvation which in turn was caused by sync-over-async when disposing Kestrel's test server. dotnet/aspnetcore#6127

@davidsh
Copy link
Contributor Author

davidsh commented Jun 10, 2020

What makes testing this brittle?

I asked this question about tests to @davidfowl in a recent YARP team meeting since PR dotnet/aspnetcore#21649 didn't have any tests. He remarked that it was tedious to write automated tests due to async operations etc. The consensus at the team meeting was to consider "cost benefit" to an automated test. At the very least, I suggested we still run manual tests of the telemetry.

* PR feedback
* Added RequestAbort as separate event method with dimensions (host, port)
@davidsh davidsh changed the title [WIP] Add telemetry to System.Net.Http Add telemetry to System.Net.Http Jun 10, 2020
@davidfowl
Copy link
Member

The tests unit tests for event counters are trickier than the usualy event source test because you're waiting for increment and decrements asynchronously (event listeners don't get triggered inline). It means you have this weird logic for waiting on the counter event to come back with data after changing the value.

@JamesNK
Copy link
Member

JamesNK commented Jun 11, 2020

An example of unit testing counters from grpc-dotnet: https://github.com/grpc/grpc-dotnet/blob/486446086540381be32d40dd675d47da6f989dff/test/FunctionalTests/Client/EventSourceTests.cs

They were a pain in the butt to get right, but they're not flaky.

@davidsh
Copy link
Contributor Author

davidsh commented Jun 11, 2020

They were a pain in the butt to get right, but they're not flaky.

Thanks for the pointer to those tests. Some networking event sources (Microsoft-System-Net) diagnostic tracing have some unit test coverage as well.

I agree that with effort automated tests can be written. The topic we discussed at an earlier team meeting was about "is it worth the dev cost right now". I'll add a tracking issue for adding automated test coverage for events/counters where we have gaps.

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @davidsh, it looked pretty good to me and hopefully the comments are useful for you.

* RequestStart now includes host, port, scheme, path and version
* RequestStart and RequestStop now bracket the lifetime of the request including reading the
response body stream. If the response body is not complete read then disposing the stream
will trigger RequestStop as well.
@davidsh
Copy link
Contributor Author

davidsh commented Jun 18, 2020

I pushed a new commit that redid how the RequestStart/RequestStop works. I discovered my first attempt at placing the RequestStart into SocketsHttpHandler didn't properly work with redirected requests. We ended up getting extra RequestStop events since more requests are generated internally by AllowAutoDirect=true.

This new commit seems to work good in initial manual testing with HTTP/1.1, HTTP/2.0 requests including redirection. I don't plan to make HTTP/3 work at this time.

@davidsh
Copy link
Contributor Author

davidsh commented Jun 21, 2020

@dotnet/ncl - Any additional feedback? If not, can someone please sign off?

Copy link
Contributor

@scalablecory scalablecory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after these comments.

@davidsh
Copy link
Contributor Author

davidsh commented Jun 23, 2020

/azp run runtime-libraries outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidsh
Copy link
Contributor Author

davidsh commented Jun 23, 2020

/azp run runtime-libraries outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@davidsh
Copy link
Contributor Author

davidsh commented Jun 24, 2020

CI failures in Outerloop caused by known issue #38205

@davidsh davidsh merged commit e5274e4 into dotnet:master Jun 24, 2020
@davidsh davidsh deleted the telemetry_http branch June 24, 2020 00:34
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.