Skip to content

canonicalize: Only log errors at the WARN level when falling back#174

Merged
hawkw merged 1 commit intomasterfrom
eliza/less-scary-canonicalize
Jan 19, 2019
Merged

canonicalize: Only log errors at the WARN level when falling back#174
hawkw merged 1 commit intomasterfrom
eliza/less-scary-canonicalize

Conversation

@hawkw
Copy link
Contributor

@hawkw hawkw commented Jan 17, 2019

Previously, whenever the canonicalize::Task future encounters an
error, it logs that error at the error level. However, in many cases,
these errors are transient, and we are able to successfully fall back to
a previously canonicalized name.

This branch changes canonicalize::Task to only log at the error level
when there's no previously successful result to fall back to, and log at
the warning level otherwise. In addition, the log message on fallbacks
now indicates that we fell back to a previous canonicalization.

Hopefully, this should make transient errors, such as a slow DNS server,
a little less scary.

Fixes linkerd/linkerd2#2094

Signed-off-by: Eliza Weisman [email protected]

@hawkw hawkw self-assigned this Jan 17, 2019
@hawkw hawkw requested a review from olix0r January 17, 2019 21:28
@hawkw
Copy link
Contributor Author

hawkw commented Jan 17, 2019

I did also consider changing this code to not log the same error multiple times if it occurs repeatedly to try and reduce the log spam seen in linkerd/linkerd2#2069. However, this implied adding enough additional state to the canonicalize task that I wasn't sure whether or not it was worth it.

@hawkw hawkw merged commit 49c29cd into master Jan 19, 2019
@hawkw hawkw deleted the eliza/less-scary-canonicalize branch January 19, 2019 19:04
hawkw added a commit to linkerd/linkerd2 that referenced this pull request Jan 24, 2019
proxy: update pinned version to 5b507a9

This picks up the following proxy commits:

* eaabc48 Update tower-grpc
* e9561de Update h2 to 0.1.16
* 28fd5e7 Add Route timeouts (linkerd/linkerd2-proxy#165)
* 5637372 Re-flag tcp_duration tests as flaky
* 20cbd18 Revise several log levels and messages (linkerd/linkerd2-proxy##177)
* ae16978 Remove flakiness from 'profiles' tests
* 49c29cd canonicalize: Only log errors at the WARN level when falling back (linkerd/linkerd2-proxy#174)
* 486dd13 Make outbound router honor `l5d-dst-override` header (linkerd/linkerd2-proxy#173)
* 7adc50d Make timeouts for canonicalization DNS queries tuneable (linkerd/linkerd2-proxy#175)
* 3188179 Try reducing CI flakiness by reducing RUST_TEST_THREADS to 1

Some of these changes will probably need changelog entries:

* Improve logging when rejecting malformed HTTP/2 pseudo-headers
  (hyperium/h2#347)
* Improve logging for gRPC errors (tower-rs/tower-grpc#111)
* Add Route timeouts (linkerd/linkerd2-proxy#165)
* Downgrade several of the noisiest log messages to TRACE
  (linkerd/linkerd2-proxy##177)
* Add an environment variable for configuring the DNS canonicalization
  timeout (linkerd/linkerd2-proxy#175)
* Make outbound router honor `l5d-dst-override` header
  (linkerd/linkerd2-proxy#173)

Perhaps all the logging related changes can be grouped into one
changelog entry, though...

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit to linkerd/linkerd2 that referenced this pull request Jan 24, 2019
This picks up the following proxy commits:

* eaabc48 Update tower-grpc
* e9561de Update h2 to 0.1.16
* 28fd5e7 Add Route timeouts (linkerd/linkerd2-proxy#165)
* 5637372 Re-flag tcp_duration tests as flaky
* 20cbd18 Revise several log levels and messages (linkerd/linkerd2-proxy##177)
* ae16978 Remove flakiness from 'profiles' tests
* 49c29cd canonicalize: Only log errors at the WARN level when falling back (linkerd/linkerd2-proxy#174)
* 486dd13 Make outbound router honor `l5d-dst-override` header (linkerd/linkerd2-proxy#173)
* 7adc50d Make timeouts for canonicalization DNS queries tuneable (linkerd/linkerd2-proxy#175)
* 3188179 Try reducing CI flakiness by reducing RUST_TEST_THREADS to 1

Some of these changes will probably need changelog entries:

* Improve logging when rejecting malformed HTTP/2 pseudo-headers
  (hyperium/h2#347)
* Improve logging for gRPC errors (tower-rs/tower-grpc#111)
* Add Route timeouts (linkerd/linkerd2-proxy#165)
* Downgrade several of the noisiest log messages to TRACE
  (linkerd/linkerd2-proxy##177)
* Add an environment variable for configuring the DNS canonicalization
  timeout (linkerd/linkerd2-proxy#175)
* Make outbound router honor `l5d-dst-override` header
  (linkerd/linkerd2-proxy#173)

Perhaps all the logging related changes can be grouped into one
changelog entry, though...

Signed-off-by: Eliza Weisman <[email protected]>
sprt pushed a commit to sprt/linkerd2-proxy that referenced this pull request Aug 30, 2019
…#174)

We added basic prometheus instrumentation, but this only encapsulated basic go metrics and
 request counts. This adds latency and response size metrics exporting as well, to the 
public-api server, theweb server and the telemetry server.

Since the util function in grpc.go was basically used to wrap the server creation in a prometheus handler, I added the other prometheus constants in there and renamed the file to prometheus.go. 

- Add request duration and response size instrumentation to web and public api
- Also add latency monitoring to telemetry service requests
- Rename util/grpc.go to util/prometheus.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants