Skip to content

Log cancelled and failed requests#919

Merged
mgyucht merged 2 commits intomainfrom
log-on-context-timeout
May 14, 2024
Merged

Log cancelled and failed requests#919
mgyucht merged 2 commits intomainfrom
log-on-context-timeout

Conversation

@mgyucht
Copy link
Copy Markdown
Contributor

@mgyucht mgyucht commented May 13, 2024

Changes

Currently, we do not log requests that failed to complete or that failed when the SDK started processing the response. This means that requests that timeout due to a client-side timeout or canceled context are not logged, meaning that there is no visibility in the log of these kinds of requests.

This PR changes the request processing and error handling to ensure that all requests are logged, even those that are cancelled or fail due to other errors. The general approach is to process as much of the response as possible until encountering an error, log what we've retrieved, and then do any post-processing.

Some changes are needed in the roundtrip stringer to handle the case of a nil response.

Tests

Unit tests verify that canceled requests or nil responses are logged.

  • make test passing
  • make fmt applied
  • relevant integration tests applied

@mgyucht mgyucht requested a review from pietern May 13, 2024 11:28
@mgyucht mgyucht changed the title Log cancelled requests Log cancelled and failed requests May 13, 2024
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 13, 2024

Codecov Report

Attention: Patch coverage is 86.66667% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 7.09%. Comparing base (27d08a6) to head (4deafb4).

Files Patch % Lines
apierr/errors.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main    #919      +/-   ##
========================================
+ Coverage   7.08%   7.09%   +0.01%     
========================================
  Files        276     276              
  Lines      64376   64381       +5     
========================================
+ Hits        4563    4570       +7     
  Misses     59502   59502              
+ Partials     311     309       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

assert.Equal(t, `[DEBUG] non-retriable error: Get "/a": request timed out after 30s of inactivity
[DEBUG] GET /a
< Error: Get "/a": request timed out after 30s of inactivity
`, bufLogger.String())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could eliminate the first line to avoid duplication, perhaps?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep it since there are two distinct things being logged: the raw request/response pair, and then the decision to not retry the error. I've reordered them so that there is a logical flow in the log output.

// client failed to read the response body.
// Otherwise the response body will include details about the error.
if len(r.ResponseBody) == 0 && r.Err != nil {
sb.WriteString(fmt.Sprintf(" (Error: %s)", r.Err))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a question, when length of response body is > 0 (line 86), is it always true that the Err would be nil?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. We call the ErrorMapper function that parses the response into one of the SDK errors depending on the result. If the API responds with any of our standard error codes, r.Err will be set to that.

This is just refactored from above, so there is no behavior change here.

@mgyucht mgyucht added this pull request to the merge queue May 14, 2024
Merged via the queue into main with commit 63343c8 May 14, 2024
@mgyucht mgyucht deleted the log-on-context-timeout branch May 14, 2024 08:15
mgyucht added a commit that referenced this pull request May 14, 2024
* Fixed codecov for repository ([#909](#909)).
* Add traceparent header to enable distributed tracing. ([#914](#914)).
* Log cancelled and failed requests ([#919](#919)).

Dependency updates:

 * Bump golang.org/x/net from 0.22.0 to 0.24.0 ([#884](#884)).
 * Bump golang.org/x/net from 0.17.0 to 0.23.0 in /examples/zerolog ([#896](#896)).
 * Bump golang.org/x/net from 0.21.0 to 0.23.0 in /examples/slog ([#897](#897)).
@mgyucht mgyucht mentioned this pull request May 14, 2024
github-merge-queue bot pushed a commit that referenced this pull request May 14, 2024
* Fixed codecov for repository
([#909](#909)).
* Add traceparent header to enable distributed tracing.
([#914](#914)).
* Log cancelled and failed requests
([#919](#919)).

Dependency updates:

* Bump golang.org/x/net from 0.22.0 to 0.24.0
([#884](#884)).
* Bump golang.org/x/net from 0.17.0 to 0.23.0 in /examples/zerolog
([#896](#896)).
* Bump golang.org/x/net from 0.21.0 to 0.23.0 in /examples/slog
([#897](#897)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants