Skip to content

Always retry on HTTP 429 and 503, even if Retry-After header is not available#396

Closed
nfx wants to merge 6 commits intomainfrom
retries/on429
Closed

Always retry on HTTP 429 and 503, even if Retry-After header is not available#396
nfx wants to merge 6 commits intomainfrom
retries/on429

Conversation

@nfx
Copy link
Copy Markdown
Contributor

@nfx nfx commented Oct 11, 2023

Changes

This PR improves retry logic using more type-safe code to determine sleep periods between retries or falling back to the default sleep interval. Relies on concrete exception types introduced by #376

Tests

  • make test run locally
  • make fmt applied
  • relevant integration tests applied

nfx added 6 commits October 3, 2023 14:03
…ists`, `BadRequest`, `PermissionDenied`, `InternalError`, and others

Improve the ergonomics of SDK, where instead of `except DatabricksError as err: if err.error_code != 'NOT_FOUND': raise err else: do_stuff()` we could do `except NotFound: do_stuff()`. Additionally, it'll make it easier to read stack traces, as they will contain specific exception class name.

# First principles
- do not override `builtins.NotImplemented` for `NOT_IMPLEMENTED` error code
- assume that platform error_code/HTTP status code mapping is not perfect and in the state of transition
- we do represent reasonable subset of error codes as specific exceptions

## Open questions

### HTTP Status Codes vs Error Codes

1. Mix between status codes and error codes (preferred)
2. Rely only on HTTP status codes
3. Rely only on `error_code`'s

One example is `BAD_REQUEST` error code that maps onto HTTP 400 and to `except BadRequest as err` catch clause. But `MALFORMED_REQUEST`, `INVALID_STATE`, and `UNPARSEABLE_HTTP_ERROR` do also map to HTTP 400. So the proposal is to remap the MALFORMED_REQUEST to `BadRequest` exception.

Another corner-case is UC:
- 'METASTORE_DOES_NOT_EXIST': NotFound,
- 'DAC_DOES_NOT_EXIST': NotFound,
- 'CATALOG_DOES_NOT_EXIST': NotFound,
- 'SCHEMA_DOES_NOT_EXIST': NotFound,
- 'TABLE_DOES_NOT_EXIST': NotFound,
- 'SHARE_DOES_NOT_EXIST': NotFound,
- 'RECIPIENT_DOES_NOT_EXIST': NotFound,
- 'STORAGE_CREDENTIAL_DOES_NOT_EXIST': NotFound,
- 'EXTERNAL_LOCATION_DOES_NOT_EXIST': NotFound,
- 'PRINCIPAL_DOES_NOT_EXIST': NotFound,
- 'PROVIDER_DOES_NOT_EXIST': NotFound,

### Naming conflict resolution

We have three sources of naming:
- `error_code`
- HTTP Status
- Python builtin exceptions

We still have to define which name takes the precedence.
… available

This PR improves retry logic by using more type-safe code to determine sleep time periods between retries or falling back to the default sleep interval.
@nfx nfx added the chore anything that makes the code better label Oct 11, 2023
@nfx nfx requested a review from pietern October 11, 2023 10:16
github-merge-queue bot pushed a commit that referenced this pull request Oct 13, 2023
## Changes
Change HTTP client to always retry on 429 and 503, regardless of whether
Retry-After header is present. If absent, default to 1 second, as we do
today.

Subsumes #391 and #396 while we're still discussing error architecture
in the SDK.

## Tests
Unit test to ensure that 429 and 503 work with and without Retry-After
header.
Base automatically changed from ergonomics/errors to main November 13, 2023 13:45
@renaudhartert-db
Copy link
Copy Markdown
Contributor

Thanks for this contribution. We're closing this PR as part of regular housekeeping since it has been inactive for over a year. This is not a reflection on the quality of the work -- if the changes are still relevant, feel free to re-open the PR or open a new one and we'll be happy to review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore anything that makes the code better

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants