🚨🚨 🐛 Convex source fix skipped records#27226
🚨🚨 🐛 Convex source fix skipped records#27226Marcos Marx (marcosmarxm) merged 17 commits intoairbytehq:masterfrom
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
|
/test connector=connectors/source-convex
Build FailedTest summary info: |
…in race conditions
|
Marcos Marx (@marcosmarxm) |
* update cursor in next_page_token, add request_headers * . * fix unit tests * also fix SyncMode.incremental so it doesn't return duplicate records in race conditions * fix client version in json_schemas * fix version number * fix sync_mode * cache json_schemas to pass typecheck * never mind * fix sync_mode * update metadata and doc * backward compatiblity bypass --------- Co-authored-by: Marcos Marx <[email protected]> Co-authored-by: marcosmarxm <[email protected]>
What
There is a bug in the Convex source connector due pagination interacting with the default
HttpAvailabilityStrategy.Convex is a fairly simple source that has one pagination cursor per stream. In the existing code we store the cursor on the stream, and update the cursor in
parse_response. This is the same cursor we use in thestateproperty for checkpointing, and it's the same cursor used to constructrequest_params.The bug comes from the usage of
HttpAvailabilityStrategywhich callsread_recordson the stream to see if there are any records to read. We have conflicting constraints:read_recordsmust bump the cursor sostream.statecan be used as a checkpoint. Butread_recordsmust not bump the cursor because then the availability check skips records.See community slack thread: https://airbytehq.slack.com/archives/C027KKE4BCZ/p1686329017267509
How
This PR solves the issue by bumping the cursor in
next_page_tokeninstead ofparse_response, becausenext_page_tokenis only called after the first result is yielded fromread_records. Therefore the availability check -- which discards the iterator after getting the first result -- does not callnext_page_tokenand the records are no longer dropped.Additionally, we add a new request header with the client version, so the server can detect whether the Airbyte connector is running with this fix. If the server detects a version prior to this PR, it will throw a 400 error and block syncing entirely.
Recommended reading order
source_convex/source.py🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user?
I believe this change is a breaking change because the current connector skips records and is disabled server-side, so this applies:
For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.
If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Actions
Expand the relevant checklist and delete the others.
Community member or Airbyter
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.