Skip to content

🐛 Source Convex: full_refresh stops after one page#33431

Merged
Serhii Lazebnyi (lazebnyi) merged 5 commits intoairbytehq:masterfrom
ldanilek:lee/convex-snapshot-sync
Dec 18, 2023
Merged

🐛 Source Convex: full_refresh stops after one page#33431
Serhii Lazebnyi (lazebnyi) merged 5 commits intoairbytehq:masterfrom
ldanilek:lee/convex-snapshot-sync

Conversation

@ldanilek
Copy link
Copy Markdown
Contributor

What

Describe what the change is solving

The Convex source connector has a bug where SyncMode == full_refresh causes it to stop the sync after a single page of 128 results. This PR fixes the bug and adds tests.

It helps to add screenshots if it affects the frontend.

Before, note that only 128 documents are synced:
Screenshot 2023-12-12 at 9 12 43 PM

After, note that 1773 records are synced, which is the total number of records in the source:
Screenshot 2023-12-13 at 11 15 40 AM

How

Describe the solution

The Convex source connector does two sets of pagination, in sequence:

  1. the "snapshot" pagination syncs a snapshot of the data
  2. the "delta" pagination syncs subsequent changes after the snapshot

Currently we stop the sync when the "delta" pagination is done. But this is a problem for full-refresh sync, which doesn't do the "delta" pagination at all, which means the "delta" pagination is marked as done before the "snapshot" pagination has completed.

The solution is we return a non-None next_page_token when //either// the "snapshot" or "delta" syncs have more data.

Also add unit tests that would have caught this case.

Recommended reading order

  1. source_convex/source.py
  2. source-convex/unit_tests/test_streams.py

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user?

The end result is that syncs that would previously have terminated early will now fully sync all data.
This only applies for users that are using full_refresh sync mode.

Therefore it is strongly recommended that users do a full sync after this change releases, to make sure their data fully syncs.
Up to reviewer whether this counts as a breaking change.

For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.

If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Actions

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Unit & integration tests added
Screenshot 2023-12-13 at 12 41 56 PM

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.

@vercel
Copy link
Copy Markdown

vercel bot commented Dec 13, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Dec 18, 2023 9:16pm

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 13, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test. Good find.
Oof. With this bugfix, initial snapshot syncs of more than one page will work.

@octavia-squidington-iii Octavia Squidington III (octavia-squidington-iii) added the area/documentation Improvements or additions to documentation label Dec 13, 2023
@ldanilek Lee Danilek (ldanilek) changed the title [Convex source] fix bug where full_refresh stops after one page 🐛 [Convex source] fix bug where full_refresh stops after one page Dec 14, 2023
@ldanilek Lee Danilek (ldanilek) changed the title 🐛 [Convex source] fix bug where full_refresh stops after one page 🐛 Source Convex: full_refresh stops after one page Dec 14, 2023
@lazebnyi
Copy link
Copy Markdown
Contributor

Serhii Lazebnyi (lazebnyi) commented Dec 18, 2023

I am currently engaged in testing of this PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lazebnyi Serhii Lazebnyi (lazebnyi) merged commit 63e96fb into airbytehq:master Dec 18, 2023
Tim Roes (timroes) pushed a commit that referenced this pull request Dec 19, 2023
Jatin Yadav (jatinyadav-cc) pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Jatin Yadav (jatinyadav-cc) pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/connectors Connector related issues area/documentation Improvements or additions to documentation community connectors/source/convex

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

4 participants