Skip to content

Port partition split recursion fix from hotfix to main#44770

Merged
simorenoh merged 4 commits intomainfrom
copilot/port-changes-to-main-branch
Jan 22, 2026
Merged

Port partition split recursion fix from hotfix to main#44770
simorenoh merged 4 commits intomainfrom
copilot/port-changes-to-main-branch

Conversation

Copy link
Contributor

Copilot AI commented Jan 20, 2026

Port Cosmos DB Partition Split Recursion Fix from PR #44649

This task involves porting critical bug fixes from PR #44649 that was merged to the hotfix branch. The fix addresses an infinite recursion issue during partition split (410 error) handling.

Plan Checklist:

  • Review and understand all changes from PR fix: partition split recursion issue #44649
  • Update CHANGELOG.md with bug fix entry (do not bump version - keep current version)
  • Apply fixes to sync base_execution_context.py (add logging, _internal_pk_range_fetch flag, state reset)
  • Apply fixes to async base_execution_context.py (same changes as sync)
  • Apply fixes to sync routing_map_provider.py (add logging, _internal_pk_range_fetch flag)
  • Apply fixes to async routing_map_provider.py (same changes as sync)
  • Update _synchronized_request.py to pop internal flag
  • Update aio/_asynchronous_request.py to pop internal flag
  • Update test_routing_map_provider.py to accept **kwargs
  • Add test_partition_split_retry_unit.py (new sync tests)
  • Add test_partition_split_retry_unit_async.py (new async tests)
  • Verify syntax of all modified files (all pass)
  • Run code review (completed - minor suggestions, keeping PR fix: partition split recursion issue #44649 approach)
  • Add warning logs before unreachable return statements (per PR review feedback)
  • Review and finalize changes

Summary

Successfully ported all changes from PR #44649 to the main branch and addressed PR review feedback:

Key Changes:

  1. _internal_pk_range_fetch flag: Prevents infinite recursion by marking partition key range queries as internal
  2. State reset logic: Resets _has_started and _continuation to ensure retry logic works correctly
  3. Comprehensive logging: Added debug, warning, and error logging for partition split scenarios (including warnings for unreachable code paths)
  4. Request handlers updated: Both sync and async request handlers now pop internal flags

Files Modified (10):

  • CHANGELOG.md (bug fix entry added)
  • 4 execution context files (sync + async) - now with additional warning logs
  • 4 routing provider files (sync + async)
  • 2 request handler files (sync + async)
  • 1 existing test file (kwargs support)
  • 2 new comprehensive test files (sync + async unit tests)

All files are syntactically correct and ready for testing with Cosmos DB emulator.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Cosmos] port changes over to main branch</issue_title>
<issue_description>Some changes were made to a feature branch that were merged in and released separately from our main branch. The changes are present here: #44649

The task at hand is to port over all of the changes in that pull request so that the feature branch and main branches both have those changes.</issue_description>

<agent_instructions>Follow the instructions in the issue. If you can't see the changes in the referenced pull request, make sure to let me know.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Port changes from feature branch to main branch Port partition split recursion fix from hotfix to main Jan 20, 2026
Copilot AI requested a review from simorenoh January 20, 2026 21:11
@simorenoh simorenoh marked this pull request as ready for review January 20, 2026 23:12
@simorenoh simorenoh requested a review from a team as a code owner January 20, 2026 23:12
Copilot AI review requested due to automatic review settings January 20, 2026 23:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports a critical bug fix for infinite recursion during partition split (410 Gone) errors from the hotfix branch to the main branch. The fix prevents a timeout issue caused by recursive 410 retry logic when the SDK attempts to refresh partition key ranges during a partition split.

Changes:

  • Added _internal_pk_range_fetch flag to mark internal partition key range queries and skip 410 retry logic, breaking the recursion cycle
  • Implemented state reset (_has_started=False, _continuation=None) before 410 retries to ensure fetch loops execute correctly
  • Added structured logging for 410 handling with debug/warning/error levels
  • Comprehensive unit tests (645 lines) covering state reset, retry logic, recursion prevention, and memory bounds

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
CHANGELOG.md Added bug fix entry under 4.15.0b3
azure/cosmos/_execution_context/base_execution_context.py Added recursion prevention, state reset, and logging for 410 retry logic (sync)
azure/cosmos/_execution_context/aio/base_execution_context.py Added recursion prevention, state reset, and logging for 410 retry logic (async)
azure/cosmos/_routing/routing_map_provider.py Propagated _internal_pk_range_fetch flag to PK range queries and added logging (sync)
azure/cosmos/_routing/aio/routing_map_provider.py Propagated _internal_pk_range_fetch flag to PK range queries and added logging (async)
azure/cosmos/_synchronized_request.py Removed internal flag before HTTP layer (sync)
azure/cosmos/aio/_asynchronous_request.py Removed internal flag before HTTP layer (async)
tests/test_partition_split_retry_unit.py Comprehensive unit tests for state reset, retry behavior, and memory bounds (sync)
tests/test_partition_split_retry_unit_async.py Comprehensive unit tests for state reset, retry behavior, and memory bounds (async)
tests/routing/test_routing_map_provider.py Updated test signature to accept **kwargs

@simorenoh
Copy link
Member

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@tvaron3 tvaron3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@allenkim0129 allenkim0129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Has some minor comments

Copy link
Contributor

@allenkim0129 allenkim0129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor Author

Copilot AI commented Jan 21, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • scanning-api.github.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@simorenoh simorenoh enabled auto-merge (squash) January 21, 2026 23:00
Copy link
Member

@bambriz bambriz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simorenoh Looks good to me.

@simorenoh
Copy link
Member

/check-enforcer override

Copy link
Member

@bambriz bambriz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reapproving

@simorenoh simorenoh merged commit 9a8a19a into main Jan 22, 2026
22 checks passed
@simorenoh simorenoh deleted the copilot/port-changes-to-main-branch branch January 22, 2026 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Cosmos] port changes over to main branch

6 participants