Fix reindex scalability by jestradaMS · Pull Request #5324 · microsoft/fhir-server

jestradaMS · 2026-01-12T18:38:27Z

Description

This pull request introduces several improvements to the reindexing job orchestration and SQL Server search logic to enhance scalability and reliability, especially for large datasets. The key changes include batching surrogate ID range queries to prevent timeouts, streaming job creation for better pipelining, and simplifying SQL logic for surrogate ID queries. Additionally, new configuration options are added to control batching behavior.

Reindexing Scalability and Pipelining Improvements:

The reindex orchestrator now fetches surrogate ID ranges in batches (controlled by the new NumberOfParallelRecordRanges setting in ReindexJobConfiguration) and streams job creation, allowing workers to begin processing before all ranges are fetched. This reduces timeouts and speeds up large reindex operations. [1] [2] [3]
Refactored job creation logic into a new CreateAndEnqueueJobDefinitionsAsync method to support immediate, batched job enqueuing and cleaner code organization. [1] [2]
Improved logging for job creation and enqueuing, providing more granular information about batches and resource types. [1] [2]

SQL Server Search Query Simplification and Logging:

Consolidated the logic for querying surrogate IDs with and without searchParamHash into a single method, removing the redundant SearchForReindexSurrogateIdsWithoutSearchParamHashAsync and simplifying SQL command construction. [1] [2] [3] [4]
Added detailed logging to SQL Server search operations to aid in debugging and monitoring batch sizes and query parameters.

Related issues

Addresses AB#180432.

Testing

Describe how this change was tested.

FHIR Team Checklist

Update the title of the PR to be succinct and less than 65 characters
Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
Tag the PR with the type of update: Bug, Build, Dependencies, Enhancement, New-Feature or Documentation
Tag the PR with Open source, Azure API for FHIR (CosmosDB or common code) or Azure Healthcare APIs (SQL or common code) to specify where this change is intended to be released.
Tag the PR with Schema Version backward compatible or Schema Version backward incompatible or Schema Version unchanged if this adds or updates Sql script which is/is not backward compatible with the code.
When changing or adding behavior, if your code modifies the system design or changes design assumptions, please create and include an ADR.
CI is green before merge
Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

- Consolidate SearchForReindexSurrogateIdsBySearchParamHashAsync to handle both with and without searchParamHash cases using batched queries - Remove unbatched SearchForReindexSurrogateIdsWithoutSearchParamHashAsync which caused full table scans on large Resource tables (154M+ rows) - Add NumberOfParallelRecordRanges config property (default 100) to ReindexJobConfiguration for controlling batch size - Refactor ReindexOrchestratorJob.EnqueueQueryProcessingJobsAsync to fetch surrogate ID ranges in batches, following the Export job pattern - Stream and enqueue job definitions immediately after each batch to allow workers to start processing sooner - Add cancellation checks between batches for graceful shutdown

…jobs and prevent duplicates

src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs

+                        do
+                        {
+                            // Check for cancellation between batches
+                            if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested)


General fix approach: Remove the constant &&/|| component from the condition so that only dynamic, meaningful checks remain. Here, we retain cancellationToken.IsCancellationRequested and remove _jobInfo.CancelRequested from the two conditions that CodeQL has identified as involving a constant false value.

Concrete best fix for this file:

At line 416, simplify:

From if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested)

To if (cancellationToken.IsCancellationRequested)

At line 499, simplify:

From if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested)

To if (cancellationToken.IsCancellationRequested)

This keeps the standard cancellation token behavior intact and matches what the analyzer already believes is the effective logic (since _jobInfo.CancelRequested is always false in this context). No new imports, methods, or definitions are required; we are only simplifying existing conditions within EnqueueQueryProcessingJobsAsync.

…eIdRanges pattern - Add SetupGetSurrogateIdRangesMock helper method that uses callback to check startId - Mocks now return empty list when startId > rangeEnd, simulating batched behavior - Fixes infinite loop/timeout in tests caused by do-while (ranges.Any()) loop - Converts all 12 static .Returns() mocks to callback-based returns - All 23 ReindexOrchestratorJobTests now pass

…s to ensure all resource types are accounted for

src/Microsoft.Health.Fhir.Core/Configs/ReindexJobConfiguration.cs

src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs

src/Microsoft.Health.Fhir.Core/Configs/ReindexJobConfiguration.cs

…cks and enhancing logging

…clarity in reindex configuration per Sergey's feedback

…gJobsAsync method

src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs

jestradaMS · 2026-01-12T23:02:54Z

/azp run

azure-pipelines · 2026-01-12T23:03:12Z

Azure Pipelines successfully started running 1 pipeline(s).

* Fix SQL execution timeout in reindex operations * Enhance reindex job creation logic to support resuming from existing jobs and prevent duplicates * fix: Update ReindexOrchestratorJobTests mocks for batched GetSurrogateIdRanges pattern * fix: Extend timeout for processing jobs in ReindexOrchestratorJobTests to ensure all resource types are accounted for * fix: Simplify reindex job creation logic by removing existing job checks and enhancing logging * fix: Rename NumberOfParallelRecordRanges to NumberOfRecordRanges for clarity in reindex configuration per Sergey's feedback * fix: Remove redundant job cancellation check in EnqueueQueryProcessingJobsAsync method

jestradaMS added this to the CY25Q3/2Wk13 milestone Jan 12, 2026

jestradaMS requested a review from a team as a code owner January 12, 2026 18:38

jestradaMS added the Bug Bug bug bug. label Jan 12, 2026

Enhance reindex job creation logic to support resuming from existing …

43e1c70

…jobs and prevent duplicates

jestradaMS added Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs No-PaaS-breaking-change labels Jan 12, 2026

github-advanced-security bot found potential problems Jan 12, 2026

View reviewed changes

jestradaMS added 2 commits January 12, 2026 14:24

fix: Extend timeout for processing jobs in ReindexOrchestratorJobTest…

079537e

…s to ensure all resource types are accounted for

SergeyGaluzo reviewed Jan 12, 2026

View reviewed changes

src/Microsoft.Health.Fhir.Core/Configs/ReindexJobConfiguration.cs Outdated Show resolved Hide resolved

SergeyGaluzo reviewed Jan 12, 2026

View reviewed changes

src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs Outdated Show resolved Hide resolved

SergeyGaluzo reviewed Jan 12, 2026

View reviewed changes

src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs Outdated Show resolved Hide resolved

SergeyGaluzo reviewed Jan 12, 2026

View reviewed changes

src/Microsoft.Health.Fhir.Core/Configs/ReindexJobConfiguration.cs Outdated Show resolved Hide resolved

jestradaMS added 3 commits January 12, 2026 15:58

fix: Simplify reindex job creation logic by removing existing job che…

1637815

…cks and enhancing logging

fix: Rename NumberOfParallelRecordRanges to NumberOfRecordRanges for …

8b8f76a

…clarity in reindex configuration per Sergey's feedback

fix: Remove redundant job cancellation check in EnqueueQueryProcessin…

4fb8d76

…gJobsAsync method

jestradaMS enabled auto-merge (squash) January 12, 2026 22:20

SergeyGaluzo reviewed Jan 12, 2026

View reviewed changes

src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs Show resolved Hide resolved

SergeyGaluzo approved these changes Jan 12, 2026

View reviewed changes

fhibf approved these changes Jan 13, 2026

View reviewed changes

jestradaMS merged commit 9365e39 into main Jan 13, 2026
60 of 61 checks passed

jestradaMS deleted the users/jestrada/reindex-fixes-20250112 branch January 13, 2026 00:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reindex scalability#5324

Fix reindex scalability#5324
jestradaMS merged 7 commits intomainfrom
users/jestrada/reindex-fixes-20250112

jestradaMS commented Jan 12, 2026 •

edited by azure-boards bot

Loading

Uh oh!

Check warning

Copilot Autofix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jestradaMS commented Jan 12, 2026

Uh oh!

azure-pipelines bot commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@@ -413,7 +413,7 @@
                     private async Task<IReadOnlyList<long>> EnqueueQueryProcessingJobsAsync(IList<JobInfo> existingProcessingJobs, CancellationToken cancellationToken)
                     {
-                        if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested)
+                        if (cancellationToken.IsCancellationRequested)
                         {
                             throw new OperationCanceledException("Reindex operation cancelled by customer.");
                         }
@@ -496,7 +496,7 @@
                                     do
                                     {
                                         // Check for cancellation between batches
-                                        if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested)
+                                        if (cancellationToken.IsCancellationRequested)
                                         {
                                             throw new OperationCanceledException("Reindex operation cancelled by customer.");
                                         }

Conversation

jestradaMS commented Jan 12, 2026 • edited by azure-boards bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Testing

FHIR Team Checklist

Semver Change (docs)

Uh oh!

Check warning

Copilot Autofix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jestradaMS commented Jan 12, 2026

Uh oh!

azure-pipelines bot commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jestradaMS commented Jan 12, 2026 •

edited by azure-boards bot

Loading