Conversation
- Consolidate SearchForReindexSurrogateIdsBySearchParamHashAsync to handle both with and without searchParamHash cases using batched queries - Remove unbatched SearchForReindexSurrogateIdsWithoutSearchParamHashAsync which caused full table scans on large Resource tables (154M+ rows) - Add NumberOfParallelRecordRanges config property (default 100) to ReindexJobConfiguration for controlling batch size - Refactor ReindexOrchestratorJob.EnqueueQueryProcessingJobsAsync to fetch surrogate ID ranges in batches, following the Export job pattern - Stream and enqueue job definitions immediately after each batch to allow workers to start processing sooner - Add cancellation checks between batches for graceful shutdown
…jobs and prevent duplicates
| do | ||
| { | ||
| // Check for cancellation between batches | ||
| if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested) |
Check warning
Code scanning / CodeQL
Constant condition
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
General fix approach: Remove the constant &&/|| component from the condition so that only dynamic, meaningful checks remain. Here, we retain cancellationToken.IsCancellationRequested and remove _jobInfo.CancelRequested from the two conditions that CodeQL has identified as involving a constant false value.
Concrete best fix for this file:
- At line 416, simplify:
- From
if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested) - To
if (cancellationToken.IsCancellationRequested)
- From
- At line 499, simplify:
- From
if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested) - To
if (cancellationToken.IsCancellationRequested)
- From
This keeps the standard cancellation token behavior intact and matches what the analyzer already believes is the effective logic (since _jobInfo.CancelRequested is always false in this context). No new imports, methods, or definitions are required; we are only simplifying existing conditions within EnqueueQueryProcessingJobsAsync.
| @@ -413,7 +413,7 @@ | ||
|
|
||
| private async Task<IReadOnlyList<long>> EnqueueQueryProcessingJobsAsync(IList<JobInfo> existingProcessingJobs, CancellationToken cancellationToken) | ||
| { | ||
| if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested) | ||
| if (cancellationToken.IsCancellationRequested) | ||
| { | ||
| throw new OperationCanceledException("Reindex operation cancelled by customer."); | ||
| } | ||
| @@ -496,7 +496,7 @@ | ||
| do | ||
| { | ||
| // Check for cancellation between batches | ||
| if (cancellationToken.IsCancellationRequested || _jobInfo.CancelRequested) | ||
| if (cancellationToken.IsCancellationRequested) | ||
| { | ||
| throw new OperationCanceledException("Reindex operation cancelled by customer."); | ||
| } |
…eIdRanges pattern - Add SetupGetSurrogateIdRangesMock helper method that uses callback to check startId - Mocks now return empty list when startId > rangeEnd, simulating batched behavior - Fixes infinite loop/timeout in tests caused by do-while (ranges.Any()) loop - Converts all 12 static .Returns() mocks to callback-based returns - All 23 ReindexOrchestratorJobTests now pass
…s to ensure all resource types are accounted for
src/Microsoft.Health.Fhir.Core/Configs/ReindexJobConfiguration.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Core/Configs/ReindexJobConfiguration.cs
Outdated
Show resolved
Hide resolved
…cks and enhancing logging
…clarity in reindex configuration per Sergey's feedback
…gJobsAsync method
src/Microsoft.Health.Fhir.Core/Features/Operations/Reindex/ReindexOrchestratorJob.cs
Show resolved
Hide resolved
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
* Fix SQL execution timeout in reindex operations * Enhance reindex job creation logic to support resuming from existing jobs and prevent duplicates * fix: Update ReindexOrchestratorJobTests mocks for batched GetSurrogateIdRanges pattern * fix: Extend timeout for processing jobs in ReindexOrchestratorJobTests to ensure all resource types are accounted for * fix: Simplify reindex job creation logic by removing existing job checks and enhancing logging * fix: Rename NumberOfParallelRecordRanges to NumberOfRecordRanges for clarity in reindex configuration per Sergey's feedback * fix: Remove redundant job cancellation check in EnqueueQueryProcessingJobsAsync method
Description
This pull request introduces several improvements to the reindexing job orchestration and SQL Server search logic to enhance scalability and reliability, especially for large datasets. The key changes include batching surrogate ID range queries to prevent timeouts, streaming job creation for better pipelining, and simplifying SQL logic for surrogate ID queries. Additionally, new configuration options are added to control batching behavior.
Reindexing Scalability and Pipelining Improvements:
The reindex orchestrator now fetches surrogate ID ranges in batches (controlled by the new
NumberOfParallelRecordRangessetting inReindexJobConfiguration) and streams job creation, allowing workers to begin processing before all ranges are fetched. This reduces timeouts and speeds up large reindex operations. [1] [2] [3]Refactored job creation logic into a new
CreateAndEnqueueJobDefinitionsAsyncmethod to support immediate, batched job enqueuing and cleaner code organization. [1] [2]Improved logging for job creation and enqueuing, providing more granular information about batches and resource types. [1] [2]
SQL Server Search Query Simplification and Logging:
Consolidated the logic for querying surrogate IDs with and without
searchParamHashinto a single method, removing the redundantSearchForReindexSurrogateIdsWithoutSearchParamHashAsyncand simplifying SQL command construction. [1] [2] [3] [4]Added detailed logging to SQL Server search operations to aid in debugging and monitoring batch sizes and query parameters.
Related issues
Addresses AB#180432.
Testing
Describe how this change was tested.
FHIR Team Checklist
Semver Change (docs)
Patch|Skip|Feature|Breaking (reason)