Skip to content

Capability statement refreshes when profiles are updated#5330

Merged
fhibf merged 13 commits intomainfrom
user/fernfe/capabilityStatementProfileCache
Jan 17, 2026
Merged

Capability statement refreshes when profiles are updated#5330
fhibf merged 13 commits intomainfrom
user/fernfe/capabilityStatementProfileCache

Conversation

@fhibf
Copy link
Contributor

@fhibf fhibf commented Jan 14, 2026

Description

In this PR there are a few updates on the way how SystemConformanceProvider updates its profiles and on how ServerProvideProfileValidation works.

With these changes, Profile changes will not be reflected at the Capability Statement just once every 4hs, these updates will happen in less than 2 minutes. And all replicas of the services will be updated as well, minimizing the risk of having different replicas out of sync.

Changes:

  • SystemConformanceProvider
    • The background logic refreshing the capability statement has a new try/catch block that avoids the looping to crash. If an error happens, it'll log the exception and continue its execution. That will ensure the looping to continue updating capability statements every 4hs.
  • ServerProvideProfileValidation
    • ServerProvideProfileValidation has a new background looping running once every minute checking if there is any recent change on "ValueSet", "StructureDefinition" or "CodeSystem". If there are changes, then it'll notify SystemConformanceProvider that is has to refresh to load the new changes.
image

Related issues

Addresses AB#179963.

Testing

  • Running the changes locally I was able to add new profiles and see the changes reflected on /metadata endpoint in less than 2 minutes.

FHIR Team Checklist

  • When changing or adding behavior, if your code modifies the system design or changes design assumptions, please create and include an ADR.
  • CI is green before merge Build Status
  • Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

@fhibf fhibf requested a review from a team as a code owner January 14, 2026 00:19
@fhibf fhibf added Azure API for FHIR Label denotes that the issue or PR is relevant to the Azure API for FHIR Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs Bug Bug bug bug. Enhancement Enhancement on existing functionality. Schema Version backward compatible labels Jan 14, 2026
@fhibf fhibf added this to the CY25Q3/2Wk14 milestone Jan 14, 2026
Writing tests to validate the logic of the background loop.
@fhibf fhibf changed the title Forcing capability statement to refresh when profiles are updated Capability statement refreshes when profiles are updated Jan 15, 2026
@fhibf fhibf added Enhancement Enhancement on existing functionality. No-ADR ADR not needed and removed Enhancement Enhancement on existing functionality. labels Jan 15, 2026
@fhibf
Copy link
Contributor Author

fhibf commented Jan 15, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Renamed sync method for profile sync.
LTA-Thinking
LTA-Thinking previously approved these changes Jan 16, 2026
@fhibf
Copy link
Contributor Author

fhibf commented Jan 16, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@brendankowitz
Copy link
Member

🔍 AI Code Review Summary

PR: "Capability statement refreshes when profiles are updated"
Reviewed by: Claude Opus 4.5, Gemini 3 Pro


🔴 CRITICAL ISSUES

1. Race Condition in Sync Signaling Mechanism

The _isExternalDependentSyncRequired flag has a race condition where profile updates can be permanently missed:

Scenario:

  1. BackgroundLoop detects change (Hash A → B), sets flag = true
  2. SystemConformanceProvider starts syncing version B
  3. While syncing, DB updates to Hash C, BackgroundLoop sets flag = true (no-op, already true)
  4. SystemConformanceProvider finishes with B, calls MarkSyncCompleted() → flag = false
  5. BackgroundLoop sees DB matches cached Hash C, doesn't trigger again

Result: Server stuck on version B, missed version C until a NEW update (D) occurs.

Recommendation: Use a version token/counter instead of boolean flag, or make MarkSyncCompleted() conditional on the hash that was synced.


2. Thread Safety of Shared Fields

_isExternalDependentSyncRequired and _mostRecentProfileHash are accessed from multiple threads without synchronization:

  • Written by background task in BackgroundLoop()
  • Read by IsSyncRequested() from SystemConformanceProvider thread
  • Written by MarkSyncCompleted() from another thread

Recommendation: Use volatile keyword or Interlocked operations for thread-safe access.


3. Dispose Doesn't Await Background Task

Dispose() cancels the CancellationTokenSource but doesn't await _backgroundTask completion. This can cause:

  • Background task still accessing _searchServiceFactory after disposal
  • ObjectDisposedException if the loop tries to use disposed resources

Recommendation: Implement IAsyncDisposable and await _backgroundTask completion, similar to the pattern in SystemConformanceProvider.DisposeAsync().


🟡 MEDIUM ISSUES

Issue Details Recommendation
Error logging level OperationCanceledException logged as LogError during normal shutdown (line 333-337) Change to LogInformation - cancellation during shutdown is expected behavior
Initial hash always triggers sync On first run, sets _isExternalDependentSyncRequired = true even when no profiles exist Skip sync trigger if hash indicates empty database
No config validation User could set BackgroundProfileStatusCheckIntervalInSeconds to 1 second = 6 queries/second Add minimum threshold validation (e.g., 60 seconds)

🟢 NO ISSUES FOUND

  • .github/workflows/validate-prs.yml - Syntax fix ✓
  • ValidateOperationConfiguration.cs - Config additions ✓
  • CosmosResponseProcessor.cs - Defensive try/catch is appropriate ✓
  • ProfileValidator.cs - Logging addition is benign ✓
  • Test coverage appears adequate ✓

📋 SUMMARY OF RECOMMENDATIONS

  1. Fix the race condition by using a sync token/version instead of boolean flag, or pass the synced hash to MarkSyncCompleted()
  2. Add thread safety with volatile or Interlocked operations
  3. Implement IAsyncDisposable and await background task before disposing resources
  4. Change cancellation log level from Error to Information
  5. Add config validation for minimum interval to prevent performance issues
  6. Consider event-driven approach instead of polling (long-term improvement)

⚡ Performance Note

With default settings (5 minute interval):

  • 6 DB queries per check × N replicas × every 5 minutes = acceptable load

However, without config validation, a misconfigured low interval could cause significant database load.


This review was generated by AI assistants analyzing the PR diff and source files.

@fhibf
Copy link
Contributor Author

fhibf commented Jan 16, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@fhibf fhibf merged commit bed5a44 into main Jan 17, 2026
60 of 61 checks passed
@fhibf fhibf deleted the user/fernfe/capabilityStatementProfileCache branch January 17, 2026 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Azure API for FHIR Label denotes that the issue or PR is relevant to the Azure API for FHIR Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs Bug Bug bug bug. Enhancement Enhancement on existing functionality. No-ADR ADR not needed Schema Version backward compatible

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants