-
Notifications
You must be signed in to change notification settings - Fork 713
feat: use primary region if specified when caching the enrichment tables on node start #8878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR optimizes enrichment table caching during node startup in multi-region super cluster setups by adding a boolean parameter to control whether to query only the primary region.
Key Changes
- Added
apply_primary_region_if_specifiedboolean parameter throughout the enrichment table data fetching flow - When true and a primary region is configured, the system queries only that region instead of all regions
- Node startup (
cache_enrichment_tables) usestrueto optimize initialization - Runtime watch operations use
falseto maintain existing behavior
Impact
- Performance: Faster node startup by avoiding unnecessary cross-region queries
- Reliability: Prevents startup failures caused by errors from non-primary regions
- Flexibility: Runtime operations can still query all regions when needed
Issues Found
- Minor style issue: parameter named with underscore prefix but is actually used (line 52 in enrichment_table.rs)
Confidence Score: 4/5
- This PR is safe to merge with minimal risk after addressing the minor style issue
- The implementation is straightforward and well-documented. The logic correctly threads the new parameter through the call chain and conditionally applies primary region filtering based on enterprise configuration. Only one minor style issue was found regarding underscore-prefixed parameter naming. The change addresses a real performance/reliability concern in multi-region deployments.
- src/service/db/enrichment_table.rs needs a minor style fix to remove underscore prefix from the parameter name
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| src/service/db/enrichment_table.rs | 4/5 | Added primary region parameter to control data fetching scope. Logic is sound but parameter naming has minor style issue. |
| src/service/db/schema.rs | 5/5 | Updated cache initialization to use primary region for enrichment tables with clear explanatory comment. |
| src/service/enrichment/mod.rs | 5/5 | Threaded new parameter through enrichment table retrieval functions with good documentation. |
Sequence Diagram
sequenceDiagram
participant Node as Node Startup
participant Schema as schema.rs::cache_enrichment_tables()
participant Enrich as enrichment/mod.rs
participant DB as enrichment_table.rs
participant Search as search_cluster::search_inner()
participant Config as Enterprise Config
Note over Node,Config: Node Startup Flow (apply_primary_region_if_specified=true)
Node->>Schema: cache_enrichment_tables()
Schema->>Enrich: get_enrichment_table(org_id, name, true)
Enrich->>Enrich: get_enrichment_table_inner(org_id, name, true)
Enrich->>DB: get_enrichment_table_data(org_id, name, true)
DB->>Config: Check super_cluster.enabled & enrichment_table_get_region
Config-->>DB: Returns primary region if specified
alt Primary region specified
DB->>Search: search_inner(..., regions=[primary_region], ...)
Search-->>DB: Query only primary region
else No primary region
DB->>Search: search_inner(..., regions=[], ...)
Search-->>DB: Query all regions
end
DB-->>Enrich: Return enrichment data
Enrich-->>Schema: Return enrichment table
Schema->>Schema: Cache enrichment table
Note over Node,Config: Runtime Watch Flow (apply_primary_region_if_specified=false)
DB->>DB: watch() detects change
DB->>Enrich: get_enrichment_table(org_id, name, false)
Enrich->>DB: get_enrichment_table_data(org_id, name, false)
DB->>Search: search_inner(..., regions=[], ...)
Search-->>DB: Query all regions
DB-->>DB: Update cache
3 files reviewed, 1 comment
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| All tests passed | 365 | 345 | 0 | 19 | 1 | 95% | 4m 39s |
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| All tests passed | 365 | 342 | 0 | 19 | 4 | 94% | 7m 5s |
…les on node start (#8816) In a multi-region super cluster setup, when a node starts up and caches enrichment tables, it generally queries all regions for enrichment table data. However, if user is sure that enrichment table is stored in one primary region, then querying all regions is inefficient and might even fail if other region queriers return error for some reason. Which can eventually cause node start failure. This commit adds a new `apply_primary_region_if_specified` boolean parameter throughout the enrichment table data fetching flow to control whether to use only the primary region when fetching data. 1. **New Parameter Added** (`enrichment_table.rs:40`, `mod.rs:190`): - Added `apply_primary_region_if_specified` boolean parameter to: - `get_enrichment_table_data()` - `get_enrichment_table()` - `get_enrichment_table_json()` 2. **Primary Region Logic** (`enrichment_table.rs:52-68`): - When `apply_primary_region_if_specified` is true and enterprise features are enabled: - Checks if super cluster is enabled - Reads the `enrichment_table_get_region` configuration - If a primary region is specified, uses only that region for the search request - Otherwise, uses empty regions array (queries all regions) 3. **Cache Initialization** (`schema.rs:588-591`): - When caching enrichment tables during node startup (`cache_enrichment_tables()`), passes `true` for the parameter - This ensures node startup only queries the primary region for enrichment data 4. **Runtime Watch Operations** (`enrichment_table.rs:310-328`): - Watch operations pass `false` for the parameter - This maintains existing behavior for runtime updates 5. **Code Cleanup**: - Removed unused `get()` function (lines 105-146 deleted) - Simplified logging by removing redundant empty/non-empty checks - Added better debug logging showing number of records fetched - **Performance**: Node startup is faster as it only queries the designated primary region for enrichment table data - **Correctness**: Ensures enrichment tables are fetched from the authoritative source (primary region) during initialization - **Flexibility**: Runtime operations can still query all regions if needed This feature requires enterprise edition with super cluster enabled and the `enrichment_table_get_region` configuration set to specify which region should be considered the primary source for enrichment table data.
1c86a23 to
01219cd
Compare
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| All tests passed | 365 | 339 | 0 | 19 | 7 | 93% | 4m 39s |
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| All tests passed | 365 | 343 | 0 | 19 | 3 | 94% | 4m 40s |
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| All tests passed | 365 | 345 | 0 | 19 | 1 | 95% | 4m 38s |
User description
In a multi-region super cluster setup, when a node starts up and caches
enrichment tables, it generally queries all regions for enrichment table
data. However, if user is sure that enrichment table is stored in one
primary region, then querying all regions is inefficient and might even
fail if other region queriers return error for some reason. Which can
eventually cause node start failure.
Solution
This commit adds a new
apply_primary_region_if_specifiedbooleanparameter throughout the enrichment table data fetching flow to control
whether to use only the primary region when fetching data.
Key Changes
New Parameter Added (
enrichment_table.rs:40,mod.rs:190):apply_primary_region_if_specifiedboolean parameter to:get_enrichment_table_data()get_enrichment_table()get_enrichment_table_json()Primary Region Logic (
enrichment_table.rs:52-68):apply_primary_region_if_specifiedis true and enterprisefeatures are enabled:
enrichment_table_get_regionconfigurationrequest
schema.rs:588-591):(
cache_enrichment_tables()), passestruefor the parameterenrichment data
Runtime Watch Operations (
enrichment_table.rs:310-328):falsefor the parameterCode Cleanup:
get()function (lines 105-146 deleted)Impact
designated primary region for enrichment table data
authoritative source (primary region) during initialization
needed
Configuration Required
This feature requires enterprise edition with super cluster enabled and
the
enrichment_table_get_regionconfiguration set to specify whichregion should be considered the primary source for enrichment table
data.
PR Type
Enhancement
Description
Add primary-region-aware enrichment fetch
Thread control flag across fetch APIs
Use primary region on node startup
Preserve all-regions for watch updates
Diagram Walkthrough
File Walkthrough
enrichment_table.rs
Region-aware enrichment data retrieval and watch updatessrc/service/db/enrichment_table.rs
_apply_primary_region_if_specifiedtoget_enrichment_table_dataregionsbased on enterprise config and flagsearch_innerwith computed regions and enable consistency flagfalseschema.rs
Startup caching uses primary region when specifiedsrc/service/db/schema.rs
get_enrichment_table(..., true)mod.rs
Plumb primary-region flag through enrichment APIssrc/service/enrichment/mod.rs
apply_primary_region_if_specifiedthrough public APIs