-
Notifications
You must be signed in to change notification settings - Fork 715
feat: use primary region if specified when caching the enrichment tables on node start #8816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
Added conditional primary region support for enrichment table caching during node startup to improve reliability and performance in multi-region super cluster setups.
Key changes:
- Added
apply_primary_region_if_specifiedboolean parameter toget_enrichment_table_data(),get_enrichment_table(), andget_enrichment_table_json() - When enabled (enterprise mode with super cluster), queries only the configured primary region instead of all regions
- Node startup (
cache_enrichment_tables()) passestrueto use primary region only - Runtime watch operations pass
falseto maintain existing multi-region query behavior - Removed unused
get()function (lines 105-146) - Improved logging to show record counts consistently
Benefits:
- Faster node startup by avoiding unnecessary cross-region queries
- Reduced failure risk when other regions return errors
- Maintains flexibility for runtime operations to query all regions
Confidence Score: 4/5
- Safe to merge with one minor style improvement
- The implementation is well-designed with clear separation between startup and runtime behavior. The logic is protected by enterprise feature flags and proper configuration checks. Only minor style issue with underscore-prefixed parameter name.
- No files require special attention - all changes are straightforward and well-documented
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| src/service/db/enrichment_table.rs | 4/5 | Added primary region logic and removed unused get() function. Parameter naming could be improved. |
| src/service/db/schema.rs | 5/5 | Updated to pass true for primary region during cache initialization with clear comment explaining the intent. |
| src/service/enrichment/mod.rs | 5/5 | Added parameter threading with helpful documentation. Watch operations correctly pass false to maintain existing behavior. |
Sequence Diagram
sequenceDiagram
participant Node as Node Startup
participant Schema as cache_enrichment_tables()
participant Enrichment as get_enrichment_table()
participant DB as get_enrichment_table_data()
participant Config as Enterprise Config
participant Search as Search Service
Note over Node,Search: Node Initialization Flow
Node->>Schema: Start caching enrichment tables
Schema->>Enrichment: get_enrichment_table(org_id, name, true)
Enrichment->>DB: get_enrichment_table_data(org_id, name, true)
alt Enterprise enabled & primary region specified
DB->>Config: Check super_cluster.enabled
DB->>Config: Get enrichment_table_get_region
Config-->>DB: Return primary region
DB->>Search: Search with regions=[primary_region]
else No primary region or non-enterprise
DB->>Search: Search with regions=[]
end
Search-->>DB: Return enrichment data
DB-->>Enrichment: Return records
Enrichment-->>Schema: Return VRL values
Schema->>Schema: Cache in ENRICHMENT_TABLES
Note over Node,Search: Runtime Watch Flow
Node->>DB: watch() detects update
DB->>Enrichment: get_enrichment_table(org_id, name, false)
Enrichment->>DB: get_enrichment_table_data(org_id, name, false)
DB->>Search: Search with regions=[] (all regions)
Search-->>DB: Return enrichment data
DB->>DB: Update cache
3 files reviewed, 1 comment
hengfeiyang
approved these changes
Oct 15, 2025
Loaki07
approved these changes
Oct 15, 2025
…les on node start
Subhra264
added a commit
that referenced
this pull request
Oct 24, 2025
…les on node start (#8816) In a multi-region super cluster setup, when a node starts up and caches enrichment tables, it generally queries all regions for enrichment table data. However, if user is sure that enrichment table is stored in one primary region, then querying all regions is inefficient and might even fail if other region queriers return error for some reason. Which can eventually cause node start failure. This commit adds a new `apply_primary_region_if_specified` boolean parameter throughout the enrichment table data fetching flow to control whether to use only the primary region when fetching data. 1. **New Parameter Added** (`enrichment_table.rs:40`, `mod.rs:190`): - Added `apply_primary_region_if_specified` boolean parameter to: - `get_enrichment_table_data()` - `get_enrichment_table()` - `get_enrichment_table_json()` 2. **Primary Region Logic** (`enrichment_table.rs:52-68`): - When `apply_primary_region_if_specified` is true and enterprise features are enabled: - Checks if super cluster is enabled - Reads the `enrichment_table_get_region` configuration - If a primary region is specified, uses only that region for the search request - Otherwise, uses empty regions array (queries all regions) 3. **Cache Initialization** (`schema.rs:588-591`): - When caching enrichment tables during node startup (`cache_enrichment_tables()`), passes `true` for the parameter - This ensures node startup only queries the primary region for enrichment data 4. **Runtime Watch Operations** (`enrichment_table.rs:310-328`): - Watch operations pass `false` for the parameter - This maintains existing behavior for runtime updates 5. **Code Cleanup**: - Removed unused `get()` function (lines 105-146 deleted) - Simplified logging by removing redundant empty/non-empty checks - Added better debug logging showing number of records fetched - **Performance**: Node startup is faster as it only queries the designated primary region for enrichment table data - **Correctness**: Ensures enrichment tables are fetched from the authoritative source (primary region) during initialization - **Flexibility**: Runtime operations can still query all regions if needed This feature requires enterprise edition with super cluster enabled and the `enrichment_table_get_region` configuration set to specify which region should be considered the primary source for enrichment table data.
Subhra264
added a commit
that referenced
this pull request
Oct 24, 2025
…les on node start (#8816) In a multi-region super cluster setup, when a node starts up and caches enrichment tables, it generally queries all regions for enrichment table data. However, if user is sure that enrichment table is stored in one primary region, then querying all regions is inefficient and might even fail if other region queriers return error for some reason. Which can eventually cause node start failure. This commit adds a new `apply_primary_region_if_specified` boolean parameter throughout the enrichment table data fetching flow to control whether to use only the primary region when fetching data. 1. **New Parameter Added** (`enrichment_table.rs:40`, `mod.rs:190`): - Added `apply_primary_region_if_specified` boolean parameter to: - `get_enrichment_table_data()` - `get_enrichment_table()` - `get_enrichment_table_json()` 2. **Primary Region Logic** (`enrichment_table.rs:52-68`): - When `apply_primary_region_if_specified` is true and enterprise features are enabled: - Checks if super cluster is enabled - Reads the `enrichment_table_get_region` configuration - If a primary region is specified, uses only that region for the search request - Otherwise, uses empty regions array (queries all regions) 3. **Cache Initialization** (`schema.rs:588-591`): - When caching enrichment tables during node startup (`cache_enrichment_tables()`), passes `true` for the parameter - This ensures node startup only queries the primary region for enrichment data 4. **Runtime Watch Operations** (`enrichment_table.rs:310-328`): - Watch operations pass `false` for the parameter - This maintains existing behavior for runtime updates 5. **Code Cleanup**: - Removed unused `get()` function (lines 105-146 deleted) - Simplified logging by removing redundant empty/non-empty checks - Added better debug logging showing number of records fetched - **Performance**: Node startup is faster as it only queries the designated primary region for enrichment table data - **Correctness**: Ensures enrichment tables are fetched from the authoritative source (primary region) during initialization - **Flexibility**: Runtime operations can still query all regions if needed This feature requires enterprise edition with super cluster enabled and the `enrichment_table_get_region` configuration set to specify which region should be considered the primary source for enrichment table data.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
In a multi-region super cluster setup, when a node starts up and caches enrichment tables, it generally queries all regions for enrichment table data. However, if user is sure that enrichment table is stored in one primary region, then querying all regions is inefficient and might even fail if other region queriers return error for some reason. Which can eventually cause node start failure.
Solution
This commit adds a new
apply_primary_region_if_specifiedboolean parameter throughout the enrichment table data fetching flow to control whether to use only the primary region when fetching data.Key Changes
New Parameter Added (
enrichment_table.rs:40,mod.rs:190):apply_primary_region_if_specifiedboolean parameter to:get_enrichment_table_data()get_enrichment_table()get_enrichment_table_json()Primary Region Logic (
enrichment_table.rs:52-68):apply_primary_region_if_specifiedis true and enterprise features are enabled:enrichment_table_get_regionconfigurationCache Initialization (
schema.rs:588-591):cache_enrichment_tables()), passestruefor the parameterRuntime Watch Operations (
enrichment_table.rs:310-328):falsefor the parameterCode Cleanup:
get()function (lines 105-146 deleted)Impact
Configuration Required
This feature requires enterprise edition with super cluster enabled and the
enrichment_table_get_regionconfiguration set to specify which region should be considered the primary source for enrichment table data.