Skip to content

Conversation

@Subhra264
Copy link
Contributor

Problem

In a multi-region super cluster setup, when a node starts up and caches enrichment tables, it generally queries all regions for enrichment table data. However, if user is sure that enrichment table is stored in one primary region, then querying all regions is inefficient and might even fail if other region queriers return error for some reason. Which can eventually cause node start failure.

Solution

This commit adds a new apply_primary_region_if_specified boolean parameter throughout the enrichment table data fetching flow to control whether to use only the primary region when fetching data.

Key Changes

  1. New Parameter Added (enrichment_table.rs:40, mod.rs:190):

    • Added apply_primary_region_if_specified boolean parameter to:
      • get_enrichment_table_data()
      • get_enrichment_table()
      • get_enrichment_table_json()
  2. Primary Region Logic (enrichment_table.rs:52-68):

    • When apply_primary_region_if_specified is true and enterprise features are enabled:
      • Checks if super cluster is enabled
      • Reads the enrichment_table_get_region configuration
      • If a primary region is specified, uses only that region for the search request
    • Otherwise, uses empty regions array (queries all regions)
  3. Cache Initialization (schema.rs:588-591):

    • When caching enrichment tables during node startup (cache_enrichment_tables()), passes true for the parameter
    • This ensures node startup only queries the primary region for enrichment data
  4. Runtime Watch Operations (enrichment_table.rs:310-328):

    • Watch operations pass false for the parameter
    • This maintains existing behavior for runtime updates
  5. Code Cleanup:

    • Removed unused get() function (lines 105-146 deleted)
    • Simplified logging by removing redundant empty/non-empty checks
    • Added better debug logging showing number of records fetched

Impact

  • Performance: Node startup is faster as it only queries the designated primary region for enrichment table data
  • Correctness: Ensures enrichment tables are fetched from the authoritative source (primary region) during initialization
  • Flexibility: Runtime operations can still query all regions if needed

Configuration Required

This feature requires enterprise edition with super cluster enabled and the enrichment_table_get_region configuration set to specify which region should be considered the primary source for enrichment table data.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

Added conditional primary region support for enrichment table caching during node startup to improve reliability and performance in multi-region super cluster setups.

Key changes:

  • Added apply_primary_region_if_specified boolean parameter to get_enrichment_table_data(), get_enrichment_table(), and get_enrichment_table_json()
  • When enabled (enterprise mode with super cluster), queries only the configured primary region instead of all regions
  • Node startup (cache_enrichment_tables()) passes true to use primary region only
  • Runtime watch operations pass false to maintain existing multi-region query behavior
  • Removed unused get() function (lines 105-146)
  • Improved logging to show record counts consistently

Benefits:

  • Faster node startup by avoiding unnecessary cross-region queries
  • Reduced failure risk when other regions return errors
  • Maintains flexibility for runtime operations to query all regions

Confidence Score: 4/5

  • Safe to merge with one minor style improvement
  • The implementation is well-designed with clear separation between startup and runtime behavior. The logic is protected by enterprise feature flags and proper configuration checks. Only minor style issue with underscore-prefixed parameter name.
  • No files require special attention - all changes are straightforward and well-documented

Important Files Changed

File Analysis

Filename Score Overview
src/service/db/enrichment_table.rs 4/5 Added primary region logic and removed unused get() function. Parameter naming could be improved.
src/service/db/schema.rs 5/5 Updated to pass true for primary region during cache initialization with clear comment explaining the intent.
src/service/enrichment/mod.rs 5/5 Added parameter threading with helpful documentation. Watch operations correctly pass false to maintain existing behavior.

Sequence Diagram

sequenceDiagram
    participant Node as Node Startup
    participant Schema as cache_enrichment_tables()
    participant Enrichment as get_enrichment_table()
    participant DB as get_enrichment_table_data()
    participant Config as Enterprise Config
    participant Search as Search Service
    
    Note over Node,Search: Node Initialization Flow
    Node->>Schema: Start caching enrichment tables
    Schema->>Enrichment: get_enrichment_table(org_id, name, true)
    Enrichment->>DB: get_enrichment_table_data(org_id, name, true)
    
    alt Enterprise enabled & primary region specified
        DB->>Config: Check super_cluster.enabled
        DB->>Config: Get enrichment_table_get_region
        Config-->>DB: Return primary region
        DB->>Search: Search with regions=[primary_region]
    else No primary region or non-enterprise
        DB->>Search: Search with regions=[]
    end
    
    Search-->>DB: Return enrichment data
    DB-->>Enrichment: Return records
    Enrichment-->>Schema: Return VRL values
    Schema->>Schema: Cache in ENRICHMENT_TABLES
    
    Note over Node,Search: Runtime Watch Flow
    Node->>DB: watch() detects update
    DB->>Enrichment: get_enrichment_table(org_id, name, false)
    Enrichment->>DB: get_enrichment_table_data(org_id, name, false)
    DB->>Search: Search with regions=[] (all regions)
    Search-->>DB: Return enrichment data
    DB->>DB: Update cache
Loading

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@Subhra264 Subhra264 merged commit 89e9433 into branch-v0.14.6-rc9 Oct 15, 2025
9 of 14 checks passed
@Subhra264 Subhra264 deleted the enrich_env branch October 15, 2025 12:38
Subhra264 added a commit that referenced this pull request Oct 24, 2025
…les on node start (#8816)

In a multi-region super cluster setup, when a node starts up and caches
enrichment tables, it generally queries all regions for enrichment table
data. However, if user is sure that enrichment table is stored in one
primary region, then querying all regions is inefficient and might even
fail if other region queriers return error for some reason. Which can
eventually cause node start failure.

This commit adds a new `apply_primary_region_if_specified` boolean
parameter throughout the enrichment table data fetching flow to control
whether to use only the primary region when fetching data.

1. **New Parameter Added** (`enrichment_table.rs:40`, `mod.rs:190`):
   - Added `apply_primary_region_if_specified` boolean parameter to:
     - `get_enrichment_table_data()`
     - `get_enrichment_table()`
     - `get_enrichment_table_json()`

2. **Primary Region Logic** (`enrichment_table.rs:52-68`):
- When `apply_primary_region_if_specified` is true and enterprise
features are enabled:
     - Checks if super cluster is enabled
     - Reads the `enrichment_table_get_region` configuration
- If a primary region is specified, uses only that region for the search
request
   - Otherwise, uses empty regions array (queries all regions)

3. **Cache Initialization** (`schema.rs:588-591`):
- When caching enrichment tables during node startup
(`cache_enrichment_tables()`), passes `true` for the parameter
- This ensures node startup only queries the primary region for
enrichment data

4. **Runtime Watch Operations** (`enrichment_table.rs:310-328`):
   - Watch operations pass `false` for the parameter
   - This maintains existing behavior for runtime updates

5. **Code Cleanup**:
   - Removed unused `get()` function (lines 105-146 deleted)
   - Simplified logging by removing redundant empty/non-empty checks
   - Added better debug logging showing number of records fetched

- **Performance**: Node startup is faster as it only queries the
designated primary region for enrichment table data
- **Correctness**: Ensures enrichment tables are fetched from the
authoritative source (primary region) during initialization
- **Flexibility**: Runtime operations can still query all regions if
needed

This feature requires enterprise edition with super cluster enabled and
the `enrichment_table_get_region` configuration set to specify which
region should be considered the primary source for enrichment table
data.
Subhra264 added a commit that referenced this pull request Oct 24, 2025
…les on node start (#8816)

In a multi-region super cluster setup, when a node starts up and caches
enrichment tables, it generally queries all regions for enrichment table
data. However, if user is sure that enrichment table is stored in one
primary region, then querying all regions is inefficient and might even
fail if other region queriers return error for some reason. Which can
eventually cause node start failure.

This commit adds a new `apply_primary_region_if_specified` boolean
parameter throughout the enrichment table data fetching flow to control
whether to use only the primary region when fetching data.

1. **New Parameter Added** (`enrichment_table.rs:40`, `mod.rs:190`):
   - Added `apply_primary_region_if_specified` boolean parameter to:
     - `get_enrichment_table_data()`
     - `get_enrichment_table()`
     - `get_enrichment_table_json()`

2. **Primary Region Logic** (`enrichment_table.rs:52-68`):
- When `apply_primary_region_if_specified` is true and enterprise
features are enabled:
     - Checks if super cluster is enabled
     - Reads the `enrichment_table_get_region` configuration
- If a primary region is specified, uses only that region for the search
request
   - Otherwise, uses empty regions array (queries all regions)

3. **Cache Initialization** (`schema.rs:588-591`):
- When caching enrichment tables during node startup
(`cache_enrichment_tables()`), passes `true` for the parameter
- This ensures node startup only queries the primary region for
enrichment data

4. **Runtime Watch Operations** (`enrichment_table.rs:310-328`):
   - Watch operations pass `false` for the parameter
   - This maintains existing behavior for runtime updates

5. **Code Cleanup**:
   - Removed unused `get()` function (lines 105-146 deleted)
   - Simplified logging by removing redundant empty/non-empty checks
   - Added better debug logging showing number of records fetched

- **Performance**: Node startup is faster as it only queries the
designated primary region for enrichment table data
- **Correctness**: Ensures enrichment tables are fetched from the
authoritative source (primary region) during initialization
- **Flexibility**: Runtime operations can still query all regions if
needed

This feature requires enterprise edition with super cluster enabled and
the `enrichment_table_get_region` configuration set to specify which
region should be considered the primary source for enrichment table
data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants