Skip to content

Conversation

@ByteBaker
Copy link
Contributor

@ByteBaker ByteBaker commented Sep 1, 2025

PR Type

Tests


Description

  • Add extensive unit tests across modules

  • Cover cache, files, reports logic

  • Include concurrency and feature flags tests

  • Validate conversion and grouping utilities


Diagram Walkthrough

flowchart LR
  Reports["Reports module tests"] -- "create & validate report" --> Reports
  Distinct["Distinct values tests"] -- "origin, emit events" --> Distinct
  Enrich["Enrichment table tests"] -- "JSON->VRL, getters" --> Enrich
  FileBroadcast["File list broadcast tests"] -- "send, statics, helpers" --> FileBroadcast
  FileLocal["File list local tests"] -- "pending/removing, concurrency" --> FileLocal
  FileMod["File list mod tests"] -- "sets/maps, progress" --> FileMod
  SearchStorage["Search storage tests"] -- "cache entry, chunking, grouping" --> SearchStorage
Loading

File Walkthrough

Relevant files
Tests
7 files
reports.rs
Add unit tests for report creation defaults                           
+47/-0   
distinct_values.rs
Add tests for records and enterprise emit APIs                     
+65/-0   
enrichment_table.rs
Expand tests for JSON-to-VRL and getters                                 
+118/-1 
broadcast.rs
Add tests for send flow and statics access                             
+87/-1   
local.rs
Add tests for pending/removing files and concurrency         
+66/-0   
mod.rs
Add tests for duplicate/deleted sets and progress               
+166/-0 
storage.rs
Add tests for cache entry, chunking, and grouping               
+200/-0 

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Fragile Tests

Several async tests assert specific return values from external-service dependent functions (e.g., expecting empty results or BASE_TIME) which may be environment-dependent and flaky. Consider mocking or gating with feature flags to avoid brittle assumptions.

#[tokio::test]
async fn test_get_enrichment_table_data() {
    // This will fail in test environment due to missing dependencies,
    // but tests the function structure
    let result = get_enrichment_table_data("test_org", "test_table").await;
    assert!(result.is_ok());
    let data = result.unwrap();
    assert_eq!(data.len(), 0); // Should return empty vec due to search service failure
}

#[tokio::test]
async fn test_get() {
    let result = get("test_org", "test_table").await;
    assert!(result.is_ok());
    let data = result.unwrap();
    assert_eq!(data.len(), 0); // Should return empty vec due to search service failure
}

#[tokio::test]
async fn test_get_table_size() {
    let size = get_table_size("test_org", "test_table").await;
    assert_eq!(size, 0.0); // Should return 0 when no data is found
}

#[tokio::test]
async fn test_get_start_time() {
    let start_time = get_start_time("test_org", "test_table").await;
    assert!(start_time > 0); // Should return BASE_TIME if no stats found
}

#[tokio::test]
async fn test_get_meta_table_stats() {
    let result = get_meta_table_stats("test_org", "test_table").await;
    assert!(result.is_none()); // Should return None when no stats exist
}
Trivial Assertions

Enterprise tests assert result is ok or err, which is tautological and provides no validation. Replace with mocks or feature-gated branches that verify behavior when super cluster is enabled/disabled.

    #[tokio::test]
    async fn test_emit_put_event() {
        let record = create_test_record();

        // Test function exists and handles the record
        let result = emit_put_event(&record).await;
        // Would succeed if super cluster is disabled
        assert!(result.is_ok() || result.is_err());
    }

    #[tokio::test]
    async fn test_emit_delete_event() {
        let record = create_test_record();

        let result = emit_delete_event(&record).await;
        assert!(result.is_ok() || result.is_err());
    }

    #[tokio::test]
    async fn test_emit_batch_delete_event() {
        let result = emit_batch_delete_event(&OriginType::Stream, "test_id").await;
        assert!(result.is_ok() || result.is_err());
    }
}
Global State Contention

Tests mutate global statics like DEPULICATE_FILES/DELETED_FILES without isolation or reset, which can cause order-dependent failures under concurrent test runs. Add setup/teardown or use scoped state/mocks.

#[test]
fn test_duplicate_files_operations() {
    let test_file = "duplicate_test.parquet";

    // Test insertion
    DEPULICATE_FILES.insert(test_file.to_string());
    assert!(DEPULICATE_FILES.contains(test_file));

    // Test removal
    DEPULICATE_FILES.remove(test_file);
    assert!(!DEPULICATE_FILES.contains(test_file));
}

#[test]
fn test_deleted_files_operations() {
    let test_file = "deleted_test.parquet";
    let meta = create_test_file_meta();

    // Test insertion
    DELETED_FILES.insert(test_file.to_string(), meta.clone());
    assert!(DELETED_FILES.contains_key(test_file));

    // Test retrieval
    if let Some(stored_meta) = DELETED_FILES.get(test_file) {
        assert_eq!(stored_meta.records, meta.records);
        assert_eq!(stored_meta.min_ts, meta.min_ts);
    }

    // Test removal
    DELETED_FILES.remove(test_file);
    assert!(!DELETED_FILES.contains_key(test_file));
}

#[test]
fn test_blocked_orgs() {
    // Test that BLOCKED_ORGS is initialized from config
    let blocked_count = BLOCKED_ORGS.len();
    assert!(blocked_count >= 0); // Can be 0 if no orgs are blocked in config

    // Test that it contains strings
    for org in BLOCKED_ORGS.iter() {
        assert!(!org.is_empty() || org.is_empty()); // Just verify it's a string
    }
}

#[tokio::test]
async fn test_concurrent_duplicate_files_access() {
    let base_file = "concurrent_dup_test";

    // Test concurrent access to duplicate files
    let tasks: Vec<_> = (0..10)
        .map(|i| {
            let file = format!("{base_file}_{i}.parquet");
            tokio::spawn(async move {
                DEPULICATE_FILES.insert(file.clone());
                tokio::time::sleep(tokio::time::Duration::from_millis(1)).await;
                DEPULICATE_FILES.remove(&file);
            })
        })
        .collect();

    // Wait for all tasks to complete
    for task in tasks {
        task.await.unwrap();
    }

    // Verify all files are removed
    for i in 0..10 {
        let file = format!("{base_file}_{i}.parquet");
        assert!(!DEPULICATE_FILES.contains(&file));
    }
}

#[tokio::test]
async fn test_concurrent_deleted_files_access() {
    let base_file = "concurrent_del_test";
    let meta = create_test_file_meta();

    // Test concurrent access to deleted files
    let tasks: Vec<_> = (0..10)
        .map(|i| {
            let file = format!("{base_file}_{i}.parquet");
            let meta_clone = meta.clone();
            tokio::spawn(async move {
                DELETED_FILES.insert(file.clone(), meta_clone);
                tokio::time::sleep(tokio::time::Duration::from_millis(1)).await;
                DELETED_FILES.remove(&file);
            })
        })
        .collect();

    // Wait for all tasks to complete
    for task in tasks {
        task.await.unwrap();
    }

    // Verify all files are removed
    for i in 0..10 {
        let file = format!("{base_file}_{i}.parquet");
        assert!(!DELETED_FILES.contains_key(&file));
    }
}

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR adds comprehensive unit tests across multiple service modules as part of a test coverage improvement initiative. The changes include:

Search GRPC Storage Module: Added thorough tests for utility functions including get_cache_entry, into_chunks, repartition_sorted_groups, group_files_by_time_range, and regroup_tantivy_files. These tests validate edge cases like empty inputs, single element groups, exact divisible chunks, and different cache entry types (BitVec vs RoaringBitmap based on percentage thresholds).

Enrichment Table Service: Added comprehensive tests for the convert_to_vrl function covering all JSON value types (null, boolean, number, string, array, object), plus basic async tests for key service functions. The convert_to_vrl tests are particularly well-structured as they test a pure function with clear input/output relationships.

Database Services: Added unit tests across several database service modules:

  • Reports: Basic structural tests with a helper function for creating test Report instances
  • Distinct Values: Tests for record creation and conditional compilation tests for enterprise features
  • File List (Local): Tests for file existence checking, static variable access, and concurrent operations to prevent deadlocks
  • File List (Broadcast): Tests for the send function, EventChannel creation, and static variable manipulation
  • File List (Main): Tests covering static variable operations, concurrent access patterns, and feature-specific functionality

The tests follow consistent patterns with proper helper functions, appropriate use of conditional compilation for enterprise features, and focus on testing both happy path scenarios and edge cases. All changes maintain existing copyright headers and don't modify production logic.

Confidence score: 4/5

  • This PR is generally safe to merge with some minor concerns that should be addressed
  • Score reflects good test coverage additions but issues with static variable state management and weak assertions in some tests
  • Pay close attention to file_list module tests for potential test isolation issues

Context used:

Context - Consolidate repeated setup code in tests into shared fixtures to adhere to DRY (Don't Repeat Yourself) principles. (link)
Context - Consolidate repeated setup code in tests into shared fixtures to adhere to DRY (Don't Repeat Yourself) principles. (link)

7 files reviewed, 2 comments

Edit Code Review Bot Settings | Greptile

assert_eq!(record.org_name, "test_org");
assert_eq!(record.stream_name, "test_stream");
assert_eq!(record.field_name, "test_field");
assert_eq!(record.field_name, "test_field");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Duplicate assertion - field_name is checked twice

Suggested change
assert_eq!(record.field_name, "test_field");
assert_eq!(record.stream_type, "logs");

Comment on lines +134 to +135
assert!(pending_count >= 0);
assert!(removing_count >= 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: These assertions will always pass since len() returns usize which is always >= 0

Suggested change
assert!(pending_count >= 0);
assert!(removing_count >= 0);
// Just verify they're accessible (no meaningful assertions needed for counts)

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid nondeterministic test panic

This test assumes success and unwraps the result, which can cause nondeterministic
failures when external dependencies are unavailable. Make the test resilient by
accepting both success and error paths without panicking and only asserting on
behavior when successful.

src/service/db/enrichment_table.rs [463-471]

 #[tokio::test]
 async fn test_get_enrichment_table_data() {
-    // This will fail in test environment due to missing dependencies,
-    // but tests the function structure
     let result = get_enrichment_table_data("test_org", "test_table").await;
-    assert!(result.is_ok());
-    let data = result.unwrap();
-    assert_eq!(data.len(), 0); // Should return empty vec due to search service failure
+    if let Ok(data) = result {
+        assert!(data.is_empty());
+    }
+    // If Err, do not unwrap or fail; environment-dependent
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly reduces flakiness by not unwrapping environment-dependent results, aligning with the test's comment about external dependencies; impact is moderate as it improves test reliability without changing functionality.

Medium
Guard against environment-dependent errors

Unwrapping after a generic is_ok assertion can still panic under
environment-dependent failures. Handle the error path without panicking and validate
only when the call succeeds.

src/service/db/enrichment_table.rs [473-479]

 #[tokio::test]
 async fn test_get() {
     let result = get("test_org", "test_table").await;
-    assert!(result.is_ok());
-    let data = result.unwrap();
-    assert_eq!(data.len(), 0); // Should return empty vec due to search service failure
+    if let Ok(data) = result {
+        assert!(data.is_empty());
+    }
+    // If Err, skip assertions to avoid environment-related panics
 }
Suggestion importance[1-10]: 6

__

Why: Similar to the first, it prevents panics from unwrapping on possibly failing calls in test environments; helpful for robustness but a minor improvement.

Low
General
Relax brittle time assertion

This asserts start_time > 0, which will fail if BASE_TIME can be zero or negative in
some configs. Make the assertion tolerant by only checking that a value is returned
(e.g., non-negative) or comparing against the actual BASE_TIME constant if
accessible.

src/service/db/enrichment_table.rs [487-491]

 #[tokio::test]
 async fn test_get_start_time() {
     let start_time = get_start_time("test_org", "test_table").await;
-    assert!(start_time > 0); // Should return BASE_TIME if no stats found
+    assert!(start_time >= 0);
 }
Suggestion importance[1-10]: 5

__

Why: The change may avoid false failures if BASE_TIME can be 0, but it's speculative without proof from the PR; modest improvement in test tolerance with slight risk of masking real issues.

Low

@ByteBaker ByteBaker merged commit c5aab7b into main Sep 1, 2025
28 checks passed
@ByteBaker ByteBaker deleted the chore/coverage branch September 1, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants