Skip to content

Conversation

@hengfeiyang
Copy link
Contributor

fixed #9419

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Dec 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

Failed to generate code suggestions for PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 2, 2025

Greptile Overview

Greptile Summary

This PR rolls back the use of the scc (Scalable Concurrent Containers) library and replaces it with parking_lot::RwLock and tokio::sync::RwLock combined with hashbrown::HashMap/HashSet. The change addresses issue #9419 where searches were blocking after upgrading to v0.20.1.

Key changes:

  • Replaced all scc::HashMap and scc::HashSet instances with lock-protected hashbrown collections
  • Converted async lock operations to synchronous ones using parking_lot::RwLock or tokio::sync::RwLock
  • Removed unnecessary tokio::spawn wrapper in flight handler
  • Improved lock ordering in file_list_dump.rs with explicit lock drops to prevent deadlocks
  • Updated local_lock API to use a two-step pattern (get LockHolder, then call .lock())

The rollback simplifies the concurrency model by using well-established locking primitives instead of the lock-free scc library, which may have had subtle issues causing the search blocking behavior.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk, as it reverts to proven concurrency primitives
  • The rollback from scc to standard locking primitives is well-executed with consistent patterns across all files. The change addresses a critical P0 blocker where searches were getting stuck. While the implementation is solid, there's a minor concern about the synchronization model change from lock-free to lock-based potentially affecting performance under high concurrency, though this is acceptable given it fixes the blocking issue.
  • Pay close attention to src/common/infra/wal.rs and src/job/file_list_dump.rs for lock ordering and deadlock prevention

Important Files Changed

File Analysis

Filename Score Overview
src/common/infra/wal.rs 4/5 Replaces scc async HashMap with parking_lot RwLock + hashbrown HashMap for file locking; removes async from all functions
src/infra/src/local_lock.rs 4/5 Replaces scc HashMap with tokio RwLock + hashbrown HashMap; changes lock API to return LockHolder with Drop trait
src/job/file_list_dump.rs 4/5 Replaces scc HashSet with tokio RwLock + hashbrown HashSet; adds critical lock ordering fix to prevent deadlock
src/service/schema.rs 4/5 Updates local_lock API usage to new two-step pattern (get LockHolder, then lock)
src/service/search/datafusion/storage/file_list.rs 4/5 Replaces scc HashMap with parking_lot RwLock + hashbrown HashMap; improves clear logic with explicit key collection

Sequence Diagram

sequenceDiagram
    participant Client as Search Client
    participant Flight as Flight Handler
    participant WAL as WAL Module
    participant Files as File System
    participant Locker as SearchingFileLocker

    Client->>Flight: Search Request (trace_id)
    Flight->>WAL: get_file_list()
    WAL->>Files: List WAL files
    Files-->>WAL: file_keys[]
    
    WAL->>Locker: lock_files(file_keys)
    Note over Locker: Acquires write lock<br/>Increments counter for each file
    Locker-->>WAL: Files locked
    
    WAL->>WAL: lock_request(trace_id, file_keys)
    Note over WAL: Stores trace_id -> files mapping<br/>for cleanup on request completion
    
    WAL-->>Flight: locked_files[]
    Flight->>Flight: Process search query
    
    alt Search Completes Successfully
        Flight->>WAL: release_request(trace_id)
        WAL->>WAL: Remove trace_id mapping
        WAL->>Locker: release_files(file_keys)
        Note over Locker: Acquires write lock<br/>Decrements counter for each file<br/>Removes entry if counter reaches 0
    else Search Fails or Filtered Out
        Flight->>WAL: release_files(specific_file)
        WAL->>Locker: release_files([file])
        Note over Locker: Decrements counter<br/>Removes if counter = 0
    end
    
    Note over Locker: Files with counter > 0<br/>are protected from deletion<br/>by ingester cleanup jobs
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@hengfeiyang hengfeiyang merged commit 4a745cf into branch-v0.20.0 Dec 2, 2025
31 of 34 checks passed
@hengfeiyang hengfeiyang deleted the branch-v0.20.0-hotfix branch December 2, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants