Skip to content

Conversation

@hengfeiyang
Copy link
Contributor

fixed #9419

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Dec 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

Failed to generate code suggestions for PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 2, 2025

Greptile Overview

Greptile Summary

This PR rolls back usage of the scc (Scalable Concurrent Collections) crate across multiple modules, replacing it with standard parking_lot::RwLock or tokio::sync::RwLock wrapped HashMap/HashSet collections. This change addresses issue #9419 where searches were getting stuck after upgrading to version 0.20.1.

Key Changes:

  • WAL file locking (src/common/infra/wal.rs): Replaced scc::HashMap<String, AtomicIsize> with parking_lot::RwLock<SearchingFileLocker>. Functions are now synchronous instead of async, with explicit reference counting for file locks.
  • Local lock (src/infra/src/local_lock.rs): Changed from scc::HashMap with OccupiedEntry to RwLock<HashMap> with a new LockHolder struct. The API now requires a two-step lock acquisition pattern.
  • File list storage (src/service/search/datafusion/storage/file_list.rs): Replaced scc::HashMap with parking_lot::RwLock<HashMap> for file and segment caching.
  • Various job/service modules: Consistently replaced scc::HashSet with RwLock<HashSet> for tracking processing files and ongoing jobs.

Memory management improvements:

  • Added shrink_to_fit() calls after removing elements to prevent memory accumulation
  • Explicit lock dropping patterns to prevent potential deadlocks

The rollback from scc to standard locking primitives suggests that the lock-free concurrent collections were causing issues with the search request lifecycle, potentially related to how entries were being acquired and released in the flight search context.

Confidence Score: 4/5

  • This PR addresses a critical P0 bug (search blocking) by rolling back to simpler, well-tested locking primitives.
  • The changes are consistent across all files, replacing scc collections with standard RwLock patterns. The logic is straightforward and follows established Rust concurrency patterns. Tests are updated to match the new API. Minor concern about the local_lock API change requiring callers to use two-step locking, but the only caller (schema.rs) is updated correctly.
  • The src/infra/src/local_lock.rs change removes the lock_multi function - verify no other callers depend on it.

Important Files Changed

File Analysis

Filename Score Overview
src/common/infra/wal.rs 5/5 Major refactor: replaced scc::HashMap with parking_lot::RwLock for WAL file locking, converting async functions to sync. Added SearchingFileLocker struct with reference counting and shrink_to_fit for memory management.
src/handler/grpc/flight/mod.rs 5/5 Simplified clear_session_data by removing tokio::spawn wrapper since release_request is now synchronous.
src/infra/src/local_lock.rs 4/5 Replaced scc::HashMap with RwLock+LockHolder pattern. New design returns a LockHolder that must be explicitly locked, creating a two-step locking process. Removes lock_multi function.
src/service/schema.rs 5/5 Updated local_lock usage to two-step pattern: first obtain LockHolder via lock(), then acquire guard via locker.lock().await.
src/service/search/datafusion/storage/file_list.rs 5/5 Replaced scc::HashMap with parking_lot::RwLock for FILES and SEGMENTS. Changed clear() to explicit key collection and removal with shrink_to_fit.

Sequence Diagram

sequenceDiagram
    participant Client
    participant FlightHandler as Flight Handler
    participant WAL as WAL Module
    participant FileList as File List Storage
    participant SearchFiles as SEARCHING_FILES

    Client->>FlightHandler: Search Request
    FlightHandler->>WAL: lock_files(files)
    WAL->>SearchFiles: write().lock(file) for each file
    Note over SearchFiles: Reference count incremented

    FlightHandler->>WAL: lock_request(trace_id, files)
    WAL->>WAL: Store files in SEARCHING_REQUESTS

    FlightHandler->>FileList: set(trace_id, schema_key, files)
    FileList->>FileList: Store file metadata

    FlightHandler-->>Client: Stream Results

    Client->>FlightHandler: Request Complete/Error
    FlightHandler->>FlightHandler: clear_session_data(trace_id)
    FlightHandler->>FileList: clear(trace_id)
    FileList->>FileList: Remove entries + shrink_to_fit()

    FlightHandler->>WAL: release_request(trace_id)
    WAL->>WAL: Remove from SEARCHING_REQUESTS
    WAL->>SearchFiles: write().release(file) for each file
    Note over SearchFiles: Reference count decremented, removed if 0
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@hengfeiyang hengfeiyang merged commit 9c249f2 into main Dec 3, 2025
39 of 48 checks passed
@hengfeiyang hengfeiyang deleted the fix/rollback-scc branch December 3, 2025 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

search block

4 participants