Skip to content

Conversation

@hengfeiyang
Copy link
Contributor

Refactor get_opts function that improve 5x for vortex search.

@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 27, 2025

Greptile Overview

Greptile Summary

Refactored get_opts function across the caching layer to return GetResult instead of Bytes, improving performance for vortex search by avoiding unnecessary data copying and materialization. The change moves the result construction closer to the data source (disk/memory cache), allowing more efficient streaming.

  • Disk cache now returns GetResultPayload::File directly with file handle
  • Memory cache returns GetResultPayload::Stream wrapping cached bytes
  • Storage layer simplified by delegating GetResult construction to cache modules
  • Added e_tag generation for better cache validation

Confidence Score: 4/5

  • This PR is safe to merge after addressing one potential panic issue in the disk cache range handling.
  • The refactoring is well-structured and follows existing patterns. One issue: disk.rs:524 uses .unwrap() on as_range() which could panic on invalid ranges, while memory.rs properly handles this with map_err.
  • src/infra/src/cache/file_data/disk.rs - potential panic on line 524

Important Files Changed

File Analysis

Filename Score Overview
src/infra/src/cache/file_data/disk.rs 3/5 Added new get_opts function for disk cache with GetOptions support. Contains potential panic due to .unwrap() on as_range() at line 524.
src/infra/src/cache/file_data/memory.rs 5/5 Added new get_opts function for memory cache with GetOptions support. Properly handles range errors with map_err.
src/infra/src/cache/file_data/mod.rs 5/5 Refactored get_opts to return GetResult instead of Bytes, delegating to disk and memory cache get_opts implementations.
src/infra/src/cache/storage.rs 5/5 Simplified by delegating to file_data::get_opts which now returns GetResult directly. Added e_tag to head method. Removed redundant range validation.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant CacheFS
    participant file_data
    participant memory
    participant disk
    participant storage

    Caller->>CacheFS: get_opts(account, location, options)
    CacheFS->>file_data: get_opts(account, path, options, false)
    
    alt Memory cache enabled
        file_data->>memory: get_opts(file, options)
        memory-->>file_data: GetResult (Stream payload)
        file_data-->>CacheFS: Ok(GetResult)
    else Disk cache enabled
        file_data->>disk: get_opts(file, options)
        disk-->>file_data: GetResult (File payload)
        file_data-->>CacheFS: Ok(GetResult)
    else Cache miss & remote=true
        file_data->>storage: get_opts(account, file, options)
        storage-->>file_data: GetResult
        file_data-->>CacheFS: Ok(GetResult)
    end
    
    CacheFS-->>Caller: GetResult
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the get_opts function in the cache layer to optimize performance for vortex search operations, targeting a 5x improvement. The changes centralize GetResult construction logic by moving it from storage.rs into the individual cache implementations (memory.rs and disk.rs), eliminating redundant code and improving maintainability.

Key Changes

  • Refactored file_data::get_opts to return GetResult instead of Bytes, with cache implementations now handling GetOptions directly including range requests and preconditions
  • Simplified CacheFS methods (get, get_opts, get_range) to delegate GetResult construction to cache implementations
  • Added new get_opts functions in memory.rs and disk.rs with proper etag generation and range handling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/infra/src/cache/storage.rs Simplified cache facade by removing manual GetResult construction, now delegates to cache implementations; updated imports and etag format
src/infra/src/cache/file_data/mod.rs Changed get_opts signature to return GetResult instead of Bytes; updated get function to call .bytes() on result
src/infra/src/cache/file_data/memory.rs Added get_opts function that constructs GetResult with proper metadata, etag, range handling, and precondition checks
src/infra/src/cache/file_data/disk.rs Added get_opts function with file-based payload; added helper functions last_modified and get_etag for consistent metadata generation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hengfeiyang hengfeiyang merged commit 467a42d into main Nov 28, 2025
61 of 66 checks passed
@hengfeiyang hengfeiyang deleted the perf/object_store branch November 28, 2025 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants