Skip to content

Conversation

@hengfeiyang
Copy link
Contributor

@hengfeiyang hengfeiyang commented Dec 4, 2025

  • fix: result cache metadata for histogram
  • fix: partition generate for histogram
  • fix: cache filter for histogram
  • feat: no longer delete agg cache file, the solution is only use agg cache file which interval not greater than search interval

@hengfeiyang hengfeiyang marked this pull request as draft December 4, 2025 13:34
@github-actions github-actions bot added the ☢️ Bug Something isn't working label Dec 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

Failed to generate code suggestions for PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 4, 2025

Greptile Overview

Greptile Summary

This PR fixes result caching for histogram queries by addressing several interconnected issues:

  • Histogram interval handling in cache calculations: Added histogram_interval parameter to calculate_deltas to properly handle time boundaries for histogram buckets. When histogram interval is set, delta start time now uses result_meta.end_time directly instead of adding 1 microsecond, preventing overlap between cached and fetched data.

  • Cache hit filtering fix: In get_cached_results, the hit filtering logic now accounts for histogram interval width (hit_ts + histogram_interval < hits_allowed_end_time) ensuring histogram buckets that span beyond the query end time are properly excluded.

  • Cache time range adjustment for histograms: Added need_adjust_end_time flag in write_results to track when end time was aligned to histogram boundaries, then restore the full interval at cache write time to ensure complete buckets are stored.

  • Partition generation threshold change: Relaxed the partition alignment condition from duration > self.min_step * 3 to duration > self.min_step, allowing alignment for smaller time ranges.

  • Removed cache file deletion: Removed the mismatched-interval cache file deletion logic from do_partitioned_search. Per PR description, the new approach uses only aggregation cache files with intervals not greater than the search interval rather than deleting old files.

  • Added PartialOrd to Interval enum: Enables interval comparison for the new cache selection strategy.

Confidence Score: 4/5

  • This PR is safe to merge with thorough testing of histogram caching edge cases
  • The changes are well-targeted fixes for histogram result caching issues. The logic changes are sound - accounting for histogram bucket widths in cache hit filtering and delta calculations. The removal of cache file deletion in favor of interval comparison is a cleaner approach. Minor concern around the relaxed partition alignment threshold which may need verification.
  • Pay attention to src/service/search/cache/mod.rs and src/service/search/partition.rs - verify the time range adjustment and partition threshold changes work correctly for edge cases with small durations.

Important Files Changed

File Analysis

Filename Score Overview
src/config/src/meta/search.rs 5/5 Added PartialOrd derive to Interval enum to enable interval comparison for cache selection logic.
src/service/search/cache/cacher.rs 4/5 Fixed histogram result cache by adding histogram_interval parameter to calculate_deltas, adjusting hit filtering to include interval width, and conditionally pushing cached responses only when data exists.
src/service/search/cache/mod.rs 4/5 Improved cache write logic with need_adjust_end_time flag, renamed variables for clarity, added logging, and adjusted time range handling for histogram aggregates.
src/service/search/mod.rs 5/5 Added cache_interval parameter to discover_cache_for_query call and improved logging format.
src/service/search/partition.rs 4/5 Changed partition alignment condition from duration > self.min_step * 3 to duration > self.min_step for smaller durations, moved mini partition calculation inside conditional block.
src/service/search/streaming/cache.rs 5/5 Enhanced logging with timing information and trace_id formatting consistency.
src/service/search/streaming/execution.rs 4/5 Removed mismatched-interval cache file deletion logic and enhanced partition logging with actual partition values.
src/service/search/streaming/mod.rs 5/5 Fixed remaining_query_range calculation using + 1 instead of std::cmp::max(1, ...), changed debug log to info level, added timing log.

Sequence Diagram

sequenceDiagram
    participant Client
    participant SearchService
    participant CacheChecker as Cache Checker
    participant CacheWriter as Cache Writer
    participant PartitionGen as Partition Generator

    Client->>SearchService: Search Request (histogram)
    SearchService->>CacheChecker: check_cache(query, histogram_interval)
    CacheChecker->>CacheChecker: get_cached_results()
    
    alt Cache Hit
        CacheChecker->>CacheChecker: Filter hits by interval-adjusted boundaries
        CacheChecker->>CacheChecker: calculate_deltas(histogram_interval)
        CacheChecker-->>SearchService: CachedQueryResponse + deltas
    else Cache Miss
        CacheChecker-->>SearchService: Empty response
    end

    SearchService->>PartitionGen: generate_partitions_aligned_with_histogram_interval()
    PartitionGen->>PartitionGen: Align start/end to min_step boundaries
    PartitionGen-->>SearchService: Partitions

    SearchService->>SearchService: Execute search for delta partitions
    
    SearchService->>CacheWriter: write_results()
    CacheWriter->>CacheWriter: Align time range to histogram interval
    CacheWriter->>CacheWriter: Filter hits within accept range
    CacheWriter->>CacheWriter: Adjust end_time if need_adjust_end_time
    CacheWriter->>CacheWriter: Cache to disk with aligned timestamps

    SearchService-->>Client: Merged results (cached + fresh)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@hengfeiyang hengfeiyang added the Needs-Testing Needs-Testing label Dec 4, 2025
@hengfeiyang hengfeiyang marked this pull request as ready for review December 4, 2025 14:44
@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

Failed to generate code suggestions for PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes result caching issues for histogram queries by improving cache management and time range alignment. The main changes include:

  • Removing the deletion of mismatched-interval aggregation cache files (now filtering is done during cache discovery instead)
  • Fixing cache time range calculations for histogram queries to properly handle interval boundaries
  • Adjusting partition generation logic to create alignment partitions for smaller durations
  • Adding the cache_interval parameter to discover_cache_for_query to enable interval-based cache filtering
  • Improving logging consistency and adding performance timing metrics

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/service/search/streaming/mod.rs Enhanced logging (debug→info), fixed remaining_query_range calculation, added search completion timing
src/service/search/streaming/execution.rs Removed mismatched-interval cache deletion logic, improved partition logging
src/service/search/streaming/cache.rs Added timing metrics and improved log formatting
src/service/search/partition.rs Changed histogram partition alignment threshold from 3*min_step to min_step, moved mini_partition_size calculation
src/service/search/mod.rs Updated log message formatting, added cache_interval parameter to discover_cache_for_query call
src/service/search/cache/mod.rs Fixed cache time range adjustment for histograms, improved variable naming (cached_responses vs search_responses), added timing logs
src/service/search/cache/cacher.rs Enhanced histogram interval handling, fixed delta calculation for histogram queries, improved conditional caching logic
src/config/src/meta/search.rs Added PartialOrd trait to Interval enum for interval comparisons

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hengfeiyang hengfeiyang merged commit f4edc78 into main Dec 5, 2025
52 of 57 checks passed
@hengfeiyang hengfeiyang deleted the fix/result-cache branch December 5, 2025 00:39
hengfeiyang added a commit that referenced this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working Needs-Testing Needs-Testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants