Skip to content

Conversation

@Loaki07
Copy link
Contributor

@Loaki07 Loaki07 commented Oct 7, 2025

ref: #8731

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR implements caching support for histogram queries with non-timestamp column ordering (e.g., `ORDER BY count DESC`). Previously, the cache system only supported queries ordered by timestamp columns, which excluded many useful histogram visualizations.

The implementation introduces a new boolean field is_histogram_non_ts_order to the CacheQueryRequest struct that identifies when a histogram query is ordered by non-timestamp columns. When this flag is true, the cache system switches from the optimized first/last record timestamp extraction to scanning all hits to determine accurate time ranges, since results may not be chronologically ordered.

The changes span multiple layers of the caching architecture:

  • Query Planning: Logic to detect histogram queries (histogram_interval > 0) with non-timestamp ORDER BY clauses
  • Cache Request Propagation: The new flag is forwarded through cluster caching, local caching, and websocket event handling
  • Timestamp Extraction: Enhanced extract_timestamp_range() function that chooses between fast (first/last) and comprehensive (scan all) approaches based on ordering
  • Utility Functions: New helpers for quote-aware timestamp field detection and non-timestamp ordering identification

This enables proper caching of histogram queries commonly used in dashboards and analytics, such as SELECT histogram(_timestamp), count(*) ... ORDER BY count DESC, which are inherently time-based but sorted by aggregated values.

Important Files Changed

Changed Files
Filename Score Overview
src/handler/grpc/request/query_cache.rs 5/5 Adds is_histogram_non_ts_order field hardcoded to false in gRPC handlers with proper comments
src/service/search/cluster/cacher.rs 4/5 Forwards the new histogram flag from original request to local cache lookup
src/common/meta/search.rs 5/5 Adds well-documented is_histogram_non_ts_order boolean field to CacheQueryRequest struct
src/service/search/cache/multi.rs 3/5 Implements time-ordered vs non-time-ordered timestamp extraction logic for histogram queries
src/service/search/cache/cacher.rs 4/5 Adds histogram query detection and exception logic to bypass timestamp ordering requirements
src/service/search/cache/result_utils.rs 4/5 Introduces utility functions for timestamp field detection and range extraction with unwrap calls
src/service/search/cache/mod.rs 1/5 Critical issue: references undefined write_res variable instead of res
src/service/websocket_events/search.rs 4/5 Adds histogram query detection for websocket search result caching
src/service/search/cluster/cache_multi.rs 1/5 Missing is_histogram_non_ts_order field in gRPC protobuf message breaks cluster functionality

Confidence score: 1/5

  • This PR has critical implementation issues that will cause immediate failures in production
  • Score lowered due to undefined variable reference and incomplete gRPC protocol definition
  • Pay close attention to src/service/search/cache/mod.rs and src/service/search/cluster/cache_multi.rs which contain blocking issues

Sequence Diagram

sequenceDiagram
    participant User
    participant SearchHandler
    participant CacheChecker
    participant SQLParser
    participant CacheStore
    participant DataFetcher
    participant ResultWriter

    User->>SearchHandler: "Submit search query with histogram and ORDER BY count DESC"
    SearchHandler->>SQLParser: "Parse SQL query"
    SQLParser->>SQLParser: "Detect histogram query with non-timestamp ORDER BY"
    SQLParser-->>SearchHandler: "Return parsed SQL with is_histogram_non_ts_order=true"
    
    SearchHandler->>CacheChecker: "check_cache()"
    CacheChecker->>CacheChecker: "Determine if histogram query with non-ts ordering"
    CacheChecker->>CacheStore: "Search for existing cache entries"
    
    alt Cache Hit Found
        CacheStore-->>CacheChecker: "Return cached results"
        CacheChecker->>CacheChecker: "Filter cached results by time range"
        CacheChecker->>CacheChecker: "Sort results using stored ORDER BY metadata"
        CacheChecker-->>SearchHandler: "Return cached response with deltas"
    else No Cache Hit
        CacheChecker-->>SearchHandler: "Return empty cache response"
    end
    
    alt Need to fetch new data (deltas exist)
        SearchHandler->>DataFetcher: "Execute search for missing time ranges"
        DataFetcher-->>SearchHandler: "Return search results"
        SearchHandler->>SearchHandler: "Merge cached and new results"
    end
    
    SearchHandler->>ResultWriter: "write_results_v2() with is_histogram_non_ts_order flag"
    ResultWriter->>ResultWriter: "Extract timestamp range (scan all hits for non-time-ordered)"
    ResultWriter->>ResultWriter: "Apply boundary deduplication for aggregates"
    ResultWriter->>CacheStore: "Store results to disk with metadata"
    CacheStore-->>ResultWriter: "Confirm cache storage"
    
    SearchHandler-->>User: "Return final search results"
Loading

Additional Comments (3)

  1. src/service/search/cluster/cacher.rs, line 76-86 (link)

    logic: Missing is_histogram_non_ts_order field in gRPC request - should be included for consistency with local cache call

  2. src/service/search/cache/multi.rs, line 254 (link)

    logic: Logic error: function returns early when remaining_metas is not empty, preventing recursive processing

  3. src/service/search/cluster/cache_multi.rs, line 81-91 (link)

    logic: Missing is_histogram_non_ts_order field in gRPC request. The protobuf QueryCacheRequest doesn't include this field, causing remote nodes to always receive false for this flag.

9 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 204 to 206
if !found_ts_order && is_histogram_query {
is_descending = false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: This fallback logic inside the loop will execute on every iteration. Move outside the loop to avoid redundant execution.

Suggested change
if !found_ts_order && is_histogram_query {
is_descending = false;
}
// For histogram queries, if ORDER BY is not on the histogram/timestamp column,
// we should still cache based on the histogram's inherent time ordering
// Check if any order_by field matches the result_ts_col
let mut found_ts_order = false;
for (field, order) in &order_by {
if is_timestamp_field(field, &result_ts_col) {
is_descending = order == &OrderBy::Desc;
found_ts_order = true;
break;
}
}
// For histogram queries ordered by non-timestamp columns (e.g., ORDER BY count),
// use ascending as default
if !found_ts_order && is_histogram_query {
is_descending = false;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/service/search/cache/cacher.rs
Line: 204:206

Comment:
**logic:** This fallback logic inside the loop will execute on every iteration. Move outside the loop to avoid redundant execution.

```suggestion
            // For histogram queries, if ORDER BY is not on the histogram/timestamp column,
        // we should still cache based on the histogram's inherent time ordering
        // Check if any order_by field matches the result_ts_col
        let mut found_ts_order = false;
        for (field, order) in &order_by {
            if is_timestamp_field(field, &result_ts_col) {
                is_descending = order == &OrderBy::Desc;
                found_ts_order = true;
                break;
            }
        }

        // For histogram queries ordered by non-timestamp columns (e.g., ORDER BY count),
        // use ascending as default
        if !found_ts_order && is_histogram_query {
            is_descending = false;
        }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +90 to +91
let first_ts = get_ts_value(ts_column, hits.first().unwrap());
let last_ts = get_ts_value(ts_column, hits.last().unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Calling unwrap() on hits.first() and hits.last() without checking if the slice is empty could panic. Consider handling the empty case explicitly.

Suggested change
let first_ts = get_ts_value(ts_column, hits.first().unwrap());
let last_ts = get_ts_value(ts_column, hits.last().unwrap());
if hits.is_empty() {
return (0, 0);
}
let first_ts = get_ts_value(ts_column, hits.first().unwrap());
let last_ts = get_ts_value(ts_column, hits.last().unwrap());

Context Used: Context from dashboard - Avoid using expect with potentially failing operations; instead, handle the None case to prevent... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/service/search/cache/result_utils.rs
Line: 90:91

Comment:
**logic:** Calling `unwrap()` on `hits.first()` and `hits.last()` without checking if the slice is empty could panic. Consider handling the empty case explicitly.

```suggestion
        if hits.is_empty() {
            return (0, 0);
        }
        let first_ts = get_ts_value(ts_column, hits.first().unwrap());
        let last_ts = get_ts_value(ts_column, hits.last().unwrap());
```

**Context Used:** Context from `dashboard` - Avoid using `expect` with potentially failing operations; instead, handle the `None` case to prevent... ([source](https://app.greptile.com/review/custom-context?memory=b18dc58c-d445-4b6e-9438-bbf3b82397ca))

How can I resolve this? If you propose a fix, please make it concise.

Signed-off-by: loaki07 <[email protected]>
Signed-off-by: loaki07 <[email protected]>
@Loaki07 Loaki07 merged commit cc72425 into branch-v0.14.6-rc7 Oct 7, 2025
46 of 50 checks passed
@Loaki07 Loaki07 deleted the feat/cache_hist_with_non_ts_col_order_by_rc7 branch October 7, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants