Skip to content

Conversation

@Loaki07
Copy link
Contributor

@Loaki07 Loaki07 commented Oct 17, 2025

User description

ref: #8731


PR Type

Enhancement, Bug fix


Description

  • Cache support for histogram with non-TS ORDER BY

  • Scan all hits to compute time range

  • Preserve ORDER BY metadata in cache writes

  • Refactor utils for ORDER BY timestamp checks


Diagram Walkthrough

flowchart LR
  SQL["SQL parse (histogram, ORDER BY)"]
  Check["check_cache: detect non-TS histogram order"]
  Req["CacheQueryRequest + is_histogram_non_ts_order"]
  Utils["result_utils: is_timestamp_field, has_non_timestamp_ordering, extract_timestamp_range"]
  Write["write_results: scan hits for min/max when needed"]
  WS["streaming cache: propagate flag"]
  Mod["search mod: cache write integration"]

  SQL -- "order_by, ts_column" --> Check
  Check -- "set flag" --> Req
  Check -- "use timestamp detection" --> Utils
  Req -- "persist flag" --> Write
  Mod -- "compute is_histogram_non_ts_order" --> Write
  WS -- "compute and pass flag" --> Write
  Write -- "min/max via Utils" --> Utils
Loading

File Walkthrough

Relevant files
Enhancement
search.rs
Add non-TS histogram order flag to request                             

src/common/meta/search.rs

  • Add is_histogram_non_ts_order to CacheQueryRequest
  • Document behavior requiring full-hit scan
+5/-0     
cacher.rs
Detect and flag non-TS ordered histograms                               

src/service/search/cache/cacher.rs

  • Import new utils for ORDER BY checks
  • Allow histogram caching without TS ORDER BY
  • Determine is_histogram_non_ts_order
  • Propagate flag in cache write paths
+33/-3   
mod.rs
Compute and pass histogram order flag to writer                   

src/service/search/cache/mod.rs

  • Import extract_timestamp_range
  • Compute is_histogram_non_ts_order on write
  • Pass flag to write_results
+18/-5   
result_utils.rs
Utilities for timestamp/order checks and range extraction

src/service/search/cache/result_utils.rs

  • Add is_timestamp_field helper
  • Add has_non_timestamp_ordering
  • Add extract_timestamp_range with scan fallback
+59/-0   
cache.rs
Streaming cache supports non-TS histogram ordering             

src/service/search/streaming/cache.rs

  • Detect non-TS histogram order in streaming path
  • Pass flag to cache write_results
+10/-0   
Miscellaneous
mod.rs
Hardcode partition timing and count                                           

src/service/search/mod.rs

  • Force total_secs = 4 (debug change)
  • Force part_num = 10 (debug change)
+2/-0     

@github-actions github-actions bot changed the title feat: cache hist query with non ts col order by feat: cache hist query with non ts col order by Oct 17, 2025
@github-actions
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

8731 - Partially compliant

Compliant requirements:

  • Add an is_histogram_non_ts_order flag to CacheQueryRequest
  • Compute is_histogram_non_ts_order in check_cache() when histogram + non-timestamp ORDER BY
  • Propagate is_histogram_non_ts_order through all cache write paths (main, websocket)
  • When writing cache files for histogram non-ts ORDER BY, scan all hits to get min/max timestamps
  • Preserve ORDER BY metadata in cached responses to ensure correct sorting
  • Refactor naming from is_non_ts_histogram to is_histogram_non_ts_order

Non-compliant requirements:

  • Propagate is_histogram_non_ts_order through cluster cache path (not visible in diff)
  • Fix get_allowed_up_to() to scan all hits for max timestamp (discard_ts calculation) (not shown)
  • Fix response_start_time/response_end_time computation in cache retrieval to scan all hits (not shown)
  • Update retain filter boundary conditions to be inclusive where needed (not evident)
  • Provide tests/verification for various histogram + non-ts ORDER BY cases and regressions (no tests in diff)

Requires further human verification:

  • Validate cluster cache path propagation if applicable
  • Manually verify cache hit/miss behavior and ordering across queries listed in the ticket
  • Performance check: scanning all hits does not cause significant overhead on large datasets
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

Hardcoded overrides for total_secs and part_num (set to 4 and 10) will alter partitioning behavior globally and may be unintended debug code; verify and remove or guard by config/feature flag.

total_secs = 4;

// If total secs is <= aggs_min_num_partition_secs (default 3 seconds), then disable
// partitioning even if streaming aggs is true. This optimization avoids partition overhead
// for fast queries.
#[cfg(feature = "enterprise")]
if is_streaming_aggregate && total_secs <= cfg.limit.aggs_min_num_partition_secs {
    log::info!(
        "[trace_id {trace_id}] Disabling streaming aggregation: total_secs ({}) <= aggs_min_num_partition_secs ({}), returning single partition",
        total_secs,
        cfg.limit.aggs_min_num_partition_secs
    );
    resp.partitions = vec![[req.start_time, req.end_time]];
    resp.streaming_aggs = false;
    resp.streaming_id = None;
    return Ok(resp);
}

let mut part_num = max(1, total_secs / cfg.limit.query_partition_by_secs);
if part_num * cfg.limit.query_partition_by_secs < total_secs {
    part_num += 1;
}
// if the partition number is too large, we limit it to 1000
if part_num > 1000 {
    part_num = 1000;
}
part_num = 10;
Logic Correctness

Determination of is_histogram_non_ts_order relies on has_non_timestamp_ordering over entire ORDER BY; ensure mixed ORDER BY with timestamp first is handled as time-ordered per ticket expectations.

// Determine if this is a histogram query with non-timestamp ORDER BY.
// These queries need special handling because results may not be time-ordered,
// requiring us to scan all hits to find the actual time range.
let is_histogram_non_ts_order = is_histogram_query
    && !order_by.is_empty()
    && has_non_timestamp_ordering(&order_by, &result_ts_col);
Robustness

extract_timestamp_range uses unwrap() on first/last when is_time_ordered; callers should ensure non-empty hits, but add defensive checks or document strict preconditions.

    if is_time_ordered {
        // Fast path: results are sorted by time, just check first and last
        let first_ts = get_ts_value(ts_column, hits.first().unwrap());
        let last_ts = get_ts_value(ts_column, hits.last().unwrap());
        (first_ts.min(last_ts), first_ts.max(last_ts))
    } else {
        // Slow path: results are not time-ordered, must scan all hits
        hits.iter()
            .map(|hit| get_ts_value(ts_column, hit))
            .fold((i64::MAX, i64::MIN), |(min_ts, max_ts), ts| {
                (min_ts.min(ts), max_ts.max(ts))
            })
    }
}

@github-actions
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove hardcoded partition seconds

Avoid overriding total_secs with a hardcoded value, which disables dynamic partition
duration and can severely impact performance or correctness for large/small queries.
Use the calculated value to preserve adaptive behavior.

src/service/search/mod.rs [889]

-total_secs = 4;
+// keep dynamically computed total_secs; remove hardcoded override
+// total_secs remains as computed above
Suggestion importance[1-10]: 9

__

Why: The hardcoded total_secs = 4; at line 889 overrides computed logic and likely degrades correctness/performance; removing it restores adaptive behavior and aligns with surrounding code.

High
Remove hardcoded partition count

Do not force part_num to a fixed value; this bypasses safeguards (like the 1000 cap)
and can underutilize resources or overload them depending on data size. Keep the
computed part_num to maintain scalability and throughput.

src/service/search/mod.rs [915]

-part_num = 10;
+// keep dynamically computed part_num; remove hardcoded override
+// part_num remains as computed above (with existing caps)
Suggestion importance[1-10]: 9

__

Why: Forcing part_num = 10; at line 915 ignores calculated partitioning and caps, risking under/over-utilization; reverting preserves scalability and intended safeguards.

High
Guard against invalid timestamp range

Handle the case where all hits lack a valid timestamp to avoid returning (i64::MAX,
i64::MIN), which can break later comparisons. Default to using the first/last
timestamps or clamp to request bounds when no valid timestamps are found.

src/service/search/cache/result_utils.rs [82-87]

-hits.iter()
+let (min_ts, max_ts) = hits
+    .iter()
     .map(|hit| get_ts_value(ts_column, hit))
+    .filter(|ts| *ts != 0)
     .fold((i64::MAX, i64::MIN), |(min_ts, max_ts), ts| {
         (min_ts.min(ts), max_ts.max(ts))
-    })
+    });
+if min_ts == i64::MAX || max_ts == i64::MIN {
+    // Fallback: use first/last values (may be 0) to avoid sentinel extremes
+    let first_ts = get_ts_value(ts_column, hits.first().unwrap());
+    let last_ts = get_ts_value(ts_column, hits.last().unwrap());
+    (first_ts.min(last_ts), first_ts.max(last_ts))
+} else {
+    (min_ts, max_ts)
+}
Suggestion importance[1-10]: 7

__

Why: The proposed handling prevents returning (i64::MAX, i64::MIN) when all timestamps are invalid by filtering zeros and falling back, improving robustness; moderate impact since upstream should usually ensure valid timestamps.

Medium

@testdino-playwright-reporter
Copy link

⚠️ Test Run Unstable


Author: Loaki07 | Branch: feat/cache_hist_with_non_ts_col_order_by_main | Commit: e127558

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
All tests passed 364 340 0 19 5 93% 4m 47s

View Detailed Results

@testdino-playwright-reporter
Copy link

⚠️ Test Run Unstable


Author: Loaki07 | Branch: feat/cache_hist_with_non_ts_col_order_by_main | Commit: 428d4dc

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
All tests passed 364 344 0 19 1 95% 4m 40s

View Detailed Results

@testdino-playwright-reporter
Copy link

⚠️ Test Run Unstable


Author: Loaki07 | Branch: feat/cache_hist_with_non_ts_col_order_by_main | Commit: 3420129

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
1 test failed 292 271 1 17 3 93% 5m 18s

Test Failure Analysis

  1. dashboard.spec.js: Timeout issues during chart rendering due to network delays
    1. dashboard UI testcases should update the line chart correctly when used camel case in custom sql query: Timeout waiting for page to load, exceeding 60 seconds.

Root Cause Analysis

  • The timeout error in the test is likely related to recent changes in the dashboard rendering logic, as indicated by the extensive modifications in the dashboard.spec.js file.

Recommended Actions

  1. Increase the timeout duration in the test to accommodate longer load times.
  2. Optimize the SQL query used in the test to improve performance.
  3. Investigate network conditions or server response times during test execution.

View Detailed Results

@testdino-playwright-reporter
Copy link

⚠️ Test Run Unstable


Author: Loaki07 | Branch: feat/cache_hist_with_non_ts_col_order_by_main | Commit: fad012a

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
All tests passed 365 343 0 19 3 94% 4m 39s

View Detailed Results

@Loaki07 Loaki07 marked this pull request as ready for review October 27, 2025 10:05
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds caching support for histogram queries with non-timestamp column ORDER BY (e.g., ORDER BY count DESC). When such queries are detected, the cache system scans all hits to find min/max timestamps instead of using the fast first/last optimization.

Key Changes:

  • Added is_histogram_non_ts_order flag to CacheQueryRequest to track histogram queries ordered by non-timestamp columns
  • Created utility functions is_timestamp_field(), has_non_timestamp_ordering(), and extract_timestamp_range() in result_utils.rs
  • Modified cache check logic in cacher.rs to allow histogram queries even when ORDER BY is not on timestamp
  • Updated write_results() to scan all hits for time range when is_histogram_non_ts_order=true
  • Propagated flag through both standard and streaming cache write paths

Critical Issue Found:

  • Logic inconsistency between cache check path (cacher.rs:213-215) and cache write paths (mod.rs:427-433, streaming/cache.rs:84-90)
  • Cache check uses has_non_timestamp_ordering() which checks if ANY ORDER BY field is non-timestamp (.any())
  • Cache write paths only check if the FIRST ORDER BY field is non-timestamp (.first())
  • For multi-column ORDER BY like ORDER BY _timestamp, count DESC, this creates a mismatch that could cause incorrect caching behavior

Confidence Score: 2/5

  • This PR has a critical logic bug that will cause cache inconsistencies for multi-column ORDER BY clauses
  • The inconsistent logic between cache check and cache write paths is a blocking issue. While the overall approach is sound, the .any() vs .first() mismatch in ORDER BY field checking will cause the cache check to flag queries as non-time-ordered when they're actually time-ordered (if timestamp is first), leading to unnecessary full-scan overhead and potential cache key mismatches
  • Pay close attention to src/service/search/cache/cacher.rs and src/service/search/cache/result_utils.rs - the logic bug must be fixed before merge

Important Files Changed

File Analysis

Filename Score Overview
src/service/search/cache/cacher.rs 2/5 Added histogram non-ts ORDER BY detection with logic mismatch bug between cache check and write paths
src/service/search/cache/result_utils.rs 2/5 Added utility functions for timestamp range extraction, but has_non_timestamp_ordering() uses wrong logic (any vs first)
src/service/search/cache/mod.rs 4/5 Correctly checks only first ORDER BY field for histogram non-ts ordering, uses extract_timestamp_range() for time range calculation
src/service/search/streaming/cache.rs 4/5 Correctly checks only first ORDER BY field for histogram non-ts ordering in streaming path
src/common/meta/search.rs 5/5 Added well-documented is_histogram_non_ts_order flag to CacheQueryRequest struct

Sequence Diagram

sequenceDiagram
    participant Client
    participant Cacher as cache/cacher.rs
    participant Utils as cache/result_utils.rs
    participant Mod as cache/mod.rs
    participant Stream as streaming/cache.rs
    participant Disk as Cache Storage

    Client->>Cacher: check_cache()
    Cacher->>Cacher: Detect is_histogram_query
    Cacher->>Utils: has_non_timestamp_ordering(order_by, ts_col)
    Note over Utils: BUG: Uses .any() checks ALL fields
    Utils-->>Cacher: is_histogram_non_ts_order
    Cacher->>Cacher: Set flag in CacheQueryRequest
    Cacher->>Disk: Check for cached results
    
    alt Cache Miss - Execute Query
        Cacher-->>Mod: Execute query
        Mod->>Mod: Get results with order_by_metadata
        Mod->>Mod: Compute is_histogram_non_ts_order
        Note over Mod: Only checks FIRST field (.first())
        Mod->>Utils: extract_timestamp_range(hits, is_time_ordered)
        Utils->>Utils: Scan all hits if !is_time_ordered
        Utils-->>Mod: (min_ts, max_ts)
        Mod->>Disk: write_results(is_histogram_non_ts_order)
    else Streaming Path
        Stream->>Stream: Compute is_histogram_non_ts_order
        Note over Stream: Only checks FIRST field (.first())
        Stream->>Mod: write_results()
        Mod->>Utils: extract_timestamp_range()
        Utils-->>Mod: (min_ts, max_ts)
        Mod->>Disk: write_results()
    end
    
    Note over Cacher,Stream: INCONSISTENCY: cacher uses .any(), mod/stream use .first()
Loading

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@Loaki07 Loaki07 merged commit 875d1f3 into main Oct 28, 2025
44 of 48 checks passed
@Loaki07 Loaki07 deleted the feat/cache_hist_with_non_ts_col_order_by_main branch October 28, 2025 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants