Skip to content

Conversation

@Loaki07
Copy link
Contributor

@Loaki07 Loaki07 commented Nov 29, 2025

How I run my query
1. nov 26 10:30 - 11:30 - interval 30 mins
2. nov 26 10:30 - 16:30 - interval 60 mins
3. nov 26 10:30 - 16:30 - interval 60 mins -- refresh cache

Cache dir
1. on first run
- has 2 caches 30 mins
2. on 2nd run
- will havve mixed 30mins cache and 60 mins cache

TODO:
-  [x] Problem 1:
on step 3
- the interval for that duration is 60 mins
- but the discovery of cache will also include the 30 mins cache
[@mod.rs](file:///Users/lin/o2/openobserve/src/service/search/mod.rs)
`discover_cache_for_query`
- the overwrite will over write the 60 mins cache only. How to address
this problem?
- [x] Problem 2:
- fix: add rbac control only user with permussion to edit stream can
`refresh cache`

Proposed Solution
- On `refresh_cache` run streaming aggs without using cache
- at `streaming/execution.rs` , at the end of steaming aggs delete all
the cache files for the given duration when there are miss matched cache
interval files
- then remove the `streaming_id` from the `GLOBAL_CACHE`
- eg: nov 26 10:30 - 16:30 - interval 60 mins -- refresh cache
- Cache consists of 30mins or 4hr interval files all of them will be
deleted
- 60 mins cache interval files will be overwritten or created a new.
- This way on the next run there will be no query results mismatch when
the query runs again with the newly updated cache, especially delayed
data for the same duration.
- add rbac control to only allow users with edit permissions to stream
to use this feature
@github-actions github-actions bot added the ☢️ Bug Something isn't working label Nov 29, 2025
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 29, 2025

Greptile Overview

Greptile Summary

Fixes cache inconsistency issue when queries with different histogram intervals run against the same streaming aggregation. The PR changes the cache refresh strategy from "delete-then-write" to "overwrite-then-cleanup-mismatched", ensuring cache files with the target interval are preserved while removing incompatible intervals.

Key changes:

  • Adds RBAC control requiring stream edit (PUT) permission for cache refresh operations
  • Disables cache discovery when overwrite_cache=true to force fresh computation
  • Deletes mismatched-interval cache files post-execution rather than pre-execution
  • Propagates overwrite_cache/clear_cache flags through the distributed execution pipeline
  • Deprecates the old clear_streaming_agg_cache function

Note: The implementation includes a documented race condition where concurrent queries with different intervals during cache refresh may see partial results. This is an accepted trade-off for simplicity.

Confidence Score: 4/5

  • Safe to merge with minor style improvements possible
  • The core logic correctly addresses the cache inconsistency problem by overwriting then cleaning up mismatched intervals. RBAC implementation properly restricts cache refresh to authorized users. The documented race condition is acknowledged and appears acceptable for this use case. Minor inefficiency with duplicate function call can be optimized.
  • Pay attention to src/service/search/streaming/execution.rs for the cache cleanup logic and race condition behavior in production

Important Files Changed

File Analysis

Filename Score Overview
src/service/search/streaming/execution.rs 4/5 Replaces early cache deletion with post-execution cleanup of mismatched-interval cache files; removes deprecated clear_streaming_agg_cache function
src/handler/http/request/search/search_stream.rs 5/5 Adds RBAC check requiring PUT (edit) permissions for cache refresh operations
src/service/search/mod.rs 5/5 Adds use_aggs_cache parameter to disable cache discovery when overwriting cache
src/config/src/datafusion/request.rs 5/5 Adds overwrite_cache field to Request struct to track cache refresh intent

Sequence Diagram

sequenceDiagram
    participant User
    participant HTTP Handler
    participant RBAC
    participant Search Service
    participant Streaming Exec
    participant Cache System
    participant GLOBAL_CACHE

    User->>HTTP Handler: POST /search?clear_cache=true
    HTTP Handler->>RBAC: check_resource_permissions(PUT)
    alt No edit permission
        RBAC-->>User: 403 Unauthorized
    end
    RBAC-->>HTTP Handler: Permission granted
    HTTP Handler->>HTTP Handler: Set req.clear_cache=true, req.use_cache=false
    HTTP Handler->>Search Service: search(req)
    Search Service->>Search Service: Create Request with overwrite_cache=true
    Search Service->>Search Service: search_partition(use_aggs_cache=false)
    Note over Search Service: Skip cache discovery since<br/>use_aggs_cache=false
    Search Service->>Streaming Exec: do_partitioned_search(clear_cache=true)
    Streaming Exec->>Streaming Exec: Execute query WITHOUT using cache
    Streaming Exec->>Cache System: Write new cache files (60min interval)
    Note over Streaming Exec,GLOBAL_CACHE: After query completes
    Streaming Exec->>GLOBAL_CACHE: Get cache_file_path & target_interval
    Streaming Exec->>Cache System: delete_mismatched_interval_cache_files()
    Note over Cache System: Delete 30min & 4hr interval files<br/>Keep only 60min files
    Cache System-->>Streaming Exec: deleted_count
    Streaming Exec->>GLOBAL_CACHE: remove_cache(streaming_id)
    Streaming Exec-->>User: Query results with fresh cache
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@Loaki07 Loaki07 merged commit 70454dc into branch-v0.20.0 Nov 29, 2025
59 of 68 checks passed
@Loaki07 Loaki07 deleted the fix/streaming_aggs_refresh_cache_v0.20.0 branch November 29, 2025 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants