-
Notifications
You must be signed in to change notification settings - Fork 715
fix: overwrite aggs cache instead of delete and rewrite cache #9390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Failed to generate code suggestions for PR |
Greptile OverviewGreptile SummaryThis PR adds a Key Changes:
Critical Issue:
Confidence Score: 2/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Client
participant HTTP/gRPC Handler
participant SearchService
participant CacheDiscovery
participant QueryExecution
Client->>HTTP/gRPC Handler: Request with use_cache=false
Note over HTTP/gRPC Handler: SearchPartitionRequest missing use_cache field
HTTP/gRPC Handler->>SearchService: search_partition(..., use_aggs_cache=true)
Note over HTTP/gRPC Handler: ⚠️ Hardcoded true ignores user preference
alt use_aggs_cache=false
SearchService->>SearchService: Create empty CacheDiscoveryResult
Note over SearchService: Skip cache lookup entirely
else use_aggs_cache=true
SearchService->>CacheDiscovery: discover_cache_for_query()
CacheDiscovery-->>SearchService: Return cached ranges
end
SearchService->>QueryExecution: Execute query with/without cache
QueryExecution-->>SearchService: Results
SearchService-->>HTTP/gRPC Handler: Response
HTTP/gRPC Handler-->>Client: Search results
Note over Client,QueryExecution: ✅ streaming/execution.rs correctly uses req.use_cache
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 2 comments
req.use_cache=falseHow I run my query 1. nov 26 10:30 - 11:30 - interval 30 mins 2. nov 26 10:30 - 16:30 - interval 60 mins 3. nov 26 10:30 - 16:30 - interval 60 mins -- refresh cache Cache dir 1. on first run - has 2 caches 30 mins 2. on 2nd run - will havve mixed 30mins cache and 60 mins cache TODO: - [x] Problem 1: on step 3 - the interval for that duration is 60 mins - but the discovery of cache will also include the 30 mins cache [@mod.rs](file:///Users/lin/o2/openobserve/src/service/search/mod.rs) `discover_cache_for_query` - the overwrite will over write the 60 mins cache only. How to address this problem? - [x] Problem 2: - fix: add rbac control only user with permussion to edit stream can `refresh cache` Proposed Solution - On `refresh_cache` run streaming aggs without using cache - at `streaming/execution.rs` , at the end of steaming aggs delete all the cache files for the given duration when there are miss matched cache interval files - then remove the `streaming_id` from the `GLOBAL_CACHE` - eg: nov 26 10:30 - 16:30 - interval 60 mins -- refresh cache - Cache consists of 30mins or 4hr interval files all of them will be deleted - 60 mins cache interval files will be overwritten or created a new. - This way on the next run there will be no query results mismatch when the query runs again with the newly updated cache, especially delayed data for the same duration. - add rbac control to only allow users with edit permissions to stream to use this feature
How I run my query
Cache dir
TODO:
on step 3
discover_cache_for_queryrefresh cacheProposed Solution
refresh_cacherun streaming aggs without using cachestreaming/execution.rs, at the end of steaming aggs delete all the cache files for the given duration when there are miss matched cache interval filesstreaming_idfrom theGLOBAL_CACHE