Skip to content

Conversation

@hengfeiyang
Copy link
Contributor

@hengfeiyang hengfeiyang commented Oct 22, 2025

  • perf: no need watch data retention events
  • perf: cache latest version if get from db
  • fix: cache missing of schema

@github-actions github-actions bot added ☢️ Bug Something isn't working Review effort 2/5 labels Oct 22, 2025
@github-actions
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Logic Regression Risk

Switching to conditional need_watch based on date_range.is_none() changes semantics for partial deletions. Verify that any components relying on watch notifications (e.g., workers, UI, metrics) do not require events for partial deletions; otherwise, those flows may stop reacting to partial delete tasks.

// only watch if deleting all data
let need_watch = if date_range.is_none() {
    db::NEED_WATCH
} else {
    false
};
db::put(&db_key, "OK".into(), need_watch, None).await?;
Ok((key, true)) // return the key and true
Consistency Across Callers

The same watch condition is applied in both create and process paths; ensure all other code paths that write to /compact/delete/{key} follow the same rule to avoid mixed behavior where some writes watch and others do not for the same logical operation.

    // only watch if deleting all data
    let need_watch = if date_range.is_none() {
        db::NEED_WATCH
    } else {
        false
    };
    Ok(db::put(&db_key, node.to_string().into(), need_watch, None).await?)
}

@github-actions
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid downgrading existing watches

Preserve the existing watch flag when it is already enabled to avoid downgrading a
previously watched key on partial deletions. Only disable watching if it is
currently off and a date range is provided.

src/service/db/compact/retention.rs [64-70]

-// only watch if deleting all data
-let need_watch = if date_range.is_none() {
-    db::NEED_WATCH
-} else {
-    false
-};
+// only watch if deleting all data; do not downgrade an existing watch
+let current_watch = db::get_watch_flag(&db_key).await.unwrap_or(false);
+let need_watch = if current_watch { true } else { date_range.is_none() && db::NEED_WATCH };
 db::put(&db_key, "OK".into(), need_watch, None).await?;
Suggestion importance[1-10]: 3

__

Why: The lines match the new hunk and the idea could prevent unintentionally disabling an existing watch, but it introduces an undeclared API (db::get_watch_flag) and changes semantics without PR context, making it speculative and potentially incorrect.

Low
Preserve watch flag consistently

Align watch behavior across both code paths by keeping existing watch flags intact
to prevent race conditions where a partial deletion clears a needed watch. Compute
need_watch using the current stored flag plus the "full delete" condition.

src/service/db/compact/retention.rs [84-90]

-// only watch if deleting all data
-let need_watch = if date_range.is_none() {
-    db::NEED_WATCH
-} else {
-    false
-};
+// only watch if deleting all data; do not downgrade an existing watch
+let current_watch = db::get_watch_flag(&db_key).await.unwrap_or(false);
+let need_watch = if current_watch { true } else { date_range.is_none() && db::NEED_WATCH };
 Ok(db::put(&db_key, node.to_string().into(), need_watch, None).await?)
Suggestion importance[1-10]: 3

__

Why: The existing code lines are correctly identified, and keeping a prior watch might help avoid races, but it relies on a non-existent function and alters intended behavior; thus it is speculative with limited assured impact.

Low

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

Optimizes the data retention deletion system by disabling watch events for date-range-specific deletions while keeping them enabled for full stream deletions.

Key Changes:

  • Modified delete_stream and process_stream to conditionally set need_watch based on whether date_range is None (full deletion) or Some (partial deletion)
  • When deleting all stream data (date_range.is_none()), watch events are still triggered (need_watch = true) to ensure cluster-wide coordination
  • When deleting specific date ranges, watch events are now disabled (need_watch = false) to reduce unnecessary event overhead for routine retention operations

Performance Impact:

  • Reduces watch event traffic for date-range deletions, which are more frequent routine operations
  • Maintains critical watch events for full stream deletions, which require cluster coordination

Identified Issue:

  • delete_stream_done function (line 124) still unconditionally uses db::NEED_WATCH for all deletions, creating an inconsistency with the new pattern established in this PR

Confidence Score: 3/5

  • This PR is safe to merge with moderate risk due to an inconsistency that should be addressed
  • The optimization logic is sound and reduces unnecessary watch events for routine operations. However, there's an inconsistency: delete_stream_done (src/service/db/compact/retention.rs:124) still always uses db::NEED_WATCH regardless of date_range, which contradicts the new pattern. This won't cause failures but creates asymmetry in watch behavior between creation and completion of date-range deletions.
  • src/service/db/compact/retention.rs - The delete_stream_done function should apply the same conditional watch logic for consistency

Important Files Changed

File Analysis

Filename Score Overview
src/service/db/compact/retention.rs 3/5 Optimizes watch events for date-range deletions to reduce overhead, but has an inconsistency with delete_stream_done which still always watches

Sequence Diagram

sequenceDiagram
    participant Client
    participant RetentionAPI as retention::delete_stream
    participant DB as db::put
    participant Watcher as retention::watch
    participant Cache as CACHE
    
    Note over Client,Cache: Scenario 1: Delete ALL stream data
    Client->>RetentionAPI: delete_stream(date_range=None)
    RetentionAPI->>RetentionAPI: Check CACHE (rate limit)
    RetentionAPI->>Cache: Insert key with timestamp
    RetentionAPI->>DB: put(key, "OK", need_watch=true)
    DB->>Watcher: Trigger watch event (Put)
    Watcher->>Cache: Update CACHE
    
    Note over Client,Cache: Scenario 2: Delete specific date range
    Client->>RetentionAPI: delete_stream(date_range=Some((start, end)))
    RetentionAPI->>RetentionAPI: Check CACHE (rate limit)
    RetentionAPI->>Cache: Insert key with timestamp
    RetentionAPI->>DB: put(key, "OK", need_watch=false)
    Note over DB,Watcher: No watch event triggered (optimization)
    
    Note over Client,Cache: Issue: delete_stream_done inconsistency
    Client->>RetentionAPI: delete_stream_done(date_range=Some(...))
    RetentionAPI->>DB: delete_if_exists(key, need_watch=true)
    DB->>Watcher: Triggers watch even for date ranges
    Note over Watcher: Inconsistent: should not watch for date ranges
Loading

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@hengfeiyang hengfeiyang changed the title fix: no need watch data retention events perf: no need watch data retention events Oct 22, 2025
@testdino-playwright-reporter
Copy link

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
All tests passed 365 346 0 19 0 95% 5m 12s

View Detailed Results

@testdino-playwright-reporter
Copy link

⚠️ Test Run Unstable


Author: hengfeiyang | Branch: fix/compactor-opts | Commit: 0476805

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
All tests passed 365 339 0 19 7 93% 5m 12s

View Detailed Results

@testdino-playwright-reporter
Copy link

⚠️ Test Run Unstable


Author: hengfeiyang | Branch: fix/compactor-opts | Commit: 0476805

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
All tests passed 365 342 0 19 4 94% 5m 13s

View Detailed Results

@hengfeiyang hengfeiyang requested a review from Subhra264 October 23, 2025 04:56
Copy link
Contributor

@YashodhanJoshi1 YashodhanJoshi1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a minor nit, but otherwise lgtm

@testdino-playwright-reporter
Copy link

⚠️ Test Run Unstable


Author: hengfeiyang | Branch: fix/compactor-opts | Commit: 14685d6

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
All tests passed 365 342 0 19 4 94% 4m 39s

View Detailed Results

@testdino-playwright-reporter
Copy link

⚠️ Test Run Unstable


Author: hengfeiyang | Branch: fix/compactor-opts | Commit: 14685d6

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
All tests passed 365 343 0 19 3 94% 4m 39s

View Detailed Results

@hengfeiyang hengfeiyang merged commit 40f43f2 into main Oct 23, 2025
32 checks passed
@hengfeiyang hengfeiyang deleted the fix/compactor-opts branch October 23, 2025 06:56
uddhavdave pushed a commit that referenced this pull request Oct 27, 2025
- [x] perf: no need watch data retention events
- [x] perf: cache latest version if get from db
- [x] fix: cache missing of schema
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working Review effort 2/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants