Skip to content

Conversation

@Subhra264
Copy link
Contributor

  • When delay is non-zero for derived stream, currently sometimes the start time becomes greater than the end time. This pr fixes that by moving to the next run where the end time eventually becomes greater than the start time and also follows the given delay.
  • Currently when updating a derived stream, it deletes the old scheduled job, and hence the start time for the scheduled job changes. As a result, there can be duplicate results in the derived stream because of updates. This pr fixes this by just updating only the next_run_at field of the scheduled_job.

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Aug 19, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR fixes two critical timing issues in derived stream processing when delays are configured. The first issue occurs when a derived stream has a non-zero delay - the temporal calculations can result in invalid time ranges where start_time > end_time, causing pipeline failures. The second issue involves duplicate results when updating derived stream pipelines because the system would delete and recreate scheduled jobs, resetting their timing state.

The solution involves three key changes:

  1. Added helper methods to PipelineSource enum (src/config/src/meta/pipeline/components.rs): The is_scheduled() and is_realtime() methods provide clean type checking capabilities using idiomatic Rust pattern matching with the matches! macro.

  2. Modified pipeline update logic (src/service/pipeline/mod.rs): The update_pipeline function now conditionally deletes derived stream triggers only when the pipeline source type changes from scheduled to realtime, rather than always deleting on any update. This preserves the scheduled job's next_run_at timing state.

  3. Enhanced scheduler validation (src/service/alerts/scheduler/handlers.rs): Added comprehensive validation in handle_derived_stream_triggers to detect invalid time ranges where start > end time. When detected, the system gracefully skips to the next scheduled run with proper status reporting (TriggerDataStatus::Skipped) and improved logging.

These changes work together to maintain temporal consistency in derived stream processing while preserving scheduled job state during updates, preventing both timing failures and data duplication.

Confidence score: 4/5

  • This PR addresses well-defined timing bugs with a logical approach that preserves scheduler state
  • Score reflects complex scheduler logic changes that require careful testing in production scenarios
  • Pay close attention to the scheduler validation logic in handlers.rs for potential edge cases

3 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

@Subhra264 Subhra264 merged commit ee67956 into branch-v0.14.6-rc7 Aug 20, 2025
50 checks passed
@Subhra264 Subhra264 deleted the derived_delay_rc7 branch August 20, 2025 08:20
Subhra264 added a commit that referenced this pull request Aug 20, 2025
- When delay is non-zero for derived stream, currently sometimes the
start time becomes greater than the end time. This pr fixes that by
moving to the next run where the end time eventually becomes greater
than the start time and also follows the given delay.
- Currently when updating a derived stream, it deletes the old scheduled
job, and hence the start time for the scheduled job changes. As a
result, there can be duplicate results in the derived stream because of
updates. This pr fixes this by just updating only the next_run_at field
of the scheduled_job.
Subhra264 added a commit that referenced this pull request Aug 20, 2025
- When delay is non-zero for derived stream, currently sometimes the
start time becomes greater than the end time. This pr fixes that by
moving to the next run where the end time eventually becomes greater
than the start time and also follows the given delay.
- Currently when updating a derived stream, it deletes the old scheduled
job, and hence the start time for the scheduled job changes. As a
result, there can be duplicate results in the derived stream because of
updates. This pr fixes this by just updating only the next_run_at field
of the scheduled_job.
Subhra264 added a commit that referenced this pull request Aug 21, 2025
… (#8065)

- When delay is non-zero for derived stream, currently sometimes the
start time becomes greater than the end time. This pr fixes that by
moving to the next run where the end time eventually becomes greater
than the start time and also follows the given delay.
- Currently when updating a derived stream, it deletes the old scheduled
job, and hence the start time for the scheduled job changes. As a
result, there can be duplicate results in the derived stream because of
updates. This pr fixes this by just updating only the next_run_at field
of the scheduled_job.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants