Skip to content

Conversation

@Subhra264
Copy link
Contributor

  • When delay is non-zero for derived stream, currently sometimes the start time becomes greater than the end time. This pr fixes that by moving to the next run where the end time eventually becomes greater than the start time and also follows the given delay.
  • Currently when updating a derived stream, it deletes the old scheduled job, and hence the start time for the scheduled job changes. As a result, there can be duplicate results in the derived stream because of updates. This pr fixes this by just updating only the next_run_at field of the scheduled_job.

- When delay is non-zero for derived stream, currently sometimes the
start time becomes greater than the end time. This pr fixes that by
moving to the next run where the end time eventually becomes greater
than the start time and also follows the given delay.
- Currently when updating a derived stream, it deletes the old scheduled
job, and hence the start time for the scheduled job changes. As a
result, there can be duplicate results in the derived stream because of
updates. This pr fixes this by just updating only the next_run_at field
of the scheduled_job.
@github-actions github-actions bot added the ☢️ Bug Something isn't working label Aug 20, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR addresses two critical issues with derived stream scheduling when delay configurations are involved:

Issue 1: Invalid Time Ranges
When a derived stream has a non-zero delay configured, the system sometimes calculates a start time that is greater than the end time, causing processing failures. The fix introduces time alignment logic and skip handling in the scheduler. The supposed_to_be_run_at timestamp is now aligned to ensure consistent time boundaries, and when an invalid time range is detected (start > end), the scheduler logs a warning and moves to the next valid run window instead of attempting to process the invalid range.

Issue 2: Duplicate Results from Updates
Previously, updating a scheduled derived stream would always delete the existing trigger and recreate it, causing the start time context to be lost and potentially leading to duplicate results. The solution implements conditional trigger deletion - triggers are only deleted when the pipeline source type actually changes from scheduled to realtime. For scheduled-to-scheduled updates, the existing trigger is preserved and only its next_run_at field is updated.

Implementation Details
The changes span three files: adding helper methods (is_scheduled() and is_realtime()) to the PipelineSource enum for cleaner type checking, implementing comprehensive skip logic in the scheduler handlers with improved logging and early TriggerData initialization, and modifying the pipeline update logic to conditionally preserve existing triggers. The scheduler now properly handles catch-up scenarios using aligned timestamps and reports skipped runs with appropriate status tracking.

Confidence score: 4/5

  • This PR addresses well-defined scheduling issues with a targeted approach that preserves existing functionality
  • Score reflects complex timing logic in critical scheduler code that handles edge cases around delays and time boundaries
  • Pay close attention to the scheduler handlers file where most of the complex time calculation and skip logic is implemented

3 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

@Subhra264 Subhra264 merged commit ff07439 into branch-v0.14.6-rc9 Aug 21, 2025
49 checks passed
@Subhra264 Subhra264 deleted the delay_derived_rc9 branch August 21, 2025 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants