Skip to content

fix: preserve edited schedules across suspension rewrites#2043

Merged
yottahmd merged 5 commits intomainfrom
fix/2042-schedule-edit-sse-refresh
Apr 27, 2026
Merged

fix: preserve edited schedules across suspension rewrites#2043
yottahmd merged 5 commits intomainfrom
fix/2042-schedule-edit-sse-refresh

Conversation

@yottahmd
Copy link
Copy Markdown
Collaborator

@yottahmd yottahmd commented Apr 27, 2026

Summary

  • wake DAG list/detail SSE streams after spec and suspension mutations so edited schedules refresh immediately
  • preserve scheduler watermark state across delete-add DAG rewrites and reset skip-if-successful suppression when cron schedule fingerprints change
  • add regression coverage for suspended schedule edits, watermark migration, and DAG mutation notifications

Testing

  • go test ./internal/service/scheduler ./internal/persis/filewatermark ./internal/service/frontend/api/v1 -count=1
  • go test ./internal/intg -run 'TestIssue2042|TestScheduleEditWhileSuspendedDoesNotSuppressNewSlot' -count=1

Closes #2042

Summary by CodeRabbit

Release Notes

  • New Features

    • DAG mutations now trigger real-time notifications, enabling clients to refresh immediately when specs or suspension state change.
  • Bug Fixes

    • Improved handling of edited schedules when DAGs are suspended and resumed.
    • Enhanced skip_if_successful logic to correctly identify and skip runs from edited or removed schedules.
  • Tests

    • Added integration and unit tests for schedule editing during suspension and skip-if-successful behavior with schedule changes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e274d2a-4c33-4aa6-900b-3e22db6b340c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR resolves a scheduler issue where edited DAG schedules during suspension fail to dispatch. It introduces schedule fingerprint tracking to detect schedule changes, resets skip-successful logic when schedules change, adds DAG mutation notifications to the frontend API and SSE system, and updates watermark versioning/migration to persist new state fields.

Changes

Cohort / File(s) Summary
Integration Tests
internal/intg/issue_2042_test.go, internal/intg/sched_test.go
New tests validating scheduler dispatch behavior when DAG schedules are edited during suspension, using fixed clocks and mocked dispatch recording.
Watermark Migration
internal/persis/filewatermark/store.go, internal/persis/filewatermark/store_test.go
Extended watermark state versioning from v2 to v3, updated migration stages to preserve intermediate version states, and added test coverage for v2→v3 transitions.
Frontend DAG Mutation Notifications
internal/service/frontend/api/v1/api.go, internal/service/frontend/api/v1/dags.go, internal/service/frontend/api/v1/dags_internal_test.go
Added optional dagMutationNotifier callback hook and WithDAGMutationNotifier option; integrated notification calls in UpdateDAGSpec and UpdateDAGSuspensionState; added unit tests verifying callbacks are invoked.
Server SSE Integration
internal/service/frontend/server.go
Wired DAG mutation notifier to SSE multiplexer, triggering topic wakeups for DAGs list and specific DAG topics on mutations.
Scheduler Watermark State
internal/service/scheduler/watermark.go
Extended DAGWatermark struct with StartScheduleFingerprint and SkipSuccessResetAt fields for persisted JSON serialization.
Scheduler Schedule State & Reconciliation
internal/service/scheduler/schedule_state.go
Incremented SchedulerStateVersion to 3; added startScheduleFingerprint helper to compute normalized schedule fingerprints; introduced reconcileStartScheduleState to detect schedule changes and reset skip-successful flags when fingerprints diverge.
TickPlanner Core Logic
internal/service/scheduler/tick_planner.go, internal/service/scheduler/tick_planner_test.go
Refactored shouldRun to use schedule fingerprint identity checking instead of pure timestamp comparison; added deletion grace-window watermark tracking; centralized zero-watermark detection; integrated reconcileStartScheduleState calls during init and DAG update events; added tests for grace-window pruning and skip-successful behavior across edited schedules.

Sequence Diagram(s)

Not applicable. While the PR introduces significant functional changes (schedule fingerprinting, mutation notifications), the control flows are primarily internal to the scheduler or localized to the frontend API. The actual connections between the frontend notification system and scheduler reconciliation are not shown as a cohesive multi-component sequence in the diff.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PR #1968: Implements the SSE multiplexer and DAG-run event invalidation APIs that this PR's mutation notifier relies on to wake frontend subscribers.
  • PR #2034: Modifies scheduler loop mechanics (cronLoop → waitForTick/runTick transitions) that interact with the TickPlanner refactoring in this PR.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main fix: preserving edited schedules when DAGs are suspended and then rewritten/reimported, which directly addresses issue #2042.
Linked Issues check ✅ Passed All code changes implement the objectives from issue #2042: SSE refresh notifications on mutations, watermark state preservation across rewrites, schedule fingerprint tracking to reset skip-if-successful suppression, and regression tests validating the fix.
Out of Scope Changes check ✅ Passed All changes directly support issue #2042 requirements: watermark state extensions, schedule fingerprint tracking, DAG mutation notifications, SSE wakeups, and integration/unit test coverage for the suspension-edit scenario.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/2042-schedule-edit-sse-refresh

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/intg/sched_test.go`:
- Around line 192-249: The test is racy because the fake clock is set to
10:04:59 so the edited schedule/unsuspend can miss the 10:05:00 boundary; change
the clock base used in sc.SetClock from time.Date(2026,4,27,10,4,59,...) to an
earlier time (e.g., 10:04:00) so the scheduler has a full minute before the
target slot, and add an explicit scheduler-registration wait before removing
suspendFlag: replace or augment the initial require.Eventually(sc.IsRunning())
check with a require.Eventually that asserts sc.IsRunning() && the scheduler has
registered the DAG (inspect th.EntryReader.DAGs() for loaded.Name == dagName) so
unsuspend only happens after registration; keep using existing helpers like
pollSchedulerErr, writeSpec, suspendFlag and dispatchCount to locate the
relevant logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: acdeff8c-2756-4edc-84f3-272f696ba6c8

📥 Commits

Reviewing files that changed from the base of the PR and between a2055d9 and 2b70cb8.

📒 Files selected for processing (12)
  • internal/intg/issue_2042_test.go
  • internal/intg/sched_test.go
  • internal/persis/filewatermark/store.go
  • internal/persis/filewatermark/store_test.go
  • internal/service/frontend/api/v1/api.go
  • internal/service/frontend/api/v1/dags.go
  • internal/service/frontend/api/v1/dags_internal_test.go
  • internal/service/frontend/server.go
  • internal/service/scheduler/schedule_state.go
  • internal/service/scheduler/tick_planner.go
  • internal/service/scheduler/tick_planner_test.go
  • internal/service/scheduler/watermark.go

Comment thread internal/intg/sched_test.go Outdated
@yottahmd yottahmd merged commit 5d46777 into main Apr 27, 2026
10 checks passed
@yottahmd yottahmd deleted the fix/2042-schedule-edit-sse-refresh branch April 27, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

modify spec's schedule time but not execute

1 participant