Skip to content

Resume the pipeline when a new zone is loaded#153

Merged
ximon18 merged 8 commits intomainfrom
resume-pipeline-on-load
Oct 10, 2025
Merged

Resume the pipeline when a new zone is loaded#153
ximon18 merged 8 commits intomainfrom
resume-pipeline-on-load

Conversation

@bal-e
Copy link
Contributor

@bal-e bal-e commented Oct 7, 2025

If a zone's pipeline is soft-halted, and a new version of the zone is loaded, the pipeline doesn't resume. Part of the reason for this was to stop the zone signer from signing an unsigned zone that hasn't been approved. A secondary issue is that a 'SignZone' command (e.g. issued autonomously by the key manager), while the pipeline is halted, causes the zone signer to hard-halt. This accidentally turns soft-halts into hard-halts.

Now, a new zonetree for ready-to-sign zones has been added. Unsigned zones are added to this tree when they are approved. This separates the pipeline mode from the unsigned zone approval state; the zone signer can be run at any time, and it will only touch approved zones. The zone signer has also been adjusted to leave the pipeline mode alone if it is already halted.

If a zone's pipeline is soft-halted, and a new version of the zone is
loaded, the pipeline doesn't resume.  Part of the reason for this was
to stop the zone signer from signing an unsigned zone that hasn't been
approved.  A secondary issue is that a 'SignZone' command (e.g. issued
autonomously by the key manager), while the pipeline is halted, causes
the zone signer to hard-halt.  This accidentally turns soft-halts into
hard-halts.

Now, a new zonetree for ready-to-sign zones has been added.  Unsigned
zones are added to this tree when they are approved.  This separates
the pipeline mode from the unsigned zone approval state; the zone signer
can be run at any time, and it will only touch approved zones.  The zone
signer has also been adjusted to leave the pipeline mode alone if it is
already halted.
@bal-e bal-e requested a review from ximon18 October 7, 2025 08:49
@bal-e bal-e self-assigned this Oct 7, 2025
@bal-e
Copy link
Contributor Author

bal-e commented Oct 7, 2025

Contributes to solving #154.

if let Ok(mut zone_state) = zone.state.lock() {
match zone_state.pipeline_mode.clone() {
PipelineMode::Running => {}
PipelineMode::SoftHalt(message) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you chose to include the soft-halt message in your log message, but didn't also add it to the hard-halt log message?

Copy link
Member

@ximon18 ximon18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @bal-e!

I'll build and test it, if that looks good I'll approve the PR.

@ximon18
Copy link
Member

ximon18 commented Oct 7, 2025

So, I tried testing this but the zone gets stuck showing that signing has been requested but proceeding no further. The logs show "Starting signing operation" but nothing more. Reverting to 0.1.0-alpha fixes the problem.

@ximon18
Copy link
Member

ximon18 commented Oct 7, 2025

So, I tried testing this but the zone gets stuck showing that signing has been requested but proceeding no further. The logs show "Starting signing operation" but nothing more. Reverting to 0.1.0-alpha fixes the problem.

Ah, in trace level logging I see:

Ignoring request to sign unavailable zone 'example.com'

But the system is not left in the correct state from the point of view of the cascade zone status command.

I'll see why it thinks the zone is unavailable.

@ximon18
Copy link
Member

ximon18 commented Oct 7, 2025

Ahha, maybe it only adds the zone to the new signable zones collection if it is approved, but I have no approval required at present.

@ximon18
Copy link
Member

ximon18 commented Oct 7, 2025

@bal-e: I pushed a change that also promotes the zone if review is not required. It now works better for me but I'm still hitting a problem when the unsigned review passed but the signed review did not (in my case I forgot to chmod +x the signed review script). Then it gets stuck again. Investigating.

Also, maybe nothing to do with this PR, if the unsigned review script lacks execute permission and cannot be run the zone is held. Attempting to approve it using cascade zone approve fails as it says the zone is not being reviewed right now.

In the case of a zone that failed review, on zone reload we should:

a. Re-review it, because the content may have changed.
b. Allow it to progress to the next stage in the pipeline (if review
passes).

Prior to this change it was neither re-reviewed nor progressed, new zone
content triggering a re-review was just ignored.
@ximon18
Copy link
Member

ximon18 commented Oct 7, 2025

Right, I think that solves it.

The change makes me a little uncomfortable because I'm changing existing logic, but as far as I can tell it was just wrong, maybe a left over of the earlier HTTP API based approval (because then you could get multiple approvals for the same zone so ignoring a duplicate might make sense, but now with review script exit code based approval it can only be approved once).

FYI manual approval won't work, e.g. cascade zone approve because the zone is not pending review, it was reviewed and rejected.

@ximon18 ximon18 requested a review from a team October 7, 2025 13:17
Copy link
Member

@mozzieongit mozzieongit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have some issues when doing (with local zone file):

  • add policy with rejecting review script (and review required)
  • add zone
  • change policy to use different script (that still rejects)
  • run policy reload
  • run zone status ... (now shows the new review script, and the existing error messsage (so is out of sync, but that is known, I think)
  • run zone reload ...
  • change script to accept zone
  • run zone reload ...
  • zone status says loaded new version bla 1s ago, but is still waiting for approval and error that the review rejected the zone.

(I haven't looked further through the code yet, but want to get this out already)

@mozzieongit
Copy link
Member

I also have some issues when doing (with local zone file):

* add policy with rejecting review script (and review required)

* add zone

* change policy to use different script (that still rejects)

* run `policy reload`

* run `zone status ...` (now shows the new review script, and the existing error messsage (so is out of sync, but that is known, I think)

* run `zone reload ...`

* change script to accept zone

* run `zone reload ...`

* `zone status` says loaded new version bla 1s ago, but is still waiting for approval and error that the review rejected the zone.

(I haven't looked further through the code yet, but want to get this out already)

I might be an idiot ... command not found ...

@ximon18 ximon18 modified the milestone: 0.1.0-alpha2 Oct 7, 2025
@ximon18 ximon18 self-assigned this Oct 7, 2025
@ximon18 ximon18 added this to the 0.1.0-alpha2 milestone Oct 7, 2025
@mozzieongit
Copy link
Member

Seems to work

Copy link
Member

@mozzieongit mozzieongit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After @ximon18 's comment about the Ok(()), this would be approved

@mozzieongit
Copy link
Member

Just a note: #197 still applies in this branch, but I assume that is fix unrelated to this PR, in that the fix touches on how policies are reloaded.

@ximon18
Copy link
Member

ximon18 commented Oct 10, 2025

Indeed I would say they are unrelated.

@ximon18
Copy link
Member

ximon18 commented Oct 10, 2025

@mozzieongit: See c71ee90.

@mozzieongit
Copy link
Member

@mozzieongit: See c71ee90.

Looks good.

Are you going to resolve the Ok(()) comment in this PR?

@ximon18
Copy link
Member

ximon18 commented Oct 10, 2025

Are you going to resolve the Ok(()) comment in this PR?

Done.

@ximon18 ximon18 merged commit 926ad0f into main Oct 10, 2025
27 checks passed
@ximon18 ximon18 deleted the resume-pipeline-on-load branch October 10, 2025 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants