testing framework: introduce interrupts for stopping tests#33477
testing framework: introduce interrupts for stopping tests#33477liamcervante merged 11 commits intomainfrom
Conversation
2ee6eef to
7dffeab
Compare
7dffeab to
9988b67
Compare
…aform into liamcervante/tests/provider
…rraform into liamcervante/tests/cancel
| runner.Cancelled = true | ||
| cancel() | ||
|
|
||
| // TODO(liamcervante): Should we add a timer here? That would mean |
There was a problem hiding this comment.
Maybe there's more info we could get to the user, but a forceful cancellation can (and usually will) always leave untracked resources around because the provider has not been able to return from some number of ApplyResourceChange calls. This is a more frequent hazard with user-executed testing, where they too often decide they don't want to wait for whatever reason and panic hit ctrl+c to try and abort.
There was a problem hiding this comment.
So, I think the implementation right now won't cause this because the forceful cancellation actually only tells it to skip the tidy up, but it still always waits for whatever operation is ongoing to finish (ie. it doesn't pass the forceful cancellation into the terraform context, only the first nice cancel request). I did this because, as you say, we'll almost definitely be left with resources live that Terraform doesn't know about.
If I was to come back and implement the timer here I think we'd start to see the problems, which is why I've just left it as a TODO.
I'm not sure what the best way to deal with this would be. Potentially we could add a validate/refresh/import option to the test command, and it runs through all the tests and attempts to see if any infrastructure that matches the config exists? That might be very complicated. We already print out all the resources we know exist and haven't been tidied up, I don't know how much more Terraform can actually say.
There was a problem hiding this comment.
Yeah, without aborting immediately, or at least within a short delay, this isn't going to do much of anything. Most providers don't fully implement cancellation, so this will prevent Terraform from exiting at all without an external signal until the provider times out internally. This has already passed the "there may be data loss" warning, so users should be willing to accept the risk, and now we are probably more likely to see users killing the process entirely to get unstuck.
There was a problem hiding this comment.
Okay, I'll merge this as is now and will revisit this TODO in a separate PR. I can make the language a bit stronger in regards to warnings about what will happen during a hard interrupt. Could also print out the plan that was being executed when it was cancelled, so users can also see what resources might have been left behind.
| // The command argument decides whether it executes only a plan or also applies | ||
| // the plan it creates during the planning. | ||
| func (runner *TestRunner) execute(run *moduletest.Run, file *moduletest.File, config *configs.Config, state *states.State, opts *terraform.PlanOpts, command configs.TestCommand) (*terraform.Context, *plans.Plan, *states.State, tfdiags.Diagnostics) { | ||
| if opts.Mode == plans.DestroyMode && state.Empty() { |
There was a problem hiding this comment.
Not sure if it matters here, but Empty also looks for root module outputs, because in a normal context those need to be removed explicitly by a plan, but for tests maybe it's useful to skip those?
There was a problem hiding this comment.
Thanks! That's good context for me in the future. I think in this case it's okay, as it'll just go and do a very quick destroy operation and seeing that fail/succeed might be something the user wants to include in the tests.
| return backend.ParseVariableValues(unparsed, config.Module.Variables) | ||
| } | ||
|
|
||
| func captureTestPanic(identifier string, diags *tfdiags.Diagnostics) { |
There was a problem hiding this comment.
If we are trying to handle panics in the tests, we may want to look into overriding the behavior of the global panic handler. Most codepaths in core are covered by logging.PanicHandler, which does not recover and exits with 11 after writing the custom error message. (this also might be a rare enough occurrence that it's not really worth the effort)
There was a problem hiding this comment.
ah yeah, I see that logging.PanicHandler is also referenced directly in the graph walk cycle. Let me take this functionality out of this PR, and I can make a dedicated PR separately that can solve this. It's not as simple as I thought it would be.
…te/tests/provider
…rraform into liamcervante/tests/cancel
|
Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch. |
|
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
This PR adds support for interrupts into the testing process.
We copy the implementation for the plan and apply operations in that the first interrupt is a request that still results in a tidy state and nice printed output, and then a second interrupt results in an immediate exit. In the case of testing, this means that resources created during a test will not be tidied up if the second interrupt is received. Note that the warning and list of resources left alive will still be printed.
This PR also performs a refactoring by introducing the concept of a
TestRunnerthat helps with packaging the various plan and apply operations into a single function such that we only need to implement the interrupt actions in a single place.An alternative would be to reuse the existing operation functionality for the testing framework as that already supports interrupts. However, it also supports/forces a range of other things like state locking, plan rendering, and IO management. All of these we do not need as part of the testing framework, and this functionality is heavily baked into the operation functions. It is my opinion that implementing interrupts directly for the testing framework is less work than attempting to reuse the operation functions and have them selectively only handle interrupts and none of the other features.
This PR is part of a chain of PRs adding support for the beta functionality of the testing framework.
moduleblockprovidersblockThe above PRs depend on one another, and should be reviewed and merged in order. If you want to see all the changes at once you can navigate to the last PR and update the branch it is to be merged into to
mainand this will show the summary of changes.