Skip to content

testing framework: introduce interrupts for stopping tests#33477

Merged
liamcervante merged 11 commits intomainfrom
liamcervante/tests/cancel
Jul 10, 2023
Merged

testing framework: introduce interrupts for stopping tests#33477
liamcervante merged 11 commits intomainfrom
liamcervante/tests/cancel

Conversation

@liamcervante
Copy link
Copy Markdown
Contributor

@liamcervante liamcervante commented Jul 5, 2023

This PR adds support for interrupts into the testing process.

We copy the implementation for the plan and apply operations in that the first interrupt is a request that still results in a tidy state and nice printed output, and then a second interrupt results in an immediate exit. In the case of testing, this means that resources created during a test will not be tidied up if the second interrupt is received. Note that the warning and list of resources left alive will still be printed.

This PR also performs a refactoring by introducing the concept of a TestRunner that helps with packaging the various plan and apply operations into a single function such that we only need to implement the interrupt actions in a single place.

An alternative would be to reuse the existing operation functionality for the testing framework as that already supports interrupts. However, it also supports/forces a range of other things like state locking, plan rendering, and IO management. All of these we do not need as part of the testing framework, and this functionality is heavily baked into the operation functions. It is my opinion that implementing interrupts directly for the testing framework is less work than attempting to reuse the operation functions and have them selectively only handle interrupts and none of the other features.


This PR is part of a chain of PRs adding support for the beta functionality of the testing framework.

Description Branch PR
Preperation / common refactoring liamcervante/tests/prep #33445
Implement module block liamcervante/tests/module #33456
Implement providers block liamcervante/tests/providers #33466
Implement interrupt functionality liamcervante/tests/cancel #33477

The above PRs depend on one another, and should be reviewed and merged in order. If you want to see all the changes at once you can navigate to the last PR and update the branch it is to be merged into to main and this will show the summary of changes.

Comment thread internal/command/test.go
runner.Cancelled = true
cancel()

// TODO(liamcervante): Should we add a timer here? That would mean
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's more info we could get to the user, but a forceful cancellation can (and usually will) always leave untracked resources around because the provider has not been able to return from some number of ApplyResourceChange calls. This is a more frequent hazard with user-executed testing, where they too often decide they don't want to wait for whatever reason and panic hit ctrl+c to try and abort.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think the implementation right now won't cause this because the forceful cancellation actually only tells it to skip the tidy up, but it still always waits for whatever operation is ongoing to finish (ie. it doesn't pass the forceful cancellation into the terraform context, only the first nice cancel request). I did this because, as you say, we'll almost definitely be left with resources live that Terraform doesn't know about.

If I was to come back and implement the timer here I think we'd start to see the problems, which is why I've just left it as a TODO.

I'm not sure what the best way to deal with this would be. Potentially we could add a validate/refresh/import option to the test command, and it runs through all the tests and attempts to see if any infrastructure that matches the config exists? That might be very complicated. We already print out all the resources we know exist and haven't been tidied up, I don't know how much more Terraform can actually say.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, without aborting immediately, or at least within a short delay, this isn't going to do much of anything. Most providers don't fully implement cancellation, so this will prevent Terraform from exiting at all without an external signal until the provider times out internally. This has already passed the "there may be data loss" warning, so users should be willing to accept the risk, and now we are probably more likely to see users killing the process entirely to get unstuck.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll merge this as is now and will revisit this TODO in a separate PR. I can make the language a bit stronger in regards to warnings about what will happen during a hard interrupt. Could also print out the plan that was being executed when it was cancelled, so users can also see what resources might have been left behind.

Comment thread internal/command/test.go
// The command argument decides whether it executes only a plan or also applies
// the plan it creates during the planning.
func (runner *TestRunner) execute(run *moduletest.Run, file *moduletest.File, config *configs.Config, state *states.State, opts *terraform.PlanOpts, command configs.TestCommand) (*terraform.Context, *plans.Plan, *states.State, tfdiags.Diagnostics) {
if opts.Mode == plans.DestroyMode && state.Empty() {
Copy link
Copy Markdown
Member

@jbardin jbardin Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it matters here, but Empty also looks for root module outputs, because in a normal context those need to be removed explicitly by a plan, but for tests maybe it's useful to skip those?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That's good context for me in the future. I think in this case it's okay, as it'll just go and do a very quick destroy operation and seeing that fail/succeed might be something the user wants to include in the tests.

Comment thread internal/command/test.go Outdated
return backend.ParseVariableValues(unparsed, config.Module.Variables)
}

func captureTestPanic(identifier string, diags *tfdiags.Diagnostics) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are trying to handle panics in the tests, we may want to look into overriding the behavior of the global panic handler. Most codepaths in core are covered by logging.PanicHandler, which does not recover and exits with 11 after writing the custom error message. (this also might be a rare enough occurrence that it's not really worth the effort)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah, I see that logging.PanicHandler is also referenced directly in the graph walk cycle. Let me take this functionality out of this PR, and I can make a dedicated PR separately that can solve this. It's not as simple as I thought it would be.

Base automatically changed from liamcervante/tests/provider to main July 10, 2023 13:33
@liamcervante liamcervante merged commit 4862812 into main Jul 10, 2023
@liamcervante liamcervante deleted the liamcervante/tests/cancel branch July 10, 2023 13:53
@github-actions
Copy link
Copy Markdown
Contributor

Reminder for the merging maintainer: if this is a user-visible change, please update the changelog on the appropriate release branch.

@github-actions
Copy link
Copy Markdown
Contributor

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants