fix(winrm): catch cmd err from winrm #298
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Certain "Windows defender updates" (or Definition updates) caused issues during the powershell provisioner. Whenever the update ran it restricted the creation of remote shell through winrm through which we execute commands. This restriction was intermittent probably due to the impact of the update being installed. This caused the provisioners to fail with command not found errors or skipping it entirely. Windows returned back with an error but the winrm library isn't equipped to handle this yet.
Although the winrm library itself isn't returning any errors during shell creation request (need to try to get that fixed), but it throws the error at a certain stage later which is good enough for now to know something went wrong.
Specifically cmd.err (from winrm library) was ignored which had errors pertaining to shell creation or cmd execution. Now this error is being considered as part of Run() function which is part of client.go in winrm. This does exactly what was being done manually earlier. Since the err value is private to the package, we aren't able to examine the errors contained during the cmd execution.
The Run() function is a blocking call, hence had to refactor the communicator to allow reading the stdout and stderr as a goroutine. This ensures we don't block the stream pipes.
More details on the Issue
Windows Powershell Issues RCA
Actual Issue
Windows updates its Virus & Security definitions on reboot or when instance is booting for the first time. These are the updates that keep the Windows Defender up to date with the latest definitions. They occur independently of the regular windows updates. It shows up in
Event Viewersomething like this:Whenever this
Event ID 43starts, it seems to restart or block some core services which doesn’t allow WinRM to open a cmd shell at this time. This happens only for a fraction of a second which is why this is an intermittent issue. Also not every update of this causes issues which makes it even more difficult to replicate correctly.Whichever powershell provisioner is running at this moment gets affected. The Shell isnt created for that provisioner to execute its commands. Although an immediate retry of the same request should be successful.
This update is difficult to pause or halt temporarily hence this workaround cannot be suggested to users.
WinRM libraries used in Packer
Types of Issues occurring due to this
Various types of errors can occur due to the high uncertainty of timing of the shell creation failure.
Command that runs the script fails
This command runs the script/inline commands that user has provided in the provisioner. When this fails there’s no error returned by the winrm libraries which makes packer think it ran successfully and proceeds to next provisioner.
This makes the provisioner look like it executed but it never did on the windows machine.
Script File upload / move fails
Winrmcplibrary uploads the script to a temporary location (%TEMP%) first and then moves it to C:\Windows\Temp. Both these operations execute independently and are not atomic. If any one of them fails then it can appear as though the script was not found. This results in errors like:CommandNotFoundException,ObjectNotFound,The term … is not recognized, etc.Content of the script file damaged or missing
winrmcplibrary is used to upload the scripts to the windows machine. It does by appending a chunk of characters to the temp file in windows. Each chunk is base64 encoded and appended to a new line in the temp file in remote. Each chunk makes its own request to create shell and execute the command to echo the content.If any of the request here fails, the integrity of the script is lost and it may not execute properly. This has caused issues like
“Environment variables intermittently not set”because the env file may be missing few variables due to this issue.Temporary files left behind
winrmcp fails to remove the temp file it created during the script file upload. This will leave behind a script in %TEMP%.
Sample request/response to Create Shell
Request to create a cmd shell
Response from create shell request
Error Response
The response from create shell request that fails to provide a shell ID
Why the libraries aren’t returning error
Winrm
The Shell ID set here is “”.Have created an issue in this repo: client.CreateShell() does not detect any errors returned by windows masterzen/winrm#176
There’s a discussion to make this error field accessible: do we need a cmd.Error() function? masterzen/winrm#41
Command.err(holds any more error occurred during cmd execution) hence it makes it more difficult to detect this issue.Winrmcp
What issues are fixed
Since the library is where the major fix will be, we aren’t able to solve all the listed issues as of now.
The issue where the provisioner seems to be skipped, where the command to run the script file is failed is fixed for now. The change was made to use the Run() method (client.go file) in winrm which returns the Command.err . This enables to retry based on this error.