fix(winrm): catch cmd err from winrm #298

anurag5sh · 2025-07-01T15:04:49Z

Description

Certain "Windows defender updates" (or Definition updates) caused issues during the powershell provisioner. Whenever the update ran it restricted the creation of remote shell through winrm through which we execute commands. This restriction was intermittent probably due to the impact of the update being installed. This caused the provisioners to fail with command not found errors or skipping it entirely. Windows returned back with an error but the winrm library isn't equipped to handle this yet.

Although the winrm library itself isn't returning any errors during shell creation request (need to try to get that fixed), but it throws the error at a certain stage later which is good enough for now to know something went wrong.

Specifically cmd.err (from winrm library) was ignored which had errors pertaining to shell creation or cmd execution. Now this error is being considered as part of Run() function which is part of client.go in winrm. This does exactly what was being done manually earlier. Since the err value is private to the package, we aren't able to examine the errors contained during the cmd execution.

The Run() function is a blocking call, hence had to refactor the communicator to allow reading the stdout and stderr as a goroutine. This ensures we don't block the stream pipes.

More details on the Issue

Windows Powershell Issues RCA

Actual Issue

Windows updates its Virus & Security definitions on reboot or when instance is booting for the first time. These are the updates that keep the Windows Defender up to date with the latest definitions. They occur independently of the regular windows updates. It shows up in Event Viewer something like this:

Installation Started: Windows has started installing the following update: Security Intelligence Update for Microsoft Defender Antivirus - KB2267602 (Version 1.431.185.0) - Current Channel (Broad)

Whenever this Event ID 43 starts, it seems to restart or block some core services which doesn’t allow WinRM to open a cmd shell at this time. This happens only for a fraction of a second which is why this is an intermittent issue. Also not every update of this causes issues which makes it even more difficult to replicate correctly.

Whichever powershell provisioner is running at this moment gets affected. The Shell isnt created for that provisioner to execute its commands. Although an immediate retry of the same request should be successful.

This update is difficult to pause or halt temporarily hence this workaround cannot be suggested to users.

WinRM libraries used in Packer

https://github.com/masterzen/winrm - creates a cmd shell in windows where we can execute commands remotely
https://github.com/packer-community/winrmcp - uses winrm to upload scripts/files to windows

Types of Issues occurring due to this

Various types of errors can occur due to the high uncertainty of timing of the shell creation failure.

Command that runs the script fails
```
powershell -executionpolicy bypass "& { if (Test-Path variable:global:ProgressPreference){set-variable -name variable:global:ProgressPreference -value 'SilentlyContinue'};. c:/Windows/Temp/packer-ps-env-vars-686436fb-6d60-e170-36a9-87a8054b81e9.ps1; &'c:/Windows/Temp/script-686436fb-bf3b-018b-0cc3-d83628baa833.ps1'; exit $LastExitCode }
```
This command runs the script/inline commands that user has provided in the provisioner. When this fails there’s no error returned by the winrm libraries which makes packer think it ran successfully and proceeds to next provisioner.

This makes the provisioner look like it executed but it never did on the windows machine.
Script File upload / move fails

Winrmcp library uploads the script to a temporary location (%TEMP%) first and then moves it to C:\Windows\Temp. Both these operations execute independently and are not atomic. If any one of them fails then it can appear as though the script was not found. This results in errors like: CommandNotFoundException , ObjectNotFound, The term … is not recognized, etc.

Content of the script file damaged or missing

winrmcp library is used to upload the scripts to the windows machine. It does by appending a chunk of characters to the temp file in windows. Each chunk is base64 encoded and appended to a new line in the temp file in remote. Each chunk makes its own request to create shell and execute the command to echo the content.

If any of the request here fails, the integrity of the script is lost and it may not execute properly. This has caused issues like “Environment variables intermittently not set” because the env file may be missing few variables due to this issue.
Temporary files left behind

winrmcp fails to remove the temp file it created during the script file upload. This will leave behind a script in %TEMP%.

Sample request/response to Create Shell

Request to create a cmd shell

<env:Envelope
	xmlns:env="http://www.w3.org/2003/05/soap-envelope"
	xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing"
	xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell"
	xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd"
	xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd">
	<env:Header>
		<a:To>https://34.134.189.249:5986/wsman</a:To>
		<a:ReplyTo>
			<a:Address mustUnderstand="true">http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:Address>
		</a:ReplyTo>
		<w:MaxEnvelopeSize mustUnderstand="true">153600</w:MaxEnvelopeSize>
		<w:OperationTimeout>PT15M</w:OperationTimeout>
		<a:MessageID>uuid:054cee86-9d72-4c57-b9e0-6aa4d9931253</a:MessageID>
		<w:Locale mustUnderstand="false" xml:lang="en-US"/>
		<p:DataLocale mustUnderstand="false" xml:lang="en-US"/>
		<a:Action mustUnderstand="true">http://schemas.xmlsoap.org/ws/2004/09/transfer/Create</a:Action>
		<w:ResourceURI mustUnderstand="true">http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd</w:ResourceURI>
		<w:OptionSet>
			<w:Option Name="WINRS_NOPROFILE">FALSE</w:Option>
			<w:Option Name="WINRS_CODEPAGE">65001</w:Option>
		</w:OptionSet>
	</env:Header>
	<env:Body>
		<rsp:Shell>
			<rsp:InputStreams>stdin</rsp:InputStreams>
			<rsp:OutputStreams>stdout stderr</rsp:OutputStreams>
		</rsp:Shell>
	</env:Body>
</env:Envelope>

Response from create shell request

<s:Envelope xml:lang="en-US"
	xmlns:s="http://www.w3.org/2003/05/soap-envelope"
	xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing"
	xmlns:x="http://schemas.xmlsoap.org/ws/2004/09/transfer"
	xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd"
	xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell"
	xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd">
	<s:Header>
		<a:Action>http://schemas.xmlsoap.org/ws/2004/09/transfer/CreateResponse</a:Action>
		<a:MessageID>uuid:B880B23F-41FF-41A0-B8FC-C7566630E188</a:MessageID>
		<a:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:To>
		<a:RelatesTo>uuid:054cee86-9d72-4c57-b9e0-6aa4d9931253</a:RelatesTo>
	</s:Header>
	<s:Body>
		<x:ResourceCreated>
			<a:Address>https://34.134.189.249:5986/wsman</a:Address>
			<a:ReferenceParameters>
				<w:ResourceURI>http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd</w:ResourceURI>
				<w:SelectorSet>
					<w:Selector Name="ShellId">D86A4CED-C2DA-40FF-B6A4-18E1797A48B8</w:Selector>
				</w:SelectorSet>
			</a:ReferenceParameters>
		</x:ResourceCreated>
		<rsp:Shell
			xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell">
			<rsp:ShellId>D86A4CED-C2DA-40FF-B6A4-18E1797A48B8</rsp:ShellId>
			<rsp:ResourceUri>http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd</rsp:ResourceUri>
			<rsp:Owner>packer</rsp:Owner>
			<rsp:ClientIP>34.133.63.210</rsp:ClientIP>
			<rsp:IdleTimeOut>PT7200.000S</rsp:IdleTimeOut>
			<rsp:InputStreams>stdin</rsp:InputStreams>
			<rsp:OutputStreams>stdout stderr</rsp:OutputStreams>
			<rsp:ShellRunTime>P0DT0H0M0S</rsp:ShellRunTime>
			<rsp:ShellInactivity>P0DT0H0M0S</rsp:ShellInactivity>
		</rsp:Shell>
	</s:Body>
</s:Envelope>

Error Response

The response from create shell request that fails to provide a shell ID

<s:Envelope xml:lang="en-US"
	xmlns:s="http://www.w3.org/2003/05/soap-envelope"
	xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing"
	xmlns:x="http://schemas.xmlsoap.org/ws/2004/09/transfer"
	xmlns:e="http://schemas.xmlsoap.org/ws/2004/08/eventing"
	xmlns:n="http://schemas.xmlsoap.org/ws/2004/09/enumeration"
	xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd"
	xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd">
	<s:Header>
		<a:Action>http://schemas.dmtf.org/wbem/wsman/1/wsman/fault</a:Action>
		<a:MessageID>uuid:0A888267-33ED-4F08-98E6-DDEBDE2067DE</a:MessageID>
		<a:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:To>
		<a:RelatesTo>uuid:1893210b-91a7-45dc-aa90-b0c42e0dd740</a:RelatesTo>
	</s:Header>
	<s:Body>
		<s:Fault>
			<s:Code>
				<s:Value>s:Receiver</s:Value>
				<s:Subcode>
					<s:Value>w:InternalError</s:Value>
				</s:Subcode>
			</s:Code>
			<s:Reason>
				<s:Text xml:lang="en-US">Illegal operation attempted on a registry key that has been marked for deletion. </s:Text>
			</s:Reason>
			<s:Detail>
				<f:WSManFault
					xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="2147943418" Machine="34.134.189.249">
					<f:Message>
						<f:ProviderFault provider="Shell cmd plugin" path="%!s(MISSING)ystemroot%!\(MISSING)system32\winrscmd.dll">Illegal operation attempted on a registry key that has been marked for deletion. </f:ProviderFault>
					</f:Message>
				</f:WSManFault>
			</s:Detail>
		</s:Fault>
	</s:Body>
</s:Envelope>

Why the libraries aren’t returning error

Winrm
1. The winrm library isn’t parsing the error components in the CreateShell response. This never returns any error and the request is seen as successful. The Shell ID set here is “”.
Have created an issue in this repo: client.CreateShell() does not detect any errors returned by windows masterzen/winrm#176
1. There’s a command struct that contains err field. This stores the error when the shell ID is empty. But since this is a private field we cannot access the error.
There’s a discussion to make this error field accessible: do we need a cmd.Error() function? masterzen/winrm#41
1. The CreateShell() and Execute() methods don’t return the Command.err (holds any more error occurred during cmd execution) hence it makes it more difficult to detect this issue.
Winrmcp
1. This is also using the CreateShell() and Execute() methods which has the same limitations and these issues will occur. Alternative Run() can be used which returns the errors for now.

What issues are fixed

Since the library is where the major fix will be, we aren’t able to solve all the listed issues as of now.

The issue where the provisioner seems to be skipped, where the command to run the script file is failed is fixed for now. The change was made to use the Run() method (client.go file) in winrm which returns the Command.err . This enables to retry based on this error.

cmd.err was ignored from winrm lib which had errors pertaining to shell creation or cmd execution. Refactored the communicator to account for the winrm changes

Copilot

Pull Request Overview

This PR addresses handling errors from winrm during remote command execution by refactoring the communicator implementation.

Refactor winrm communicator to use c.client.Run and correctly set exit codes and log command outcomes.
Adjust packer communicator to setup asynchronous output collection through goroutines before starting the command execution.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
sdk-internals/communicator/winrm/communicator.go	Refactored Run() function to replace shell creation and command execution with a blocking run that returns exit code and error.
packer/communicator.go	Reordered output collection by launching a goroutine for UI updates before starting the command, and enhanced draining of output channels.

Comments suppressed due to low confidence (2)

sdk-internals/communicator/winrm/communicator.go:85

Ensure that c.client.Run properly handles resource cleanup (for example, closing the shell) to prevent any leaks. If the cleanup is not handled internally by the library, consider invoking shell.Close() explicitly.

	exitCode, err := c.client.Run(rc.Command, rc.Stdout, rc.Stderr)

packer/communicator.go:174

The reordering where the output processing goroutine starts before executing the command can be critical to avoid losing any output, so please verify that this ordering is aligned with the intended execution flows across the codebase.

	if err := c.Start(ctx, r); err != nil {

fix(winrm): catch cmd err from winrm

f6edf98

cmd.err was ignored from winrm lib which had errors pertaining to shell creation or cmd execution. Refactored the communicator to account for the winrm changes

anurag5sh requested a review from a team as a code owner July 1, 2025 15:04

anurag5sh requested a review from Copilot July 1, 2025 15:08

Copilot AI reviewed Jul 1, 2025

View reviewed changes

anshulsharma-hashicorp approved these changes Jul 4, 2025

View reviewed changes

anurag5sh merged commit 523f1ce into main Jul 4, 2025
9 checks passed

anurag5sh deleted the winrm_error_handling branch July 4, 2025 09:00

Neved4 mentioned this pull request Sep 18, 2025

packer-tmp 1.14.2 Neved4/homebrew-tap#285

Closed

anurag5sh mentioned this pull request Oct 9, 2025

bump github.com/masterzen/winrm #305

Merged

ShawnHardwick mentioned this pull request Nov 25, 2025

amazon-ebs: error uploading powershell script after sysprep hashicorp/packer-plugin-amazon#637

Open

awesomenix mentioned this pull request Dec 20, 2025

Error while capturing windows vhd builds (any) hashicorp/packer-plugin-azure#560

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(winrm): catch cmd err from winrm #298

fix(winrm): catch cmd err from winrm #298

Uh oh!

anurag5sh commented Jul 1, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(winrm): catch cmd err from winrm #298

fix(winrm): catch cmd err from winrm #298

Uh oh!

Conversation

anurag5sh commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

More details on the Issue

Windows Powershell Issues RCA

Actual Issue

WinRM libraries used in Packer

Types of Issues occurring due to this

Sample request/response to Create Shell

Error Response

Why the libraries aren’t returning error

What issues are fixed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anurag5sh commented Jul 1, 2025 •

edited

Loading