Skip to content

Conversation

@anurag5sh
Copy link
Contributor

@anurag5sh anurag5sh commented Jul 1, 2025

Description

Certain "Windows defender updates" (or Definition updates) caused issues during the powershell provisioner. Whenever the update ran it restricted the creation of remote shell through winrm through which we execute commands. This restriction was intermittent probably due to the impact of the update being installed. This caused the provisioners to fail with command not found errors or skipping it entirely. Windows returned back with an error but the winrm library isn't equipped to handle this yet.

Although the winrm library itself isn't returning any errors during shell creation request (need to try to get that fixed), but it throws the error at a certain stage later which is good enough for now to know something went wrong.

Specifically cmd.err (from winrm library) was ignored which had errors pertaining to shell creation or cmd execution. Now this error is being considered as part of Run() function which is part of client.go in winrm. This does exactly what was being done manually earlier. Since the err value is private to the package, we aren't able to examine the errors contained during the cmd execution.

The Run() function is a blocking call, hence had to refactor the communicator to allow reading the stdout and stderr as a goroutine. This ensures we don't block the stream pipes.

More details on the Issue

Windows Powershell Issues RCA

Actual Issue

Windows updates its Virus & Security definitions on reboot or when instance is booting for the first time. These are the updates that keep the Windows Defender up to date with the latest definitions. They occur independently of the regular windows updates. It shows up in Event Viewer something like this:

image

Installation Started: Windows has started installing the following update: Security Intelligence Update for Microsoft Defender Antivirus - KB2267602 (Version 1.431.185.0) - Current Channel (Broad)

Whenever this Event ID 43 starts, it seems to restart or block some core services which doesn’t allow WinRM to open a cmd shell at this time. This happens only for a fraction of a second which is why this is an intermittent issue. Also not every update of this causes issues which makes it even more difficult to replicate correctly.

Whichever powershell provisioner is running at this moment gets affected. The Shell isnt created for that provisioner to execute its commands. Although an immediate retry of the same request should be successful.

This update is difficult to pause or halt temporarily hence this workaround cannot be suggested to users.

WinRM libraries used in Packer

  1. https://github.com/masterzen/winrm - creates a cmd shell in windows where we can execute commands remotely
  2. https://github.com/packer-community/winrmcp - uses winrm to upload scripts/files to windows

Types of Issues occurring due to this

Various types of errors can occur due to the high uncertainty of timing of the shell creation failure.

  1. Command that runs the script fails

    powershell -executionpolicy bypass "& { if (Test-Path variable:global:ProgressPreference){set-variable -name variable:global:ProgressPreference -value 'SilentlyContinue'};. c:/Windows/Temp/packer-ps-env-vars-686436fb-6d60-e170-36a9-87a8054b81e9.ps1; &'c:/Windows/Temp/script-686436fb-bf3b-018b-0cc3-d83628baa833.ps1'; exit $LastExitCode }

    This command runs the script/inline commands that user has provided in the provisioner. When this fails there’s no error returned by the winrm libraries which makes packer think it ran successfully and proceeds to next provisioner.

    This makes the provisioner look like it executed but it never did on the windows machine.

  2. Script File upload / move fails

    Winrmcp library uploads the script to a temporary location (%TEMP%) first and then moves it to C:\Windows\Temp. Both these operations execute independently and are not atomic. If any one of them fails then it can appear as though the script was not found. This results in errors like: CommandNotFoundException , ObjectNotFound, The term … is not recognized, etc.

image

  1. Content of the script file damaged or missing

    winrmcp library is used to upload the scripts to the windows machine. It does by appending a chunk of characters to the temp file in windows. Each chunk is base64 encoded and appended to a new line in the temp file in remote. Each chunk makes its own request to create shell and execute the command to echo the content.

    If any of the request here fails, the integrity of the script is lost and it may not execute properly. This has caused issues like “Environment variables intermittently not set” because the env file may be missing few variables due to this issue.

  2. Temporary files left behind

    winrmcp fails to remove the temp file it created during the script file upload. This will leave behind a script in %TEMP%.

Sample request/response to Create Shell

  • Request to create a cmd shell

    <env:Envelope
    	xmlns:env="http://www.w3.org/2003/05/soap-envelope"
    	xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing"
    	xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell"
    	xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd"
    	xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd">
    	<env:Header>
    		<a:To>https://34.134.189.249:5986/wsman</a:To>
    		<a:ReplyTo>
    			<a:Address mustUnderstand="true">http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:Address>
    		</a:ReplyTo>
    		<w:MaxEnvelopeSize mustUnderstand="true">153600</w:MaxEnvelopeSize>
    		<w:OperationTimeout>PT15M</w:OperationTimeout>
    		<a:MessageID>uuid:054cee86-9d72-4c57-b9e0-6aa4d9931253</a:MessageID>
    		<w:Locale mustUnderstand="false" xml:lang="en-US"/>
    		<p:DataLocale mustUnderstand="false" xml:lang="en-US"/>
    		<a:Action mustUnderstand="true">http://schemas.xmlsoap.org/ws/2004/09/transfer/Create</a:Action>
    		<w:ResourceURI mustUnderstand="true">http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd</w:ResourceURI>
    		<w:OptionSet>
    			<w:Option Name="WINRS_NOPROFILE">FALSE</w:Option>
    			<w:Option Name="WINRS_CODEPAGE">65001</w:Option>
    		</w:OptionSet>
    	</env:Header>
    	<env:Body>
    		<rsp:Shell>
    			<rsp:InputStreams>stdin</rsp:InputStreams>
    			<rsp:OutputStreams>stdout stderr</rsp:OutputStreams>
    		</rsp:Shell>
    	</env:Body>
    </env:Envelope>
  • Response from create shell request

    <s:Envelope xml:lang="en-US"
    	xmlns:s="http://www.w3.org/2003/05/soap-envelope"
    	xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing"
    	xmlns:x="http://schemas.xmlsoap.org/ws/2004/09/transfer"
    	xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd"
    	xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell"
    	xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd">
    	<s:Header>
    		<a:Action>http://schemas.xmlsoap.org/ws/2004/09/transfer/CreateResponse</a:Action>
    		<a:MessageID>uuid:B880B23F-41FF-41A0-B8FC-C7566630E188</a:MessageID>
    		<a:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:To>
    		<a:RelatesTo>uuid:054cee86-9d72-4c57-b9e0-6aa4d9931253</a:RelatesTo>
    	</s:Header>
    	<s:Body>
    		<x:ResourceCreated>
    			<a:Address>https://34.134.189.249:5986/wsman</a:Address>
    			<a:ReferenceParameters>
    				<w:ResourceURI>http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd</w:ResourceURI>
    				<w:SelectorSet>
    					<w:Selector Name="ShellId">D86A4CED-C2DA-40FF-B6A4-18E1797A48B8</w:Selector>
    				</w:SelectorSet>
    			</a:ReferenceParameters>
    		</x:ResourceCreated>
    		<rsp:Shell
    			xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell">
    			<rsp:ShellId>D86A4CED-C2DA-40FF-B6A4-18E1797A48B8</rsp:ShellId>
    			<rsp:ResourceUri>http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd</rsp:ResourceUri>
    			<rsp:Owner>packer</rsp:Owner>
    			<rsp:ClientIP>34.133.63.210</rsp:ClientIP>
    			<rsp:IdleTimeOut>PT7200.000S</rsp:IdleTimeOut>
    			<rsp:InputStreams>stdin</rsp:InputStreams>
    			<rsp:OutputStreams>stdout stderr</rsp:OutputStreams>
    			<rsp:ShellRunTime>P0DT0H0M0S</rsp:ShellRunTime>
    			<rsp:ShellInactivity>P0DT0H0M0S</rsp:ShellInactivity>
    		</rsp:Shell>
    	</s:Body>
    </s:Envelope>

Error Response

The response from create shell request that fails to provide a shell ID

<s:Envelope xml:lang="en-US"
	xmlns:s="http://www.w3.org/2003/05/soap-envelope"
	xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing"
	xmlns:x="http://schemas.xmlsoap.org/ws/2004/09/transfer"
	xmlns:e="http://schemas.xmlsoap.org/ws/2004/08/eventing"
	xmlns:n="http://schemas.xmlsoap.org/ws/2004/09/enumeration"
	xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd"
	xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd">
	<s:Header>
		<a:Action>http://schemas.dmtf.org/wbem/wsman/1/wsman/fault</a:Action>
		<a:MessageID>uuid:0A888267-33ED-4F08-98E6-DDEBDE2067DE</a:MessageID>
		<a:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:To>
		<a:RelatesTo>uuid:1893210b-91a7-45dc-aa90-b0c42e0dd740</a:RelatesTo>
	</s:Header>
	<s:Body>
		<s:Fault>
			<s:Code>
				<s:Value>s:Receiver</s:Value>
				<s:Subcode>
					<s:Value>w:InternalError</s:Value>
				</s:Subcode>
			</s:Code>
			<s:Reason>
				<s:Text xml:lang="en-US">Illegal operation attempted on a registry key that has been marked for deletion. </s:Text>
			</s:Reason>
			<s:Detail>
				<f:WSManFault
					xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="2147943418" Machine="34.134.189.249">
					<f:Message>
						<f:ProviderFault provider="Shell cmd plugin" path="%!s(MISSING)ystemroot%!\(MISSING)system32\winrscmd.dll">Illegal operation attempted on a registry key that has been marked for deletion. </f:ProviderFault>
					</f:Message>
				</f:WSManFault>
			</s:Detail>
		</s:Fault>
	</s:Body>
</s:Envelope>

Why the libraries aren’t returning error

  • Winrm

    1. The winrm library isn’t parsing the error components in the CreateShell response. This never returns any error and the request is seen as successful. The Shell ID set here is “”.

    Have created an issue in this repo: client.CreateShell() does not detect any errors returned by windows masterzen/winrm#176

    1. There’s a command struct that contains err field. This stores the error when the shell ID is empty. But since this is a private field we cannot access the error.

    There’s a discussion to make this error field accessible: do we need a cmd.Error() function? masterzen/winrm#41

    1. The CreateShell() and Execute() methods don’t return the Command.err (holds any more error occurred during cmd execution) hence it makes it more difficult to detect this issue.
  • Winrmcp

    1. This is also using the CreateShell() and Execute() methods which has the same limitations and these issues will occur. Alternative Run() can be used which returns the errors for now.

What issues are fixed

Since the library is where the major fix will be, we aren’t able to solve all the listed issues as of now.

The issue where the provisioner seems to be skipped, where the command to run the script file is failed is fixed for now. The change was made to use the Run() method (client.go file) in winrm which returns the Command.err . This enables to retry based on this error.

cmd.err was ignored from winrm lib which had errors pertaining to
shell creation or cmd execution.
Refactored the communicator to account for the winrm changes
@anurag5sh anurag5sh requested a review from a team as a code owner July 1, 2025 15:04
@anurag5sh anurag5sh requested a review from Copilot July 1, 2025 15:08
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses handling errors from winrm during remote command execution by refactoring the communicator implementation.

  • Refactor winrm communicator to use c.client.Run and correctly set exit codes and log command outcomes.
  • Adjust packer communicator to setup asynchronous output collection through goroutines before starting the command execution.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
sdk-internals/communicator/winrm/communicator.go Refactored Run() function to replace shell creation and command execution with a blocking run that returns exit code and error.
packer/communicator.go Reordered output collection by launching a goroutine for UI updates before starting the command, and enhanced draining of output channels.
Comments suppressed due to low confidence (2)

sdk-internals/communicator/winrm/communicator.go:85

  • Ensure that c.client.Run properly handles resource cleanup (for example, closing the shell) to prevent any leaks. If the cleanup is not handled internally by the library, consider invoking shell.Close() explicitly.
	exitCode, err := c.client.Run(rc.Command, rc.Stdout, rc.Stderr)

packer/communicator.go:174

  • The reordering where the output processing goroutine starts before executing the command can be critical to avoid losing any output, so please verify that this ordering is aligned with the intended execution flows across the codebase.
	if err := c.Start(ctx, r); err != nil {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants