Skip to content

Conversation

@iSazonov
Copy link
Collaborator

@iSazonov iSazonov commented Aug 21, 2019

PR Summary

At startup we create initial runspace and set default location to current file system location (path).
For that we use generic SetLocation() which designed to work with all providers and do many extra checks/operations.
In the startup scenario we know that the path is valid file system path so we can use more simple and more fast set location.

On my system I get win ~10ms or ~1.62% and ~8Mb or ~3%.
The win comes from removing some extra file operations:
image

PR Context

Related #6443

PR Checklist

@iSazonov iSazonov added the CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log label Aug 21, 2019
@iSazonov iSazonov added this to the 7.0.0-preview.4 milestone Aug 21, 2019
@bergmeister
Copy link
Contributor

bergmeister commented Aug 21, 2019

In the startup scenario we know that the path is valid file system path so we can use more simple and more fast set location.

What about the Azure Cloud Shell, which starts by default in the Azure drive, which is a SHiPS provider for a PSDrive?

Requesting a Cloud Shell.Succeeded.
Connecting terminal...
Welcome to Azure Cloud Shell

Type "az" to use Azure CLI
Type "help" to learn about Cloud Shell

get-locationMOTD: Discover installed Azure modules: Get-Module Az* -ListAvailableVERBOSE: Authenticating to Azure ...VERBOSE: Building your Azure drive ...
Azure:/
PS Azure:\> Get-Location

Path
----
Azure:/

PS Azure:\> Get-PSDrive

Name           Used (GB)     Free (GB) Provider      Root                                                                                                                                                                    CurrentLocation
----           ---------     --------- --------      ----                                                                                                                                                                    ---------------
/                  19.71         28.70 FileSystem    /                                                                                                                                                                        home/christoph
Alias                                  Alias
Azure                                  SHiPS         AzurePSDrive#Azure
Env                                    Environment
Function                               Function
Variable                               Variable

@iSazonov
Copy link
Collaborator Author

What about the Azure Cloud Shell, which starts by default in the Azure drive?

I guess it is loaded after initial startup and also it is remoting scenario.

@SteveL-MSFT
Copy link
Member

@bergmeister I'm pretty sure CloudShell uses the PowerShell profile to create the Azure drive, logon, and Set-Location so all that is after pwsh startup anyways.

@iSazonov
Copy link
Collaborator Author

Perhaps this code can still be improved, but it also works for Linux where I can’t debug :-(

@daxian-dbw
Copy link
Member

@TravisEz13 and @adityapatwardhan Given you two have experiences with FileSystemProvider related code, could you please review this PR when you have time? Thanks!

{
// See if the path is a relative or absolute
// path.
if (Globber.IsAbsolutePath(path, out string driveName))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this could be optimized too but I don't know how this works on Unix and I afraid to make breaking change.

Additional thought is that the branch works only on Unix and at startup we possibly already initialized current drive and no need to repeat the step.

Copy link
Member

@TravisEz13 TravisEz13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daxian-dbw I only have experience with the location globber.

PathInfo current = null;
if (PublicSessionState.InvokeCommand.LocationChangedAction != null)
{
// It is under a condition to avoid the unneeded allocation - CurrentLocation always allocates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your comment. The two parts of the sentence seem to conflict. How about this:

Suggested change
// It is under a condition to avoid the unneeded allocation - CurrentLocation always allocates.
// Allocate a new location using the CurrentLocation property

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want protect the condition from removing because CurrentLocation always allocates and we should avoid the allocation if we don't use CurrentLocation in code below. Your proposal doesn't reflect this.
What about "CurrentLocation always allocates and we should avoid the allocation if we don't use CurrentLocation later."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand that much better. Please use that.

@ghost ghost added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept labels Sep 3, 2019
@daxian-dbw daxian-dbw added the Review - Committee The PR/Issue needs a review from the PowerShell Committee label Sep 4, 2019
Copy link
Member

@daxian-dbw daxian-dbw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally concerned about the risk of changing the provider code for perf purpose, especially when the gain is insignificant, as the code can be exercised in so many different ways, it's hard for us to enumerate all the scenarios that might be affected.
I think it's worth for the committee to take this PR as case for discussion, to think about what the guidelines/balance should be for performance changes and sensitive code.

}
}

string currentPath = path.Substring(CurrentDrive.Root.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see there would be problem to this line. Say CurrentDrive is the Temp drive, and the path passed is Temp:\blah, then this Substring call will throw exception.

PS:121> $temp.Root
C:\Users\AUser\AppData\Local\Temp\

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If current drive is Temp:/ we always call IsAbsolutePath() and always set current drive based on the path. I checked this for shorter code path that is ResetRunspaceState(). Other code paths call previously Bind(), there is a call SetSessionStateDrive(), then again SetSessionStateDrive() and we fall in IsAbsolutePath() too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the following repro:

cd temp:\
mkdir test
$t =$ExecutionContext.SessionState.GetType()
$p = $t.GetProperties("nonpublic, instance")
$p = $p[0]
$v = $p.GetValue($ExecutionContext.SessionState)
$t = $v.GetType()

## Get the existing 'SetLocation(string)' method
$m1 = $t.GetMethod("SetLocation", [System.Reflection.BindingFlags]"nonpublic, instance", $null, [type[]]@([string]), $null)
## Success. Location is set to 'Temp:\test'
$m1.Invoke($v, @("Temp:\test"))

## Get the new 'SetLocationFileSystemFast' method and invoke it.
$m2 = $t.GetMethod("SetLocationFileSystemFast", [System.Reflection.BindingFlags]"nonpublic, instance")
## Fail with 'ArgumentOutOfRangeException'
$m2.Invoke($v, @("Temp:\test"))
> Exception calling "Invoke" with "2" argument(s): "startIndex cannot be larger than length of string. (Parameter 'startIndex')"

I know you have examined the current existing call sites of this method and know they are safe, but the fast-code-path method is internal, and it could be used by someone in other places in future, without knowing the prerequisites or assumptions for this method to work. This is not the right pattern to follow, it will results code hard to maintain.

@joeyaiello
Copy link
Contributor

@PowerShell/powershell-committee reviewed this one, and unfortunately we think we're going to have to reject it.

  • We're concerned about the risk of unknown behavioral changes. We just don't know enough about what this does to make a statement that this is an okay change to make. For example, as an internal API, other parts of the code could be depending on the non-fast path.
  • The performance impact is really too negligible for it to be worth the risk: 1.6% startup time is statistically insignificant, and any RAM savings is going to be gone as soon as PowerShell does anything with the filesystem.

@joeyaiello joeyaiello closed this Sep 4, 2019
@joeyaiello joeyaiello removed the Review - Committee The PR/Issue needs a review from the PowerShell Committee label Sep 4, 2019
@joeyaiello joeyaiello added the Committee-Reviewed PS-Committee has reviewed this and made a decision label Sep 4, 2019
@iSazonov
Copy link
Collaborator Author

iSazonov commented Sep 5, 2019

For example, as an internal API, other parts of the code could be depending on the non-fast path.

It is not true. We always call the code with Directory.GetCurrentDirectory() or System.IO.Path.GetDirectoryName(PsUtils.GetMainModule(currentProcess).FileName) that is always a file system file in predictable form. No other entry points exist. So there is no unpredictable risk - we can review all code paths.

The performance impact is really too negligible for it to be worth the risk

My intention was and remains to go through all startup code and remove delays where possible. I would not be surprised if in the end we would get a 20-30 percent increase in performance. (Is startup scenario important?)
After the PR we could optimize the code even more.
We call SetSessionStateDrive() 4 times at startup. We could call only 2 times.
In SetSessionStateDrive() we set current drive and then set location that do again set current drive. We could look if we can avoid the duplication.

Is this enough to continue the work?

@daxian-dbw
Copy link
Member

No other entry points exist. So there is no unpredictable risk - we can review all code paths.

Again, no other entry points exists today doesn't mean it won't in future, because the new API is internal. This is not a good pattern to follow.

We call SetSessionStateDrive() 4 times at startup. We could call only 2 times.

This was already done by #10416, wasn't it? And that PR has been accepted.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Sep 6, 2019

I rebased to get #10416 in the branch only now. :-) I see we set a location only once at startup.
I believe that we will not be able to speed up the startup scenario by doing something in one place. I think we will have to review all startup code and make small improvements to many subsystems. The set location code is a place where we can still get perf win in some ms. This will seed up creation of runspaces in all scenarios too.
I added new commit where made the method private and move to SessionStateInternal class with SetLocationStateDrive() method. I hope this addresses your concern about internal API.

There is still a comment about AppContainer and I don't still understand how address this after removing globbing code. It seems we need to check that Directory.GetCurrentDirectory() exists?

Thank you for great comments and help!

@daxian-dbw
Copy link
Member

SessionStateInternal class with SetLocationStateDrive() method. I hope this addresses your concern about internal API.

My concerns are not just it being internal, but in general that it's risky to change the provider code. We cannot enumerate all possible scenarios that exercise the code path. If I can quickly find one potential problem just by looking at the code (which I'm not familiar with), then that's a signal that there may be more issues with the changes that are unknown yet.

My personal principles on performance work are:

  1. the work should be well-guided by profiling;
  2. it's either low hanging fruits that can be easily reviewed with low risks, or more fundamental algorithm level changes that provide significant perf gain;
  3. the most important one -- no behavioral changes should be introduced for a perf changes, unless it's an approved one.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Sep 9, 2019

@daxian-dbw Thanks! Sorry if I noisy.

In the PR we do not do behavioral changes in the PR.
SetSessionStateDrive() method code and comments say that we use always and exclusively FileSystem provider to set init drive and always set location to Directory.GetCurrentDirectory() if requested.
It seems the method name is misleading. Perhaps InitSessionStateDriveToFileSystemDrive() would be better.
Moreover, this code is enclosed in try-catch and all exceptions is ignored. Even if something happens in the original code, we probably don’t even see it too. We see only the consequences that is another current location than Directory.GetCurrentDirectory() but we have also another path for AppContainer which implies that no script should depend on this.

There are only two scenarios where we set a folder location (that only the PR care).
First, runspace creation scenario, we will see wrong prompt in console. Currently it works well.
Then, we will have init file system location not in Directory.GetCurrentDirectory(). I set current drive to Temp:/ and create new runspace - this works well too. We could add the test.
Second, there is a runspace reset scenario. I can not found how could we test this but at first look it is the same as runspace creation scenario - in the sense that if creation works, then the reset works too.

Last notice is that single scenario where set location is important is startup one because we want get right prompt. Remaining is not important because we use OS current path for initialization and it is not predictable and reliable source in multi runspace environment which means that nobody should lean on the value and set needed location explicitly in current runspace.

Now мy only concern is with AppContainer scenario, which is mentioned in the comments. I do not understand how this can happen and how debug it. It looks like Directory.GetCurrentDirectory() returns nonexistent value in AppContainer.
Update: I run the PR build in Windows container and pwsh is started with right prompt. Not tested for Linux container. Perhaps there was an issue with .Net Core 1.0?

@iSazonov
Copy link
Collaborator Author

iSazonov commented Sep 17, 2019

@joeyaiello @daxian-dbw
I added some commits to make the code more clean and understandable and measure a runspace creation scenario with follow script:

measure-Command {
for( $i=0; $i -lt 100; $i++) {
    [runspace] $rs1 = [runspacefactory]::CreateRunspace()
    [runspace] $rs2 = [runspacefactory]::CreateRunspace()
    [runspace] $rs3 = [runspacefactory]::CreateRunspace()
    [runspace] $rs4 = [runspacefactory]::CreateRunspace()
    [runspace] $rs5 = [runspacefactory]::CreateRunspace()
    [runspace] $rs6 = [runspacefactory]::CreateRunspace()
    [runspace] $rs7 = [runspacefactory]::CreateRunspace()
    [runspace] $rs8 = [runspacefactory]::CreateRunspace()
    [runspace] $rs9 = [runspacefactory]::CreateRunspace()
    [runspace] $rs10 = [runspacefactory]::CreateRunspace()

    $rs1.Open()
    $rs2.Open()
    $rs3.Open()
    $rs4.Open()
    $rs5.Open()
    $rs6.Open()
    $rs7.Open()
    $rs8.Open()
    $rs9.Open()
    $rs10.Open()

    $rs1.Dispose()
    $rs2.Dispose()
    $rs3.Dispose()
    $rs4.Dispose()
    $rs5.Dispose()
    $rs6.Dispose()
    $rs7.Dispose()
    $rs8.Dispose()
    $rs9.Dispose()
    $rs10.Dispose()
}

}

Before the change:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 18
Milliseconds      : 510
Ticks             : 185101449
TotalDays         : 0.000214237788194444
TotalHours        : 0.00514170691666667
TotalMinutes      : 0.308502415
TotalSeconds      : 18.5101449
TotalMilliseconds : 18510.1449

After the change:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 10
Milliseconds      : 654
Ticks             : 106545000
TotalDays         : 0.000123315972222222
TotalHours        : 0.00295958333333333
TotalMinutes      : 0.177575
TotalSeconds      : 10.6545
TotalMilliseconds : 10654.5

That is 41.5% for the samples (~8 ms per 1 runspace).
Perhaps this is not very much for startup scenario (~2%) but clearly visible for scenarios that actively use runspaces.

@daxian-dbw
Copy link
Member

Interesting. I will take a close look this week. Currently working on driving the preview.4 release.

@iSazonov
Copy link
Collaborator Author

PerfView screenshot
image

Bind takes 26.8% (that is nearly 100% in AutomationEngine constructor) including UpdatesType() - 13.4% and SetSessionDrive() - 12.0. The both is good candidates for perf investigations.

@daxian-dbw
Copy link
Member

I added some commits to make the code more clean and understandable ...

@iSazonov I don't see new commits here. Are there new commits in your branch that are not shown here?

@iSazonov
Copy link
Collaborator Author

iSazonov commented Sep 26, 2019

@daxian-dbw Closed PR doesn't show new commits. :-)
master...iSazonov:perf-startup-setlocationfast
I can pull new PR if you want.

@iSazonov
Copy link
Collaborator Author

@daxian-dbw @SteveL-MSFT I'd want to bring back the change. I still believe it is useful for startup and related scenarios.

@daxian-dbw
Copy link
Member

@PoshChan Please remind me in 24 hours

@PoshChan
Copy link
Collaborator

@daxian-dbw, this is the reminder you requested 24 hours ago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-Performance Indicates that a PR should be marked as a performance improvement in the Change Log Committee-Reviewed PS-Committee has reviewed this and made a decision

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants