Skip to content

Added -Top/-Bottom params to Sort-Object for Top/Bottom N sort and many Pester tests#2518

Merged
lzybkr merged 14 commits intoPowerShell:masterfrom
KirkMunro:sort-top-or-bottom
Nov 4, 2016
Merged

Added -Top/-Bottom params to Sort-Object for Top/Bottom N sort and many Pester tests#2518
lzybkr merged 14 commits intoPowerShell:masterfrom
KirkMunro:sort-top-or-bottom

Conversation

@KirkMunro
Copy link
Copy Markdown
Contributor

@KirkMunro KirkMunro commented Oct 21, 2016

This changeset addresses issue #2462, along with all feedback received in the comments for that issue.

The original issue was identified as an enhancement to add -Top support to Sort-Object, so that (a) you would get results back faster when you only wanted the top N, and (b) you would get code elegance by not having to pipe to Select-Object. Several commenters wanted -Bottom N support added for completeness, so an additional parameter set was created with a -Bottom <ìnt> parameter. Over 40 tests were written for Pester to compare the sort results using Sort -Top or Sort -Bottom with sort results when -Top and -Bottom are not used to ensure that we always get back the data we want according with how Sort-Object sorts data without -Top/-Bottom. That means that if Sort-Object is broken (if it returns results in a different order), these tests will pass; however, the intent of these tests is to compare the sort results with those of a basic Sort-Object invocation, and other tests exist to verify the order of the items returned from Sort-Object, so this is all good.

Additionally, performance testing was done and -Top/-Bottom sorts on very large data sets (1M integers) returned results about twice as fast as when using sort | select -First/-Last, as long as -Unique is not used. This performance improvement is partly because of the reduced pipeline and partly because of a faster sort algorithm (an indexed min-/max-heap sort is used to process Top N/Bottom N results more quickly).

Worth noting: Using -Top/-Bottom sorts using -Unique are really, really slow with very large data sets, just as sort | select -First/-Last are slow with very large data sets. This code is old, and while looking at it I felt it would be worth looking at SortedSet for -Unique sorts (SortedSet did not exist when sort-object was first written) to see if it would return unique sorted results more quickly. That investigation was outside of the scope of this PR though, so I'm simply mentioning it here so that I don't forget about the item. A separate issue should probably be created to investigate SortedSet with Sort-Object -Unique to see if it improves performance like I hope it will.

@msftclas
Copy link
Copy Markdown

Hi @KirkMunro, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!

The agreement was validated by Microsoft and real humans are currently evaluating your PR.

TTYL, MSBOT;

@KirkMunro
Copy link
Copy Markdown
Contributor Author

I'm not entirely sure what to do with the failures in AppVeyor/Travis CI. Saw on another PR thread that @vors indicated there were some problems due to changes in .NET Core that he was looking into. These changes build fine on my machine, so pointers if I need to do anything would be helpful. Thanks.

@vors vors closed this Oct 22, 2016
@vors vors reopened this Oct 22, 2016
@msftclas
Copy link
Copy Markdown

Hi @KirkMunro, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!

The agreement was validated by Microsoft and real humans are currently evaluating your PR.

TTYL, MSBOT;

@vors
Copy link
Copy Markdown
Collaborator

vors commented Oct 22, 2016

Reopened PR to re-trigger the execution on the fixed master

}
}

} No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// <summary>
///
/// </summary>
[Cmdlet("Sort", "Object", HelpUri = "https://go.microsoft.com/fwlink/?LinkID=113403", RemotingCapability = RemotingCapability.None)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid unnecessary formatting

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting is not unnecessary. I added DefaultParameterSetName to the Cmdlet attribute. The line becomes far too long and hard to read. Formatting makes the content readable without unnecessary side-scrolling. Unless the PS Team requires this change for some reason, this code stays as is.

}
}

Describe 'Sort-Object Top and Bottom Unit Tests' -Tags 'CI' {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider grouping similar tests below in Context

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I need to go into this PR to address issues with the logic I'll consider adding context to the Pester tests. Otherwise they are grouped as other tests in the file were grouped, and I'm comfortable with that.

Describe 'Sort-Object Top and Bottom Unit Tests' -Tags 'CI' {

It 'Return the top N integers sorted in ascending order' {
$unsortedData = 973474993,271612178,-1258909473,659770354,1829227828,-1709391247,-10835210,-1477737798,1125017828,813732193
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Move duplicate initializers to BeforeAll {} (and throughout all code below too)
  2. Why do you use so complex data? I would prefer 1,2,-3,... a,b,z,... Enough 5 or 6 values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BeforeAll isn't documented (I didn't even know it existed), and I simply followed the model in the file already.

The numeric data used was intentionally generated using Get-Random with the range of all int values so that the test would result in a random outcome that covered a broad base of numeric input. With sorting and -Top/-Bottom, 5 or 6 values to start with is not always a sufficient test, so I'll leave it with 10 values.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many tests on this repository contain BeforeAll. See these examples.

Simple source data easier to understand and easier to maintain. I would say that it is better to have the test which return results you can evaluate your own eyes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our test guidelines, please do use BeforeAll.

for ($i = 0; $i -lt $topNSortResults.Count; $i++) {
$topNSortResults[$i] | Should Be $fullSortResults[$i]
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to move repeated code to functions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had written a function at first, with the tests being simply one or two lines of code; this resulted in a complicated function which would have to be tested by itself. I really didn't like this model, and preferred a short test with a half-dozen lines of code or so, which allows anyone to review a failing test in place and see more easily why it may be failing, plus enable testing outside of Pester by simply copying and pasting the test contents.

@iSazonov
Copy link
Copy Markdown
Collaborator

@KirkMunro Thanks for Sort-Object improvements!

A separate issue should probably be created to investigate SortedSet with Sort-Object -Unique to see if it improves performance like I hope it will.

Feel free to open the Issue now! I believe that performance improving is the right direction.

@iSazonov
Copy link
Copy Markdown
Collaborator

iSazonov commented Oct 22, 2016

@KirkMunro Could you please rebase your branch (and drop unnecessary commits) ? Your PR contains some merge commits.

/// <summary>
/// This is a single entry in the min-/max-heap used for top N/bottom N sorts
/// </summary>
public sealed class IndexedMinMaxHeapEntry
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be public, right?
Also consider making this a struct, it seems like that would be a win performance wise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed locally, will commit with other changes soon. On the struct comment, should this be a struct as well then, for the performance win?

    internal sealed class OrderByPropertyEntry
    {
        internal PSObject inputObject = null;
        internal List<ObjectCommandPropertyValue> orderValues = new List<ObjectCommandPropertyValue>();
    }

}
}

internal OrderByPropertyEntry Entry { get; } = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid initializing properties with their default values. Old code may be doing this, but there is no reason to do so and we'll fix the old code eventually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed locally, will commit soon.


internal OrderByPropertyEntry Entry { get; } = null;
internal int Index { get; } = -1;
internal bool Sortable { get; } = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, avoid initializing properties with their default values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed locally, will commit soon.

else
{
// The heap entry comparer will sort the items in the heap
var comparer = new IndexedMinMaxHeapComparer(orderByProperty.Comparer);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual sort should be in it's own method.

Copy link
Copy Markdown
Contributor Author

@KirkMunro KirkMunro Oct 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved entire body of else block except for WriteObject to new Heapify method, will commit soon.

{
// There may be a faster algorithm to sort while eliminating duplicates and retaining only
// the top N records; for now though, this gets the job done
listToOutput = listToOutput.GetRange(0, Top);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needlessly creates a new list and copies the elements.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with RemoveRange, will commit soon.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveRange might be worse - it copies and clears.

I was thinking of iterating over the range explicitly with integers instead of the enumerator of the list.

}

Describe "Sort-Object DRT Unit Tests" -Tags "CI" {
It "Sort-Object with object array should work"{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our guidelines, avoid formatting changes. I realize test files are inconsistent w.r.t. tabs vs. spaces, but we want to keep meaningful additions in different commits from formatting changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzybkr Understood. So should I convert that file back to use tabs instead of spaces?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do.

Describe 'Sort-Object Top and Bottom Unit Tests' -Tags 'CI' {

It 'Return the top N integers sorted in ascending order' {
$unsortedData = 973474993,271612178,-1258909473,659770354,1829227828,-1709391247,-10835210,-1477737798,1125017828,813732193
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our test guidelines, please do use BeforeAll.


It 'Return the top N integers sorted in ascending order' {
$unsortedData = 973474993,271612178,-1258909473,659770354,1829227828,-1709391247,-10835210,-1477737798,1125017828,813732193
$baseSortParameters = @{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems silly to splat an empty hashtable.


It 'Return the top N integers sorted in descending order' {
$unsortedData = 973474993,271612178,-1258909473,659770354,1829227828,-1709391247,-10835210,-1477737798,1125017828,813732193
$baseSortParameters = @{Descending=$true}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little silly to splat a single parameter here.

}

It 'Return the top N objects from a collection of different types of objects, sorted in descending order' {
$unsortedData = @(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create $unsortedData just once in BeforeAll?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after a closer look, these tests are a good example of why Pester has a parameter -TestCases to It. Please consider changing the tests to use -TestCases, I think it will simplify these test quite a bit.

@KirkMunro
Copy link
Copy Markdown
Contributor Author

@lzybkr: All changes requested on this discussion have been addressed and committed to this PR. I'm not sure what I need to do to respond to your requested changes other than send this note. One very noteworthy change came out of this discussion: since you highlighted that List.Remove had a strong negative impact on performance vs in-place sorting of the items in the list, I was able to get a 5x performance improvement out of -Unique -Top/-Bottom sorts, which is great! On a collection of 250K random values between 1 and 175K, Sort -Unique performs 5x faster than before, it performs slightly faster than that with -Bottom N, and it performs faster than that with -Top N. Non-unique -Top/-Bottom N sorts perform more than 2x faster than a base sort (and even more than that with sort | select -First/-Last). As suggested by @iSazonov, I will blog about this change set, probably in an article for PowerShell Magazine. I'm just waiting for results from appveyor/travis CI and will respond to any issues they have if they come up.

@iSazonov
Copy link
Copy Markdown
Collaborator

@KirkMunro Please rebase your branch.

@KirkMunro
Copy link
Copy Markdown
Contributor Author

@iSazonov I did rebase my branch.

@KirkMunro
Copy link
Copy Markdown
Contributor Author

KirkMunro commented Oct 28, 2016

Something is clearly wrong because this PR now shows 26 files changed, yet when I compare my branch here (master...KirkMunro:sort-top-or-bottom) it properly shows 4 files have changed. sigh I didn't have this issue before I rebased my branch.

@iSazonov
Copy link
Copy Markdown
Collaborator

Rebase is also my headache 😊
From my notes (maybe help you):

# rebase pull request on top of master
git chechout <branch>
git fetch PowerShell master
git pull --rebase PowerShell master
git push -f

@lzybkr
Copy link
Copy Markdown
Contributor

lzybkr commented Oct 28, 2016

I'm glad the improvements were notable. Please do fix the commits in the PR and I'll finish reviewing.

For future reference, when we merge, we decide if we'll squash, rebase, or commit with a merge commit, so it's not normally necessary for most folks to rebase themselves.

That said, it is useful to be comfortable with rebase.

@KirkMunro
Copy link
Copy Markdown
Contributor Author

KirkMunro commented Oct 28, 2016

@lzybkr Commit cleanup finished and checks passed, awaiting your review.

I should add one note: I didn't benchmark using struct vs class because I moved my additional properties into the existing class and changing that to a struct was non-trivial due to other dependencies (Compare-Object, Group-Object) and outside of the scope of this work.

@iSazonov
Copy link
Copy Markdown
Collaborator

@KirkMunro Great work!!!
I looked up the new tests. They are very elegant. But why haven't you moved a initialize codes in BeforeAll?

Copy link
Copy Markdown
Contributor

@lzybkr lzybkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - this is looking pretty good.

I think it's easy to make -Unique much faster with -Top and -Bottom - can you take a look?

Other than that, I just noticed a couple little things that would be nice to clean up.

/// <summary>
/// Sort unsorted OrderByPropertyEntry data using an indexed min-/max-heap sort
/// </summary>
private List<OrderByPropertyEntry> Heapify(List<OrderByPropertyEntry> dataToSort, OrderByPropertyComparer orderByPropertyComparer, out int heapCount)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the heap suggests it's not done in place even though it is.

Maybe return heapCount instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should have made that change when I switched to in-place sorting. It's done locally now.

{
RemoveDuplicates(orderByProperty);
}
// Note: It may be worth compariing List.Sort with SortedSet.Sort when handling unique records in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: compariing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed local, thanks.


// Min-heap: if the child item is larger than its parent, break
// Max-heap: if the child item is smaller than its parent, break
if (comparer.Compare(dataToSort[childIndex], dataToSort[parentIndex]) == comparator)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you handle -Unique here too? If they compare equal, you just skip the item, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's that simple. When I looked at supporting -Unique in heapify, my initial reaction was that it wouldn't work, at least not the way you might think. Since the data in the heap is not sorted, the challenge when checking for uniqueness of each value is which values you compare against. If the next value could go into the heap (as determined by comparing with the root), then I'd have to look at children of the current item (root) to see if I either find the same item or determine whether or not it would be possible to find the same item in a child of the item (in which case I'd have to recurse into that child item and repeat with its children). Worst case would result in N comparisons, where N is the size of the heap. Best case would be 1 comparison. Anyhow, that's roughly what I thought would be required. I could try it and see how well (or not) that performs vs the current method.

param($nSortEntry, $fullSortEntry)
if ($nSortEntry -is [System.Array]) {
# Arrays must be compared differently than everything else
$nSortEntry.Equals($fullSortEntry) | Should Be $true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array.Equals is reference equality - so this can't be what you meant.

Array does implement IStructuralEquatable, so if you can create an EqualityComparer, you could use something like:

([System.Collections.IStructuralEquatable]$nSortEntry).Equals($fullSortEntry,
    [System.Collections.Generic.EqualityComparer[int]]::Default) | Should Be $true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if you really meant to use reference equality, use [object]::ReferenceEquality and add a comment why.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the comment and update to the other reference equality test you suggested. I use reference equality because I'm verifying arrays are sorted into the proper position.


// If -Unique was used, if -Top & -Bottom were not used, or if -Top or -Bottom would return all objects, sort
// all objects, remove duplicates if necessary, and identify the list of items to return
if (_unique || (Top == 0 && Bottom == 0) || Top >= dataToProcess.Count || Bottom >= dataToProcess.Count)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I mistaken in assuming you can easily handle -Unique in heapify?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about -Unique and heapify applies here. I'm testing something with it though to see what impact it might have.


It 'Return the <nSortType> N sorted in <orderType> order' -TestCases $topBottomAscendingDescending {
param([string]$nSortType, [string]$orderType)
$baseSortParameters = if ($orderType -eq 'Descending') {@{Descending = $true}} else {@{}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you're splatting, you can simplify with something like:

... = @{Descending = $orderType -eq 'descending'}

And you decide to change this, you might as well fix the case to be consistent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, if you're doing an ascending sort, no parameter is passed. If I change this then an ascending sort passes -Descending:$false. I suppose for my tests that doesn't matter much since I'm verifying output order and not explicitly testing the switch parameter, so I changed it to simplify the logic (that's what I had before anyway).

Context 'Heterogeneous n-sort' {

$unsortedData = @(
($item0 = 'x')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't using any of the assignments, so can they be removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I removed them. I would use the assignments when testing various things outside of Pester, but it's good now so I took them out.

# Arrays must be compared differently than everything else
$nSortEntry.Equals($fullSortEntry) | Should Be $true
# Arrays are compared using reference equality to ensure that the original array was
# moved to the correct position in both sorts; value equality doesn't verify this
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was about Compare-Object, because many tests using it, but Sort-Object is specific. So maybe for the future explicitly add: 'Compare-Object is not applicable here!'

$topBottom = @(
@{nSortType='Top' }
@{nSortType='Bottom'}
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously I had in mind, that these variables ($topBottom and $topBottomAscendingDescending) are good candidates to move in BeforeAll.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't buy anything though -- those variables don't change.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specificity is that this is not an ordinary script - Pester makes parsing and execution operators. BeforeAll is the result of the fact that it does not guarantee the order of execution. Thus all the common code (including function) within Describe must be inside BeforeAll.
You can see the most correct sample here

Copy link
Copy Markdown
Contributor Author

@KirkMunro KirkMunro Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not guarantee the order of execution

Where are you getting that from? Pester is not documented that way, and based on how PowerShell works I don't believe that's the case. And if it is changed to work that way, then plenty of Pester tests could start breaking.

You've shared one example where BeforeAll is being used to define functions, etc. however there are many other examples where BeforeAll is not being used in the test harness for PowerShell Core yet variables are assigned, functions are loaded, etc. There's nothing in the Pester documentation about it running tests asynchronously, and even if it did, it would still have to load those tests, which would execute the code for those tests. Changing that behaviour in Pester would be a breaking change.

Since they are not documented, and since I don't have time to go read the code to figure out what they do, I can only ascertain that BeforeAll/AfterAll are much like begin/end blocks, where you can do resource management, make sure certain things are loaded before tests and unloaded afterwards. Other than that, I don't believe there is any advantage whatsoever to using BeforeAll over defining the functions/variables inside of a Context block instead.

@KirkMunro
Copy link
Copy Markdown
Contributor Author

KirkMunro commented Nov 2, 2016

I just committed an update that improves -unique -top/-bottom sorts performance substantially. On a collection of 500K integers that were generated randomly with values from 1 to 175K, sort -unique took 18 seconds to complete, and with the previous implementation a -Top/-Bottom N -Unique sort would finish slightly faster than that, in around 17 seconds. With these changes, the -Top/-Bottom N -Unique sort finishes in a little over 6 seconds.

I couldn't make this work with the min-/max-heap sort without either walking the heap to see if an item was already added, since an item could be anywhere in the heap. Walking the heap could result in comparing a new item with every item in the heap in the worst case. Instead I decided to add another collection to track the unique items. I use SortedSet to store anything unique that is added to the heap, and then skip items that are already in that set instead of adding them to the heap. This seems pretty efficient, and definitely improves the performance.

@lzybkr Let me know what you think of this commit when you have time to review the most recent changes. Thanks!

[Update] I committed a follow-up change that removes a few lines that weren't necessary due to the changes I just made, and cleaned up the code in sort-object.cs based on my changes so that it's easier to follow.

@lzybkr lzybkr merged commit 244827a into PowerShell:master Nov 4, 2016
@lzybkr
Copy link
Copy Markdown
Contributor

lzybkr commented Nov 4, 2016

Great work @KirkMunro.

A Set should be fine for typical uses of Top/Bottom w/ -Unique because I'd expect Top/Bottom to be small relative to the total number of objects.

If that assumption is wrong, we could look at insertion sort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants