-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Select-Object -Unique is much slower than Sort-Object -Unique #7707
Description
Reading a file of ~60,000 lines, picking only unique entries:
$lines = [System.Io.File]::ReadLines('/path/to/file.txt')
$lines | Select-Object -Unique # 6-12 minutes
$lines | Sort-Object -Unique # 2-3 seconds
(Relevant source code for sort-object unique handling and relevant source code for select-object -unique handling, appears to happen on PSv5.1 Windows and PSv6.1-preview 4 Linux).
I see that Select-Object stores a list of items it has seen, and has a nested loop to compare every incoming item against every item in the list, a full object compare instead of just the property being sorted on, and every added property. so it is doing more work, and should be expected to be slower. Even so, it is so much slower - for a case of 'unique strings' which seems like it would be common, but may not be - could it be sped up?
Would it be reasonable to have it store a HashSet of something like obj.ToString() as well, and then for each incoming object, lookup in the HashSet - if it's not there, then the object must be unique and new, and it can be output without further work. If the value is in the hashset, it can do the full comparison. Or would that be too much extra memory use?
Using sort-object is a workaround if you don't mind the order changing, but if you try select-object and think it's slow, sorting seems like it would add extra work on top and take longer - it's not obvious that it might be ~100x faster.