Skip to content

Select-Object -Unique is much slower than Sort-Object -Unique #7707

@HumanEquivalentUnit

Description

@HumanEquivalentUnit

Reading a file of ~60,000 lines, picking only unique entries:

$lines = [System.Io.File]::ReadLines('/path/to/file.txt')

$lines | Select-Object -Unique   # 6-12 minutes

$lines | Sort-Object -Unique    # 2-3 seconds

(Relevant source code for sort-object unique handling and relevant source code for select-object -unique handling, appears to happen on PSv5.1 Windows and PSv6.1-preview 4 Linux).

I see that Select-Object stores a list of items it has seen, and has a nested loop to compare every incoming item against every item in the list, a full object compare instead of just the property being sorted on, and every added property. so it is doing more work, and should be expected to be slower. Even so, it is so much slower - for a case of 'unique strings' which seems like it would be common, but may not be - could it be sped up?

Would it be reasonable to have it store a HashSet of something like obj.ToString() as well, and then for each incoming object, lookup in the HashSet - if it's not there, then the object must be unique and new, and it can be output without further work. If the value is in the hashset, it can do the full comparison. Or would that be too much extra memory use?

Using sort-object is a workaround if you don't mind the order changing, but if you try select-object and think it's slow, sorting seems like it would add extra work on top and take longer - it's not obvious that it might be ~100x faster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    HacktoberfestPotential candidate to participate in HacktoberfestIssue-Enhancementthe issue is more of a feature request than a bugResolution-No ActivityIssue has had no activity for 6 months or moreUp-for-GrabsUp-for-grabs issues are not high priorities, and may be opportunities for external contributorsWG-Cmdlets-Utilitycmdlets in the Microsoft.PowerShell.Utility moduleWG-Engine-Performancecore PowerShell engine, interpreter, and runtime performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions