Skip to content

linter: Optimize string lookups with better data structure #6408

@camchenry

Description

@camchenry

There are many cases in the linter where store Vec<String>, Vec<&str>, or Vec<CompactStr> and then later check if vec.contains(something). At runtime, this takes O(n) to find if the given value is in the array. However, we can do better by replacing Vec with a different data structure.

FxHashSet

If we replace Vec with FxHashSet, we get the benefit of lookups taking O(1) time, at the cost of inserts taking longer due to hashing.

Sorted Vec

If we sort the Vec after inserting values, then we can use binary_search to look up if the element is in the array, which take O(log(n)) time. The sorting process itself is additional overhead of O(n log(n)), but might be negligible if elements are not inserted to the array afterwards. Otherwise, if elements are inserted later, it is probably faster to use a set.


The ideal data structure here probably depends on the usecase. For re-inserting values, a HashSet might be better. For a smaller set of values that don't change, a sorted Vec might be better.

Metadata

Metadata

Assignees

Labels

A-linterArea - LinterC-performanceCategory - Solution not expected to change functional behavior, only performance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions