Skip to content

match_files() is not a pure generator function, and it impacts tree_*() gravely #52

@orens

Description

@orens

Hey @cpburnz , thanks for the great lib!
In match_files() (https://github.com/cpburnz/python-path-specification/blob/c00b332b2075548ee0c0673b72d7f2570d12ffe6/pathspec/pathspec.py#L170), the line

file_map = util.normalize_files(files, separators=separators)

(L190) requires files to be completely exhausted before even the first file is matched. If files is a list-like, this is not a problem, but when calling it from the tree_*() methods it means that the whole iterator mechanics is pretty much useless.
It also means that if I have an ignored folder containing a very complex structure, which I want pathspec to ignore, pathspec will search through it although there is no way it will play a role in the results.

As an example, for an automation I'm writing on a real life repository containing a frontend application, the scan of npm generated files took about 10 minutes (before yielding the first result) and then I gave up and stopped it.

I think a possible solution is to remove this dictionary and simply doing:

for file in files:
  if util.match_file(self.patterns, util.normalize_file(file)):
    yield file

(I bypassed util.match_files() here as it, too, is not a generator and will try to convert files to list first)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions