Skip to content

improved file indexing logic for faster results#30

Merged
SUSTAPLE117 merged 4 commits intomainfrom
maint/improvements
Mar 19, 2026
Merged

improved file indexing logic for faster results#30
SUSTAPLE117 merged 4 commits intomainfrom
maint/improvements

Conversation

@SUSTAPLE117
Copy link
Copy Markdown
Contributor

@SUSTAPLE117 SUSTAPLE117 commented Mar 16, 2026

This pull request refactors the file indexing logic in pkg/fileindex/fileindex.go to improve concurrency, simplify the code, and enable inline file matching during directory traversal. The worker-based pattern is replaced with a goroutine-per-subdirectory traversal (GDU-style), resulting in more efficient and scalable directory scanning. Several related tests have also been updated and expanded.

Concurrency and traversal refactor:

  • Replaced the worker pool and file channel approach with a goroutine-per-subdirectory traversal using a semaphore to limit concurrency, enabling inline file matching and improving scalability. The runWorker and fileEntry structures were removed, and file matching is now performed directly within the traversal logic. ([[1]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL101-L106), [[2]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL163-R166), [[3]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL205-R234), [[4]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL301-R326))
  • The walkDirectory function was rewritten to traverse directories in parallel, match files inline, and handle cancellation responsively. Subdirectories are now processed by spawning goroutines gated by a semaphore, avoiding deadlocks and maximizing CPU utilization. ([[1]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL205-R234), [[2]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL272-R256), [[3]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL301-R326))

Simplification and cleanup:

  • Removed the dependency on errgroup and simplified error handling and progress reporting, making cancellation and completion logic more straightforward. ([[1]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL18), [[2]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL127-L138), [[3]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL163-R166))
  • Updated function signatures and documentation to reflect the new traversal and matching strategy; BuildIndex and runDiscovery now perform inline matching. ([[1]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL116-R109), [[2]](https://github.com/boostsecurityio/bagel/pull/30/files#diff-e6bfd96a585bbbdffc4b4e1d19ead84777166e2e2a29d4b888cb35293dcdb59eL205-R234))

Testing improvements:

  • Added TestBuildIndex_ParallelTraversal to verify that the new goroutine-per-subdirectory traversal correctly finds all files in a wide directory tree, and updated cancellation tests to ensure proper handling of context cancellation. ([pkg/fileindex/fileindex_test.goL565-R615](https://github.com/boostsecurityio/bagel/pull/30/files#diff-5c2276b5dbb6e0f6c0161f7738778a610d4c46c60467efa6c97f30ccbdde79b1L565-R615))

These changes result in a more efficient, scalable, and maintainable file indexing implementation.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the file indexing implementation to traverse directories in parallel (with inline pattern matching) to improve indexing speed, and adds a test that exercises parallel traversal behavior.

Changes:

  • Reworked BuildIndex to remove the worker/channel pipeline and perform inline matching during traversal.
  • Added parallel subdirectory traversal using goroutines gated by a semaphore.
  • Added a new test to validate results on a wide directory tree.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/fileindex/fileindex.go Replaces worker-based processing with inline matching and parallel directory walkers.
pkg/fileindex/fileindex_test.go Adds a test that builds a wide directory tree and asserts all expected .env files are indexed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the file indexing implementation to improve indexing speed by parallelizing directory traversal and doing inline pattern matching during the walk, and adds a test to exercise the new traversal strategy.

Changes:

  • Replace the worker-pool + channel pipeline with goroutine-per-subdirectory traversal gated by a semaphore.
  • Perform pattern matching inline during traversal rather than dispatching discovered files to workers.
  • Add a new test that builds a wide directory tree and asserts the index finds all expected .env files.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/fileindex/fileindex.go Refactors BuildIndex to parallelize traversal with a semaphore and match files inline (removes errgroup/worker pipeline).
pkg/fileindex/fileindex_test.go Adds a parallel traversal test that builds a wide directory structure and validates expected matches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the file indexing implementation to perform inline matching during directory traversal and introduces a new test aimed at exercising parallel traversal behavior.

Changes:

  • Replaced the worker/channel + errgroup indexing pipeline with inline matching during recursive traversal.
  • Added goroutine-per-subdirectory traversal gated by a semaphore, plus a new test for parallel traversal.
  • Updated the context-cancellation test expectations to align with the new return behavior (index returned even on cancellation).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/fileindex/fileindex.go Switches indexing to inline matching and introduces semaphore-gated parallel directory walking; updates cancellation/error behavior.
pkg/fileindex/fileindex_test.go Adjusts cancellation test assertions and adds a new parallel traversal test case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the fileindex package to speed up indexing by removing the worker/channel pipeline and instead performing pattern matching inline while traversing directories in parallel (goroutine-per-subdirectory with a semaphore). It also expands test coverage to exercise the new traversal strategy.

Changes:

  • Replace errgroup + worker pool indexing with inline matching during directory traversal.
  • Add semaphore-gated parallel subdirectory traversal (including for symlinked directories when enabled).
  • Update/add tests for cancellation behavior and parallel traversal coverage.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pkg/fileindex/fileindex.go Reworks indexing to inline matching and introduces semaphore-gated parallel directory walking.
pkg/fileindex/fileindex_test.go Adds a parallel traversal test and adjusts cancellation assertions to align with new behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@SUSTAPLE117 SUSTAPLE117 merged commit 8a41c9e into main Mar 19, 2026
10 checks passed
@SUSTAPLE117 SUSTAPLE117 deleted the maint/improvements branch March 19, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants