Skip to content

Conversation

@2bndy5
Copy link
Contributor

@2bndy5 2bndy5 commented Oct 3, 2025

I found that the glob pattern for some extensions hit multiple files more than once.

pathlib.Path(".").rglob("*.c")
# matches demo.c and demo.cpp

Worse, the duplicates were being analyzed more than once. So, this offer a performance improvement as well.

Summary by CodeRabbit

  • New Features
    • Guaranteed deduplication of source files during analysis to avoid repeated processing and messages.
  • Performance
    • Minor efficiency improvement by eliminating duplicates before processing.
  • Behavior Change
    • File processing/output order may vary between runs due to deduplication changes; content remains the same.

I found that the glob pattern for some extensions hit multiple files more than once.

```py
pathlib.Path(".").rglob("*.c")
# matches demo.c and demo.cpp
```

Worse, the duplicates were being analyzed more than once.
So, this offer a performance improvement as well.
@2bndy5 2bndy5 added the bug Something isn't working label Oct 3, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 3, 2025

Walkthrough

list_source_files now accumulates matching source file paths in a set instead of a list, guaranteeing uniqueness and not preserving order. The function constructs and returns FileObj instances from this set of unique paths. No exported/public declarations were altered.

Changes

Cohort / File(s) Summary of changes
File filtering logic
cpp_linter/common_fs/file_filter.py
Replaced list accumulation with a set in list_source_files to enforce uniqueness; adjusted addition of matches accordingly; final FileObj list now built from the set, removing order guarantees.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit's high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely describes the primary fix of preventing duplicate file matches when the `--files-changed-only=false` flag is used, directly reflecting the changes made in the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch no-duplicate-found-files

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b4a252 and 78f87b8.

📒 Files selected for processing (1)
  • cpp_linter/common_fs/file_filter.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cpp_linter/common_fs/file_filter.py (1)
cpp_linter/common_fs/__init__.py (1)
  • FileObj (17-240)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: test (ubuntu-latest, 10)
  • GitHub Check: test (ubuntu-latest, 11)
  • GitHub Check: test (ubuntu-latest, 20)
  • GitHub Check: test (ubuntu-latest, 15)
  • GitHub Check: test (ubuntu-latest, 19)
  • GitHub Check: test (ubuntu-latest, 16)
  • GitHub Check: test (ubuntu-latest, 9)
  • GitHub Check: test (ubuntu-latest, 21)
  • GitHub Check: test (ubuntu-latest, 14)
  • GitHub Check: test (ubuntu-latest, 12)
  • GitHub Check: test (ubuntu-latest, 17)
  • GitHub Check: test (ubuntu-latest, 18)
  • GitHub Check: test (ubuntu-latest, 13)
  • GitHub Check: test (windows-latest, 14)
  • GitHub Check: test (windows-latest, 15)
  • GitHub Check: test (windows-latest, 21)
  • GitHub Check: test (windows-latest, 12)
  • GitHub Check: test (windows-latest, 18)
  • GitHub Check: test (windows-latest, 17)
  • GitHub Check: test (windows-latest, 20)
🔇 Additional comments (1)
cpp_linter/common_fs/file_filter.py (1)

169-180: LGTM! Effective deduplication fix.

The change from list to set correctly prevents duplicate file matches when glob patterns overlap (e.g., *.c matching both .c and .cpp files). This eliminates redundant analysis and improves performance with O(1) membership checks instead of O(n).

Note: Sets do not guarantee a specific iteration order (though Python 3.7+ preserves insertion order as an implementation detail). If file processing order matters, consider using dict.fromkeys() to preserve insertion order explicitly while maintaining uniqueness.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.26%. Comparing base (4d51599) to head (78f87b8).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #164   +/-   ##
=======================================
  Coverage   98.26%   98.26%           
=======================================
  Files          23       23           
  Lines        1899     1899           
=======================================
  Hits         1866     1866           
  Misses         33       33           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@2bndy5 2bndy5 merged commit 4d2df05 into main Oct 3, 2025
43 checks passed
@2bndy5 2bndy5 deleted the no-duplicate-found-files branch October 3, 2025 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants