Skip to content

fix: skip oversized AI CLI chat files#34

Merged
SUSTAPLE117 merged 8 commits intomainfrom
fix-ai-cli-file-size-limit
Mar 26, 2026
Merged

fix: skip oversized AI CLI chat files#34
SUSTAPLE117 merged 8 commits intomainfrom
fix-ai-cli-file-size-limit

Conversation

@tveronezi
Copy link
Copy Markdown
Contributor

Summary

The ai_cli probe calls os.ReadFile() on every matched chat log file. This call is synchronous and cannot be interrupted by context cancellation. On machines where users have accumulated large session histories the scan hangs until the caller's deadline is exceeded, making it appear to crash with no results.

Changes

  • Add maxChatFileSize = 1 MB constant
  • Call os.Stat() before os.ReadFile() and skip files above the threshold with a debug log line — AI CLI credential files (the security-relevant targets) are always small; only accumulated conversation logs grow beyond 1 MB
  • Add a context-cancellation check between files so the probe responds promptly to timeouts
  • Refactor the per-tool file loops into a single fileSets [][]string loop

Tests

TestAICliProbe_OversizedFileSkipped — verifies the probe returns no findings for a 1.2 MB chat file.

Two fixes to prevent bagel from hanging when scanning developer machines
with large AI CLI conversation histories.

**1. ai_cli probe: 1MB file-size limit**
- Add `maxChatFileSize = 1MB` constant and check file size via `os.Stat()`
  before `os.ReadFile()` in `processFile()`; files above the threshold are
  logged and skipped, avoiding unbounded memory use and scan hangs caused by
  `os.ReadFile()` being non-cancellable by context
- Refactor separate per-tool file loops into a single `fileSets [][]string`
  loop with an inline context-cancellation check

**2. File index: `exclude_paths` config option**
- Add `ExcludePaths []string` to `models.FileIndexConfig` (`exclude_paths`
  in YAML) so users can exclude high-file-count directories (e.g. repos with
  `node_modules`) from the file index walk entirely
- Expand `~` / `$HOME` / `%USERPROFILE%` in exclude paths (same logic as
  `base_dirs`)
- Add `isExcludedPath()` helper to `fileindex` package; `walkDirectory()`
  returns early when `currentDir` is excluded
- Thread `ExcludePaths` through `cache.LoadInput`, `cache.SaveInput`,
  `cacheKeyInput`, and `cache.Metadata` so cache entries are invalidated
  when the exclude list changes; bump `SchemaVersion` to 4

Tests added:
- `TestAICliProbe_OversizedFileSkipped` — verifies probe skips a 1.2 MB chat file
- `TestBuildIndex_WithExcludePaths` — verifies excluded directories are not indexed
- Fix typo in AICliProbe comment: ALI CLI -> AI CLI
- Return ctx.Err() (not nil) when context is cancelled in file loop so
  the collector can log and surface probe timeouts/cancellations
- Normalize ExcludePaths with filepath.Clean/filepath.FromSlash at
  expansion time so trailing separators and Windows paths match correctly
- Guard isExcludedPath against empty/whitespace entries that would
  otherwise match every absolute path on Unix
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents the ai_cli probe from hanging on large accumulated AI chat history files by skipping files over a fixed size threshold and improving responsiveness to context cancellation.

Changes:

  • Add maxChatFileSize (1MB) and skip oversized AI CLI files before attempting os.ReadFile.
  • Add a context-cancellation check between file scans.
  • Refactor per-tool loops into a unified fileSets iteration and add a test for oversized file skipping.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
pkg/probe/ai_cli.go Adds max-size guard, context cancellation check, and loop refactor to avoid blocking on huge chat logs.
pkg/probe/ai_cli_test.go Adds TestAICliProbe_OversizedFileSkipped to validate oversized chat logs are skipped.
CLA_SIGNATURES.md Adds a new CLA signature entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Return findings, nil on context cancellation (with debug log) so partial
  findings gathered before cancellation are not discarded by the collector
- Fix log message: 'Skipping oversized AI CLI chat file' -> 'Skipping
  oversized AI CLI file' since the size limit applies to all files, not
  only chat logs
- Remove unused fmt import
// maxChatFileSize is the maximum size of an AI chat log file that will be scanned.
// Files larger than this limit are skipped to prevent unbounded memory usage and
// scan hangs when users accumulate large conversation histories.
const maxChatFileSize = 1 * 1024 * 1024 // 1MB
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tveronezi this should be configurable through config

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c7e71f6maxFileSize is now read from config.Flags["max_file_size"] (int, bytes), falling back to the existing 1 MB default. Documented in bagel.yaml and covered by TestAICliProbe_CustomMaxFileSizeFlag.

Replace the hardcoded `maxChatFileSize` constant with a `maxFileSize`
field on AICliProbe read from `config.Flags["max_file_size"]`, falling
back to the existing 1 MB default.

Addresses SUSTAPLE117's review comment on #34.
@tveronezi tveronezi marked this pull request as ready for review March 23, 2026 22:20
@SUSTAPLE117 SUSTAPLE117 merged commit 44d4451 into main Mar 26, 2026
6 checks passed
@SUSTAPLE117 SUSTAPLE117 deleted the fix-ai-cli-file-size-limit branch March 26, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants