Skip to content

Add support for multiple document paths with glob patterns#2

Merged
tomohiro-owada merged 1 commit intotomohiro-owada:mainfrom
badri:claude/add-multiple-docs-011CUpsanwN1BK5KtdbNwCE1
Dec 15, 2025
Merged

Add support for multiple document paths with glob patterns#2
tomohiro-owada merged 1 commit intotomohiro-owada:mainfrom
badri:claude/add-multiple-docs-011CUpsanwN1BK5KtdbNwCE1

Conversation

@badri
Copy link
Contributor

@badri badri commented Nov 5, 2025

User description

This enhancement allows users to specify multiple document directories and glob patterns in config.json, providing more flexibility in organizing and indexing markdown files.

Features:

  • Config: Changed from single 'documents_dir' to 'document_patterns' array
  • Supports both directory paths and glob patterns (e.g., "./docs/**/*.md")
  • Backwards compatible: old 'documents_dir' automatically migrated
  • Sync: Uses pattern matching to find files instead of single directory walk
  • MCP tools: Updated path validation to work with multiple base directories
  • Tests: Added comprehensive tests for pattern expansion and migration
  • Documentation: Updated README with examples and pattern syntax

Example config.json:
{
"document_patterns": [ "./documents", "./notes//*.md", "./projects/backend//*.md" ], ... }

Pattern examples:

  • "./documents" - All .md files in directory
  • "./docs/**/*.md" - Recursive search with **
  • "./projects//docs/.md" - Wildcard patterns
  • "/path/to/external/docs" - Absolute paths

PR Type

Enhancement


Description

  • Replace single documents_dir with document_patterns array for flexible multi-location indexing

  • Support glob patterns including ** for recursive directory matching

  • Implement backwards compatibility with automatic migration from old config format

  • Add comprehensive pattern expansion logic with deduplication across multiple patterns

  • Update path validation in MCP tools to work with multiple base directories

  • Enhance test coverage with pattern expansion and migration scenarios


Diagram Walkthrough

flowchart LR
  A["Config: documents_dir"] -->|"Migrate"| B["Config: document_patterns[]"]
  B -->|"Expand patterns"| C["GetDocumentFiles"]
  C -->|"Match globs & dirs"| D["Markdown files list"]
  D -->|"Deduplicate"| E["Final file set"]
  B -->|"Extract base dirs"| F["GetBaseDirectories"]
  F -->|"Validate paths"| G["MCP tools"]
Loading

File Walkthrough

Relevant files
Enhancement
main.go
Update main to support multiple document directories         

cmd/main.go

  • Updated logging to display DocumentPatterns array instead of single
    DocumentsDir
  • Changed directory creation logic to iterate over multiple base
    directories from GetBaseDirectories()
  • Converted fatal error to warning for individual directory creation
    failures
+7/-5     
config.go
Add glob pattern support with backwards compatibility       

internal/config/config.go

  • Added DocumentPatterns field as array and deprecated DocumentsDir
    field
  • Implemented GetDocumentFiles() to expand all patterns and return
    deduplicated markdown files
  • Added expandPattern() to handle both directory paths and glob patterns
  • Implemented expandDoubleStarPattern() for recursive ** pattern
    matching
  • Added GetBaseDirectories() to extract base directories from patterns
    for path validation
  • Implemented backwards compatibility migration in Load() function
  • Enhanced Validate() to ensure at least one document pattern is
    configured
+241/-9 
sync.go
Use pattern-based file discovery instead of directory walk

internal/indexer/sync.go

  • Replaced filepath.Walk() on single directory with GetDocumentFiles()
    call
  • Simplified file scanning logic by delegating pattern matching to
    config
  • Improved error handling to continue processing despite individual file
    access errors
+12/-18 
tools.go
Update path validation for multiple base directories         

internal/mcp/tools.go

  • Updated validatePath() to accept multiple base directories array
  • Fixed path validation logic to check if file is within any configured
    base directory
  • Updated handleDeleteDocument() and handleReindexDocument() to use full
    file paths
  • Updated handleIndexMarkdown(), handleAddFrontmatter(), and
    handleUpdateFrontmatter() to use GetBaseDirectories()
+25/-22 
Tests
config_test.go
Add comprehensive tests for pattern expansion and migration

internal/config/config_test.go

  • Updated existing tests to use DocumentPatterns array instead of
    DocumentsDir
  • Added TestLoadConfig_BackwardsCompatibility() to verify migration from
    old format
  • Added TestGetDocumentFiles() with multiple test cases for pattern
    expansion
  • Added TestGetBaseDirectories() to validate base directory extraction
  • Added helper functions contains() and containsMiddle() for flexible
    string matching
+198/-7 
Documentation
README.md
Document glob pattern support and configuration examples 

README.md

  • Updated configuration examples to use document_patterns array with
    multiple patterns
  • Added documentation for pattern syntax including directory paths and
    glob patterns
  • Added new "Pattern Examples" section with detailed examples
  • Updated configuration options description with pattern capabilities
  • Added note about backwards compatibility with old documents_dir field
  • Updated all code examples throughout English and Japanese sections
+56/-6   

This enhancement allows users to specify multiple document directories
and glob patterns in config.json, providing more flexibility in
organizing and indexing markdown files.

Features:
- Config: Changed from single 'documents_dir' to 'document_patterns' array
- Supports both directory paths and glob patterns (e.g., "./docs/**/*.md")
- Backwards compatible: old 'documents_dir' automatically migrated
- Sync: Uses pattern matching to find files instead of single directory walk
- MCP tools: Updated path validation to work with multiple base directories
- Tests: Added comprehensive tests for pattern expansion and migration
- Documentation: Updated README with examples and pattern syntax

Example config.json:
{
  "document_patterns": [
    "./documents",
    "./notes/**/*.md",
    "./projects/backend/**/*.md"
  ],
  ...
}

Pattern examples:
- "./documents" - All .md files in directory
- "./docs/**/*.md" - Recursive search with **
- "./projects/*/docs/*.md" - Wildcard patterns
- "/path/to/external/docs" - Absolute paths
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Action logging: New filesystem scanning and pattern expansion usage logs counts but does not clearly log
critical actions like individual file indexing decisions or outcomes, which may limit
reconstructing events.

Referred Code
matchedFiles, err := idx.config.GetDocumentFiles()
if err != nil {
	return nil, fmt.Errorf("failed to get document files: %w", err)
}

// Get modification times for all matched files
for _, path := range matchedFiles {
	info, err := os.Stat(path)
	if err != nil {
		fmt.Fprintf(os.Stderr, "[WARN] Error accessing %s: %v\n", path, err)
		continue
	}

	// Store file path and modification time
	fsFiles[path] = info.ModTime()
}

fmt.Fprintf(os.Stderr, "[INFO] Found %d markdown files in filesystem\n", len(fsFiles))
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Error handling: Pattern expansion and directory walking warn and continue on errors (e.g., expandPattern
Walk returns nil on err), which may hide failures without aggregation or actionable
context.

Referred Code
err := filepath.Walk(pattern, func(path string, info os.FileInfo, err error) error {
	if err != nil {
		return nil // Continue despite errors
	}
	if !info.IsDir() && filepath.Ext(path) == ".md" {
		files = append(files, path)
	}
	return nil
})
if err != nil {
	return nil, err
}
Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Log verbosity: Logging of full document patterns and base directories may reveal filesystem structure
paths which could be sensitive in some environments.

Referred Code
fmt.Fprintf(os.Stderr, "[INFO] Document patterns: %v\n", cfg.DocumentPatterns)
fmt.Fprintf(os.Stderr, "[INFO] Database path: %s\n", cfg.DBPath)
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Path validation: The updated validatePath allows any path whose relative path does not start with
'.' which could misclassify paths like '..' edge-cases; needs strict
prefix check to avoid traversal across different roots.

Referred Code
func validatePath(filePath string, baseDirs []string) error {
	absPath, err := filepath.Abs(filePath)
	if err != nil {
		return err
	}

	// Check if path is within any of the base directories
	for _, baseDir := range baseDirs {
		absBase, err := filepath.Abs(baseDir)
		if err != nil {
			continue
		}

		relPath, err := filepath.Rel(absBase, absPath)
		if err != nil {
			continue
		}

		// Check if path escapes base directory
		if len(relPath) > 0 && relPath[0] != '.' {
			// Path is within this base directory


 ... (clipped 6 lines)
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Simplify globbing by using a library

Replace the custom glob pattern implementation, particularly the logic for ,
with a robust third-party library like github.com/bmatcuk/doublestar. This will
simplify the code in internal/config/config.go and reduce maintenance.**

Examples:

internal/config/config.go [159-289]
func (c *Config) expandPattern(pattern string) ([]string, error) {
	var files []string

	// Check if pattern looks like a directory (no wildcards and no .md extension)
	if !strings.Contains(pattern, "*") && !strings.Contains(pattern, "?") {
		// Treat as directory - walk it for all .md files
		err := filepath.Walk(pattern, func(path string, info os.FileInfo, err error) error {
			if err != nil {
				return nil // Continue despite errors
			}

 ... (clipped 121 lines)

Solution Walkthrough:

Before:

// internal/config/config.go
func (c *Config) expandPattern(pattern string) ([]string, error) {
	// Check if pattern is a directory vs glob
	if !strings.Contains(pattern, "*") && !strings.Contains(pattern, "?") {
		// Manually walk directory for .md files
		err := filepath.Walk(pattern, ...)
		return files, err
	}

	// Custom handling for **
	if strings.Contains(pattern, "**") {
		return c.expandDoubleStarPattern(pattern)
	}

	// Simple glob
	matches, err := filepath.Glob(pattern)
	// ... filter for .md files
	return files, nil
}

func (c *Config) expandDoubleStarPattern(pattern string) ([]string, error) {
	// Custom implementation for ** glob
	parts := strings.SplitN(pattern, "**", 2)
	// ... walk base directory and match suffix
}

After:

// internal/config/config.go
import "github.com/bmatcuk/doublestar/v4"

func (c *Config) expandPattern(pattern string) ([]string, error) {
	globPattern := pattern
	info, err := os.Stat(pattern)
	// If pattern is a directory, append glob for all markdown files
	if err == nil && info.IsDir() {
		globPattern = filepath.Join(pattern, "**/*.md")
	}

	// Use library for all globbing, including **
	matches, err := doublestar.Glob(os.DirFS("."), globPattern)
	if err != nil {
		return nil, err
	}

	// Filter results to ensure they are markdown files
	// ...
	return files, nil
}
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies that the custom globbing logic is complex and error-prone, and proposes replacing it with a standard library, which significantly improves code robustness and maintainability.

High
Security
Improve path traversal validation logic

Improve the security of validatePath by checking if the relative path starts
with .. to more reliably prevent path traversal attacks.

internal/mcp/tools.go [363-367]

 // Check if path escapes base directory
-if len(relPath) > 0 && relPath[0] != '.' {
+if !strings.HasPrefix(relPath, "..") {
 	// Path is within this base directory
 	return nil
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion identifies a security flaw in the path traversal check and provides a more robust and secure implementation, which is critical for preventing unauthorized file access.

Medium
General
Use a library for globbing

Replace the custom glob pattern implementation in expandDoubleStarPattern
with a dedicated library like github.com/bmatcuk/doublestar/v4 to simplify the
code and improve correctness.**

internal/config/config.go [235-238]

-// Check if path matches the suffix pattern
-if c.matchesSuffix(path, baseDir, suffix) {
-	files = append(files, path)
-}
+// We can use doublestar to handle the globbing directly.
+// No need to manually walk and match.
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the custom implementation for ** glob patterns is complex and can be replaced by a standard library, which simplifies the code and improves robustness.

Medium
Possible issue
Propagate errors during directory walk

In expandPattern, propagate errors from filepath.Walk instead of ignoring them
to ensure issues like permission errors are not silently skipped.

internal/config/config.go [162-178]

 // Check if pattern looks like a directory (no wildcards and no .md extension)
 if !strings.Contains(pattern, "*") && !strings.Contains(pattern, "?") {
 	// Treat as directory - walk it for all .md files
 	err := filepath.Walk(pattern, func(path string, info os.FileInfo, err error) error {
 		if err != nil {
-			return nil // Continue despite errors
+			return err // Propagate errors
 		}
 		if !info.IsDir() && filepath.Ext(path) == ".md" {
 			files = append(files, path)
 		}
 		return nil
 	})
 	if err != nil {
 		return nil, err
 	}
 	return files, nil
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that errors in filepath.Walk are being suppressed, which could hide issues like permission errors, and proposes propagating them for better error handling.

Low
  • More

@tomohiro-owada tomohiro-owada self-requested a review December 15, 2025 09:11
Copy link
Owner

@tomohiro-owada tomohiro-owada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you so much for this PR! The implementation of multiple document paths with glob patterns is really well done.

I made a small follow-up commit to fix a couple of minor issues:

  1. Removed unused import: path/filepath was imported but not used in internal/indexer/sync.go
  2. Fixed integration tests: TestEndToEnd_Sync and TestEndToEnd_EmptyDirectory were still using the deprecated cfg.DocumentsDir instead of the new
    cfg.DocumentPatterns, which caused them to read from the actual ./documents directory rather than the test's temp directory

With these fixes, all tests are now passing. Great work on the backward compatibility migration and the glob pattern expansion logic!

@tomohiro-owada tomohiro-owada merged commit e414d44 into tomohiro-owada:main Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants