A powerful Python tool that converts codebases (folder structures with files) into a single text file or Microsoft Word document (.docx), while preserving folder structure and file contents. Perfect for AI/LLM processing, documentation generation, and code analysis.
- Multi-source input: Local directories and GitHub repositories
- Flexible output: Text files (.txt) and Microsoft Word documents (.docx)
- Smart exclusions: Advanced pattern matching for files and directories
- Performance optimized: Efficient traversal of large codebases
- Comprehensive logging: Detailed verbose mode for transparency
- Encoding support: Handles various file encodings gracefully
pip install codebase-to-textcodebase-to-text --input "path_or_github_url" --output "output_path" --output_type "txt"# Exclude specific patterns
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --exclude "*.log,temp/,**/__pycache__/**"
# Multiple exclude arguments
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --exclude "*.pyc" --exclude "build/" --exclude "venv/"
# Exclude hidden files
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --exclude_hidden
# Verbose mode for detailed logging
codebase-to-text --input "./my_project" --output "output.txt" --output_type "txt" --verbosefrom codebase_to_text import CodebaseToText
# Basic usage
converter = CodebaseToText(
input_path="path_or_github_url",
output_path="output_path",
output_type="txt"
)
converter.get_file()
# Advanced usage with exclusions
converter = CodebaseToText(
input_path="./my_project",
output_path="./output.txt",
output_type="txt",
exclude=["*.log", "temp/", "**/__pycache__/**"],
exclude_hidden=True,
verbose=True
)
converter.get_file()
# Get text content without saving to file
text_content = converter.get_text()
print(text_content)The tool supports powerful exclusion patterns to filter out unwanted files and directories:
- Exact filename:
README.md,config.yaml - Wildcard patterns:
*.log,*.tmp,test_* - Directory patterns:
__pycache__/,.git/,node_modules/ - Recursive patterns:
**/__pycache__/**,**/node_modules/** - Path-based patterns:
src/temp/,docs/build/
- CLI Arguments: Use
--excludeflag (can be used multiple times) .excludefile: Place in your project root (see example below)- Default patterns: Common files/folders are excluded automatically
The tool automatically excludes common development files:
.git/,__pycache__/,*.pyc,*.pyonode_modules/,.venv/,venv/,env/*.log,*.tmp,.DS_Store.pytest_cache/,build/,dist/
Create a .exclude file in your project root:
# .exclude file - Patterns for files/folders to exclude
# Version control
.git/
.gitignore
# Python
__pycache__/
*.pyc
venv/
.pytest_cache/
# Node.js
node_modules/
*.log
# IDE files
.vscode/
.idea/
# Project specific
config/secrets.yaml
data/large_files/| Parameter | Description | Example |
|---|---|---|
--input |
Input path (local folder or GitHub URL) | ./my_project or https://github.com/user/repo |
--output |
Output file path | ./output.txt |
--output_type |
Output format (txt or docx) |
txt |
--exclude |
Exclusion patterns (repeatable) | --exclude "*.log" --exclude "temp/" |
--exclude_hidden |
Exclude hidden files/folders | Flag (no value) |
--verbose |
Enable detailed logging | Flag (no value) |
# Basic conversion
codebase-to-text --input "~/projects/my_app" --output "my_app_code.txt" --output_type "txt"
# With custom exclusions
codebase-to-text --input "~/projects/my_app" --output "my_app_code.txt" --output_type "txt" --exclude "*.log,build/,dist/" --verbose# Public repository
codebase-to-text --input "https://github.com/username/repo" --output "repo_analysis.docx" --output_type "docx"
# With exclusions for cleaner output
codebase-to-text --input "https://github.com/username/repo" --output "repo_clean.txt" --output_type "txt" --exclude "*.md,docs/,examples/"# Analyze a codebase programmatically
from codebase_to_text import CodebaseToText
def analyze_codebase(project_path):
converter = CodebaseToText(
input_path=project_path,
output_path="analysis.txt",
output_type="txt",
exclude=["*.log", "test/", "**/__pycache__/**"],
verbose=True
)
# Get the content
content = converter.get_text()
# Process with your preferred LLM/AI tool
# analysis_result = your_ai_tool.analyze(content)
return content
# Usage
code_content = analyze_codebase("./my_project")- AI/LLM Training: Prepare codebases for language model training
- Code Review: Generate comprehensive code overviews for review
- Documentation: Create single-file documentation from projects
- Analysis: Feed entire codebases to AI tools for analysis
- Migration: Document legacy codebases before migration
- Learning: Study open-source projects more effectively
The generated output includes:
- Folder Structure: Tree-like representation of the directory structure
- File Contents: Full content of each file with metadata
- Clear Separators: Distinct sections for easy navigation
License This project is licensed under the MIT License - see the LICENSE file for details.