Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# GitHub Copilot Instructions

## Contribution Guidelines

### Before Committing

1. **Run linters:** `make lint` (must pass without warnings or errors)
2. **Run tests:** `make test` (must pass all tests)
3. **Build successfully:** `make build` (must compile without warnings or errors)

### Code Standards

- Follow Go best practices and idiomatic patterns
- Use Australian English spelling throughout code (unless it's a function or parameter to an upstream library) and documentation
- No marketing terms like "comprehensive" or "production-grade"
- Focus on clear, concise, actionable technical guidance
- Keep responses token-efficient (avoid returning unnecessary data)
verbosity
Comment thread
sammcj marked this conversation as resolved.

## Code Quality Checks

### General Code Quality
- Verify proper module imports and dependencies
- Check for hardcoded credentials or sensitive data
- Ensure proper resource cleanup (defer statements)
- Validate input parameters thoroughly
- Use appropriate data types and structures
- Follow consistent error message formatting

## Configuration & Environment
- Environment variables should have sensible defaults
- Configuration should be documented in README
- Support both development and production modes
- Handle missing optional dependencies gracefully

## General Guidelines

- Do not use marketing terms such as 'comprehensive' or 'production-grade' in documentation or code comments.
- Focus on clear, concise actionable technical guidance.

## Review Checklist for Every PR

Before approving any pull request, verify:

- [ ] Code follows the latest Golang best practices
- [ ] No security issues or vulnerabilities introduced
- [ ] All linting and tests pass successfully
- [ ] Documentation updated if required
- [ ] Australian English spelling used throughout, No American English spelling used (unless it's a function or parameter to an upstream library)
- [ ] Context cancellation handled properly if applicable
- [ ] Resource cleanup with defer statements if applicable

If you are re-reviewing a PR you've reviewed in the past and your previous comments / suggestions have been addressed or are no longer valid please resolve those previous review comments to keep the review history clean and easy to follow.
17 changes: 11 additions & 6 deletions .github/workflows/build-release-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
- 'README.md'

env:
GO_VERSION: '1.24.3' # Updated to match toolchain
GO_VERSION: '1.25.4' # Updated to match toolchain
BINARY_NAME: 'ingest'

permissions:
Expand Down Expand Up @@ -45,14 +45,14 @@ jobs:

steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
uses: actions/checkout@v6
Comment thread
sammcj marked this conversation as resolved.
with:
fetch-depth: 0

- name: Set up Go and cache dependencies
uses: actions/setup-go@41dfa10bad2bb2ae585af6ee5bb4d7d973ad74ed # v5
uses: actions/setup-go@v6
Comment thread
sammcj marked this conversation as resolved.
with:
go-version: ${{ env.GO_VERSION }}
go-version-file: "go.mod"

- name: Get version
id: set_version
Expand All @@ -70,8 +70,13 @@ jobs:
sudo apt-get update
sudo apt-get install -y ${{ matrix.target.c_compiler_package }}

- name: golangci-lint
uses: golangci/golangci-lint-action@v9
with:
version: v2.6

- name: Run tests
run: go test -v ./...
run: make test

- name: Build
env:
Expand All @@ -81,7 +86,7 @@ jobs:
VERSION: ${{ steps.set_version.outputs.new_tag }}

run: |
go build -v -ldflags "-X main.Version=$VERSION" -o build/${{ env.BINARY_NAME }}-${{ matrix.target.os }}-${{ matrix.target.arch }} .
go build -v -ldflags "-w -s -X main.Version=$VERSION" -o build/${{ env.BINARY_NAME }}-${{ matrix.target.os }}-${{ matrix.target.arch }} .
ls -ltarh build/

- name: Upload artifact
Expand Down
30 changes: 30 additions & 0 deletions .golangci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
version: "2"
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The golangci-lint configuration version "2" doesn't align with the golangci-lint version format. The version field in .golangci.yml typically isn't used to specify the golangci-lint tool version. If this is intended as a configuration schema version, it should be documented. Otherwise, consider removing this line as it may cause confusion or compatibility issues.

Suggested change
version: "2"

Copilot uses AI. Check for mistakes.
linters:
enable:
- unparam
settings:
unparam:
check-exported: false
exclusions:
generated: lax
presets:
- comments
- common-false-positives
- legacy
- std-error-handling
paths:
- third_party$
- builtin$
- examples$
formatters:
exclusions:
generated: lax
paths:
- third_party$
- builtin$
- examples$
- screenshots$
- .github$
- .claude$
- bin$

6 changes: 4 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ BINARY_NAME=ingest
# Version information
VERSION := $(shell git describe --tags --always)
BUILD_TIME := $(shell date -u '+%Y-%m-%d_%I:%M:%S%p')
LDFLAGS := -ldflags "-X main.Version=$(VERSION) -X main.BuildTime=$(BUILD_TIME)"
LDFLAGS := -ldflags "-w -s -X main.Version=$(VERSION) -X main.BuildTime=$(BUILD_TIME)"

# Main package path
MAIN_PACKAGE=.
Expand All @@ -30,7 +30,9 @@ clean:
rm -f $(BINARY_NAME)

lint:
gofmt -s -w .
gofmt -w -s .
golangci-lint run
go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@latest -fix -test ./...
Comment thread
sammcj marked this conversation as resolved.

test:
$(GOTEST) -v ./...
Expand Down
54 changes: 48 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ And ingest web URLs.
- Estimate vRAM requirements and check model compatibility using another package I've created called [quantest](https://github.com/sammcj/quantest)
- Parse output directly to LLMs such as Ollama or any OpenAI compatible API for processing
- Generate and include git diffs and logs
- Count approximate tokens for LLM compatibility
- Count tokens using offline tokeniser (default) or optionally use Anthropic API (API key required, but no charge for counting)
Comment thread
sammcj marked this conversation as resolved.
- Customisable output templates
- Copy output to clipboard (when available)
- Export to file or print to console
Expand Down Expand Up @@ -210,6 +210,50 @@ You can provide a prompt suffix to append to the generated prompt:
ingest --llm -p "explain this code" /path/to/project
```

## Token Counting

Ingest provides token counting using either an offline tokeniser (default) or the Anthropic API for more accurate counts.

### Offline Token Counting (Default)

By default, ingest uses an offline tokeniser with a correction factor for improved accuracy:
Comment thread
sammcj marked this conversation as resolved.

```shell
ingest /path/to/project
# [ℹ️] Tokens (Approximate): 15,945
```

The offline tokeniser applies a 1.18x multiplier based on empirical analysis comparing it with Anthropic's API. This correction reduces average estimation error from ~17% to ~2%, providing slightly more accurate token counts without requiring an API key.
Comment thread
sammcj marked this conversation as resolved.

To disable the correction factor and use raw token counts, use the `--no-correction` flag:
Comment thread
sammcj marked this conversation as resolved.

```shell
ingest --no-correction /path/to/project
# Uses raw offline tokeniser without correction multiplier
Comment thread
sammcj marked this conversation as resolved.
```

The first time ingest runs, it downloads a small tokeniser file for offline use.
Comment thread
sammcj marked this conversation as resolved.

### Anthropic API Token Counting

For accurate token counts using Anthropic's counting API, use the `-a` or `--anthropic` flag:

```shell
export ANTHROPIC_API_KEY="your-api-key"
ingest -a /path/to/project
# ✓ Using Anthropic API (claude-sonnet-4-5) for token counting
# [ℹ️] Tokens (Approximate): 15,942
```

The API accepts keys from these environment variables (checked in order):
- `ANTHROPIC_API_KEY`
- `ANTHROPIC_TOKEN`
- `ANTHROPIC_TOKEN_COUNT_KEY`

**Performance optimisation**: When counting tokens for multiple files (e.g. in the "Top 15 largest files" report), ingest processes API requests in parallel batches of 4, significantly reducing the time needed for token counting.

If the API call fails, ingest automatically falls back to the offline tokeniser.
Comment thread
sammcj marked this conversation as resolved.

## Code Compression with Tree-sitter

**Experimental**
Expand Down Expand Up @@ -311,8 +355,10 @@ These directories will be created automatically on first run, along with README

### Flags

- `--compress`: **New** Enable code compression using Tree-sitter to extract key structural information while omitting implementation details
- `-a, --anthropic`: Use Anthropic API for token counting (requires API key in environment)
Comment thread
sammcj marked this conversation as resolved.
- `--compress`: Enable code compression using Tree-sitter to extract key structural information while omitting implementation details
- `--config`: Opens the config file in the default editor
- `--no-correction`: Disable offline tokeniser correction factor (use raw token count)
Comment thread
sammcj marked this conversation as resolved.
- `--context`: Specify the context length for VRAM estimation
- `--exclude-from-tree`: Exclude files/folders from the source tree based on exclude patterns
- `--git-diff-branch`: Generate git diff between two branches
Expand Down Expand Up @@ -376,8 +422,4 @@ Contributions are welcome, Please feel free to submit a Pull Request.
- Copyright 2024 Sam McLeod
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgements

- Initially inspired by [mufeedvh/code2prompt](https://github.com/mufeedvh/code2prompt)

<script src="http://api.html5media.info/1.1.8/html5media.min.js"></script>
3 changes: 3 additions & 0 deletions filesystem/defaultExcludes.go
Original file line number Diff line number Diff line change
Expand Up @@ -194,12 +194,15 @@ const defaultGlobContent = `
**/.vimrc
**/.whitesource
**/.zcompdump*
**/.claude/*.json
**/.mcp.json
**/bat-config
**/changelog.md
**/CHANGELOG*
**/CLA.md
**/CODE_OF_CONDUCT.md
**/CODEOWNERS
**/CONTRIBUTORS.md
**/commitlint.config.js
**/contributing.md
**/CONTRIBUTING*
Expand Down
54 changes: 46 additions & 8 deletions filesystem/filesystem.go
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ func trackExcludedDirectory(excluded *ExcludedInfo, path string, mu *sync.Mutex)
excluded.Directories[path] = 0 // Initialize directory count
}

func WalkDirectory(rootPath string, includePatterns, excludePatterns []string, patternExclude string, includePriority, lineNumber, relativePaths, excludeFromTree, noCodeblock, noDefaultExcludes bool, comp *compressor.GenericCompressor) (string, []FileInfo, *ExcludedInfo, error) {
func WalkDirectory(rootPath string, includePatterns, excludePatterns []string, patternExclude string, includePriority, lineNumber, relativePaths, excludeFromTree, noCodeblock, noDefaultExcludes, followSymlinks bool, comp *compressor.GenericCompressor) (string, []FileInfo, *ExcludedInfo, error) {
var files []FileInfo
var mu sync.Mutex
var wg sync.WaitGroup
Expand Down Expand Up @@ -210,14 +210,24 @@ func WalkDirectory(rootPath string, includePatterns, excludePatterns []string, p
var treeString string

if !fileInfo.IsDir() {
// Check if the single file is a symlink
if !followSymlinks {
linkInfo, err := os.Lstat(rootPath)
if err != nil {
return "", nil, nil, fmt.Errorf("failed to get symlink info: %w", err)
}
if linkInfo.Mode()&os.ModeSymlink != 0 {
utils.PrintColouredMessage("ℹ️", fmt.Sprintf("Skipping symlinked file: %s", rootPath), color.FgCyan)
return fmt.Sprintf("File: %s (symlink, skipped)", rootPath), []FileInfo{}, excluded, nil
}
}

// Handle single file
relPath := filepath.Base(rootPath)
if shouldIncludeFile(relPath, includePatterns, allExcludePatterns, gitignore, includePriority) {
wg.Add(1)
go func() {
defer wg.Done()
wg.Go(func() {
processFile(rootPath, relPath, filepath.Dir(rootPath), lineNumber, relativePaths, noCodeblock, &mu, &files, comp)
}()
})
} else {
trackExcludedFile(excluded, rootPath, &mu)
}
Expand All @@ -240,6 +250,22 @@ func WalkDirectory(rootPath string, includePatterns, excludePatterns []string, p
return err
}

// Check if the path is a symlink
if !followSymlinks {
linkInfo, err := os.Lstat(path)
if err != nil {
return err
}
if linkInfo.Mode()&os.ModeSymlink != 0 {
if linkInfo.IsDir() || (info != nil && info.IsDir()) {
utils.PrintColouredMessage("ℹ️", fmt.Sprintf("Skipping symlinked directory: %s", path), color.FgCyan)
return filepath.SkipDir
}
utils.PrintColouredMessage("ℹ️", fmt.Sprintf("Skipping symlinked file: %s", path), color.FgCyan)
return nil
}
Comment thread
sammcj marked this conversation as resolved.
}

// Check if the current path (file or directory) should be excluded
if shouldExcludePath(relPath, allExcludePatterns, gitignore) {
if info.IsDir() {
Expand All @@ -257,10 +283,10 @@ func WalkDirectory(rootPath string, includePatterns, excludePatterns []string, p

if !info.IsDir() && shouldIncludeFile(relPath, includePatterns, allExcludePatterns, gitignore, includePriority) {
wg.Add(1)
go func(path, relPath string, info os.FileInfo) {
go func(path, relPath string) {
defer wg.Done()
processFile(path, relPath, rootPath, lineNumber, relativePaths, noCodeblock, &mu, &files, comp)
}(path, relPath, info)
}(path, relPath)
}

return nil
Expand Down Expand Up @@ -631,7 +657,19 @@ func isExcluded(path string, patterns []string) bool {
return false
}

func ProcessSingleFile(path string, lineNumber, relativePaths, noCodeblock bool, comp *compressor.GenericCompressor) (FileInfo, error) {
func ProcessSingleFile(path string, lineNumber, relativePaths, noCodeblock, followSymlinks bool, comp *compressor.GenericCompressor) (FileInfo, error) {
// Check if the file is a symlink
if !followSymlinks {
linkInfo, err := os.Lstat(path)
if err != nil {
return FileInfo{}, fmt.Errorf("failed to get symlink info: %w", err)
}
if linkInfo.Mode()&os.ModeSymlink != 0 {
utils.PrintColouredMessage("ℹ️", fmt.Sprintf("Skipping symlinked file: %s", path), color.FgCyan)
return FileInfo{}, fmt.Errorf("file is a symlink and --follow-symlinks is not set")
}
}

// Check if it's a PDF first
isPDF, err := pdf.IsPDF(path)
if err != nil {
Expand Down
Loading