Skip to content

cmd: add corpus clean command to remove invalid sequences#777

Merged
anishnaik merged 5 commits intomasterfrom
dev/remove-unhealthy-corpus
Feb 6, 2026
Merged

cmd: add corpus clean command to remove invalid sequences#777
anishnaik merged 5 commits intomasterfrom
dev/remove-unhealthy-corpus

Conversation

@dguido
Copy link
Copy Markdown
Member

@dguido dguido commented Jan 22, 2026

Summary

  • Adds a new medusa corpus clean command to remove invalid sequences from the on-disk corpus
  • After contract refactoring, the corpus may contain many sequences that can no longer execute (e.g., changed function signatures, removed contracts)
  • The command compiles contracts, sets up a test chain, and validates each sequence
  • Invalid sequences are removed from disk (with --dry-run option to preview changes)

Usage

# Clean invalid sequences from corpus
medusa corpus clean

# Preview what would be deleted without actually deleting
medusa corpus clean --dry-run

# Use a custom config file
medusa corpus clean --config path/to/medusa.json

Example output:

Reading configuration file at: medusa.json
Initializing fuzzer...
Loading and validating corpus from: corpus/
Corpus cleaning completed in 5s
Results: 121 valid, 7361 invalid out of 7482 total sequences
7361 invalid sequences removed from disk

Implementation Details

  • cmd/corpus.go: New CLI command using Cobra, similar to existing fuzz/init commands
  • fuzzing/corpus_cleaner.go: CorpusCleaner type that reuses fuzzer infrastructure
  • fuzzing/corpus/corpus.go: Added CleanInvalidSequences() method
  • fuzzing/corpus/corpus_files.go: Added removeFileFromDisk() method

Test plan

  • Code compiles (go build ./...)
  • Code passes formatting (go fmt ./...)
  • Linter passes on new files (no new issues in golangci-lint)
  • Corpus tests pass (go test -v ./fuzzing/corpus/...)
  • Manual testing with a real corpus (requires contract project)

Fixes #743


Generated with Claude Code

After contract refactoring, the corpus may contain many invalid sequences that
cannot be executed. This adds a new `medusa corpus clean` command to remove
these invalid sequences from disk.

The command:
- Compiles contracts and sets up a test chain
- Loads and validates each call sequence in the corpus
- Removes sequences that fail to execute (contract resolution failures,
  ABI mismatches, or execution errors)
- Supports --dry-run to preview what would be deleted
- Reports statistics on valid/invalid sequences

Fixes #743

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dguido dguido force-pushed the dev/remove-unhealthy-corpus branch from 6010ae3 to 83dfd8a Compare January 22, 2026 05:47
anishnaik and others added 4 commits February 5, 2026 07:57
Move CorpusCleaner to fuzzing/corpus/ package to improve separation of
concerns and match the organizational pattern of corpus_pruner.go.

Changes:
- Create fuzzing/corpus/corpus_cleaner.go with refactored CorpusCleaner
  that receives dependencies as parameters (no *Fuzzer dependency)
- Update fuzzing/corpus_cleaner.go to be a thin wrapper that sets up
  dependencies and delegates to corpus.CorpusCleaner
- Follows same pattern as corpus_pruner.go for consistency

Benefits:
- Better separation of concerns (corpus package independent of Fuzzer)
- More testable (CorpusCleaner can be tested with mocked dependencies)
- Consistent architecture within the corpus package

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…ze architecture

- Remove --dry-run flag completely (invalid sequences are always deleted)
- Add CreateTestChainForCleaning() public helper to Fuzzer for CLI use
- Delete fuzzing/corpus_cleaner.go wrapper (CLI now calls corpus package directly)
- Standardize flag handling by creating cmd/corpus_flags.go
- Move all cleaning logic to corpus package (no Fuzzer dependency)
- Follow corpus_pruner pattern for clean separation of concerns

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@anishnaik anishnaik merged commit bc3d1d0 into master Feb 6, 2026
15 checks passed
@anishnaik anishnaik deleted the dev/remove-unhealthy-corpus branch February 6, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Option to remove unhealthy sequences from corpus

2 participants