A modular runtime and orchestration system
for AI agents.
Structured pipelines, gated phases, specialized agents. Works with Claude Code, OpenCode, Codex CLI, Cursor, and Kiro. 3,750 tests. Production-grade.
AI models write code.
That's not the hard part anymore.
The hard part is everything else. Picking what to work on. Managing branches. Reviewing output. Cleaning up AI artifacts. Handling CI. Addressing reviewer comments. Deploying. AgentSys automates all of it.
20 Commands. One Toolkit.
Each works standalone. Together, they automate everything.
/next-task
Task to production, fully automated
- 12-phase pipeline: discovery through deployment
- Multi-agent review loop (code, security, perf, tests)
- Persistent state -- resume from any phase
- GitHub Issues, GitLab, or local task files
$ /next-task # Start new workflow
$ /next-task --resume # Resume interrupted workflow
/agnix
Lint agent configs before they break
- 385 validation rules across 36 categories
- 10+ AI tools: Claude Code, Cursor, Copilot, Codex, OpenCode, Gemini CLI
- 102 auto-fixable rules with --fix flag
- SARIF output for GitHub Code Scanning
$ /agnix # Validate current project
$ /agnix --fix # Auto-fix fixable issues
/ship
Branch to merged PR in one command
- Commits, pushes, creates PR, monitors CI
- Waits for auto-reviewers, addresses every comment
- Platform auto-detection (GitHub Actions, Railway, Vercel)
- Merges, deploys, and cleans up
$ /ship # Full workflow
$ /ship --dry-run # Preview without executing
/deslop
Kill AI slop before it ships
- 3-phase detection: regex, multi-pass analyzers, CLI tools
- Certainty-graded findings (HIGH / MEDIUM / LOW)
- JS/TS, Python, Rust, Go, Java
- Auto-fix HIGH certainty issues
$ /deslop # Report only (safe)
$ /deslop apply # Fix HIGH certainty issues
/perf
Evidence-backed performance investigation
- 10-phase methodology with baselines and profiling
- Hypothesis generation and controlled experiments
- Breaking point analysis via binary search
- Based on recorded real investigation sessions
$ /perf # Start new investigation
$ /perf --resume # Resume previous investigation
/drift-detect
Find what's documented but not built
- AST-based plan vs code semantic analysis
- JavaScript collectors + single Opus call
- 77% token reduction vs multi-agent approaches
- Tested on 1,000+ repositories
$ /drift-detect # Full analysis
$ /drift-detect --depth quick # Quick scan
/audit-project
Multi-agent code review that iterates until clean
- Up to 10 specialized agents per project
- Security, performance, architecture, DB, API, frontend
- Iterates until zero open issues remain
- Auto-fixes all non-false-positive findings
$ /audit-project # Full review
$ /audit-project --domain security # Security only
/enhance
Analyze everything that shapes agent behavior
- 7 parallel analyzers for prompts, agents, plugins, docs
- Certainty-graded findings with auto-fix support
- Auto-learns false positives over time
- Hooks and skills analysis included
$ /enhance # Run all analyzers
$ /enhance --apply # Apply HIGH certainty fixes
/repo-intel
Unified static analysis for AI agents
- Git history intelligence: hotspots, coupling, ownership, bus factor
- AST symbols: exports, functions, classes, imports
- 9 plugins consume repo-intel data automatically
- Incremental updates, 20 query types
$ /repo-intel init # First-time scan
$ /repo-intel query hotspots # Most active files
/sync-docs
Keep docs in sync with code
- Finds outdated references and stale examples
- Detects missing CHANGELOG entries
- Version mismatch detection
- Auto-fixes safe issues like version numbers
$ /sync-docs # Check what needs updates
$ /sync-docs apply # Apply safe fixes
/learn
Research any topic, build a learning guide
- Progressive discovery: broad to specific to deep
- Quality-scored sources (authority, recency, depth)
- Structured guide with examples and best practices
- RAG index for future agent lookups
$ /learn react hooks --depth=deep # Comprehensive
$ /learn kubernetes --depth=brief # Quick overview
/consult
Get a second opinion from another AI tool
- Cross-tool AI consultation via ACP transport
- 6 providers: Claude, Gemini, Codex, Copilot, Kiro, OpenCode
- Effort-mapped model selection per provider
- Session continuations and context injection
$ /consult "Is this the right approach?" --tool=gemini # Second opinion
$ /consult "Review for performance" --tool=codex # Codex review
/debate
Structured adversarial debate between AI tools
- Multi-round proposer/challenger format
- Evidence-backed arguments with mandatory counterpoints
- Any two AI tools as debaters (Claude, Gemini, Codex, Kiro, etc.)
- Final verdict from the orchestrator
$ /debate codex vs gemini about microservices vs monolith # Structured debate
$ /debate claude vs kiro about our auth implementation # Codebase debate
/web-ctl
Browser automation for AI agents
- Headless Playwright with encrypted session persistence
- Human-in-the-loop auth handoff with CAPTCHA detection
- Anti-bot measures and output sanitization
- Snapshot-based accessibility tree for element discovery
$ /web-ctl goto https://example.com # Navigate
$ /web-ctl auth github --url https://github.com/login # Auth handoff
/prepare-delivery
Pre-ship quality gates
- Deslop, simplify, review loop, delivery validation, docs sync
- Conditional agnix + enhance for config changes
- Works standalone or as part of /next-task
- Does not ship - use /gate-and-ship for full pipeline
$ /prepare-delivery # Run all quality gates
$ /prepare-delivery --skip-review # Skip review loop
/gate-and-ship
Quality gates then ship
- Chains /prepare-delivery then /ship
- One command from code-complete to merged PR
- All flags forwarded to sub-commands
- Each piece runs independently too
$ /gate-and-ship # Full pipeline
$ /gate-and-ship --base=develop # Custom base branch
/release
Versioned release with ecosystem detection
- Auto-detects: npm, cargo, go, python, maven, gradle
- Discovers release tools (semantic-release, goreleaser, etc.)
- Pre-release health check with repo-intel
- Tag, publish, create GitHub release
$ /release # Create release
$ /release --dry-run # Preview without publishing
/skillers
Learn from your workflow patterns
- Reads transcripts from Claude Code, Codex, OpenCode
- Clusters patterns into themed knowledge
- Suggests skills, hooks, and agents to automate repetitive work
- No per-turn overhead - works from saved transcripts
$ /skillers # Analyze workflow patterns
$ /skillers compact # Compact transcripts into knowledge
/onboard
Codebase orientation for newcomers
- Automated project data collection
- Interactive guided tour of the codebase
- Identifies key files, patterns, conventions
- Works on any codebase - no setup required
$ /onboard # Full onboarding tour
$ /onboard --quick # Quick overview
/can-i-help
Find where to contribute
- Matches developer skills to project needs
- Finds test gaps, stale docs, open issues
- Good-first-task identification
- Uses repo-intel for data-driven suggestions
$ /can-i-help # Find contribution opportunities
$ /can-i-help --skills=typescript # Match specific skills
Built Different
Not another AI wrapper. Engineering-grade workflow automation.
Code does code work. AI does AI work.
Static analysis, regex, and AST for detection. LLMs only for synthesis and judgment. 77% fewer tokens than multi-agent approaches.
One agent, one job, done well
47 specialized agents, each with a narrow scope and clear success criteria. No agent tries to do everything.
Pipeline with gates
Each step must pass before the next begins. Can't push before review. Can't merge before CI. Hooks enforce it.
Validate plan and results
Approve the plan. See the results. The middle is automated. One approval unlocks autonomous execution.
Benchmarks
Structured prompts and enriched context do more for output quality than model tier.
Sonnet + AgentSys beats raw Opus
Sonnet + agentsys: $0.66, 6,084 tokens, specific recommendations. Raw Opus: $1.10, 2,841 tokens, generic output. 40% cheaper, 2x more output.
Model tier matters less
With agentsys, Sonnet matches Opus quality. Pipeline structure captures the gains. 73-83% cost reduction with equivalent outcomes.
Invest in pipeline, not model spend
Better prompts, richer context, enforced phases - these compound in ways that model upgrades alone don't. Tested on real tasks against glide-mq.
47 Agents. 40 Skills.
Right model for the task. Opus reasons. Sonnet validates. Haiku executes.
Deep codebase analysis and context gathering
Step-by-step implementation design
Autonomous code writing and modification
Performance investigation coordination
Deep performance analysis and profiling
Web research and learning guide creation
Multi-source plan synthesis and merging
Agent configuration quality analysis
CLAUDE.md file optimization
Documentation quality improvement
Git hooks and automation analysis
Prompt engineering best practices
Skill definition quality analysis
Structured adversarial debate coordination
Workflow pattern analysis and automation suggestions
Task source scanning and prioritization
Pre-ship quality gate validation
CI failure diagnosis and auto-repair
Test coverage analysis and gap detection
AI slop pattern detection and cleanup
Cross-file semantic analysis
Plugin configuration validation
Hot code path identification
Performance investigation logging
Performance hypothesis generation
Controlled experiment execution
Documentation sync and update
Agent config linting orchestration
Cross-tool AI consultation orchestration
Browser automation and session management
Transcript compaction into knowledge themes
Versioned release with ecosystem detection
Codebase orientation and guided onboarding
Git worktree creation and cleanup
CI pipeline status polling
Mechanical code fixes and formatting
Repo map structural validation
Code style and quality patterns
Security vulnerability detection
Runtime performance optimization
Test quality and coverage review
System architecture analysis
Database schema and query review
API design and consistency
Frontend patterns and accessibility
Backend architecture and scaling
CI/CD and infrastructure review
40 Skills across 19 Plugins
Get Started in 30 Seconds
Recommended
$ /plugin marketplace add agent-sh/agentsys
$ /plugin install next-task@agentsys
$ /plugin install ship@agentsys
Interactive installer for Claude Code, OpenCode, and Codex CLI
$ npm install -g agentsys && agentsys
Clone and install from source
$ git clone https://github.com/agent-sh/agentsys.git
$ cd agentsys
$ npm install