-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
Feature Request: Improve Claude Code Token Management with MCP Servers
Problem Statement
Claude Code's current MCP server architecture creates significant workflow friction and inefficient resource utilization. All configured MCP servers load their complete tool schemas into the context at session initialization, consuming tokens regardless of actual usage.
Specific Issues:
- Static token overhead: 18.3k tokens (9.2% of context) consumed by unused AWS MCP servers
- Configuration-time resource decisions in discovery-driven workflows
- Session restart required to modify MCP server availability
- Premature optimization pressure: choose between token efficiency or tool availability
Impact on Developer Workflow
Real development scenarios require dynamic tool access patterns that the current architecture cannot support:
- Mid-conversation discovery: Developer realizes they need AWS documentation while debugging, but MCP servers weren't loaded
- Context-dependent tooling: Different projects require different AWS services (Lambda vs CDK vs pricing analysis)
- Token budget management: 18k static overhead reduces effective context window by ~4-5k lines of code
- Workflow interruption: Restarting sessions to change MCP configuration breaks conversation continuity
Technical Root Cause
The system treats MCP servers as session-scoped heavyweight resources rather than on-demand lightweight services. Tool schema definitions are eagerly loaded rather than lazily initialized, violating efficient resource allocation principles.
Proposed Solutions
Primary: Runtime MCP Server Management
- Enable/disable servers within active sessions without configuration changes
- UI controls in
/mcpinterface for real-time server toggling - Tool schema loading/unloading on demand
- Preserve conversation context during server state changes
Secondary: Intelligent Tool Loading
- Lazy schema initialization: Load tool definitions only when first referenced
- Contextual server suggestions: Claude identifies and requests needed servers mid-conversation
- Automatic schema eviction: Unload unused tool definitions to reclaim tokens
- Token-aware prioritization: Prefer lightweight servers when context pressure exists
Tertiary: Enhanced Configuration Scoping
- Session profiles: Quick-switch between predefined MCP server combinations
- Project-based auto-configuration: Automatically load relevant servers based on project type detection
- Usage analytics: Track MCP server utilization to inform configuration optimization
Success Criteria
- Zero-restart server management: Developers can enable AWS documentation MCP server mid-conversation without session interruption
- Token efficiency: Unused servers consume zero context tokens
- Workflow preservation: MCP server changes maintain conversation history and context
- Predictable performance: Server loading/unloading operations complete within 2-3 seconds
Business Justification
This directly impacts developer productivity in Claude Code adoption:
- Reduced cognitive overhead: No need to predict entire toolchain requirements at session start
- Improved context utilization: Recover 9%+ of context window for actual code and conversation
- Enhanced user experience: Eliminate artificial workflow constraints that force suboptimal behavior
Current Environment
- Claude Code with global MCP server configuration
- AWS MCP servers: aws-core, aws-documentation, aws-cdk, aws-pricing
- Context usage: 89k/200k tokens with 18.3k MCP overhead
- Development focus: Serverless/Lambda with Terraform (not CDK)
Priority Classification
High Priority - This addresses a fundamental architectural constraint that forces users into inefficient resource allocation patterns, directly impacting the core value proposition of Claude Code as a development productivity tool.