-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
Problem
Currently, all tool definitions for all active MCP servers are loaded into the conversation context for every session, consuming significant token budget regardless of whether the tools are actually used.
Real-world example: With 7 MCP servers active, tool definitions consume 67,300 tokens (33.7% of 200k context budget) before any conversation begins.
Impact
- Context Pressure: Large token overhead limits conversation depth and file analysis
- Scalability: As MCP ecosystem grows, users must choose between functionality and context availability
- Inefficiency: Tools loaded but rarely used still consume tokens (e.g., GitHub MCP with 27 tools consumes ~18k tokens even in sessions that never touch GitHub)
Current Workaround
Manually enable/disable servers per session:
claude mcp remove github # Disable when not needed
claude mcp add github # Re-enable for specific tasksProblems with workaround:
- Requires user intervention and manual tracking
- Disruptive to workflow (server connection overhead)
- Requires knowing tool needs in advance
- Requires session restart for changes to take effect (no mid-session toggling)
- Even with aggressive trimming (only 3 core servers), still consuming 42.6k tokens (21.3% of context)
Proposed Solution: Three-Tier Context Loading
Tier 1: Minimal Context (Always Loaded)
Purpose: Claude knows tools exist and what they do
Format: Server name + tool name + one-line description
Example:
docker-mcp:
- list-containers: List all Docker containers
- get-logs: Retrieve logs for a container
- create-container: Create new standalone container
- deploy-compose: Deploy Docker Compose stack
Token Cost: ~50-100 tokens per tool (current: 550-850 tokens per tool)
Savings: ~85-90% reduction in baseline context usage
Tier 2: Full Definition (On-Demand)
Purpose: When Claude decides to use a tool, fetch complete parameter schema
Trigger: Claude attempts to use tool OR proactively fetches definition
Similar to: How MCP resources work (ListMcpResourcesTool → ReadMcpResourceTool)
Alternative approach: Allow Claude to call tools with minimal context and return actionable parameter errors if schema is incomplete.
Tier 3: Extended Documentation (User-Provided)
Purpose: Detailed usage patterns, examples, best practices
Current State: Already possible via project context files
No changes needed: Users can load detailed docs on-demand when needed
Expected Benefits
-
Context Efficiency:
- Baseline MCP context: 67k → ~10k tokens (85% reduction)
- Load full definitions only for tools actually used
-
Scalability:
- Users can enable 20+ MCP servers without context pressure
- MCP ecosystem growth doesn't force tool/context tradeoffs
-
Better UX:
- No manual server management required
- All tools available, minimal context cost
- Natural discoverability (Tier 1 descriptions always visible)
Implementation Considerations
Tool Discovery:
- Claude should be able to list all available tools with descriptions
- Similar to how `claude mcp list` shows installed servers
Lazy Loading Mechanism:
- Option A: Explicit fetch tool (similar to `ReadMcpResourceTool`)
- Option B: Automatic fetch on first use (transparent to Claude)
- Option C: Error-driven (call fails → return schema → retry with correct params)
Caching:
- Once fetched, keep full definition in context for session
- Avoid repeated fetches for same tool
Alternative Considered
Session Profiles: User-defined sets of servers to enable
claude mcp session docker-work # Enables filesystem, git, dockerWhy insufficient:
- Still loads all tool definitions for enabled servers
- User must predict tool needs in advance
- Doesn't solve context scaling problem
User Feedback
From user discussion:
"You generally know about all of the tools and what they do. When you need more information on how to use a tool, you pull the instruction files. Then use the tool if needed. You should only need enough context to decide IF you need to use the tool, then you can pull the specifics."
This mirrors how developers use API documentation - you know the API exists, you read the docs when you need to call it.
Context Budget Breakdown (Current State)
Configuration A: All 7 MCP servers active (unoptimized)
System prompt: 2.7k tokens (1.3%)
System tools: 14.4k tokens (7.2%)
MCP tools: 67.3k tokens (33.7%) ← Problem area
Custom agents: 0.4k tokens (0.2%)
Memory files: 1.7k tokens (0.9%)
Messages: 0.1k tokens (0.0%)
-------------------------------------------
Used: 86.6k tokens (43.3%)
Free: 113.4k tokens (56.7%)
Configuration B: Only 3 core MCP servers (aggressive optimization via manual workaround)
System prompt: 2.7k tokens (1.3%)
System tools: 14.4k tokens (7.2%)
MCP tools: 42.6k tokens (21.3%) ← Still consuming significant context
Custom agents: 0.4k tokens (0.2%)
Memory files: 1.7k tokens (0.9%)
Messages: 0.1k tokens (0.0%)
-------------------------------------------
Used: 62.8k tokens (31.4%)
Free: 137.2k tokens (68.6%)
Analysis: Even with only 3 MCP servers (filesystem, git, mcp-gateway), tool definitions consume 42.6k tokens. This demonstrates that the problem isn't just "too many servers" - it's the architectural overhead of loading full tool definitions upfront.
Requested Action
Implement lazy-loading for MCP tool definitions to reduce baseline context usage from ~67k to ~10k tokens while maintaining full functionality.
Additional Context
- MCP servers affected: All transport types (stdio, SSE)
- User environment: Linux, 7 active MCP servers (Docker, Filesystem, GitHub, Git, SSH, MCP Gateway w/ Memory/Fetch/Playwright)
- Testing conducted: 2025-11-10, validated 36.7% context reduction with manual workaround (67.3k → 42.6k tokens)