Lazy-load MCP tool definitions to reduce context usage

### Problem

Currently, all tool definitions for all active MCP servers are loaded into the conversation context for every session, consuming significant token budget regardless of whether the tools are actually used.

**Real-world example**: With 7 MCP servers active, tool definitions consume **67,300 tokens** (33.7% of 200k context budget) before any conversation begins.

### Impact

1. **Context Pressure**: Large token overhead limits conversation depth and file analysis
2. **Scalability**: As MCP ecosystem grows, users must choose between functionality and context availability
3. **Inefficiency**: Tools loaded but rarely used still consume tokens (e.g., GitHub MCP with 27 tools consumes ~18k tokens even in sessions that never touch GitHub)

### Current Workaround

Manually enable/disable servers per session:
```bash
claude mcp remove github  # Disable when not needed
claude mcp add github     # Re-enable for specific tasks
```

**Problems with workaround**:
- Requires user intervention and manual tracking
- Disruptive to workflow (server connection overhead)
- Requires knowing tool needs in advance
- **Requires session restart** for changes to take effect (no mid-session toggling)
- Even with aggressive trimming (only 3 core servers), still consuming 42.6k tokens (21.3% of context)

### Proposed Solution: Three-Tier Context Loading

#### Tier 1: Minimal Context (Always Loaded)
**Purpose**: Claude knows tools exist and what they do
**Format**: Server name + tool name + one-line description
**Example**:
```
docker-mcp:
  - list-containers: List all Docker containers
  - get-logs: Retrieve logs for a container
  - create-container: Create new standalone container
  - deploy-compose: Deploy Docker Compose stack
```

**Token Cost**: ~50-100 tokens per tool (current: 550-850 tokens per tool)
**Savings**: ~85-90% reduction in baseline context usage

#### Tier 2: Full Definition (On-Demand)
**Purpose**: When Claude decides to use a tool, fetch complete parameter schema
**Trigger**: Claude attempts to use tool OR proactively fetches definition
**Similar to**: How MCP resources work (ListMcpResourcesTool → ReadMcpResourceTool)

**Alternative approach**: Allow Claude to call tools with minimal context and return actionable parameter errors if schema is incomplete.

#### Tier 3: Extended Documentation (User-Provided)
**Purpose**: Detailed usage patterns, examples, best practices
**Current State**: Already possible via project context files
**No changes needed**: Users can load detailed docs on-demand when needed

### Expected Benefits

1. **Context Efficiency**:
   - Baseline MCP context: 67k → ~10k tokens (85% reduction)
   - Load full definitions only for tools actually used

2. **Scalability**:
   - Users can enable 20+ MCP servers without context pressure
   - MCP ecosystem growth doesn't force tool/context tradeoffs

3. **Better UX**:
   - No manual server management required
   - All tools available, minimal context cost
   - Natural discoverability (Tier 1 descriptions always visible)

### Implementation Considerations

**Tool Discovery**:
- Claude should be able to list all available tools with descriptions
- Similar to how \`claude mcp list\` shows installed servers

**Lazy Loading Mechanism**:
- Option A: Explicit fetch tool (similar to \`ReadMcpResourceTool\`)
- Option B: Automatic fetch on first use (transparent to Claude)
- Option C: Error-driven (call fails → return schema → retry with correct params)

**Caching**:
- Once fetched, keep full definition in context for session
- Avoid repeated fetches for same tool

### Alternative Considered

**Session Profiles**: User-defined sets of servers to enable
```bash
claude mcp session docker-work  # Enables filesystem, git, docker
```

**Why insufficient**:
- Still loads all tool definitions for enabled servers
- User must predict tool needs in advance
- Doesn't solve context scaling problem

### User Feedback

From user discussion:
> "You generally know about all of the tools and what they do. When you need more information on how to use a tool, you pull the instruction files. Then use the tool if needed. You should only need enough context to decide IF you need to use the tool, then you can pull the specifics."

This mirrors how developers use API documentation - you know the API exists, you read the docs when you need to call it.

### Context Budget Breakdown (Current State)

**Configuration A**: All 7 MCP servers active (unoptimized)
```
System prompt:     2.7k tokens   (1.3%)
System tools:     14.4k tokens   (7.2%)
MCP tools:        67.3k tokens  (33.7%)  ← Problem area
Custom agents:     0.4k tokens   (0.2%)
Memory files:      1.7k tokens   (0.9%)
Messages:          0.1k tokens   (0.0%)
-------------------------------------------
Used:            86.6k tokens  (43.3%)
Free:           113.4k tokens  (56.7%)
```

**Configuration B**: Only 3 core MCP servers (aggressive optimization via manual workaround)
```
System prompt:     2.7k tokens   (1.3%)
System tools:     14.4k tokens   (7.2%)
MCP tools:        42.6k tokens  (21.3%)  ← Still consuming significant context
Custom agents:     0.4k tokens   (0.2%)
Memory files:      1.7k tokens   (0.9%)
Messages:          0.1k tokens   (0.0%)
-------------------------------------------
Used:            62.8k tokens  (31.4%)
Free:           137.2k tokens  (68.6%)
```

**Analysis**: Even with only 3 MCP servers (filesystem, git, mcp-gateway), tool definitions consume 42.6k tokens. This demonstrates that the problem isn't just "too many servers" - it's the architectural overhead of loading full tool definitions upfront.

### Requested Action

Implement lazy-loading for MCP tool definitions to reduce baseline context usage from ~67k to ~10k tokens while maintaining full functionality.

---

### Additional Context

- MCP servers affected: All transport types (stdio, SSE)
- User environment: Linux, 7 active MCP servers (Docker, Filesystem, GitHub, Git, SSH, MCP Gateway w/ Memory/Fetch/Playwright)
- Testing conducted: 2025-11-10, validated 36.7% context reduction with manual workaround (67.3k → 42.6k tokens)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy-load MCP tool definitions to reduce context usage #11364

Problem

Impact

Current Workaround

Proposed Solution: Three-Tier Context Loading

Tier 1: Minimal Context (Always Loaded)

Tier 2: Full Definition (On-Demand)

Tier 3: Extended Documentation (User-Provided)

Expected Benefits

Implementation Considerations

Alternative Considered

User Feedback

Context Budget Breakdown (Current State)

Requested Action

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Lazy-load MCP tool definitions to reduce context usage #11364

Description

Problem

Impact

Current Workaround

Proposed Solution: Three-Tier Context Loading

Tier 1: Minimal Context (Always Loaded)

Tier 2: Full Definition (On-Demand)

Tier 3: Extended Documentation (User-Provided)

Expected Benefits

Implementation Considerations

Alternative Considered

User Feedback

Context Budget Breakdown (Current State)

Requested Action

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions