-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
Feature Request: Lazy Loading for MCP Servers and Tools
Problem Statement
Currently, Claude Code loads all configured MCP servers, tools, and agents at session startup, consuming significant context before any conversation begins. In my environment:
- MCP tools: 39.8k tokens (19.9%)
- Custom agents: 9.7k tokens (4.9%)
- System tools: 22.6k tokens (11.3%)
- Memory files: 36.0k tokens (18.0%)
- Total: ~108k tokens (54% of 200k limit)
This leaves only 92k tokens for actual conversation and work, severely limiting complex tasks.
Proposed Solution
Implement lazy loading for MCP servers and tools, loading them only when needed based on conversation context.
Core Features
-
Lightweight Registry System
- Load only a small index (~5k tokens) at startup
- Registry contains tool names, descriptions, and trigger keywords
- Tools load on-demand when keywords are detected
-
Intelligent Loading
- Analyze user input for relevant keywords
- Load only required tools for the task
- Cache loaded tools for session duration
- Preload related tools that commonly work together
-
Configuration Options
{ "optimization": { "lazyLoading": true, "maxInitialTokens": 5000, "autoLoadThreshold": 0.8, "cacheMinutes": 30 }, "mcpServers": { "example-server": { "command": "...", "lazyLoad": true, "triggers": ["keyword1", "keyword2"], "preloadWith": ["related-server"] } } }
Benefits
- 95% Token Reduction: From 108k to ~5k initial tokens
- Longer Conversations: 195k tokens available vs 92k currently
- Better Performance: Faster startup, lower memory usage
- Scalability: Can add more tools without context penalty
Implementation Suggestion
Phase 1: Basic Lazy Loading
- Add
lazyLoadflag to MCP server configs - Load registry instead of full tool definitions
- Implement on-demand loading when tool is called
Phase 2: Intelligent Preloading
- Keyword-based auto-loading
- Pattern recognition for common workflows
- Tool relationship mapping
Phase 3: Advanced Optimization
- Session-based learning of tool usage patterns
- Predictive preloading based on project context
- Dynamic unloading of unused tools
User Experience
Before (Current)
Starting session...
Loading 73 MCP tools... [39.8k tokens]
Loading 56 agents... [9.7k tokens]
Loading system tools... [22.6k tokens]
Ready with 92k tokens remaining.
After (With Lazy Loading)
Starting session...
Loading tool registry... [5k tokens]
Ready with 195k tokens available.
User: "I need to build a React component"
> Auto-loading: context7, magic [+3.5k tokens]
> 191.5k tokens remaining
Test Case
I've already built a proof-of-concept in ~/.claude/optimization/ that demonstrates:
- Registry generation from existing MCP configs
- Keyword extraction and mapping
- Lazy loader with pattern matching
- 95% token reduction achieved
Files available for reference:
tool-registry.json- Example registry structurelazy-loader.py- Proof of concept implementationgenerate-index.py- Registry generation logic
Priority
High - This is a critical limitation for power users with many tools and complex workflows. The current 54% context consumption at startup makes many advanced use cases impossible.
Similar Products
- VSCode: Lazy loads extensions based on file types and activation events
- JetBrains IDEs: Load plugins on-demand based on project type
- Vim/Neovim: Lazy loading plugins (lazy.nvim, vim-plug with lazy loading)
Contact
Happy to provide more details, test beta implementations, or share the proof-of-concept code.
Submitted by: dtaylor
Date: July 9, 2025
Claude Code Version: [Current Version]
Impact: Affects all users with multiple MCP servers configured