Production-ready Claude Code proxy supporting 9+ LLM providers with 60-80% cost reduction through token optimization.
Lynkr is a self-hosted proxy server that unlocks Claude Code CLI and Cursor IDE by enabling:
- π Any LLM Provider - Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Ollama (local), llama.cpp, Azure OpenAI, Azure Anthropic, OpenAI, LM Studio
- π° 60-80% Cost Reduction - Built-in token optimization with smart tool selection, prompt caching, and memory deduplication
- π 100% Local/Private - Run completely offline with Ollama or llama.cpp
- π― Zero Code Changes - Drop-in replacement for Anthropic's backend
- π’ Enterprise-Ready - Circuit breakers, load shedding, Prometheus metrics, health checks
Perfect for:
- Developers who want provider flexibility and cost control
- Enterprises needing self-hosted AI with observability
- Privacy-focused teams requiring local model execution
- Teams seeking 60-80% cost reduction through optimization
Lynkr reduces AI costs by 60-80% through intelligent token optimization:
Scenario: 100,000 API requests/month, 50k input tokens, 2k output tokens per request
| Provider | Without Lynkr | With Lynkr | Monthly Savings | Annual Savings |
|---|---|---|---|---|
| Claude Sonnet 4.5 (Databricks) | $16,000 | $6,400 | $9,600 | $115,200 |
| GPT-4o (OpenRouter) | $12,000 | $4,800 | $7,200 | $86,400 |
| Ollama (Local) | API costs | $0 | $12,000+ | $144,000+ |
6 Token Optimization Phases:
-
Smart Tool Selection (50-70% reduction)
- Filters tools based on request type
- Chat queries don't get file/git tools
- Only sends relevant tools to model
-
Prompt Caching (30-45% reduction)
- Caches repeated prompts and system messages
- Reuses context across conversations
- Reduces redundant token usage
-
Memory Deduplication (20-30% reduction)
- Removes duplicate conversation context
- Compresses historical messages
- Eliminates redundant information
-
Tool Response Truncation (15-25% reduction)
- Truncates long tool outputs intelligently
- Keeps only relevant portions
- Reduces tool result tokens
-
Dynamic System Prompts (10-20% reduction)
- Adapts prompts to request complexity
- Shorter prompts for simple queries
- Full prompts only when needed
-
Conversation Compression (15-25% reduction)
- Summarizes old conversation turns
- Keeps recent context detailed
- Archives historical context
π Detailed Token Optimization Guide
- β Cloud Providers: Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Azure OpenAI, Azure Anthropic, OpenAI
- β Local Providers: Ollama (free), llama.cpp (free), LM Studio (free)
- β Hybrid Routing: Automatically route between local (fast/free) and cloud (powerful) based on complexity
- β Automatic Fallback: Transparent failover if primary provider is unavailable
- π° 60-80% Token Reduction - 6-phase optimization pipeline
- π° $77k-$115k Annual Savings - For typical enterprise usage (100k requests/month)
- π° 100% FREE Option - Run completely locally with Ollama or llama.cpp
- π° Hybrid Routing - 65-100% cost savings by using local models for simple requests
- π 100% Local Operation - Run completely offline with Ollama/llama.cpp
- π Air-Gapped Deployments - No internet required for local providers
- π Self-Hosted - Full control over your data and infrastructure
- π Local Embeddings - Private @Codebase search with Ollama/llama.cpp
- π Policy Enforcement - Git restrictions, test requirements, web fetch controls
- π Sandboxing - Optional Docker isolation for MCP tools
- π’ Production-Ready - Circuit breakers, load shedding, graceful shutdown
- π’ Observability - Prometheus metrics, structured logging, health checks
- π’ Kubernetes-Ready - Liveness, readiness, startup probes
- π’ High Performance - ~7ΞΌs overhead, 140K req/sec throughput
- π’ Reliability - Exponential backoff, automatic retries, error resilience
- π’ Scalability - Horizontal scaling, connection pooling, load balancing
- β Claude Code CLI - Drop-in replacement for Anthropic backend
- β Cursor IDE - Full OpenAI API compatibility (Requires Cursor Pro)
- β Continue.dev - Works with any OpenAI-compatible client
- β Cline +VSCode - Confgiure it similar to cursor in openai compatible section
- π§ Long-Term Memory - Titans-inspired memory system with surprise-based filtering
- π§ Semantic Memory - FTS5 search with multi-signal retrieval (recency, importance, relevance)
- π§ Automatic Extraction - Zero-latency memory updates (<50ms retrieval, <100ms async extraction)
- π§ MCP Integration - Automatic Model Context Protocol server discovery
- π§ Tool Calling - Full tool support with server and client execution modes
- π§ Custom Tools - Easy integration of custom tool implementations
- π Embeddings Support - 4 options: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
- π Token Tracking - Real-time usage monitoring and cost attribution
- π― Zero Code Changes - Works with existing Claude Code CLI/Cursor setups
- π― Hot Reload - Development mode with auto-restart
- π― Comprehensive Logging - Structured logs with request ID correlation
- π― Easy Configuration - Environment variables or .env file
- π― Docker Support - docker-compose with GPU support
- π― 400+ Tests - Comprehensive test coverage for reliability
- β‘ Real-Time Streaming - Token-by-token streaming for all providers
- β‘ Low Latency - Minimal overhead (~7ΞΌs per request)
- β‘ High Throughput - 140K requests/second capacity
- β‘ Connection Pooling - Efficient connection reuse
- β‘ Prompt Caching - LRU cache with SHA-256 keying
π Complete Feature Documentation
Option 1: NPM Package (Recommended)
# Install globally
npm install -g lynkr
# Or run directly with npx
npx lynkrOption 2: Git Clone
# Clone repository
git clone https://github.com/vishalveerareddy123/Lynkr.git
cd Lynkr
# Install dependencies
npm install
# Create .env from example
cp .env.example .env
# Edit .env with your provider credentials
nano .env
# Start server
npm startOption 3: Homebrew (macOS/Linux)
brew tap vishalveerareddy123/lynkr
brew install lynkr
lynkr startOption 4: Docker
docker-compose up -dLynkr supports 9+ LLM providers:
| Provider | Type | Models | Cost | Privacy |
|---|---|---|---|---|
| AWS Bedrock | Cloud | 100+ (Claude, Titan, Llama, Mistral, etc.) |
|
Cloud |
| Databricks | Cloud | Claude Sonnet 4.5, Opus 4.5 | $$$ | Cloud |
| OpenRouter | Cloud | 100+ (GPT, Claude, Llama, Gemini, etc.) |
|
Cloud |
| Ollama | Local | Unlimited (free, offline) | FREE | π 100% Local |
| llama.cpp | Local | GGUF models | FREE | π 100% Local |
| Azure OpenAI | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud |
| Azure Anthropic | Cloud | Claude models | $$$ | Cloud |
| OpenAI | Cloud | GPT-4o, o1, o3 | $$$ | Cloud |
| LM Studio | Local | Local models with GUI | FREE | π 100% Local |
π Full Provider Configuration Guide
Configure Claude Code CLI to use Lynkr:
# Set Lynkr as backend
export ANTHROPIC_BASE_URL=http://localhost:8081
export ANTHROPIC_API_KEY=dummy
# Run Claude Code
claude "Your prompt here"That's it! Claude Code now uses your configured provider.
π Detailed Claude Code Setup
Configure Cursor IDE to use Lynkr:
-
Open Cursor Settings
- Mac:
Cmd+,| Windows/Linux:Ctrl+, - Navigate to: Features β Models
- Mac:
-
Configure OpenAI API Settings
- API Key:
sk-lynkr(any non-empty value) - Base URL:
http://localhost:8081/v1 - Model:
claude-3.5-sonnet(or your provider's model)
- API Key:
-
Test It
- Chat:
Cmd+L/Ctrl+L - Inline edits:
Cmd+K/Ctrl+K - @Codebase search: Requires embeddings setup
- Chat:
π Full Cursor Setup Guide | Embeddings Configuration
- π¦ Installation Guide - Detailed installation for all methods
- βοΈ Provider Configuration - Complete setup for all 9+ providers
- π― Quick Start Examples - Copy-paste configs
- π₯οΈ Claude Code CLI Setup - Connect Claude Code CLI
- π¨ Cursor IDE Setup - Full Cursor integration with troubleshooting
- π Embeddings Guide - Enable @Codebase semantic search (4 options: Ollama, llama.cpp, OpenRouter, OpenAI)
- β¨ Core Features - Architecture, request flow, format conversion
- π§ Memory System - Titans-inspired long-term memory
- π° Token Optimization - 60-80% cost reduction strategies
- π§ Tools & Execution - Tool calling, execution modes, custom tools
- π³ Docker Deployment - docker-compose setup with GPU support
- π Production Hardening - Circuit breakers, load shedding, metrics
- π API Reference - All endpoints and formats
- π§ Troubleshooting - Common issues and solutions
- β FAQ - Frequently asked questions
- π§ͺ Testing Guide - Running tests and validation
- π DeepWiki Documentation - AI-powered documentation search
- π¬ GitHub Discussions - Community Q&A
- π Report Issues - Bug reports and feature requests
- π¦ NPM Package - Official npm package
- β Multi-Provider Support - 9+ providers including local (Ollama, llama.cpp) and cloud (Bedrock, Databricks, OpenRouter)
- β 60-80% Cost Reduction - Token optimization with smart tool selection, prompt caching, memory deduplication
- β 100% Local Option - Run completely offline with Ollama/llama.cpp (zero cloud dependencies)
- β OpenAI Compatible - Works with Cursor IDE, Continue.dev, and any OpenAI-compatible client
- β Embeddings Support - 4 options for @Codebase search: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
- β MCP Integration - Automatic Model Context Protocol server discovery and orchestration
- β Enterprise Features - Circuit breakers, load shedding, Prometheus metrics, K8s health checks
- β Streaming Support - Real-time token streaming for all providers
- β Memory System - Titans-inspired long-term memory with surprise-based filtering
- β Tool Calling - Full tool support with server and passthrough execution modes
- β Production Ready - Battle-tested with 400+ tests, observability, and error resilience
βββββββββββββββββββ
β Claude Code CLI β or Cursor IDE
ββββββββββ¬βββββββββ
β Anthropic/OpenAI Format
β
βββββββββββββββββββ
β Lynkr Proxy β
β Port: 8081 β
β β
β β’ Format Conv. β
β β’ Token Optim. β
β β’ Provider Routeβ
β β’ Tool Calling β
β β’ Caching β
ββββββββββ¬βββββββββ
β
ββββ Databricks (Claude 4.5)
ββββ AWS Bedrock (100+ models)
ββββ OpenRouter (100+ models)
ββββ Ollama (local, free)
ββββ llama.cpp (local, free)
ββββ Azure OpenAI (GPT-4o, o1)
ββββ OpenAI (GPT-4o, o3)
ββββ Azure Anthropic (Claude)
100% Local (FREE)
export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=qwen2.5-coder:latest
export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
npm startAWS Bedrock (100+ models)
export MODEL_PROVIDER=bedrock
export AWS_BEDROCK_API_KEY=your-key
export AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
npm startOpenRouter (simplest cloud)
export MODEL_PROVIDER=openrouter
export OPENROUTER_API_KEY=sk-or-v1-your-key
npm startπ More Examples
We welcome contributions! Please see:
- Contributing Guide - How to contribute
- Testing Guide - Running tests
Apache 2.0 - See LICENSE file for details.
- β Star this repo if Lynkr helps you!
- π¬ Join Discussions - Ask questions, share tips
- π Report Issues - Bug reports welcome
- π Read the Docs - Comprehensive guides
Made with β€οΈ by developers, for developers.