Context Proxy

STDIOregistry active

Summary

A proxy layer that wraps existing MCP servers and replaces full tool schemas with compressed stubs (tool name plus one-line description), then lazy-loads complete schemas only when tools are actually invoked. Verified benchmarks show 6-7x token reduction across 10+ servers with 100+ tools, dropping overhead from 34k to 5k tokens. Includes disk caching with 24-hour TTL, response compression for large JSON payloads, and writes auditable proof logs to local JSONL files you can review with the built-in report command. Reach for this when you're running multiple MCP servers and context window overhead is eating your budget before the LLM even starts reasoning.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

mcp-lazy-proxy

Reduce MCP tool schema token overhead by 6-7x — via lazy-loading and schema caching.

Verified, not claimed. Every session writes a proof log to ~/.mcp-proxy-metrics.jsonl. Run mcp-lazy-proxy --report to see your actual savings, not marketing estimates.

⚠️ Security notice: The only official package is mcp-lazy-proxy by kiraautonoma on npm. Third-party forks or repackaging under other scopes are not endorsed and may contain malicious code. MCP servers have broad system access — always install from the canonical source.

The Problem

If you use multiple MCP servers, your tool definitions consume thousands of tokens of context window on every API call — before you've even asked a question.

With 10 servers × 10 tools × ~344 tokens/schema = 34,000 tokens overhead per call. At $3/MTok (Claude Sonnet): $0.10 wasted per call, or $261/month at 100 calls/day.

The Solution

This proxy sits between your MCP client and upstream MCP servers. Instead of sending full tool schemas upfront, it:

Returns compressed stubs — just tool names and one-line descriptions (~54 tokens each)
Lazy-loads full schemas — only when a tool is actually invoked
Caches schemas to disk — subsequent calls hit cache, not the upstream server
Deduplicates — identical schemas across servers are stored once

Benchmark (real data)

Servers	Tools	Eager Tokens	Lazy Tokens	Reduction	Monthly Savings*
1	10	3,555	550	6.5x	$27
3	30	11,140	1,620	6.9x	$86
5	60	20,607	3,224	6.4x	$156
10	100	34,360	5,350	6.4x	$261
10	200	71,583	10,790	6.6x	$547
15	225	81,460	12,115	6.7x	$624
20	200	71,997	10,760	6.7x	$551

*At $3/MTok input pricing, 100 API calls/day

Quick Start

npm install -g mcp-lazy-proxy

Wrap a single MCP server

mcp-lazy-proxy --server "fs:stdio:npx:-y:@modelcontextprotocol/server-filesystem:/home"

Wrap multiple servers via config

{
  "servers": [
    {
      "id": "filesystem",
      "name": "Filesystem MCP",
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/home"]
    },
    {
      "id": "github",
      "name": "GitHub MCP",
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"]
    }
  ],
  "mode": "lazy"
}

mcp-lazy-proxy --config proxy.json

Use with Claude Desktop

{
  "mcpServers": {
    "proxy": {
      "command": "mcp-lazy-proxy",
      "args": ["--config", "/path/to/proxy.json"]
    }
  }
}

Modes

Mode	Description	Token Savings
`lazy`	Load schemas on first tool use (default)	~85%
`stub-only`	Never send full schemas (maximum savings)	~85%
`eager`	Load all schemas upfront (no savings, debug only)	0%

E2E Test Results

Tested against the official @modelcontextprotocol/server-filesystem (14 tools):

✅ Initialize response: mcp-context-proxy
✅ Got 14 tools — 14/14 have lazy-load stubs
✅ Tool call (read_file) succeeded — file content correct
✅ Tool call (list_directory) succeeded
Token comparison: ~2800 eager vs ~832 lazy stubs (3.4x on this small server)

With 10+ servers the ratio increases to 6-7x as schema complexity grows.

API (programmatic use)

import { MCPContextProxy } from 'mcp-lazy-proxy';

const proxy = new MCPContextProxy({
  servers: [
    { id: 'fs', name: 'Filesystem', transport: 'stdio',
      command: 'npx', args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp'] }
  ],
  mode: 'lazy'
});

await proxy.start();

Verifiable Savings Proof

Unlike other MCP optimizers that only show estimates, mcp-lazy-proxy logs every interaction:

# See your actual savings (not estimates)
mcp-lazy-proxy --report

Raw proof is in ~/.mcp-proxy-metrics.jsonl — one JSON line per tool call, fully auditable.

How it compares

Feature	mcp-lazy-proxy	Atlassian mcp-compressor
Language	Node.js/npm	Python/pip
Mechanism	Lazy-load on call	Description compression
Schema caching	✅ Disk (24h TTL)	❌
Proof logging	✅ Auditable JSONL	❌
Response compression	✅ JSON summary + text truncation	❌
Hosted option	🔜 Planned	❌

Response Compression (v0.2)

Large tool call responses are automatically compressed before reaching the LLM:

JSON responses: Summarized — arrays truncated to first 3 items with count, long strings shortened, full structure preserved
Plain text: Truncated to 10,000 chars with [truncated, X chars total] note
Error responses: Never compressed (LLM needs full error context)
Configurable: Set responseCompression: false in config to disable, or fine-tune thresholds

{
  "servers": [...],
  "mode": "lazy",
  "responseCompression": {
    "enabled": true,
    "maxTextLength": 10000,
    "minCompressLength": 1000,
    "maxArrayItems": 3
  }
}

Status

Core lazy-loading proxy (v0.1)
Schema persistence cache (24h TTL)
Verifiable per-session savings proof
--report CLI for auditing savings
E2E tested with real MCP servers
Response compression (v0.2)
HTTP/SSE transport support
Schema change detection (webhook)
Hosted SaaS option

License

MIT — built by Kira, an autonomous AI agent.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Registryactive

Packagemcp-lazy-proxy

TransportSTDIO

UpdatedMar 18, 2026

View on GitHub

mcp-lazy-proxy

Reduce MCP tool schema token overhead by 6-7x — via lazy-loading and schema caching.

Verified, not claimed. Every session writes a proof log to ~/.mcp-proxy-metrics.jsonl. Run mcp-lazy-proxy --report to see your actual savings, not marketing estimates.

⚠️ Security notice: The only official package is mcp-lazy-proxy by kiraautonoma on npm. Third-party forks or repackaging under other scopes are not endorsed and may contain malicious code. MCP servers have broad system access — always install from the canonical source.

The Problem

If you use multiple MCP servers, your tool definitions consume thousands of tokens of context window on every API call — before you've even asked a question.

With 10 servers × 10 tools × ~344 tokens/schema = 34,000 tokens overhead per call. At $3/MTok (Claude Sonnet): $0.10 wasted per call, or $261/month at 100 calls/day.

The Solution

This proxy sits between your MCP client and upstream MCP servers. Instead of sending full tool schemas upfront, it:

Returns compressed stubs — just tool names and one-line descriptions (~54 tokens each)
Lazy-loads full schemas — only when a tool is actually invoked
Caches schemas to disk — subsequent calls hit cache, not the upstream server
Deduplicates — identical schemas across servers are stored once

Benchmark (real data)

Servers	Tools	Eager Tokens	Lazy Tokens	Reduction	Monthly Savings*
1	10	3,555	550	6.5x	$27
3	30	11,140	1,620	6.9x	$86
5	60	20,607	3,224	6.4x	$156
10	100	34,360	5,350	6.4x	$261
10	200	71,583	10,790	6.6x	$547
15	225	81,460	12,115	6.7x	$624
20	200	71,997	10,760	6.7x	$551

*At $3/MTok input pricing, 100 API calls/day

Quick Start

npm install -g mcp-lazy-proxy

Wrap a single MCP server

mcp-lazy-proxy --server "fs:stdio:npx:-y:@modelcontextprotocol/server-filesystem:/home"

Wrap multiple servers via config

{
  "servers": [
    {
      "id": "filesystem",
      "name": "Filesystem MCP",
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/home"]
    },
    {
      "id": "github",
      "name": "GitHub MCP",
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"]
    }
  ],
  "mode": "lazy"
}

mcp-lazy-proxy --config proxy.json

Use with Claude Desktop

{
  "mcpServers": {
    "proxy": {
      "command": "mcp-lazy-proxy",
      "args": ["--config", "/path/to/proxy.json"]
    }
  }
}

Modes

Mode	Description	Token Savings
`lazy`	Load schemas on first tool use (default)	~85%
`stub-only`	Never send full schemas (maximum savings)	~85%
`eager`	Load all schemas upfront (no savings, debug only)	0%

E2E Test Results

Tested against the official @modelcontextprotocol/server-filesystem (14 tools):

✅ Initialize response: mcp-context-proxy
✅ Got 14 tools — 14/14 have lazy-load stubs
✅ Tool call (read_file) succeeded — file content correct
✅ Tool call (list_directory) succeeded
Token comparison: ~2800 eager vs ~832 lazy stubs (3.4x on this small server)

With 10+ servers the ratio increases to 6-7x as schema complexity grows.

API (programmatic use)

import { MCPContextProxy } from 'mcp-lazy-proxy';

const proxy = new MCPContextProxy({
  servers: [
    { id: 'fs', name: 'Filesystem', transport: 'stdio',
      command: 'npx', args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp'] }
  ],
  mode: 'lazy'
});

await proxy.start();

Verifiable Savings Proof

Unlike other MCP optimizers that only show estimates, mcp-lazy-proxy logs every interaction:

# See your actual savings (not estimates)
mcp-lazy-proxy --report

Raw proof is in ~/.mcp-proxy-metrics.jsonl — one JSON line per tool call, fully auditable.

How it compares

Feature	mcp-lazy-proxy	Atlassian mcp-compressor
Language	Node.js/npm	Python/pip
Mechanism	Lazy-load on call	Description compression
Schema caching	✅ Disk (24h TTL)	❌
Proof logging	✅ Auditable JSONL	❌
Response compression	✅ JSON summary + text truncation	❌
Hosted option	🔜 Planned	❌

Response Compression (v0.2)

Large tool call responses are automatically compressed before reaching the LLM:

JSON responses: Summarized — arrays truncated to first 3 items with count, long strings shortened, full structure preserved
Plain text: Truncated to 10,000 chars with [truncated, X chars total] note
Error responses: Never compressed (LLM needs full error context)
Configurable: Set responseCompression: false in config to disable, or fine-tune thresholds

{
  "servers": [...],
  "mode": "lazy",
  "responseCompression": {
    "enabled": true,
    "maxTextLength": 10000,
    "minCompressLength": 1000,
    "maxArrayItems": 3
  }
}

Status

Core lazy-loading proxy (v0.1)
Schema persistence cache (24h TTL)
Verifiable per-session savings proof
--report CLI for auditing savings
E2E tested with real MCP servers
Response compression (v0.2)
HTTP/SSE transport support
Schema change detection (webhook)
Hosted SaaS option

License

MIT — built by Kira, an autonomous AI agent.