Skip to main content

Why mcpzip?

Every MCP server you add to Claude dumps all its tool schemas into the context window. That sounds harmless until you realize what happens at scale.

The Problem in Numbers

Say you use 5 MCP servers, each exposing 30 tools. That is 150 tool schemas loaded into Claude's context on every single message.

Value
Average tool schema size~350 tokens
Tools across 5 servers150
Total tool overhead52,500 tokens
Claude's context window200,000 tokens
Context consumed by tools alone26.3%

Now add 5 more servers. You are at 300 tools, 105,000 tokens, and over half your context is gone before the conversation starts.

The Real Cost

Context window tokens are not free. They increase latency, reduce the space available for your actual conversation, and degrade the model's tool selection accuracy. Research shows LLMs make worse tool choices when presented with more than ~60 options.

The Analogy

Think of it this way:

Without mcpzip: Every employee in a 500-person company introduces themselves to every visitor, reciting their full job description and qualifications. The visitor forgets most of it, gets confused, and ends up talking to the wrong person.

With mcpzip: A receptionist greets the visitor. "Who are you looking for?" The visitor says "someone who can help with payroll." The receptionist directs them to exactly the right person.

mcpzip is the receptionist. It replaces hundreds of self-introductions with three simple interactions: search, describe, execute.

What mcpzip Does

🗜
Context Compression
Replace hundreds of tool schemas with just 3 meta-tools. 99%+ token savings.
🔍
Smart Search
Keyword + LLM-powered semantic search across all your tools.
Instant Startup
Serves from disk cache immediately. Background refresh is non-blocking.
🔌
All Transports
stdio, HTTP (Streamable HTTP), and legacy SSE. Connect to any MCP server.
🔒
OAuth 2.1
Browser-based PKCE flow with token persistence. Reuses mcp-remote tokens.
🔄
Connection Pool
Lazy connects, idle timeout, automatic reconnection. Zero manual management.
📦
Single Binary
~5.8MB static binary. No runtime dependencies. Just download and run.
🔑
Auto-Migration
Import your existing Claude Code MCP config in one command.

Try the Calculator

See how many tokens mcpzip saves for your setup:

Token Savings Calculator
5
25
Without mcpzip
43,750
tokens
With mcpzip
1,200
tokens
Savings
97%
42,550 tokens saved
Without
125 tools
With
3 tools

Before and After

Without mcpzipWith mcpzip
Tools loaded per messageAll 500+Just 3
Token overhead175,000+1,200
Startup time2-10 seconds< 5 milliseconds
Adding a new serverMore context bloatZero impact
Tool selection accuracyDegrades with scaleConsistent
Search
Connection pooling
Idle timeout management
Background catalog refresh
How does semantic search work?

When you configure a Gemini API key, mcpzip runs two search strategies in parallel:

  1. Keyword search -- fast, token-based matching against tool names, descriptions, and parameters. Great for direct queries like "slack send message".

  2. LLM semantic search -- sends the query and a compact tool catalog to Gemini, which understands natural language intent. Great for queries like "help me schedule a meeting" or "find something to track my tasks".

Results from both are merged, deduplicated, and cached. The semantic search adds ~200-500ms latency but dramatically improves result quality for natural language queries.

What is context window compression?

Context window compression means reducing the number of tokens consumed by tool definitions in the AI's context window.

Without compression, every tool's full JSON Schema is sent to the model on every message. mcpzip compresses this by replacing all tool schemas with 3 meta-tools, and serving full schemas on demand via the describe_tool meta-tool.

The result: instead of 175,000 tokens for 500 tools, you use ~1,200 tokens. That is a 99.3% reduction.

Who Is mcpzip For?

You should use mcpzip if...You might not need mcpzip if...
You have 3+ MCP serversYou use only 1-2 servers
Your servers have 50+ tools totalYour servers have fewer than 20 tools total
You want faster response timesContext overhead is not a concern
You use Claude Code dailyYou rarely use MCP tools
You value clean context windowsYou prefer direct tool access
Even With 2 Servers

Even with just 2 servers, mcpzip's search capability, connection pooling, and instant startup can be worthwhile. The main question is whether your total tool count is high enough to benefit from context compression.