Why mcpzip?
Every MCP server you add to Claude dumps all its tool schemas into the context window. That sounds harmless until you realize what happens at scale.
The Problem in Numbers
Say you use 5 MCP servers, each exposing 30 tools. That is 150 tool schemas loaded into Claude's context on every single message.
| Value | |
|---|---|
| Average tool schema size | ~350 tokens |
| Tools across 5 servers | 150 |
| Total tool overhead | 52,500 tokens |
| Claude's context window | 200,000 tokens |
| Context consumed by tools alone | 26.3% |
Now add 5 more servers. You are at 300 tools, 105,000 tokens, and over half your context is gone before the conversation starts.
Context window tokens are not free. They increase latency, reduce the space available for your actual conversation, and degrade the model's tool selection accuracy. Research shows LLMs make worse tool choices when presented with more than ~60 options.
The Analogy
Think of it this way:
Without mcpzip: Every employee in a 500-person company introduces themselves to every visitor, reciting their full job description and qualifications. The visitor forgets most of it, gets confused, and ends up talking to the wrong person.
With mcpzip: A receptionist greets the visitor. "Who are you looking for?" The visitor says "someone who can help with payroll." The receptionist directs them to exactly the right person.
mcpzip is the receptionist. It replaces hundreds of self-introductions with three simple interactions: search, describe, execute.
What mcpzip Does
Try the Calculator
See how many tokens mcpzip saves for your setup:
Before and After
How does semantic search work?
When you configure a Gemini API key, mcpzip runs two search strategies in parallel:
-
Keyword search -- fast, token-based matching against tool names, descriptions, and parameters. Great for direct queries like "slack send message".
-
LLM semantic search -- sends the query and a compact tool catalog to Gemini, which understands natural language intent. Great for queries like "help me schedule a meeting" or "find something to track my tasks".
Results from both are merged, deduplicated, and cached. The semantic search adds ~200-500ms latency but dramatically improves result quality for natural language queries.
What is context window compression?
Context window compression means reducing the number of tokens consumed by tool definitions in the AI's context window.
Without compression, every tool's full JSON Schema is sent to the model on every message. mcpzip compresses this by replacing all tool schemas with 3 meta-tools, and serving full schemas on demand via the describe_tool meta-tool.
The result: instead of 175,000 tokens for 500 tools, you use ~1,200 tokens. That is a 99.3% reduction.
Who Is mcpzip For?
| You should use mcpzip if... | You might not need mcpzip if... |
|---|---|
| You have 3+ MCP servers | You use only 1-2 servers |
| Your servers have 50+ tools total | Your servers have fewer than 20 tools total |
| You want faster response times | Context overhead is not a concern |
| You use Claude Code daily | You rarely use MCP tools |
| You value clean context windows | You prefer direct tool access |
Even with just 2 servers, mcpzip's search capability, connection pooling, and instant startup can be worthwhile. The main question is whether your total tool count is high enough to benefit from context compression.