Skip to main content

Performance

mcpzip is designed to be fast and lightweight. This page covers the performance characteristics you can expect.

Context Window Savings

The primary benefit of mcpzip is context window compression.

MetricWithout mcpzipWith mcpzip
Tool schemas loadedAll (N)3 (always)
Tokens per tool schema~350~400 (meta-tool)
Total tool tokens (10 servers, 50 tools each)~175,000~1,200
Context overhead87.5% of 200K0.6% of 200K
Savings--99.3%
How does context window compression work?

Every MCP tool has a JSON Schema that describes its parameters. This schema is sent to the AI model in every message as part of the "tool definitions" block.

A typical tool schema consumes ~350 tokens. With 500 tools, that is 175,000 tokens consumed before your conversation starts.

mcpzip replaces all of those with 3 meta-tools (~1,200 tokens total). When Claude needs a tool, it searches on demand and loads only the schema it needs.

Interactive Calculator

Adjust the sliders to see how many tokens mcpzip saves for your setup:

Token Savings Calculator
5
25
Without mcpzip
43,750
tokens
With mcpzip
1,200
tokens
Savings
97%
42,550 tokens saved
Without
125 tools
With
3 tools

Real-World Scenarios

ScenarioWithout mcpzipWith mcpzip
Small setup (3 servers, 30 tools)10,500 tokens1,200 tokens
Medium setup (5 servers, 125 tools)43,750 tokens1,200 tokens
Large setup (10 servers, 500 tools)175,000 tokens1,200 tokens
Power user (15 servers, 900 tools)315,000 tokens1,200 tokens
Context Window Exhaustion

With 15+ MCP servers loaded directly, the tool schemas alone can exceed the context window of many models. GPT-4 starts degrading past ~60 tools. mcpzip eliminates this problem entirely.

Startup Time

PhaseWithout mcpzipWith mcpzip
Time to first request2-10 seconds< 5 milliseconds
Background refreshN/A2-10 seconds (non-blocking)
Catalog availableAfter all servers connectImmediately (from cache)
Instant Start

mcpzip's disk cache means it is ready to serve within milliseconds. The background refresh updates the catalog without blocking any requests.

Search Latency

Search TypeLatencyWhen Used
Cache hit< 0.1msRepeated or similar queries
Keyword search< 1msAlways (parallel with LLM)
LLM search (Gemini)200-500msWhen gemini_api_key is set
Combined (cache miss)200-500msFirst search with LLM enabled

Memory Usage

StateMemory
Idle (cached catalog loaded)~15 MB
Active (5 stdio connections)~20 MB + child processes
Active (5 HTTP connections)~18 MB
Peak (catalog refresh)~25 MB
note

stdio connections spawn child processes. Those processes have their own memory usage, typically 30-100MB each depending on the MCP server implementation. mcpzip itself stays lean.

Binary Size

BuildSize
mcpzip (Rust, release)5.8 MB
Previous Go version11 MB
Typical Node.js MCP server50-200 MB (with node_modules)

The Rust binary is statically linked with no runtime dependencies.

Connection Pooling

FeatureBehavior
Connection strategyLazy (connect on first use)
Idle timeout5 minutes (configurable)
ReconnectionAutomatic on next request
Concurrent startupAll servers connect in parallel
Per-server timeout30 seconds during catalog refresh
Call timeout120 seconds (configurable)