Skip to main content

How It Works

mcpzip acts like a receptionist for your MCP tools. Instead of every tool introducing itself to Claude on every message (flooding the context window), the receptionist directs Claude to exactly the tools it needs, on demand.

The Three Meta-Tools

Every interaction follows a simple three-step pattern: Search, Describe, Execute.

1
Search
Claude calls search_tools with a natural language query like "send a message". mcpzip searches across all upstream servers and returns matching tool names.
2
Describe (optional)
Claude calls describe_tool to get the full JSON schema for a specific tool. This reveals all parameters, types, and descriptions. Often Claude skips this if the compact search results are sufficient.
3
Execute
Claude calls execute_tool with the tool name and arguments. mcpzip routes the call to the correct upstream server, waits for the result, and returns it.
Skip the Describe Step

Claude is smart enough to infer parameters from the compact search results in many cases. The describe_tool step is optional -- Claude uses it when it needs the full schema for complex tools with many parameters.

Full Interaction Sequence

Here is what happens when Claude wants to send a Slack message:

Search Decision Tree

mcpzip uses a two-tier search system. Here is how it decides which path to take:

What is MCP?

The Model Context Protocol (MCP) is an open standard that lets AI assistants use external tools. It works like this:

  1. An MCP server exposes tools (functions the AI can call)
  2. An MCP client (like Claude Code) connects to the server
  3. The client calls tools/list to discover available tools
  4. The client calls tools/call to invoke a specific tool

Each tool has a JSON Schema describing its parameters. The problem: every tool schema gets loaded into the AI's context window, consuming tokens on every single message.

Learn more at spec.modelcontextprotocol.io.

What is a tool schema?

A tool schema is a JSON document describing a tool's interface. For example, Slack's send_message tool might have this schema:

{
"name": "send_message",
"description": "Send a message to a Slack channel",
"inputSchema": {
"type": "object",
"properties": {
"channel": { "type": "string", "description": "Channel ID or name" },
"text": { "type": "string", "description": "Message text" },
"thread_ts": { "type": "string", "description": "Thread timestamp for replies" }
},
"required": ["channel", "text"]
}
}

A typical tool schema consumes 300-500 tokens. With 10 servers averaging 25 tools each, that is 75,000-125,000 tokens loaded on every message -- before the conversation even starts.

Before vs After

Without mcpzipWith mcpzip
Tools in contextAll 250+ loaded every messageOnly 3 meta-tools loaded
Token overhead~87,500 tokens per message~1,200 tokens per message
Tool selection accuracyDegrades with more toolsStays consistent
Adding a new serverContext grows linearlyZero context impact
Startup timeMust connect to all serversInstant (serves from cache)
Search capability
Context window compression
Connection pooling

Architecture Overview

Downstream
Claude Code
mcpzip proxy
3 Meta-Tools
search_tools
describe_tool
execute_tool
Searcher
Catalog
Manager
Upstream
Slack (stdio)
Todoist (http)
GitHub (stdio)
Gmail (http)
Why not just use all tools directly?

Loading all tools directly has several compounding problems:

  1. Context window saturation: With 250 tools at ~350 tokens each, you burn 87,500 tokens before the conversation starts. On Claude with 200K context, that is 44% of your context gone.

  2. Degraded tool selection: Research shows LLMs make worse tool choices when presented with too many options. With 250 tools, Claude may pick the wrong tool or hallucinate parameters.

  3. Higher latency: More context tokens means slower inference. Every message pays the cost of processing all those tool schemas.

  4. Hard limits: Some models cap the number of tools they support. GPT-4 starts degrading past ~60 tools.

mcpzip solves all of these by giving Claude just 3 tools and letting it search for what it needs on demand.

What Happens Under the Hood

When Claude calls search_tools, mcpzip:

  1. Normalizes the query (lowercase, tokenize)
  2. Checks cache for a previous identical or similar query
  3. Runs keyword search against tool names, descriptions, and parameter names
  4. Runs LLM search (if Gemini configured) in parallel
  5. Merges results, deduplicates, and ranks by score
  6. Returns compact results in the format:
slack__send_message: Send a Slack message [channel:string*, text:string*]
telegram__send_msg: Send a Telegram message [chat_id:string*, text:string*]

The * marks required parameters. This compact format gives Claude enough information to decide which tool to use without loading the full schema.

The Compact Representation

mcpzip compresses tool schemas into a one-line format for search results:

{server}__{tool}: {description} [{param1}:{type}*, {param2}:{type}]
ComponentExamplePurpose
Server prefixslack__Identifies which upstream server owns the tool
Tool namesend_messageThe original tool name
DescriptionSend a Slack messageNatural language summary
Parameters[channel:string*, text:string*]Compact param list with types
Required marker*Asterisk marks required parameters

This representation is typically ~50 tokens compared to ~350 tokens for a full JSON schema -- a 7x compression that still gives Claude enough context to decide which tool to use.