How It Works

mcpzip acts like a receptionist for your MCP tools. Instead of every tool introducing itself to Claude on every message (flooding the context window), the receptionist directs Claude to exactly the tools it needs, on demand.

The Three Meta-Tools

Every interaction follows a simple three-step pattern: Search, Describe, Execute.

Claude calls search_tools with a natural language query like "send a message". mcpzip searches across all upstream servers and returns matching tool names.

Describe (optional)

Claude calls describe_tool to get the full JSON schema for a specific tool. This reveals all parameters, types, and descriptions. Often Claude skips this if the compact search results are sufficient.

Execute

Claude calls execute_tool with the tool name and arguments. mcpzip routes the call to the correct upstream server, waits for the result, and returns it.

Skip the Describe Step

Claude is smart enough to infer parameters from the compact search results in many cases. The describe_tool step is optional -- Claude uses it when it needs the full schema for complex tools with many parameters.

Full Interaction Sequence

Here is what happens when Claude wants to send a Slack message:

Search Decision Tree

mcpzip uses a two-tier search system. Here is how it decides which path to take:

What is MCP?

The Model Context Protocol (MCP) is an open standard that lets AI assistants use external tools. It works like this:

An MCP server exposes tools (functions the AI can call)
An MCP client (like Claude Code) connects to the server
The client calls tools/list to discover available tools
The client calls tools/call to invoke a specific tool

Each tool has a JSON Schema describing its parameters. The problem: every tool schema gets loaded into the AI's context window, consuming tokens on every single message.

Learn more at spec.modelcontextprotocol.io.

What is a tool schema?

A tool schema is a JSON document describing a tool's interface. For example, Slack's send_message tool might have this schema:

{
  "name": "send_message",
  "description": "Send a message to a Slack channel",
  "inputSchema": {
    "type": "object",
    "properties": {
      "channel": { "type": "string", "description": "Channel ID or name" },
      "text": { "type": "string", "description": "Message text" },
      "thread_ts": { "type": "string", "description": "Thread timestamp for replies" }
    },
    "required": ["channel", "text"]
  }
}

A typical tool schema consumes 300-500 tokens. With 10 servers averaging 25 tools each, that is 75,000-125,000 tokens loaded on every message -- before the conversation even starts.

Before vs After

	Without mcpzip	With mcpzip
Tools in context	All 250+ loaded every message	Only 3 meta-tools loaded
Token overhead	~87,500 tokens per message	~1,200 tokens per message
Tool selection accuracy	Degrades with more tools	Stays consistent
Adding a new server	Context grows linearly	Zero context impact
Startup time	Must connect to all servers	Instant (serves from cache)
Search capability	✗	✓
Context window compression	✗	✓
Connection pooling	✗	✓

Architecture Overview

Downstream

Claude Code

→

mcpzip proxy

3 Meta-Tools

search_tools

describe_tool

execute_tool

Searcher

Catalog

Manager

→

Upstream

Slack (stdio)

Todoist (http)

GitHub (stdio)

Gmail (http)

Why not just use all tools directly?

Loading all tools directly has several compounding problems:

Context window saturation: With 250 tools at ~350 tokens each, you burn 87,500 tokens before the conversation starts. On Claude with 200K context, that is 44% of your context gone.
Degraded tool selection: Research shows LLMs make worse tool choices when presented with too many options. With 250 tools, Claude may pick the wrong tool or hallucinate parameters.
Higher latency: More context tokens means slower inference. Every message pays the cost of processing all those tool schemas.
Hard limits: Some models cap the number of tools they support. GPT-4 starts degrading past ~60 tools.

mcpzip solves all of these by giving Claude just 3 tools and letting it search for what it needs on demand.

What Happens Under the Hood

search_tools
describe_tool
execute_tool

When Claude calls search_tools, mcpzip:

Normalizes the query (lowercase, tokenize)
Checks cache for a previous identical or similar query
Runs keyword search against tool names, descriptions, and parameter names
Runs LLM search (if Gemini configured) in parallel
Merges results, deduplicates, and ranks by score
Returns compact results in the format:

slack__send_message: Send a Slack message [channel:string*, text:string*]
telegram__send_msg: Send a Telegram message [chat_id:string*, text:string*]

The * marks required parameters. This compact format gives Claude enough information to decide which tool to use without loading the full schema.

When Claude calls describe_tool, mcpzip:

Looks up the tool in the in-memory catalog by prefixed name
Returns the full JSON schema including all properties, types, descriptions, and constraints

This is the same schema the upstream server would return from tools/list, but loaded on demand instead of upfront.

{
  "name": "slack__send_message",
  "description": "Send a message to a Slack channel",
  "inputSchema": {
    "type": "object",
    "properties": {
      "channel": { "type": "string", "description": "Channel ID or name" },
      "text": { "type": "string", "description": "Message text" },
      "thread_ts": { "type": "string", "description": "Thread timestamp" }
    },
    "required": ["channel", "text"]
  }
}

When Claude calls execute_tool, mcpzip:

Parses the prefixed name (slack__send_message becomes server: slack, tool: send_message)
Gets a connection to the upstream server from the connection pool
Calls tools/call on the upstream server with the original tool name and arguments
Handles timeout (per-call or global default)
Returns the raw result from the upstream server

If the connection was idle, it is automatically re-established. If the server is unreachable, a clear error is returned.

The Compact Representation

mcpzip compresses tool schemas into a one-line format for search results:

{server}__{tool}: {description} [{param1}:{type}*, {param2}:{type}]

Component	Example	Purpose
Server prefix	`slack__`	Identifies which upstream server owns the tool
Tool name	`send_message`	The original tool name
Description	`Send a Slack message`	Natural language summary
Parameters	`[channel:string, text:string]`	Compact param list with types
Required marker	`*`	Asterisk marks required parameters

This representation is typically ~50 tokens compared to ~350 tokens for a full JSON schema -- a 7x compression that still gives Claude enough context to decide which tool to use.

The Three Meta-Tools​

Full Interaction Sequence​

Search Decision Tree​

Before vs After​

Architecture Overview​

What Happens Under the Hood​

The Compact Representation​