Internals
A deep dive into mcpzip's internal architecture, data flow, and algorithms. This page is for contributors and curious engineers who want to understand how the proxy works under the hood.
Core Types
What is Arc and RwLock?
These are Rust concurrency primitives:
Arc<T>(Atomic Reference Counted) -- a thread-safe smart pointer that allows shared ownership of a value. Multiple parts of the code can hold a reference to the same data.RwLock<T>(Read-Write Lock) -- allows multiple concurrent readers OR a single writer. The catalog uses this so searches can read concurrently while background refresh can write.
Together, Arc<RwLock<HashMap>> gives mcpzip a thread-safe, concurrent tool catalog.
Startup Sequence
mcpzip serves from cache immediately. The background refresh runs concurrently and updates the catalog in-place. First request is served in under 1ms from cache.
Tool Call Lifecycle
Connection Pool State Machine
Each upstream server connection follows this state machine:
Connection Pool Properties
| Property | Value | Configurable |
|---|---|---|
| Default idle timeout | 5 minutes | idle_timeout_minutes |
| Default call timeout | 120 seconds | call_timeout_seconds |
| Concurrent list_tools timeout | 30 seconds per server | No |
| Reconnection strategy | Automatic on next use | No |
Catalog Refresh Merge Algorithm
When the background refresh completes, mcpzip merges new tool data with the existing cache:
Why Partial Merge?
Consider this scenario:
- You have 5 servers configured: Slack, GitHub, Todoist, Gmail, Linear
- On startup, mcpzip loads 250 cached tools
- Background refresh connects to all 5 servers
- Todoist is temporarily down
- The other 4 servers respond with their tools
Without partial merge: You would lose all Todoist tools until the next refresh.
With partial merge: Todoist's cached tools are preserved. The catalog stays complete with all 250 tools, with 4 servers' tools freshly updated.
This is why mcpzip always serves from cache first. Even if every upstream server is down, the cached catalog ensures Claude can still search and describe tools.
Tool Name Convention
Tools are namespaced as {server}__{tool}, using double underscore as a separator:
slack__send_message
todoist__create_task
gmail-personal__send_email
| Separator | Problem |
|---|---|
Single _ | Common in tool names (send_message) |
. | Used in namespaces, can confuse parsers |
/ | URL separator, can break routing |
__ | Rare in tool names, easy to split on first occurrence |
The split happens on the first __ occurrence. So a__b__c resolves to server a, tool b__c.
Project Structure
src/
main.rs Entry point, CLI dispatch
lib.rs Module declarations
config.rs Config loading, validation, paths
error.rs Error types (McpzipError enum)
types.rs Core types: ToolEntry, ServerConfig, ProxyConfig
cli/
mod.rs CLI definition (clap)
serve.rs serve command, MCP server setup
init.rs Interactive setup wizard
migrate.rs Claude Code config migration
auth/
oauth.rs OAuth 2.1 handler (PKCE, browser flow)
store.rs Token persistence (disk storage)
proxy/
server.rs ProxyServer: meta-tool handlers
handlers.rs search/describe/execute implementation
catalog/
catalog.rs Catalog: tool storage, disk cache, refresh
search/
keyword.rs KeywordSearcher: tokenization, scoring
gemini.rs GeminiSearcher: LLM-powered search
orchestrator.rs OrchestratedSearcher: merge + cache
cache.rs QueryCache: normalized query caching
transport/
manager.rs Manager: connection pool
stdio.rs StdioUpstream: process spawning, NDJSON
http.rs HttpUpstream: Streamable HTTP, SSE, OAuth
sse.rs SseUpstream: legacy SSE transport
mcp/
protocol.rs MCP protocol types (JSON-RPC, tool schemas)
server.rs McpServer: NDJSON stdio server
transport.rs McpTransport trait, NdjsonTransport
Memory Layout
| Component | Typical Size | Notes |
|---|---|---|
| Tool catalog | 2-5 MB | Depends on tool count and schema sizes |
| Connection pool | 1-2 MB per active connection | stdio processes have their own memory |
| Query cache | Under 1 MB | Bounded by unique queries |
| Base runtime | ~8 MB | Tokio runtime, MCP server, etc. |
| Total (idle) | ~15 MB | With cached catalog, no active connections |
stdio connections (spawned processes) run as separate OS processes with their own memory. The figures above are for mcpzip itself, not the upstream servers it manages.