Internals

A deep dive into mcpzip's internal architecture, data flow, and algorithms. This page is for contributors and curious engineers who want to understand how the proxy works under the hood.

Core Types

What is Arc and RwLock?

These are Rust concurrency primitives:

Arc<T> (Atomic Reference Counted) -- a thread-safe smart pointer that allows shared ownership of a value. Multiple parts of the code can hold a reference to the same data.
RwLock<T> (Read-Write Lock) -- allows multiple concurrent readers OR a single writer. The catalog uses this so searches can read concurrently while background refresh can write.

Together, Arc<RwLock<HashMap>> gives mcpzip a thread-safe, concurrent tool catalog.

Startup Sequence

Non-Blocking Startup

mcpzip serves from cache immediately. The background refresh runs concurrently and updates the catalog in-place. First request is served in under 1ms from cache.

Tool Call Lifecycle

Connection Pool State Machine

Each upstream server connection follows this state machine:

Connection Pool Properties

Property	Value	Configurable
Default idle timeout	5 minutes	`idle_timeout_minutes`
Default call timeout	120 seconds	`call_timeout_seconds`
Concurrent list_tools timeout	30 seconds per server	No
Reconnection strategy	Automatic on next use	No

Catalog Refresh Merge Algorithm

When the background refresh completes, mcpzip merges new tool data with the existing cache:

Why Partial Merge?

Consider this scenario:

You have 5 servers configured: Slack, GitHub, Todoist, Gmail, Linear
On startup, mcpzip loads 250 cached tools
Background refresh connects to all 5 servers
Todoist is temporarily down
The other 4 servers respond with their tools

Without partial merge: You would lose all Todoist tools until the next refresh.

With partial merge: Todoist's cached tools are preserved. The catalog stays complete with all 250 tools, with 4 servers' tools freshly updated.

tip

This is why mcpzip always serves from cache first. Even if every upstream server is down, the cached catalog ensures Claude can still search and describe tools.

Tool Name Convention

Tools are namespaced as {server}__{tool}, using double underscore as a separator:

slack__send_message
todoist__create_task
gmail-personal__send_email

Separator	Problem
Single `_`	Common in tool names (`send_message`)
`.`	Used in namespaces, can confuse parsers
`/`	URL separator, can break routing
`__`	Rare in tool names, easy to split on first occurrence

The split happens on the first __ occurrence. So a__b__c resolves to server a, tool b__c.

Project Structure

src/
  main.rs              Entry point, CLI dispatch
  lib.rs               Module declarations
  config.rs            Config loading, validation, paths
  error.rs             Error types (McpzipError enum)
  types.rs             Core types: ToolEntry, ServerConfig, ProxyConfig

  cli/
    mod.rs             CLI definition (clap)
    serve.rs           serve command, MCP server setup
    init.rs            Interactive setup wizard
    migrate.rs         Claude Code config migration

  auth/
    oauth.rs           OAuth 2.1 handler (PKCE, browser flow)
    store.rs           Token persistence (disk storage)

  proxy/
    server.rs          ProxyServer: meta-tool handlers
    handlers.rs        search/describe/execute implementation

  catalog/
    catalog.rs         Catalog: tool storage, disk cache, refresh

  search/
    keyword.rs         KeywordSearcher: tokenization, scoring
    gemini.rs          GeminiSearcher: LLM-powered search
    orchestrator.rs    OrchestratedSearcher: merge + cache
    cache.rs           QueryCache: normalized query caching

  transport/
    manager.rs         Manager: connection pool
    stdio.rs           StdioUpstream: process spawning, NDJSON
    http.rs            HttpUpstream: Streamable HTTP, SSE, OAuth
    sse.rs             SseUpstream: legacy SSE transport

  mcp/
    protocol.rs        MCP protocol types (JSON-RPC, tool schemas)
    server.rs          McpServer: NDJSON stdio server
    transport.rs       McpTransport trait, NdjsonTransport

Memory Layout

Component	Typical Size	Notes
Tool catalog	2-5 MB	Depends on tool count and schema sizes
Connection pool	1-2 MB per active connection	stdio processes have their own memory
Query cache	Under 1 MB	Bounded by unique queries
Base runtime	~8 MB	Tokio runtime, MCP server, etc.
Total (idle)	~15 MB	With cached catalog, no active connections

note

stdio connections (spawned processes) run as separate OS processes with their own memory. The figures above are for mcpzip itself, not the upstream servers it manages.

Core Types​

Startup Sequence​

Tool Call Lifecycle​

Connection Pool State Machine​

Connection Pool Properties​

Catalog Refresh Merge Algorithm​

Why Partial Merge?​

Tool Name Convention​

Project Structure​

Memory Layout​