MCP Internals: The Wire Protocol That Connects LLMs to Everything
Most descriptions of the Model Context Protocol stop at "it's like USB-C for AI." That analogy is fine for marketing. It tells you nothing about what actually happens when Claude calls a tool — which bytes travel over which transport, how capabilities are negotiated, what the rejection path looks like, or why the design makes certain tradeoffs that will bite you in production. This post goes to the wire level.
Why This Matters Right Now
MCP is rapidly becoming the default integration layer between LLMs and external systems. Claude Desktop, Cursor, and a growing list of agent frameworks all implement it. If you are building anything that gives an LLM access to tools — databases, APIs, file systems, browser automation — you will either implement MCP yourself or depend on something that does. Understanding the protocol mechanics is the difference between debugging blind and knowing exactly which message is malformed.
The Foundation: JSON-RPC 2.0
MCP is a thin semantic layer on top of JSON-RPC 2.0. If you have never touched JSON-RPC, the core idea is simple: every message is either a request, a response, or a notification, all encoded as JSON objects over some transport.
A request looks like this:
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "read_file",
"arguments": { "path": "/etc/hosts" }
}
}
The id field ties the response back to the request. Notifications drop the id — they are fire-and-forget. A response carries either a result or an error field, never both.
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"content": [
{ "type": "text", "text": "127.0.0.1 localhost\n..." }
],
"isError": false
}
}
Everything in MCP — capability negotiation, tool calls, resource reads, prompt fetches, sampling requests — is one of these three message shapes. The protocol is intentionally boring at the wire level. That boringness is a feature: any HTTP client, any subprocess library, any WebSocket stack can speak it.
Two Transports, Same Payloads
MCP defines two standard transports. The payloads are identical; only the framing differs.
stdio: The MCP client spawns the server as a child process. Messages travel as newline-delimited JSON over stdin/stdout. This is the right choice for local servers — a Python script, a Node.js binary, a compiled Rust tool. Startup latency is process-spawn latency (~50–200ms), and you get OS-level process isolation for free.
client --[JSON\n]--> server stdin
client <--[JSON\n]-- server stdout
HTTP + Server-Sent Events (SSE): The client sends requests via HTTP POST. The server streams responses and notifications back over an SSE connection the client established first. This allows remote MCP servers behind load balancers and CDNs. The SSE channel is persistent; the POST channel is stateless per request.
client --POST /message--> server (request)
client <--SSE stream---- server (responses + notifications)
A third transport — raw WebSocket — exists in drafts and some implementations, but stdio and HTTP+SSE cover the production cases. The MCP v2.1 specification (the current stable version as of early 2026) formalizes both, adds Server Cards for discovery, and tightens the authentication story.
The critical insight: because transports are just framing, you can test your MCP server logic entirely with stdio locally, then deploy the same server behind HTTP without changing a line of business logic.
The Handshake: Capability Negotiation
Before any tool gets called, client and server negotiate capabilities. This is the initialize / initialized exchange.
The client sends:
{
"jsonrpc": "2.0",
"id": 0,
"method": "initialize",
"params": {
"protocolVersion": "2025-11-25",
"capabilities": {
"roots": { "listChanged": true },
"sampling": {}
},
"clientInfo": {
"name": "claude-desktop",
"version": "3.2.1"
}
}
}
The server responds with its own capability advertisement:
{
"jsonrpc": "2.0",
"id": 0,
"result": {
"protocolVersion": "2025-11-25",
"capabilities": {
"tools": { "listChanged": true },
"resources": { "subscribe": true, "listChanged": true },
"prompts": { "listChanged": true }
},
"serverInfo": {
"name": "filesystem-server",
"version": "1.4.0"
}
}
}
Once the client sends the initialized notification (no id, no response expected), the session is live.
The capability fields are not just metadata — they gate behavior. A server that advertises tools.listChanged: true is promising it will send notifications/tools/list_changed when its tool roster changes dynamically. A client that does not advertise sampling in its capabilities is telling the server "do not try to ask me for LLM completions." Mismatched assumptions here cause silent failures that look like tools not showing up or notifications being dropped.
The Three Primitives
MCP gives servers three things to expose:
Tools are the most important. A tool is a JSON Schema-described function the LLM can call. The server lists tools via tools/list, the client passes the schemas to the LLM's context, and when the LLM generates a tool call, the client executes tools/call and returns the result.
# Minimal MCP server tool handler — pseudocode
def handle_tools_list():
return {
"tools": [{
"name": "search_web",
"description": "Search the web and return top results",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"max_results": {"type": "integer", "default": 5}
},
"required": ["query"]
}
}]
}
def handle_tools_call(name: str, arguments: dict):
if name == "search_web":
results = web_search(arguments["query"], arguments.get("max_results", 5))
return {
"content": [{"type": "text", "text": format_results(results)}],
"isError": False
}
raise McpError(-32601, f"Unknown tool: {name}")
Resources are data the server exposes for the LLM to read — files, database rows, API responses. Resources have URIs (e.g., file:///etc/hosts, postgres://mydb/users/42). The client can list resources, read them on demand, or subscribe to changes if the server supports it. Resources let you inject structured context without burning it all into the system prompt upfront.
Prompts are server-defined prompt templates — reusable message sequences the user or LLM can invoke by name with arguments. Think of them as macros: a summarize_pr prompt template that takes {repo} and {pr_number} and expands to a structured summary request. They surface in host UIs as slash commands or quick-actions.
The Tool Call Lifecycle, Precisely
Here is the full sequence for a single tool call, from LLM generation to result delivery:
sequenceDiagram
participant LLM
participant Client
participant Server
Client->>Server: initialize (capabilities)
Server-->>Client: result (server capabilities)
Client->>Server: initialized (notification)
Client->>Server: tools/list
Server-->>Client: {tools: [...schemas...]}
Client->>LLM: system prompt + tool schemas
LLM-->>Client: generates tool_use block {name, input}
Client->>Server: tools/call {name, arguments}
Server-->>Client: {content: [...], isError: false}
Client->>LLM: tool_result block
LLM-->>Client: continues generation
The LLM never talks to the MCP server directly. The client mediates everything. This is intentional — the client controls which tool schemas are injected into context, can rate-limit or filter tool calls, and owns the trust boundary between the model and external systems.
The content array in a tool result can contain multiple content blocks: text, image (base64-encoded), or resource (a reference to a resource URI). This lets a tool return rich multi-modal results — a screenshot alongside a text description, for instance.
Sampling: The Inversion That Most Engineers Miss
Most MCP documentation focuses on clients calling servers to run tools. Sampling is the reverse: the server asks the client to perform an LLM completion.
The server sends a sampling/createMessage request:
{
"jsonrpc": "2.0",
"id": 7,
"method": "sampling/createMessage",
"params": {
"messages": [
{"role": "user", "content": {"type": "text", "text": "Summarize this diff: ..."}}
],
"maxTokens": 512
}
}
The client decides whether to honor it — checking user consent, rate limits, cost controls — routes it to whatever LLM it is using, and returns the completion. The server never touches an LLM API key directly.
This enables a class of servers that are themselves mini-agents: they call tools, receive results, ask the LLM to reason over them, act on the reasoning, and repeat. All through the MCP protocol, all mediated by a client that can log, audit, or throttle every step. It is a clean pattern for building recursive reasoning loops without coupling your server code to a specific LLM vendor or SDK.
Roots: Grounding Servers in Your Filesystem
The roots capability lets the client tell the server which filesystem paths or URI prefixes it should consider in scope. After initialization, the client responds to roots/list with entries like:
{
"roots": [
{ "uri": "file:///home/user/project", "name": "My Project" }
]
}
Servers should treat roots as a scope hint — a file server should not crawl /etc if the client only advertised the project directory. When roots change (the user opens a different workspace), the client sends a notifications/roots/list_changed notification and the server re-fetches. This is how Claude Desktop scopes a filesystem server to your current working directory without needing any server-side configuration.
Error Handling and the Failure Modes That Will Find You
JSON-RPC errors use standard codes. -32700 is parse error (malformed JSON), -32600 is invalid request, -32601 is method not found, -32602 is invalid params. MCP adds application-level errors on top: a tool can return isError: true in its result to signal a handled error (file not found, API timeout) distinct from a protocol error. This matters because a protocol error aborts the session; an application error just returns a result the LLM can reason about.
The failure modes worth knowing before you ship:
Version mismatch: Client and server advertise different protocolVersion strings. The spec says the server should reject the session if it cannot support the client's version. In practice, many implementations silently continue with degraded behavior. Always log the negotiated version on both sides.
Capability assumption: A server sends notifications/tools/list_changed but the client never registered interest in it. The notification is silently dropped. Your tool roster appears stale. Check capability negotiation first when tools go missing.
Oversized tool schemas: Claude and other models have context limits. If you expose 200 tools, the combined JSON Schema consumes thousands of tokens before the user's message even appears. Use tools/list pagination or filter what you expose per-session based on user intent.
stdio buffering: Under stdio transport, servers must flush stdout after every message. Line-buffered stdout (the Python default for terminals) is fine. Fully buffered stdout (the default when stdout is a pipe) will cause the client to hang forever waiting for bytes sitting in a kernel buffer. Add sys.stdout.reconfigure(line_buffering=True) at server startup.
Long-running tools and timeouts: There is no built-in MCP mechanism for streaming tool progress. A tool that runs for 30 seconds returns nothing until it finishes. Clients set their own timeouts. Design long-running tools to return intermediate results as resources the client can poll, or split them into stages.
Security: The Trust Model You Need to Understand
MCP's trust model has three zones: the user, the client (the host application), and the server.
The client is the trust broker. It decides which servers to connect to, which tool calls to allow without confirmation, and whether to honor sampling requests. The server is untrusted by default — it can only do what the client permits.
This means the interesting attack surface sits between the client and server. A malicious MCP server can lie in tool descriptions to manipulate the LLM's behavior (prompt injection via schema), request sampling calls that exfiltrate context, or return crafted tool results designed to poison subsequent reasoning steps. Client implementors must sanitize tool schemas before injecting them into LLM context. Users should treat third-party MCP servers with the same skepticism they would apply to third-party npm packages — they execute code on your machine.
The MCP v2.1 spec introduces a more formal authentication story: OAuth 2.0 for remote HTTP servers, with the server advertising its authorization endpoint via a .well-known/mcp-server Server Card. Most client implementations do not fully support this yet, but it is the direction the ecosystem is heading.
The Practitioner's Lens
If you are shipping an agent product today, MCP gives you a clean separation between your orchestration logic (the client) and your capability extensions (servers). The stdio transport makes local testing trivially reproducible — CI can spin up an MCP server as a child process and run deterministic tool call tests without any network. The SSE transport makes remote servers deployable behind standard infrastructure. The capability negotiation gives you a versioning story without breaking changes. The sampling primitive lets you build recursive agent loops without coupling your server code to a specific LLM vendor.
The thing most teams get wrong: they build one monolithic MCP server that does everything. The protocol is designed for composition — multiple specialized servers, each exposing a small, well-defined set of tools. A file server, a web search server, a database server, a code execution server. The client assembles them. This keeps individual servers testable, replaceable, and scoped to least privilege. Treat MCP servers like Unix processes: small, composable, doing one thing well.