Aly Sawft · Founder & Engineer, Sawftware LLC · July 1, 2026 · 8 min read

What is the Model Context Protocol and why document tools need it

What MCP is

The Model Context Protocol (MCP) is an open standard developed by Anthropic that defines how AI agents discover and call tools. Before MCP, every AI framework had its own tool-calling convention: LangChain tools, OpenAI function calling, Claude tool_use — each with different schemas, different discovery mechanisms, and different semantics.

MCP standardizes the interface. An MCP server exposes tools with:

A machine-readable description of what the tool does
A JSON Schema for the input parameters
A standardized response format

An MCP client (any MCP-compatible agent: Claude, LangChain, Cursor, Zed, and others) can discover and call any MCP server without custom integration code. The agent learns what tools are available, chooses the right one for the task, and calls it with structured parameters.

The analogy is REST for APIs: just as REST standardized how servers expose resources over HTTP, MCP standardizes how tools are exposed to AI agents. You build the tool once; every MCP-compatible agent can use it.

Why document tools specifically need MCP

Document intelligence — extracting, analyzing, verifying, and searching documents — is one of the most common tasks AI agents are asked to perform. But document APIs are historically complex:

Multiple endpoints for different operations (extract, summarize, verify, search)
Binary inputs (PDFs, images) that agents cannot easily pass through text-based tool calls
Multi-step workflows (capture → index → search → verify citation)
Results that need to be referenced across multiple agent turns (bundle IDs, collection IDs)

Without MCP, wiring a document API into an agent requires custom integration code per framework: a LangChain tool definition, a Claude tool_use schema, an OpenAI function spec. Each needs to be maintained separately and updated when the API changes.

With MCP, you define the tools once on the server. Every MCP-compatible agent uses them without additional integration work. The agent can discover the full document intelligence capability surface — all 14 DocImprint tools — from a single endpoint.

DocImprint's 14 MCP tools

The DocImprint MCP server exposes document intelligence as a complete tool suite. Each tool has a machine-readable description, input schema, and structured output:

Extract tools:

extract_text — extract Markdown text from a URL or uploaded file
extract_tables — extract structured tables using Textract (better accuracy for data-heavy documents)
parse_invoice — extract structured invoice fields: vendor, line items, totals, dates
extract_structured — extract any structured data against a custom JSON schema (BYOS mode)
extract_url — focused URL extraction with lean response (no bundle stored)

Analysis tools:

check_claims — evaluate a list of claims against a document: supported / contradicted / not_found
summarize_document — generate a summary with key points and word count
summarize_url — lean summary without storage
qa_url — answer a question about a URL with cited evidence
translate_url — translate a URL's content to any language
verify_bundle — verify bundle integrity: re-hash artifacts and check the manifest signature

Collection tools:

create_collection — create a named document corpus
search_collection — semantic search across all indexed bundles in a collection
ask_collection — cross-document Q&A with Merkle-verified citations

jsonConfigure DocImprint MCP in Claude Desktop

// claude_desktop_config.json
{
  "mcpServers": {
    "docimprint": {
      "command": "npx",
      "args": ["mcp-remote", "https://api.docimprint.com/mcp"],
      "env": {
        "AUTHORIZATION": "Bearer dr_live_..."
      }
    }
  }
}

MCP vs raw HTTP for agent integrations

DocImprint exposes the same capabilities over MCP and REST. When should an agent use MCP vs direct HTTP?

Use MCP when:

Your agent framework already supports MCP (Claude Desktop, Cursor, LangChain MCP adapter)
You want tool discovery without reading OpenAPI specs
Binary document handling is abstracted — the MCP server accepts URLs and handles fetch/render internally
You are prototyping agent workflows and want the fastest path to working tool calls

Use raw HTTP when:

You need fine-grained control over request timing, retries, and error handling
You are building a backend service (not an interactive agent) with established HTTP client libraries
You want x402 payment flow with custom wallet signing logic
You need webhook-driven async jobs with your own queue infrastructure

Both paths hit the same Cloudflare Workers backend. MCP tools are thin wrappers over the REST endpoints — no capability gap between protocols. An agent configured with MCP can call extract_text; a backend service can call POST /v1/extract with identical results.

For production agent marketplaces, MCP is increasingly the default discovery layer. For enterprise integrations with existing API gateways, REST with API keys remains common. DocImprint supports both without forcing a choice.

How tool discovery works

When an MCP client connects to the DocImprint MCP server, it calls the list_tools method. The server returns descriptions and schemas for all 14 tools. The agent stores these in its working memory for the session.

When the user (or orchestrating agent) asks a question, the agent selects the appropriate tool. The selection is based on the tool description — this is where writing good descriptions matters. DocImprint's tool descriptions specify not just what the tool does, but when to use it:

extract_text: "Use when you need the full text content of a document for reading or analysis"
check_claims: "Use when you need to verify whether specific statements are supported by a document"
ask_collection: "Use when you need to answer a question across multiple documents in a matter corpus"

This guidance helps the agent choose correctly without being explicitly told which tool to use for each request.

A typical agent workflow over MCP

Here is a complete workflow showing how an agent uses DocImprint MCP tools to answer a compliance question:

User: "Do all of our supplier contracts from Q1 2026 include a data processing addendum?"
Agent calls create_collection (or uses an existing collection ID) with the collection of Q1 supplier contracts.
Agent calls ask_collection with the question "Does this contract include a data processing addendum or DPA clause?"
The MCP server runs vector search across the indexed bundles, retrieves relevant chunks, generates an answer, and returns: a verdict per bundle (yes/no/unclear), the cited passages with chunk_ids, and the bundle_ids where the relevant clauses were found.
Agent calls check_claims on any ambiguous bundles: "This contract includes a data processing addendum" — to get a structured supported/contradicted/not_found verdict with evidence.
Agent calls verify_bundle on the bundles before presenting the final answer — confirming integrity before citing them.
Agent returns a complete answer: which contracts include a DPA, which do not, with citations traceable to specific paragraphs in each contract.

The entire workflow runs through MCP tool calls. The agent does not need to know anything about DocImprint's HTTP API. It uses tools the way a human uses a word processor — pick the right one for the task, use it, move to the next step.

A2A and OpenAPI: the broader discovery stack

MCP is one layer in DocImprint's agent discovery stack. The full stack:

MCP (https://api.docimprint.com/mcp): tool-oriented interface for document intelligence operations. Best for agents that use tools interactively.

A2A agent card (https://api.docimprint.com/.well-known/agent.json): Google's Agent-to-Agent protocol. Declares DocImprint as an agent with capabilities, endpoints, and authentication requirements. Used by agent orchestrators that route tasks between agents.

OpenAPI 3.1 (https://api.docimprint.com/openapi.json): machine-readable HTTP spec for the full REST API. Used by code generation tools, SDK generators, and agents that prefer raw HTTP over MCP.

SKILL.md (https://api.docimprint.com/SKILL.md): a plain-text capability declaration that LLMs can read directly. Describes what DocImprint does, when to use it, and how to call it — without any protocol overhead.

Together these four endpoints mean any agent, regardless of which discovery protocol it uses, can find and use DocImprint without manual configuration.

Setting up DocImprint MCP: two minutes

Connecting DocImprint MCP to Claude Desktop or any MCP-compatible agent requires one configuration block and an API key.

Get an API key from docimprint.com/get-started. Then add the server configuration to your MCP client. For Claude Desktop, edit the config file at ~/Library/Application Support/Claude/claude_desktop_config.json on macOS.

The mcp-remote package (from npm) proxies remote MCP servers over stdio, which is what most desktop MCP clients expect. The AUTHORIZATION environment variable passes your API key to the remote server.

After restarting Claude Desktop, you should see DocImprint's tools in the tool list. Ask Claude "what DocImprint tools do you have?" to confirm discovery. Then try: "summarize this PDF" with a URL — Claude will call extract_text or summarize_document automatically.

bashVerify MCP connection and list available tools

# Test the MCP server directly (requires mcp-remote or a local MCP client)
# Or use the REST API equivalent to verify your key works:
curl https://api.docimprint.com/v1/summarize \
  -H "Authorization: Bearer dr_live_..." \
  -H "Content-Type: application/json" \
  -d '{"source":"https://example.com/annual-report.pdf"}'

# Once MCP is configured in Claude Desktop:
# "Summarize the key findings in this annual report: https://example.com/annual-report.pdf"
# Claude will call summarize_document automatically

What comes next for MCP and documents

MCP is young — the spec was released in late 2024 and the ecosystem is growing quickly. Several developments are worth watching for document intelligence specifically:

Streaming tool responses: current MCP tools return a complete response synchronously. For large document extractions, streaming (returning partial results as they are available) would significantly improve perceived latency. This is on the MCP roadmap.

Tool chaining in the protocol: today, agents chain tools by calling them sequentially. Future MCP versions may support declarative tool chains, letting the server express "after extract_text, you probably want check_claims" — reducing agent reasoning overhead.

Persistent sessions and context: an agent currently cannot "remember" a collection ID across MCP sessions without storing it externally. Protocol-level session state would make multi-turn document workflows more natural.

Cross-agent tool sharing: if Agent A creates a collection, Agent B should be able to use it without re-authenticating. This is an open problem in multi-agent MCP deployments — wallet-based identity (x402) is part of the answer.

The combination of MCP discovery + x402 payment + evidence bundles represents a complete protocol stack for trustworthy, economically self-sustaining AI agent document workflows. That is the direction the agent economy is moving.

MCP server

Full tool reference and configuration

For agents

Agent discovery and marketplace listing

Agent integration docs

MCP and A2A setup guide

x402 protocol

How agent payments work