Aly Sawft · Founder & Engineer, Sawftware LLC · · 9 min read
In physical evidence law, chain of custody is the documented record of who handled a piece of evidence, when, and under what conditions. Digital evidence rules under Federal Rule of Evidence 901 require authentication — a cryptographic hash taken at capture time satisfies this for digital documents.
For AI-extracted content, the problem compounds: not only must you prove the original document existed and was unmodified, you must also prove that the extraction output accurately reflects what was in that document at capture time. An AI that paraphrases, abbreviates, or hallucinates produces output that cannot be traced back to the source.
A cryptographic chain of custody solves both problems. This guide walks through building one with DocImprint, covering the full lifecycle from initial capture through legal hold, on-chain timestamping, and court-ready export.
The chain begins at capture. Use POST /v1/extract with store=true (the default) to create an evidence bundle. The request includes the source document (URL or file upload) and the outputs you need.
The response includes:
Store bundle_id and manifest_sha256 in your own system of record immediately. Do not rely solely on DocImprint to custody these values — they are your independent verification anchors.
# Upload a PDF directly
curl -X POST https://api.docimprint.com/v1/extract \
-H "Authorization: Bearer dr_live_..." \
-H "Content-Type: application/pdf" \
-H "X-Source-Filename: contract-2026.pdf" \
--data-binary @contract-2026.pdf
# Or capture from URL
curl -X POST https://api.docimprint.com/v1/extract \
-H "Authorization: Bearer dr_live_..." \
-H "Content-Type: application/json" \
-d '{"source":"https://example.com/filing.pdf","include":["markdown","screenshot","summary"]}'
# Store these values in your own database:
# bundle_id: ev_abc123
# manifest_sha256: a3f2c1d4e5b6...
# captured_at: 2026-06-24T10:30:00ZAs soon as a bundle is captured in a matter with litigation potential, place it under legal hold. Legal hold prevents deletion by the API, by retention policies, and by any automated garbage collection.
In e-discovery terms, a legal hold obligation arises when litigation is "reasonably anticipated" — often before a complaint is filed. Placing holds proactively is better practice than scrambling after the fact.
PUT /v1/extract/:id/hold requires the hold_reason, the holder identity, and an optional hold_until date. The response confirms the hold is active and returns a hold_id for your records.
A held bundle returns 409 LEGAL_HOLD on any DELETE attempt. Notarized bundles additionally require acknowledge_notarized: true before deletion (after the hold is released).
curl -X PUT https://api.docimprint.com/v1/extract/ev_abc123/hold \
-H "Authorization: Bearer dr_live_..." \
-H "Content-Type: application/json" \
-d '{
"hold_reason": "Active litigation: Smith v. Acme Corp, Case No. 2026-CV-1234",
"held_by": "legal@acme.com",
"hold_until": "2028-01-01"
}'
# Response: { "hold_id": "hold_xyz", "held_at": "2026-06-24T10:31:00Z", "status": "active" }
# Confirm hold is active
curl https://api.docimprint.com/v1/extract/ev_abc123/verify
# { "valid": true, "legal_hold": true, "hold_id": "hold_xyz", "held_by": "legal@acme.com" }Legal hold protects the bundle from deletion. On-chain notarization proves it existed at a specific time.
POST /v1/extract/:id/notarize submits the manifest_sha256 to Base L2. The transaction writes the hash as calldata — minimal and permanent. Optionally, an EAS (Ethereum Attestation Service) attestation is also created, which is structured, queryable, and revocable if needed.
The on-chain transaction hash and block number are stored in the bundle and returned by GET .../verify. Anyone with a Base RPC node can verify the transaction independently: look up the hash on Basescan, confirm the calldata matches the manifest_sha256, and read the block timestamp.
This produces an independently verifiable timestamp: "this document, containing this content, was captured and hashed before block 12345678 on Base, at approximately 10:32 AM UTC on 2026-06-24." DocImprint does not need to be trusted for this assertion — the blockchain record speaks for itself.
curl -X POST https://api.docimprint.com/v1/extract/ev_abc123/notarize \
-H "Authorization: Bearer dr_live_..."
# Response: {
# "tx_hash": "0x1a2b3c...",
# "block": 12345678,
# "block_timestamp": "2026-06-24T10:32:15Z",
# "network": "base",
# "eas_uid": "0xabc123..."
# }
# Store tx_hash and eas_uid in your system of record alongside bundle_idDo not rely solely on DocImprint for custodying the artifacts. Download the full evidence ZIP and store it in your own secure archive (S3, Azure Blob, on-premises storage, encrypted backup).
GET /v1/extract/:id/download returns a ZIP containing manifest.json plus every artifact stored at capture time. The ZIP is the complete, self-contained evidence package.
Store the ZIP alongside your bundle_id and manifest_sha256 in your system of record. If DocImprint is ever unavailable (or if you want to verify independently), the ZIP plus the public key at /.well-known/docimprint-keys.json is all you need for full offline verification.
Date the storage entry. Your system of record now contains: bundle_id, manifest_sha256, DocImprint signature, on-chain tx_hash, and a local copy of the artifacts — a complete chain of custody entry.
# Download artifact ZIP (free endpoint, no payment required)
curl -o "evidence-ev_abc123.zip" \
https://api.docimprint.com/v1/extract/ev_abc123/download \
-H "Authorization: Bearer dr_live_..."
# The ZIP contains:
# - manifest.json (with manifest_sha256, signature, all artifact hashes)
# - markdown.md
# - screenshot.png
# - original.pdf (if uploaded)
# - ocr.txt
# Local verification (any SHA-256 tool):
# sha256sum markdown.md # compare to manifest.json artifacts.markdown.sha256A chain of custody log is a record of every action taken on the evidence. DocImprint supports this natively:
POST /v1/extract/:id/provenance records an agent action: what agent did what to the bundle, when, and with what result. POST /v1/extract/:id/handoff records when the bundle was passed from one agent or person to another.
GET /v1/extract/:id/chain returns the full delegation graph. This is the chain of custody record for multi-agent and multi-user workflows.
Supplement this with your own system's audit log. Every API call that touches a bundle should be logged in your own system alongside the DocImprint provenance entries. A complete chain of custody includes:
DocImprint's provenance endpoints return structured JSON suitable for append-only audit logs. Export GET /v1/extract/:id/chain periodically to your SIEM or compliance archive. Combine with your application's own access logs for a complete picture of who touched the evidence and when.
# Record an agent action on a bundle
curl -X POST https://api.docimprint.com/v1/extract/ev_abc123/provenance \
-H "Authorization: Bearer dr_live_..." \
-H "Content-Type: application/json" \
-d '{"agent_id":"compliance-bot-v2","action":"claim_check","note":"Verified DPA clause presence"}'
# Retrieve full chain of custody
curl https://api.docimprint.com/v1/extract/ev_abc123/chain \
-H "Authorization: Bearer dr_live_..."Chain of custody requirements appear across multiple regulatory frameworks, though none mandate DocImprint specifically. The pattern — capture, hash, timestamp, hold, audit — maps to common obligations:
eDiscovery (FRCP Rule 34): producing parties must preserve and produce electronically stored information. A broken chain of custody can lead to sanctions. Cryptographic hashes plus on-chain timestamps provide defensible preservation records.
Financial recordkeeping (SEC, FINRA): broker-dealers and investment advisers must maintain records with integrity controls. Evidence bundles with manifest signatures satisfy "reasonable safeguards against alteration" when paired with independent verification.
Healthcare (HIPAA): while DocImprint is not a HIPAA BAA-covered service by default, the tamper-evident bundle model supports audit trails for document processing in compliance workflows where PHI handling is governed by your own policies.
EU AI Act (high-risk systems): emerging requirements for documentation of AI system inputs and outputs. Evidence bundles with Merkle citation proofs provide traceable input-output linkage that generic RAG pipelines cannot produce.
The common thread: regulators and courts ask "how do you know this is what the document said?" Cryptographic chain of custody turns that question into a verifiable procedure, not a trust assertion.
When the time comes to produce evidence — in discovery, a regulatory inquiry, or litigation — the evidence package consists of:
With these five components, opposing counsel can independently verify: the document existed, contained this content (per the artifacts), at this time (per the on-chain timestamp), and has not been modified since (per the hash check). DocImprint's cooperation is not required for any of the verification steps.
This is the difference between "here is a document our system processed" and "here is cryptographically verifiable evidence with an independent on-chain timestamp and an auditable chain of custody."
Documents are amended. Contracts are revised. Filings are corrected. Your chain of custody system must handle document versioning without destroying the evidence record for prior versions.
When a new version of a document is captured, include parent_bundle_id in the POST /v1/extract request:
curl -X POST .../extract -d '{"source":"...","parent_bundle_id":"ev_abc123"}'
The new bundle is linked to the prior bundle. GET /v1/extract/:id/history returns the full version chain. Both bundles are independently verifiable — the old version is preserved exactly as captured, and the new version is a new, independently signed bundle.
This means: "here is what the contract said on 2026-01-15 (ev_abc123), and here is what it said after the amendment on 2026-06-24 (ev_def456), and here is the cryptographic proof of each version." Version history is an auditable fact, not a claim.
# Capture amended version with parent link
curl -X POST https://api.docimprint.com/v1/extract \
-H "Authorization: Bearer dr_live_..." \
-H "Content-Type: application/json" \
-d '{"source":"https://example.com/contract-v2.pdf","parent_bundle_id":"ev_abc123"}'
# Returns new bundle_id: ev_def456
# Get version history
curl https://api.docimprint.com/v1/extract/ev_def456/history \
-H "Authorization: Bearer dr_live_..."
# [{ "bundle_id": "ev_abc123", "captured_at": "2026-01-15..." }, { "bundle_id": "ev_def456", ... }]