MCP DLP at the gateway: six rules that bite in production

MCP tool responses join the LLM context as trusted tokens. A leaked API key or customer email becomes part of the prompt the model reads and logs.

Share
MCP DLP at the gateway: six rules that bite in production

You ship an MCP server that wraps your billing system. The agent calls get_invoice. The response includes the customer's name, address, last-four of the card, and the API key your script used to talk to Stripe. The LLM doesn't know any of that. The response enters its context window as one continuous block of trusted tokens. The conversation log is stored. The trace is shipped to whatever observability tool you use. MCP DLP rules at the gateway redact this kind of content before it lands in the prompt; that's what this post is about, with six categories of redaction and a real failure mode for each.

Why MCP DLP belongs at the gateway, not in the tool

Tool descriptions become trusted context. Tool responses become trusted context. Anything that crosses the protocol boundary on the way into an MCP-aware agent feeds the same continuous stream of tokens the model reasons over. There is no field the LLM looks at and treats as "untrusted output, ignore for instructions" — there is just the prompt.

The 2025 incident log has enough teeth to take this seriously. In June 2025, Asana took its newly-launched MCP integration offline for two weeks after a cross-tenant data leak: customer information from one tenant ended up in other tenants' MCP instances. In April 2025, Invariant Labs published a working exploit where a benign-looking trivia-game MCP modifies its own tool description after install to coerce the agent's WhatsApp MCP, co-loaded in the same context, into leaking the user's entire chat history through legitimate tool calls. A month later the same team demonstrated an attacker posting a malicious GitHub Issue, getting the agent to read it, and watching the agent dump the user's salary, physical address, and private repository contents into an automatically-created public PR. Snyk later disclosed postmark-mcp, a malicious npm MCP package that registered itself as a "send mail" tool while quietly harvesting the body of every email it touched.

The common thread isn't a bug in any one tool. It's data crossing the MCP boundary in a context the tool author didn't anticipate. Defending against it inside each tool's code means asking every author — including the third-party ones already in your mcp.json — to remember to scrub every response field forever. They won't.

Move the redaction one layer up. A gateway in front of every MCP server runs the same response-redaction policy across native tools, third-party tools, and tools you'll add next quarter. The same machinery that lets you cap rate limits per API key on an n8n MCP lets you redact response bodies before they reach the agent. AIronClaw ships six DLP categories ready to enable. Below is what each one catches and the failure mode it closes.

API keys leak through shell-and-log tools

The shell MCP, the log-reader MCP, the error-tracking MCP — anything that prints process state — is a prime carrier for credentials. A developer asks the agent to "check why the build failed" and the agent runs env through a shell tool. The response is a wall of KEY=value lines, several of which are real production credentials.

Now the LLM has them. The agent might use one of them to make a "helpful" call (you'd be surprised). The trace logs them. The conversation log file contains them. A teammate searches old chats and gets them.

The DLP rule catches the wire format of the major providers' keys:

{
  "rule_type": "response_replace",
  "tools": ["*"],
  "pattern": "(?:AKIA[0-9A-Z]{16}|sk-ant-[A-Za-z0-9_\\-]{40,}|sk-proj-[A-Za-z0-9_\\-]{20,}|sk-[A-Za-z0-9]{20,}|AIza[0-9A-Za-z_\\-]{35}|gh[pousr]_[A-Za-z0-9]{30,}|xox[baprs]-[A-Za-z0-9\\-]{10,})",
  "replacement": "***",
  "dlp_rule_id": "api_keys"
}

AWS access keys (AKIA…), OpenAI (sk-…, sk-proj-…), Anthropic (sk-ant-…), Google (AIza…), GitHub tokens (ghp_, gho_, ghu_, ghs_, ghr_), and Slack (xox[baprs]-…) all get scrubbed to *** before the response leaves the gateway. The dlp_rule_id field lets the AIronClaw dashboard's DLP page display the rule under its catalog name; omit it if you're hand-rolling the rule for an API-only deployment.

Banking data leaks through CRM and ticket tools

A support agent powered by an LLM browses tickets through an MCP tool wrapping your helpdesk. One ticket is a chargeback inquiry; the customer pasted their full card number into the body. The agent summarises the ticket, the model reads the card number as one more token, the conversation log captures it. PCI scope just expanded to wherever your conversation logs live.

The gateway-side rule covers IBAN and the four card networks:

{
  "rule_type": "response_replace",
  "tools": ["*"],
  "pattern": "\\b(?:[A-Z]{2}\\d{2}[A-Z0-9]{11,30}|(?:4\\d{3}|5[1-5]\\d{2}|2(?:2[2-9]\\d|[3-6]\\d{2}|7[01]\\d|720)|3[47]\\d{2}|6(?:011|5\\d{2}))(?:[ \\-]?\\d{4}){3}|3[47]\\d{2}(?:[ \\-]?\\d{6})(?:[ \\-]?\\d{5}))\\b",
  "replacement": "***",
  "dlp_rule_id": "banking"
}

IBAN (IT60X0542811101000000123456), Visa (4xxx), Mastercard (51-55xxx, 2221-2720xxx), American Express (34xx, 37xx), Discover (6011xx, 65xx). The pattern handles common formatting too — spaces and dashes between groups of four are tolerated, since real-world payment data shows up in both 4111-1111-1111-1111 and 4111111111111111 shapes.

Emails leak through inbox tools

Inbox MCPs are some of the most useful agentic tools to build and the most dangerous to expose unfiltered. An agent with a Gmail MCP can summarise a thread, draft a reply, file follow-ups. It can also unwittingly forward every recipient's address into the LLM trace, the prompt cache, and any downstream system that consumes the conversation log.

This is a GDPR question, not just a hygiene one. Personal email addresses are personal data; processing them means having a lawful basis, a retention policy, and a deletion mechanism. Conversation logs in your LLM observability stack often have none of these.

{
  "rule_type": "response_replace",
  "tools": ["*"],
  "pattern": "[a-zA-Z0-9._%+\\-]+@[a-zA-Z0-9.\\-]+\\.[a-zA-Z]{2,}",
  "replacement": "***",
  "dlp_rule_id": "emails"
}

The regex matches RFC-5322-shaped local-part-plus-domain (including plus-tags, dotted local parts, and country-coded TLDs like .co.uk). Apply it on the tools that read mail and the tools that surface CRM contact records; leave it off the ones whose entire output is a curated address list (a transactional confirmation MCP, for example), where redaction would defeat the tool's purpose.

Internal IP addresses leak through log tools

The log-reader MCP returns the last hour of nginx errors. The errors include client IPs, upstream IPs, internal load-balancer IPs, the database server's IP. The LLM digests them all. Three weeks later, when an attacker eventually tricks the agent into listing what it knows about your infrastructure (through one of the prompt-injection routes demonstrated against GitHub MCP), they get a free reconnaissance map.

The IP redaction handles both v4 and v6 layouts:

{
  "rule_type": "response_replace",
  "tools": ["*"],
  "pattern": "\\b(?:(?:25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(?:25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\b|(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}|(?:[0-9a-fA-F]{1,4}:){1,7}:|(?:[0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|::(?:[0-9a-fA-F]{1,4}:){0,6}[0-9a-fA-F]{1,4}",
  "replacement": "***",
  "dlp_rule_id": "ip_addresses"
}

A reasonable counter-argument: external IPs in production logs are sometimes the actual signal you want the agent to reason over (geolocating a brute-force attempt, for example). Apply this rule on log tools that surface internal-facing infrastructure (your reverse-proxy, your database, your worker pools) and skip it on the ones that surface CDN access logs where client IPs are the point.

URLs leak through wiki and storage tools

A confluence MCP returns a wiki page. The page links to your internal Jira board, a pre-signed S3 URL valid for an hour, and a staging-environment dashboard nobody outside the team should know about. Each one is a URL the LLM now has, and an attacker who can prompt-inject the agent through any other surface can ask "what URLs do you remember from our conversation?" and walk away with the list.

{
  "rule_type": "response_replace",
  "tools": ["*"],
  "pattern": "(?:https?|ftp)://[^\\s<>\"'`]+",
  "replacement": "***",
  "dlp_rule_id": "urls"
}

The pattern is deliberately greedy on the path/query side: a pre-signed S3 URL is dozens of characters of base64-ish blob after the host, and stopping at the first ? would leak the file path itself. Apply this rule on tools that traverse internal documentation surfaces. It's heavy-handed by design — for tools whose entire output is one canonical URL the agent should follow, exempt the rule by listing the specific tool name in tools.

Auth tokens leak through HTTP-style tools

The API-explorer MCP returns example curl invocations from your OpenAPI spec. The examples are populated with the developer's session token because that's how the agent got them. Now the LLM has a session token. JWTs in particular are dangerous: they're long, opaque to the model, and look like compressed text — the model has no reason to redact them voluntarily.

{
  "rule_type": "response_replace",
  "tools": ["*"],
  "pattern": "eyJ[A-Za-z0-9_\\-]+\\.eyJ[A-Za-z0-9_\\-]+\\.[A-Za-z0-9_\\-]+|(?:Bearer|Basic)\\s+[A-Za-z0-9._~+/=\\-]+",
  "replacement": "***",
  "regex_flags": "i",
  "dlp_rule_id": "auth_tokens"
}

JWTs (eyJ…header.eyJ…payload.signature), Authorization: Bearer …, and Authorization: Basic … all match. The case-insensitive flag matters: agents writing curl examples occasionally lowercase bearer, and so do some real services in error messages.

Putting them on every MCP

Six categories, one bundled rule set. You apply them either through the dashboard's DLP page (toggle each one, save) or via the API by writing all six response_replace rules into the proxy's rule array in one PUT:

curl -fsS -X PUT "${AIRONCLAW_URL}/api/mcp/<mcp-id>/rules" \
  -H "Authorization: Bearer ${PAT}" \
  -H "Content-Type: application/json" \
  -d '{
    "rules": [
      {"rule_type":"response_replace","tools":["*"],"dlp_rule_id":"api_keys",     "replacement":"***","pattern":"<api-keys-pattern>"},
      {"rule_type":"response_replace","tools":["*"],"dlp_rule_id":"banking",      "replacement":"***","pattern":"<banking-pattern>"},
      {"rule_type":"response_replace","tools":["*"],"dlp_rule_id":"emails",       "replacement":"***","pattern":"<emails-pattern>"},
      {"rule_type":"response_replace","tools":["*"],"dlp_rule_id":"ip_addresses", "replacement":"***","pattern":"<ip-pattern>"},
      {"rule_type":"response_replace","tools":["*"],"dlp_rule_id":"urls",         "replacement":"***","pattern":"<urls-pattern>"},
      {"rule_type":"response_replace","tools":["*"],"dlp_rule_id":"auth_tokens",  "replacement":"***","pattern":"<auth-tokens-pattern>","regex_flags":"i"}
    ]
  }'

The patterns are the ones from the catalog above; substitute them in. Each rule applied this way also shows up in the DLP page of the dashboard because of dlp_rule_id — toggle them off there if you ever need to. Rule evaluation is in array order, and rules don't short-circuit each other, so an email address inside a JWT will be redacted by both the auth-tokens rule and the emails rule, leaving *** either way.

For tools where a category is genuinely required output (the email-list MCP whose whole point is to return contact addresses), narrow the tools field on that one rule from ["*"] to a list of tool names, leaving the global protection on every other tool. The default-on, exempt-by-name shape is the right way around for DLP — a missed exemption means a working tool with extra *** in its output, while a missed inclusion means a leak.

What to try

  • Read the Model Context Protocol specification if MCP's wire-format threat model isn't already familiar — the boundaries are clearer once the protocol is.
  • The Authzed timeline of MCP breaches is a useful reading list for anyone keeping a private threat model up to date through 2026.
  • Try AIronClaw on top of an existing MCP server. The six DLP rules above are the ones to enable on day one.