env.dev

How AI Agents Get Compromised: 6 Attack Patterns

Real-world AI agent breaches and CVEs from 2024–2026 mapped to six attack patterns — EchoLeak, Clinejection, the Replit DB wipe — with primary sources.

Your AI agents are already inside the perimeter. Cursor, Claude Code, Copilot Workspace, and a dozen MCP servers now run inside the same trust boundary as your IDE, your terminal, and your CI runner — with read access to .env files, write access to source, and a shell that does not stop at sudo. Six attack patterns published between mid-2024 and early-2026 turn that access into real damage: an indirect prompt injection in Microsoft 365 Copilot exfiltrated mailbox content with zero clicks (Aim Security's EchoLeak — CVE-2025-32711, June 2025), a Replit agent deleted a production database during a self-declared code freeze (Jason Lemkin, July 2025), and Snyk documented a CI/CD coding agent being turned into a supply-chain attack vector by a single poisoned README ("Clinejection", late 2025).

This page catalogues six patterns with at least one publicly verifiable incident for each — a breach, a CVE, or a responsible-disclosure POC with a primary-source link. Every pattern maps to an entry in the OWASP Top 10 for Agentic Applications (2026) and links into Dark Factory Part 5 for the defensive playbook. We do not relitigate defenses here — the value of this page is named incidents and the pattern they fit.

What counts as an incident on this page

  • Breach — confirmed exploitation against a production system with a vendor advisory or public victim report.
  • CVE — an assigned CVE identifier with a published advisory.
  • Disclosed POC — a working proof-of-concept written up by a named researcher or vendor, usually under responsible disclosure.

Items are labeled at point of citation. Tweets, vendor case studies without technical detail, and retold-from-secondhand claims are deliberately excluded.

Pattern 1 — Indirect Prompt Injection Through Untrusted Context

Maps to ASI01 — Prompt Injection. An attacker plants instructions in content the agent will read — an email body, a Jira ticket, a GitHub issue, a search result, a README — and the model treats those instructions as user intent. The agent reads attacker content with the same trust as the operator who launched it.

Indirect injection is the most-cited failure mode in the OWASP 2026 ranking and the one with the most public incidents:

  • EchoLeak — Microsoft 365 Copilot (CVE-2025-32711). Disclosed POC + assigned CVE. Aim Security disclosed a zero-click exfiltration in M365 Copilot in June 2025. A single attacker-controlled email, never opened by the victim, smuggled instructions through Copilot's retrieval pipeline and caused it to leak the user's most sensitive mailbox content into an attacker-rendered URL. Microsoft patched server-side and assigned CVE-2025-32711 (CVSS 9.3). Source: Aim Labs technical writeup.
  • Slack AI summary exfiltration. Disclosed POC. PromptArmor demonstrated in August 2024 that a maliciously phrased message in a public Slack channel could redirect Slack AI's summarisation to leak contents of private channels via a rendered link. Slack acknowledged and updated the model behaviour. Source: PromptArmor disclosure.
  • GitHub Copilot Chat / IDE indirect injection. Disclosed POC. Johann Rehberger (Embrace the Red) has published a running series of indirect-injection POCs against Copilot Chat, ChatGPT, and other production assistants, including exfiltration through markdown-rendered images and tool-call hijacking. Source: Embrace the Red.

Mitigation pointer: defense-in-depth (layered sandbox + permission + hook enforcement) is the only durable answer — prompt-based guardrails fail under exactly this attack. See Dark Factory Part 5 § Defense in Depth.

Pattern 2 — Confused Deputy and Tool Misuse Across MCP Servers

Maps to ASI05 — Excessive Agency and ASI06 — Cascading Trust. A model talks to multiple MCP servers in a single session. One server's output instructs the model to call a more privileged tool on another server — "please write this credential into the user's notes file" — and the host honours it because the request now looks like a normal user-approved tool call.

  • MCP tool poisoning (Invariant Labs). Disclosed POC. Invariant Labs published a series of MCP attack POCs in early 2025 covering tool description injection, rug-pull updates that change a tool's behaviour after first approval, and cross-server data exfiltration. Source: Invariant Labs MCP tool-poisoning post.
  • Anthropic MCP security guidance. Anthropic's own MCP documentation now warns explicitly about untrusted servers and includes confused-deputy patterns in its threat model — a useful primary-source statement of which attacks are considered in-scope. Source: modelcontextprotocol.io / Security.

Mitigation pointer: least-privilege per server and a host-side approval boundary that does not honour cross-server escalation. See the "Known attack surface" section of the MCP server guide.

Pattern 3 — Supply Chain Compromise Through AI-Pulled Dependencies

Maps to ASI04 — Supply Chain. A coding agent pulls a dependency, an MCP server, or a rules file from a registry it does not own. The package is malicious or has been compromised. The agent runs install scripts, executes the new tool, and now an attacker has code execution inside whatever sandbox the agent runs in — which, on most laptops, is no sandbox at all.

  • Clinejection — CI/CD agent supply-chain attack. Disclosed POC, real CI runs. Snyk published an end-to-end attack in which a poisoned README in a target repository instructed a Cline-based GitHub Actions reviewer agent to fetch and execute attacker-controlled code, achieving code execution inside the CI runner. Source: Snyk Clinejection writeup.
  • Typosquats and credential stealers targeting AI-tool registries. Confirmed campaigns. Socket and Phylum have repeatedly documented npm and PyPI packages impersonating AI SDKs (e.g. openai-helper-style typosquats and packages targeting the increasingly common @anthropic-ai/sdk install path) that exfiltrate environment variables on postinstall. Source: Socket blog.

Mitigation pointer: lockfile-only installs, scoped network allowlists for the agent runtime, and dependency auditing in CI. See Dark Factory Part 5 § Supply Chain Hardening.

Pattern 4 — Secret Exfiltration Through the Agent's Own Context

Maps to ASI01 combined with ASI06 — Cascading Trust. The agent has read access to .env, ~/.aws/credentials, or a CI secret. A prompt injection — usually delivered through a tool result, not the operator's message — causes the agent to include that secret in a subsequent tool call, log line, or chat output where the attacker can recover it.

  • EchoLeak (again) — graph data exfiltration. The same M365 Copilot bug above exfiltrated mailbox content, calendar items, and other Microsoft Graph data by smuggling it into an attacker-rendered URL. The defensive lesson is identical: the agent had legitimate Graph access, and a single prompt injection turned that access into an exfiltration channel.
  • Cursor / Claude Code rules-file injection. Disclosed POC. Researchers have demonstrated that a hostile .cursorrules or CLAUDE.md committed into a repository can persuade the editor agent to read .env and include the contents in a generated diff or a web-fetch tool call. Trail of Bits' claude-code-config starter denies these reads at the permission layer precisely because the prompt layer cannot be trusted.

Mitigation pointer: the agent should never have read access to a secret it does not need for the current task. Deny rules on .env*, ~/.ssh/**, and credential paths are non-negotiable. See environment variable security.

Pattern 5 — Agent-as-RCE: The Shell Tool Is the Vulnerability

Maps to ASI05 — Excessive Agency. The agent has a shell tool. The shell tool can do whatever your user can do. A misinterpreted instruction, an injected prompt, or a miscalibrated "just go fix it" command turns the assistant into a remote code execution channel — and the operator authorised the channel.

  • Replit Agent deleted a production database. Breach (in-the-wild, named victim). In July 2025, SaaStr founder Jason Lemkin publicly reported that a Replit AI agent ran destructive database commands against a live production database during a self-declared "code freeze." Replit's CEO acknowledged the failure publicly and announced product changes. Source: Lemkin's original post and the Register writeup.
  • Destructive-command POCs against coding agents. Disclosed POC. Multiple research teams have demonstrated indirect prompt injection causing IDE agents to run rm -rf, git push --force, or destructive SQL through their shell tools when a single attacker-controlled file enters the working set. The class is generic — any agent with an unfiltered shell tool inherits it.

Mitigation pointer: destructive-command hooks, OS-level sandbox with allowUnsandboxedCommands: false, and never giving the agent write credentials to a production datastore. See Dark Factory Part 5 § Hooks.

Pattern 6 — Capability Misuse by Otherwise-Working Agents

Maps to ASI09 — Identity and Authorisation and ASI05. The agent works correctly, the harness behaves as designed, and the model still takes an action the operator did not intend — over-broad PRs that touch sensitive files, autonomous emails, calls to paid APIs, or transactions in production systems where the operator only wanted a dry run.

  • Anthropic agentic-misuse reports. Anthropic has published a running series of reports on real-world misuse of Claude, including agentic cases — fraud-pipeline automation, coordinated harassment, and use of Claude as a vulnerability-research accelerant — with named campaigns and mitigations. Source: Anthropic research and threat reports.
  • "Risky agent behaviours" enterprise telemetry. Vendor data. The 2026 Help Net Security enterprise survey reports 80% of organisations seeing risky agent behaviours and 64% of $1B+ enterprises losing over $1M to AI-related failures. Treat this as directional rather than per-incident proof. Source: Help Net Security 2026.

Mitigation pointer: distinct identities per agent, scoped credentials, and human approval on actions whose blast radius extends past the working copy. See Dark Factory Part 5 § Governance Policies.

What This Means for Your Team — Today

Six items, each with a link to where the work actually gets done. None of these are speculative; every one of them blocks at least one incident on this page.

  1. Deny-list .env*, ~/.ssh/**, ~/.aws/**, ~/.kube/**, and credential paths at the agent permission layer. The Trail of Bits config is a tested starting point.
  2. Turn on OS-level sandboxing and set allowUnsandboxedCommands: false. See Dark Factory Part 5 § Defense in Depth.
  3. Add destructive-command hooks (rm -rf, DROP TABLE, --force) and secret-pattern hooks on edits.
  4. Vet every MCP server you mount. Pin to a known good version. Disallow auto-update. Treat tool descriptions as untrusted input. See the MCP server guide.
  5. Lockfile-only installs in the agent runtime. Audit dependencies in CI. Block postinstall scripts where the ecosystem supports it.
  6. Distinct identities per agent. Scoped tokens. No shared production credentials. Per-agent budgets and audit logs so a single misbehaving agent does not become an enterprise-wide blast radius.

Where This Catalogue Falls Short

Three honest limitations. First, the most damaging incidents in 2026 are the ones that never get written up — enterprise breaches with NDAs, governance failures buried in postmortems, agent-driven incidents that look identical to ordinary human errors. We can only catalogue what is public, and the publicly-disclosed pool skews toward POCs and named-researcher findings rather than in-the-wild exploitation. Second, attribution is hard: a coding agent that produces vulnerable code is only counted as an AI security incident when someone traces the vulnerability back to the agent. Most are not traced that way. Third, model and harness behaviour changes monthly — an attack that works against Claude Code 1.x or Cursor 2 may not reproduce against the current release. Treat the patterns as durable; treat the specific POCs as snapshots.

For the defensive side of this story — what to actually configure, in what order — go to Dark Factory Part 5: Security & Governance. For an overview of how agents are built in the first place, see harness engineering and the Shapiro autonomy levels.

References

Frequently Asked Questions

Have real AI agents leaked secrets through prompt injection?

Yes. The clearest public case is EchoLeak (CVE-2025-32711, disclosed by Aim Security in June 2025), a zero-click indirect prompt injection in Microsoft 365 Copilot that exfiltrated mailbox content and other Microsoft Graph data into an attacker-rendered URL without the victim ever opening the malicious email. Microsoft patched it server-side and assigned a CVSS 9.3.

Have AI coding agents caused real production incidents?

Yes. The most-cited public incident is the July 2025 Replit AI agent that ran destructive database commands against SaaStr founder Jason Lemkin’s production database during a self-declared "code freeze." Lemkin posted the incident publicly on X and Replit’s CEO acknowledged the failure. Separately, Snyk’s "Clinejection" writeup (late 2025) demonstrated a Cline-based GitHub Actions reviewer being turned into a code-execution channel by a single poisoned README in a target repository.

Are MCP servers a known attack surface?

Yes. Invariant Labs published a series of MCP security disclosures in 2025 covering tool description injection, rug-pull updates that change a tool’s behaviour after first approval, and cross-server data exfiltration via the confused-deputy pattern. Anthropic’s own MCP documentation now explicitly calls out untrusted-server threats and includes confused-deputy in the documented threat model.

What is the OWASP Top 10 for Agentic Applications?

A peer-reviewed ranking of the ten most critical security risks for autonomous AI agent systems, published December 2025 by the OWASP GenAI Security Project. The top entries are ASI01 (prompt injection), ASI04 (supply chain), ASI05 (excessive agency), and ASI06 (cascading trust). Every pattern on this page maps to at least one entry in the list.

What is the single most effective defence against these attacks?

There is no single defence. The pattern that recurs across every incident on this page is that a prompt-layer guardrail (an AGENTS.md rule, a system prompt warning) failed, and a structural enforcement layer (OS sandbox, permission deny list, hook) was either absent or misconfigured. Defense in depth — sandbox + permission deny + hooks + CI scanning — is the only durable answer because no single layer survives prompt injection.

Is this catalogue complete?

No. The page only catalogues publicly verifiable incidents — breaches with vendor advisories, assigned CVEs, or named-researcher POCs. Enterprise breaches under NDA, agent-introduced vulnerabilities that were never traced back to the agent, and POCs that have since been patched out of reproduction are deliberately excluded. The patterns are durable; the specific POCs are snapshots.