AGENTS.md is a markdown file at your repo root that gives AI coding agents the project-specific context they need: tech stack, conventions, commands, and rules. Claude Code, Cursor, and other agents auto-load it on startup. It is the single highest-leverage artifact for getting consistent agent output — a good AGENTS.md is the difference between code that looks like your team wrote it and generic boilerplate you have to rewrite.

Which Claude Code permission mode should I start with?

Start with Default mode for at least the first week. It prompts before every file edit and shell command, which teaches you what kinds of operations the agent performs. Move to Accept Edits after you have seen 10–20 edits you would have approved anyway. Reserve YOLO mode for sandboxed or containerized environments only — never on a machine with access to production credentials.

How do I size a task for an AI coding agent?

Aim for tasks an experienced engineer could finish in 30 minutes to 2 hours. Smaller and the overhead of writing a prompt outweighs the work; larger and the agent loses context and starts fabricating. Decompose anything bigger into independently shippable steps with clear acceptance criteria, and give each step its own session.

How do I prevent the agent from making the same mistake twice?

Build a feedback loop. When the agent makes a mistake, do not just correct it inline — codify the rule. Add it to AGENTS.md, write a hook that catches it automatically (e.g. a PostToolUse formatter), or add a test that fails when the bad pattern reappears. Each correction should make the next session better, not just the current one.

When am I ready for spec-driven development (Level 3)?

You are ready when (1) your AGENTS.md is mature enough that new agent sessions produce on-style code without hand-holding, (2) you have at least 2–3 hooks running automatically, (3) you have completed 5+ autonomous tasks end-to-end without intervention, and (4) you can describe a feature in writing more reliably than you can pair with the agent on it. If any of those are missing, stay on Level 2 longer.

AI Dark Factory Part 2: Agent Setup & AGENTS.md

Set up your AI dark factory agent: install Claude Code, write a production-grade AGENTS.md, decompose agentic coding tasks, and build feedback loops that learn.

New to the term? Read the AI dark factory primer for what it means, who coined it, and the FANUC analogy. This page is Part 2 of the implementation playbook.

In Part 1, we mapped the six levels of AI-driven development — from manual coding all the way to the fully autonomous dark factory. Now it's time to actually build something. This guide walks you through setting up your coding agent, writing a production-grade AGENTS.md, delegating your first real task, and building the feedback loops that make every future task go smoother.

By the end of this guide, you'll have a working setup that puts you solidly at Level 2 — pair-programming with AI — and ready to move toward Level 3 in the next part.

What does a working AI dev foundation look like?

Part 1 was theory. This is practice. By the end of this guide you'll have four things set up:

A configured coding agent

Claude Code installed, permissions set, hooks wired up — ready to do real work.

A production-grade AGENTS.md

Not a template. A living document tuned to your actual codebase.

A task decomposition habit

How to break features into agent-sized chunks that produce reliable output.

A feedback loop

A process for learning from agent mistakes and preventing them from happening again.

How do you set up your coding agent? (Step 1)

We're using Claude Code throughout this series. It's a terminal-native agent that reads your codebase, edits files, runs commands, and creates PRs. If you're using a different agent (Cursor, Windsurf, Devin), the AGENTS.md patterns still apply — but the setup steps will differ.

Install and initialize

bash

# Install Claude Code
curl -fsSL https://claude.ai/install.sh | bash

# Navigate to your project
cd your-project

# Start Claude Code — it will prompt you to log in on first run
claude

# Generate a starter CLAUDE.md from your project structure
/init

The /init command scans your project and generates a starter CLAUDE.md file. Think of this as a rough first draft — it captures your tech stack, directory structure, and build commands. We'll refine it heavily in the next section.

Understand the permission model

Before you let an agent loose on your code, understand the guardrails. Claude Code has three permission modes — start with the safest and loosen as you build trust:

Mode	What it does	When to use it
Default	Prompts you before file edits and commands	First week — learn what the agent does before trusting it
Accept Edits	Auto-approves file changes, still asks for shell commands	After you've seen 10-20 edits you would have approved anyway
YOLO	Auto-approves everything including shell commands	Only in sandboxed / containerized environments

Start with Default mode. The temptation to skip ahead is strong — resist it. The first few sessions in Default mode teach you what kinds of operations the agent performs. That pattern recognition is what makes you comfortable loosening permissions later. When you do reach for YOLO mode, run it inside a sandboxed environment — a Dev Container is the easiest way to give the agent root-equivalent permissions without putting your host machine or production credentials at risk.

Add your first hook

Hooks are shell commands that fire automatically at specific points in Claude Code's lifecycle. They're how you add guardrails without slowing down. Start with one that auto-formats code after every edit — add this to your .claude/settings.json:

json

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write|MultiEdit",
        "hooks": [
          {
            "type": "command",
            "command": "npx @biomejs/biome format --write "$TOOL_INPUT_file_path""
          }
        ]
      }
    ]
  }
}

Swap biome format for prettier --write or whatever formatter your project uses — the pattern is the same. This hook means the agent can't ship badly formatted code, no matter what. You don't have to review formatting. You don't have to ask the agent to format. It just happens. That's the pattern: automate the things you'd always approve. Think of PostToolUse as your local equivalent of CI — the same role a GitHub Actions formatter step plays in your pipeline, but firing in milliseconds inside the agent loop instead of minutes after the PR is open.

How do you write a production-grade AGENTS.md? (Step 2)

In Part 1, we showed a basic AGENTS.md template. Now we're going deeper. A good AGENTS.md is the difference between an agent that produces generic code you have to rewrite and one that produces code that looks like your team wrote it.

The five sections that matter

Every AGENTS.md should have these five sections. Order matters — agents process files top-to-bottom, and the most important context should come first.

1. Project overview

What this project is, in one paragraph. The agent uses this to make architectural decisions when your instructions are ambiguous.

text

High-performance REST API for a fintech platform. Handles payment processing, account management, and regulatory reporting. Uptime and correctness are non-negotiable.

2. Directory structure

The map of your codebase. Don't just list folders — explain what goes where and why.

text

src/routes/ → Express handlers (thin: validate input, call service, return response)
src/services/ → Business logic (no HTTP awareness, no database imports)
src/repositories/ → Database queries (raw SQL, no ORMs)

3. Build & test commands

Exact commands. No ambiguity. Include what "success" looks like.

text

pnpm test          # Must exit 0. Anything else = broken.
pnpm lint           # Uses Biome. Zero warnings policy.
pnpm build          # TypeScript strict mode. No implicit any.

4. Conventions

The rules your team actually follows. Be specific — "clean code" means nothing to an agent.

text

Errors: always throw AppError, never raw Error or string
Logging: structured JSON via src/lib/logger.ts
Naming: camelCase files, PascalCase classes, kebab-case routes

5. Architecture rules

The constraints that prevent architectural drift. Write these as MUST / MUST NOT rules — agents respond to strong language.

text

Services MUST NOT import from routes (dependency inversion)
All external API calls MUST go through src/integrations/
Never use `any` — prefer `unknown` with type narrowing

Before vs. after: a real AGENTS.md rewrite

Here's what a bad AGENTS.md looks like versus a good one — same project, same information, dramatically different agent output.

Before (vague)

markdown

# AGENTS.md

## Stack
Node.js, Express, PostgreSQL

## Commands
npm install
npm test
npm run build

## Rules
- Write clean code
- Add tests
- Follow best practices

After (precise)

markdown

# AGENTS.md

## Project
Payment processing API. Express 5
+ TypeScript 5.4 + PostgreSQL 16.
Strict correctness requirements.

## Structure
src/routes/    → HTTP handlers only
src/services/  → Business logic
src/repos/     → SQL queries (no ORM)
src/models/    → Zod schemas + types

## Build & Test
pnpm install --frozen-lockfile
pnpm test  # vitest, must exit 0
pnpm build # tsc --strict, no errors

## Conventions
- Errors: throw AppError, never raw
- Validation: Zod in models/, check
  in middleware
- Routes MUST NOT contain logic
- Services MUST NOT import routes

The "before" version sounds professional but gives the agent zero actionable information. "Write clean code" is subjective. "Follow best practices" is meaningless without specifying which practices. The "after" version is terse, specific, and leaves no room for interpretation.

Monorepo tip: nested AGENTS.md files

If you work in a monorepo, put a root AGENTS.md with shared conventions and a nested AGENTS.md in each package with package-specific rules. Agents automatically pick up the nearest AGENTS.md in the directory tree.

text

project/
├── AGENTS.md                    # Shared: monorepo conventions, CI commands
├── apps/
│   ├── web/
│   │   └── AGENTS.md            # React + Vite conventions, component patterns
│   └── api/
│       └── AGENTS.md            # Express conventions, DB query patterns
└── packages/
    └── shared/
        └── AGENTS.md            # "This package is imported by apps. No side effects."

How do you decompose tasks for AI agents? (Step 3)

The number one reason agent output disappoints: the task was too big. Agents work best on focused, well-scoped tasks. When you hand an agent a vague, multi-step feature, it loses context halfway through and the output degrades.

The sizing checklist

Before you delegate a task, run it through this checklist. If you can't answer yes to all four, break the task down further.

Single responsibility?

The task does one thing. "Add an endpoint" — yes. "Add auth and also refactor the user model" — no.

Clear inputs and outputs?

You can describe exactly what exists before and what should exist after. No ambiguity about "done."

Fits in ~1-3 files?

If the task touches more than three files, it's probably too big. More files = more context = more drift.

Testable in isolation?

You can verify the output without running the entire system. Unit tests, a curl command, a type check.

Decomposition in practice

Let's take a real feature — "add password reset" — and break it into agent-sized tasks.

Task	Scope	Files touched
1. Add `ResetToken` Zod schema	Type + validation only	models/reset-token.ts
2. Add `POST /forgot-password` route	Route + service method	routes/auth.ts, services/auth.ts
3. Add token generation + email sending	Service + integration	services/auth.ts, integrations/email.ts
4. Add `POST /reset-password` route	Route + service method	routes/auth.ts, services/auth.ts
5. Add tests for all reset flows	Tests only	__tests__/auth-reset.test.ts

Each task is one prompt to the agent. Each produces a reviewable diff. Each can be tested independently. If task 3 goes wrong, you don't lose the work from tasks 1 and 2.

Step 4: Your First Autonomous Task

Let's walk through a real task end-to-end. We'll delegate task 1 from the password reset example above — adding a Zod schema for the reset token.

The prompt

Notice how the prompt references AGENTS.md conventions, gives explicit constraints, and defines exactly what "done" looks like:

text

Create a Zod schema for password reset tokens in src/models/reset-token.ts.

Requirements:
- Token field: string, min 32 chars (JWT format)
- Email field: string, valid email
- ExpiresAt field: date, must be in the future
- CreatedAt field: date, defaults to now
- UsedAt field: date, nullable (null means unused)

Export:
- The Zod schema as resetTokenSchema
- The inferred TypeScript type as ResetToken

Follow the patterns in src/models/user.ts for style reference.
Run the type checker after creating the file.

What happens next

Here's what the agent does with that prompt (assuming you have a well-configured AGENTS.md):

1

Reads context

Agent reads AGENTS.md and CLAUDE.md. Learns your conventions, directory structure, naming patterns.

2

Reads reference

Agent reads src/models/user.ts to match your existing style. This is why we said "follow patterns in user.ts."

3

Writes the file

Agent creates src/models/reset-token.ts with the Zod schema, matching your conventions.

4

Runs verification

Agent runs the type checker (pnpm build or tsc). If it fails, agent fixes the error and tries again.

5

Reports results

Agent shows you the file it created and the type checker output. You review.

Review the output — not the code

At Level 2, you're still reviewing the code. But start training yourself to review against the prompt, not line by line. Ask three questions:

Does it match the spec?

All five fields present? Correct types? Correct validation?

Does it follow conventions?

File location correct? Naming matches? Export style matches?

Does it pass checks?

Type checker passes? Linter passes? No new warnings?

If all three answers are "yes," you're done. The exact code shape doesn't matter — only the behavior and compliance. This mindset shift is what prepares you for Level 3, where you stop reviewing code entirely.

How do you build a feedback loop with your AI agent? (Step 5)

Your AGENTS.md is a living document. Every time the agent makes a mistake, that's not a failure — it's a signal that your context is incomplete. The feedback loop is how you turn agent mistakes into permanent improvements.

The three-step fix

When the agent produces wrong output, follow this exact process:

1. Identify the category

Was it a convention violation? A missing constraint? A wrong assumption about architecture?

Agent used console.log instead of the structured logger. → Convention gap.

2. Add the rule to AGENTS.md

Write the rule as a MUST/MUST NOT statement. Be specific enough that the agent can't misinterpret it.

Added: "Logging: MUST use src/lib/logger.ts. MUST NOT use console.log/warn/error anywhere."

3. Verify on the next task

On the next task, watch if the same mistake happens. If it does, your rule isn't specific enough — rewrite it.

Next task: agent uses logger correctly. Rule is working. Move on.

Common mistakes and their AGENTS.md fixes

Here are the most common agent mistakes teams encounter at Level 1-2, and the AGENTS.md rules that fix them:

Agent mistake	AGENTS.md rule to add
Uses wrong import style	"Use ESM imports. No require(). No default exports."
Puts files in wrong directory	"Tests go in __tests__/ next to the source file, not in a top-level test/ dir."
Adds unnecessary dependencies	"MUST NOT add packages without explicit permission. Use built-in Node APIs first."
Over-engineers simple tasks	"Keep it simple. No abstractions for one-time operations. Three similar lines > a premature helper."
Ignores error handling patterns	"All errors MUST be AppError instances. Wrap external errors: AppError.from(err)"
Generates verbose comments	"Only add comments where logic isn't self-evident. Never add JSDoc to private methods."

How do you know you're ready for Level 3? (Step 6)

Level 3 is where you stop pair-programming and start managing agent output. You're ready when these signals are consistent:

Agent output is predictable

You can predict the structure of what the agent will produce before it runs. Your AGENTS.md is specific enough that surprise is rare.

Review time is under 5 minutes

You spend more time writing the prompt than reviewing the output. The output consistently matches your expectations.

AGENTS.md is stable

You're no longer adding new rules every session. The document has settled. Agent mistakes are rare, not systemic.

You trust the permission model

You've moved to "Accept Edits" mode and feel comfortable. The hooks catch the edge cases you care about.

When you're seeing all four signals, it's time for Part 3: Spec-Driven Development — where you write structured specifications and stop reviewing code entirely. The agent works from the spec. You review the results, not the diff.

Your Checklist

Bookmark this. Work through it over the next week. Each item builds on the last.

1

Install and configure Claude Code

Run /init, generate your starter CLAUDE.md, set Default permission mode.

2

Add a formatting hook

PostToolUse hook that auto-formats after every Edit/Write. One less thing to review.

3

Write your AGENTS.md

All five sections: project, structure, commands, conventions, architecture rules. Be precise.

4

Delegate 5 small tasks

Start with types, tests, and utility functions. Review the output. Note what went wrong.

5

Update AGENTS.md from mistakes

Every mistake becomes a MUST/MUST NOT rule. Run the feedback loop 3 times.

6

Delegate a medium task

A route + service method. Two files. One clear responsibility. Review against the prompt.

7

Assess your readiness

Check the four signals. If they're consistent, you're ready for Part 3.

References

Claude Code Documentation — official setup guide, CLI reference, and feature documentation
Claude Code Subagents Guide — creating and configuring custom subagents for specialized tasks
AGENTS.md Specification — the open standard for guiding AI coding agents, adopted by 60,000+ projects
How StrongDM's AI Team Build Software Without Looking at the Code — real-world dark factory case study that inspired this series
Creating the Perfect CLAUDE.md for Claude Code — practical guide to writing effective CLAUDE.md files
The Dark Factory Pattern: Moving From AI-Assisted to Fully Autonomous Coding — architectural breakdown of the full dark factory pattern

AI Dark Factory Part 2: Agent Setup & AGENTS.md

What does a working AI dev foundation look like?

A configured coding agent

A production-grade AGENTS.md

A task decomposition habit

A feedback loop

How do you set up your coding agent? (Step 1)

Install and initialize

Understand the permission model

Add your first hook

How do you write a production-grade AGENTS.md? (Step 2)

The five sections that matter

1. Project overview

2. Directory structure

3. Build & test commands

4. Conventions

5. Architecture rules

Before vs. after: a real AGENTS.md rewrite

Before (vague)

After (precise)

Monorepo tip: nested AGENTS.md files

How do you decompose tasks for AI agents? (Step 3)

The sizing checklist

Single responsibility?

Clear inputs and outputs?

Fits in ~1-3 files?

Testable in isolation?

Decomposition in practice

Step 4: Your First Autonomous Task

The prompt

What happens next

Review the output — not the code

Does it match the spec?

Does it follow conventions?

Does it pass checks?

How do you build a feedback loop with your AI agent? (Step 5)

The three-step fix

1. Identify the category

2. Add the rule to AGENTS.md

3. Verify on the next task

Common mistakes and their AGENTS.md fixes

How do you know you're ready for Level 3? (Step 6)

Agent output is predictable

Review time is under 5 minutes

AGENTS.md is stable

You trust the permission model

Your Checklist

References

The AI Dark Factory Playbook

AI Dark Factory: Autonomous Coding Explained

AI Dark Factory Playbook: From Autocomplete to Autonomous