Dark Factory Series
In Part 1, we mapped the six levels of AI-driven development — from manual coding all the way to the fully autonomous dark factory. Now it's time to actually build something. This guide walks you through setting up your coding agent, writing a production-grade AGENTS.md, delegating your first real task, and building the feedback loops that make every future task go smoother.
By the end of this guide, you'll have a working setup that puts you solidly at Level 2 — pair-programming with AI — and ready to move toward Level 3 in the next part.
What We're Building Today
Part 1 was theory. This is practice. By the end of this guide you'll have four things set up:
A configured coding agent
Claude Code installed, permissions set, hooks wired up — ready to do real work.
A production-grade AGENTS.md
Not a template. A living document tuned to your actual codebase.
A task decomposition habit
How to break features into agent-sized chunks that produce reliable output.
A feedback loop
A process for learning from agent mistakes and preventing them from happening again.
Step 1: Set Up Your Coding Agent
We're using Claude Code throughout this series. It's a terminal-native agent that reads your codebase, edits files, runs commands, and creates PRs. If you're using a different agent (Cursor, Windsurf, Devin), the AGENTS.md patterns still apply — but the setup steps will differ.
Install and initialize
# Install Claude Code curl -fsSL https://claude.ai/install.sh | bash # Navigate to your project cd your-project # Start Claude Code — it will prompt you to log in on first run claude # Generate a starter CLAUDE.md from your project structure /init
The /init command scans your project and generates a starter CLAUDE.md file. Think of this as a rough first draft — it captures your tech stack, directory structure, and build commands. We'll refine it heavily in the next section.
Understand the permission model
Before you let an agent loose on your code, understand the guardrails. Claude Code has three permission modes — start with the safest and loosen as you build trust:
| Mode | What it does | When to use it |
|---|---|---|
| Default | Prompts you before file edits and commands | First week — learn what the agent does before trusting it |
| Accept Edits | Auto-approves file changes, still asks for shell commands | After you've seen 10-20 edits you would have approved anyway |
| YOLO | Auto-approves everything including shell commands | Only in sandboxed / containerized environments |
Start with Default mode. The temptation to skip ahead is strong — resist it. The first few sessions in Default mode teach you what kinds of operations the agent performs. That pattern recognition is what makes you comfortable loosening permissions later.
Add your first hook
Hooks are shell commands that fire automatically at specific points in Claude Code's lifecycle. They're how you add guardrails without slowing down. Start with one that auto-formats code after every edit — add this to your .claude/settings.json:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write|MultiEdit",
"hooks": [
{
"type": "command",
"command": "npx @biomejs/biome format --write "$TOOL_INPUT_file_path""
}
]
}
]
}
}Swap biome format for prettier --write or whatever formatter your project uses — the pattern is the same. This hook means the agent can't ship badly formatted code, no matter what. You don't have to review formatting. You don't have to ask the agent to format. It just happens. That's the pattern: automate the things you'd always approve.
Step 2: Write a Production-Grade AGENTS.md
In Part 1, we showed a basic AGENTS.md template. Now we're going deeper. A good AGENTS.md is the difference between an agent that produces generic code you have to rewrite and one that produces code that looks like your team wrote it.
The five sections that matter
Every AGENTS.md should have these five sections. Order matters — agents process files top-to-bottom, and the most important context should come first.
1. Project overview
What this project is, in one paragraph. The agent uses this to make architectural decisions when your instructions are ambiguous.
High-performance REST API for a fintech platform. Handles payment processing, account management, and regulatory reporting. Uptime and correctness are non-negotiable.
2. Directory structure
The map of your codebase. Don't just list folders — explain what goes where and why.
src/routes/ → Express handlers (thin: validate input, call service, return response) src/services/ → Business logic (no HTTP awareness, no database imports) src/repositories/ → Database queries (raw SQL, no ORMs)
3. Build & test commands
Exact commands. No ambiguity. Include what "success" looks like.
pnpm test # Must exit 0. Anything else = broken. pnpm lint # Uses Biome. Zero warnings policy. pnpm build # TypeScript strict mode. No implicit any.
4. Conventions
The rules your team actually follows. Be specific — "clean code" means nothing to an agent.
Errors: always throw AppError, never raw Error or string Logging: structured JSON via src/lib/logger.ts Naming: camelCase files, PascalCase classes, kebab-case routes
5. Architecture rules
The constraints that prevent architectural drift. Write these as MUST / MUST NOT rules — agents respond to strong language.
Services MUST NOT import from routes (dependency inversion) All external API calls MUST go through src/integrations/ Never use `any` — prefer `unknown` with type narrowing
Before vs. after: a real AGENTS.md rewrite
Here's what a bad AGENTS.md looks like versus a good one — same project, same information, dramatically different agent output.
Before (vague)
# AGENTS.md ## Stack Node.js, Express, PostgreSQL ## Commands npm install npm test npm run build ## Rules - Write clean code - Add tests - Follow best practices
After (precise)
# AGENTS.md ## Project Payment processing API. Express 5 + TypeScript 5.4 + PostgreSQL 16. Strict correctness requirements. ## Structure src/routes/ → HTTP handlers only src/services/ → Business logic src/repos/ → SQL queries (no ORM) src/models/ → Zod schemas + types ## Build & Test pnpm install --frozen-lockfile pnpm test # vitest, must exit 0 pnpm build # tsc --strict, no errors ## Conventions - Errors: throw AppError, never raw - Validation: Zod in models/, check in middleware - Routes MUST NOT contain logic - Services MUST NOT import routes
The "before" version sounds professional but gives the agent zero actionable information. "Write clean code" is subjective. "Follow best practices" is meaningless without specifying which practices. The "after" version is terse, specific, and leaves no room for interpretation.
Monorepo tip: nested AGENTS.md files
If you work in a monorepo, put a root AGENTS.md with shared conventions and a nested AGENTS.md in each package with package-specific rules. Agents automatically pick up the nearest AGENTS.md in the directory tree.
project/
├── AGENTS.md # Shared: monorepo conventions, CI commands
├── apps/
│ ├── web/
│ │ └── AGENTS.md # React + Vite conventions, component patterns
│ └── api/
│ └── AGENTS.md # Express conventions, DB query patterns
└── packages/
└── shared/
└── AGENTS.md # "This package is imported by apps. No side effects."Step 3: Learn to Decompose Tasks
The number one reason agent output disappoints: the task was too big. Agents work best on focused, well-scoped tasks. When you hand an agent a vague, multi-step feature, it loses context halfway through and the output degrades.
The sizing checklist
Before you delegate a task, run it through this checklist. If you can't answer yes to all four, break the task down further.
Single responsibility?
The task does one thing. "Add an endpoint" — yes. "Add auth and also refactor the user model" — no.
Clear inputs and outputs?
You can describe exactly what exists before and what should exist after. No ambiguity about "done."
Fits in ~1-3 files?
If the task touches more than three files, it's probably too big. More files = more context = more drift.
Testable in isolation?
You can verify the output without running the entire system. Unit tests, a curl command, a type check.
Decomposition in practice
Let's take a real feature — "add password reset" — and break it into agent-sized tasks.
| Task | Scope | Files touched |
|---|---|---|
1. Add ResetToken Zod schema | Type + validation only | models/reset-token.ts |
2. Add POST /forgot-password route | Route + service method | routes/auth.ts, services/auth.ts |
| 3. Add token generation + email sending | Service + integration | services/auth.ts, integrations/email.ts |
4. Add POST /reset-password route | Route + service method | routes/auth.ts, services/auth.ts |
| 5. Add tests for all reset flows | Tests only | __tests__/auth-reset.test.ts |
Each task is one prompt to the agent. Each produces a reviewable diff. Each can be tested independently. If task 3 goes wrong, you don't lose the work from tasks 1 and 2.
Step 4: Your First Autonomous Task
Let's walk through a real task end-to-end. We'll delegate task 1 from the password reset example above — adding a Zod schema for the reset token.
The prompt
Notice how the prompt references AGENTS.md conventions, gives explicit constraints, and defines exactly what "done" looks like:
Create a Zod schema for password reset tokens in src/models/reset-token.ts. Requirements: - Token field: string, min 32 chars (JWT format) - Email field: string, valid email - ExpiresAt field: date, must be in the future - CreatedAt field: date, defaults to now - UsedAt field: date, nullable (null means unused) Export: - The Zod schema as resetTokenSchema - The inferred TypeScript type as ResetToken Follow the patterns in src/models/user.ts for style reference. Run the type checker after creating the file.
What happens next
Here's what the agent does with that prompt (assuming you have a well-configured AGENTS.md):
1Reads context
Agent reads AGENTS.md and CLAUDE.md. Learns your conventions, directory structure, naming patterns.
2Reads reference
Agent reads src/models/user.ts to match your existing style. This is why we said "follow patterns in user.ts."
3Writes the file
Agent creates src/models/reset-token.ts with the Zod schema, matching your conventions.
4Runs verification
Agent runs the type checker (pnpm build or tsc). If it fails, agent fixes the error and tries again.
5Reports results
Agent shows you the file it created and the type checker output. You review.
Review the output — not the code
At Level 2, you're still reviewing the code. But start training yourself to review against the prompt, not line by line. Ask three questions:
Does it match the spec?
All five fields present? Correct types? Correct validation?
Does it follow conventions?
File location correct? Naming matches? Export style matches?
Does it pass checks?
Type checker passes? Linter passes? No new warnings?
If all three answers are "yes," you're done. The exact code shape doesn't matter — only the behavior and compliance. This mindset shift is what prepares you for Level 3, where you stop reviewing code entirely.
Step 5: Build the Feedback Loop
Your AGENTS.md is a living document. Every time the agent makes a mistake, that's not a failure — it's a signal that your context is incomplete. The feedback loop is how you turn agent mistakes into permanent improvements.
The three-step fix
When the agent produces wrong output, follow this exact process:
1. Identify the category
Was it a convention violation? A missing constraint? A wrong assumption about architecture?
Agent used console.log instead of the structured logger. → Convention gap.
2. Add the rule to AGENTS.md
Write the rule as a MUST/MUST NOT statement. Be specific enough that the agent can't misinterpret it.
Added: "Logging: MUST use src/lib/logger.ts. MUST NOT use console.log/warn/error anywhere."
3. Verify on the next task
On the next task, watch if the same mistake happens. If it does, your rule isn't specific enough — rewrite it.
Next task: agent uses logger correctly. Rule is working. Move on.
Common mistakes and their AGENTS.md fixes
Here are the most common agent mistakes teams encounter at Level 1-2, and the AGENTS.md rules that fix them:
| Agent mistake | AGENTS.md rule to add |
|---|---|
| Uses wrong import style | "Use ESM imports. No require(). No default exports." |
| Puts files in wrong directory | "Tests go in __tests__/ next to the source file, not in a top-level test/ dir." |
| Adds unnecessary dependencies | "MUST NOT add packages without explicit permission. Use built-in Node APIs first." |
| Over-engineers simple tasks | "Keep it simple. No abstractions for one-time operations. Three similar lines > a premature helper." |
| Ignores error handling patterns | "All errors MUST be AppError instances. Wrap external errors: AppError.from(err)" |
| Generates verbose comments | "Only add comments where logic isn't self-evident. Never add JSDoc to private methods." |
Step 6: Know When You're Ready for Level 3
Level 3 is where you stop pair-programming and start managing agent output. You're ready when these signals are consistent:
Agent output is predictable
You can predict the structure of what the agent will produce before it runs. Your AGENTS.md is specific enough that surprise is rare.
Review time is under 5 minutes
You spend more time writing the prompt than reviewing the output. The output consistently matches your expectations.
AGENTS.md is stable
You're no longer adding new rules every session. The document has settled. Agent mistakes are rare, not systemic.
You trust the permission model
You've moved to "Accept Edits" mode and feel comfortable. The hooks catch the edge cases you care about.
When you're seeing all four signals, it's time for Part 3: Spec-Driven Development — where you write structured specifications and stop reviewing code entirely. The agent works from the spec. You review the results, not the diff.
Your Checklist
Bookmark this. Work through it over the next week. Each item builds on the last.
1Install and configure Claude Code
Run /init, generate your starter CLAUDE.md, set Default permission mode.
2Add a formatting hook
PostToolUse hook that auto-formats after every Edit/Write. One less thing to review.
3Write your AGENTS.md
All five sections: project, structure, commands, conventions, architecture rules. Be precise.
4Delegate 5 small tasks
Start with types, tests, and utility functions. Review the output. Note what went wrong.
5Update AGENTS.md from mistakes
Every mistake becomes a MUST/MUST NOT rule. Run the feedback loop 3 times.
6Delegate a medium task
A route + service method. Two files. One clear responsibility. Review against the prompt.
7Assess your readiness
Check the four signals. If they're consistent, you're ready for Part 3.
References
- Claude Code Documentation — official setup guide, CLI reference, and feature documentation
- Claude Code Subagents Guide — creating and configuring custom subagents for specialized tasks
- AGENTS.md Specification — the open standard for guiding AI coding agents, adopted by 60,000+ projects
- How StrongDM's AI Team Build Software Without Looking at the Code — real-world dark factory case study that inspired this series
- Creating the Perfect CLAUDE.md for Claude Code — practical guide to writing effective CLAUDE.md files
- The Dark Factory Pattern: Moving From AI-Assisted to Fully Autonomous Coding — architectural breakdown of the full dark factory pattern
Up Next
Part 3: Spec-Driven Development (coming soon)