env.dev

The Dark Factory Pattern Part 2: Setting Up Your AI Development Foundation

Hands-on guide to setting up your coding agent, writing a production-grade AGENTS.md, decomposing tasks for AI delegation, and building feedback loops that improve agent output over time.

Dark Factory Series

Part 1: The Implementation PlaybookPart 2: Setting Up Your AI Dev Foundation (you are here)

In Part 1, we mapped the six levels of AI-driven development — from manual coding all the way to the fully autonomous dark factory. Now it's time to actually build something. This guide walks you through setting up your coding agent, writing a production-grade AGENTS.md, delegating your first real task, and building the feedback loops that make every future task go smoother.

By the end of this guide, you'll have a working setup that puts you solidly at Level 2 — pair-programming with AI — and ready to move toward Level 3 in the next part.

What We're Building Today

Part 1 was theory. This is practice. By the end of this guide you'll have four things set up:

A configured coding agent

Claude Code installed, permissions set, hooks wired up — ready to do real work.

A production-grade AGENTS.md

Not a template. A living document tuned to your actual codebase.

A task decomposition habit

How to break features into agent-sized chunks that produce reliable output.

A feedback loop

A process for learning from agent mistakes and preventing them from happening again.

CONTEXT LAYERCLAUDE.mdAGENTS.md.claude/settings.jsonhooks/subagents/CODING AGENTReads context filesPlans approachWrites code + testsOUTPUTCode changesTest resultsPR / commitBetter context → better output. Everything starts with the context layer.

Step 1: Set Up Your Coding Agent

We're using Claude Code throughout this series. It's a terminal-native agent that reads your codebase, edits files, runs commands, and creates PRs. If you're using a different agent (Cursor, Windsurf, Devin), the AGENTS.md patterns still apply — but the setup steps will differ.

Install and initialize

# Install Claude Code
curl -fsSL https://claude.ai/install.sh | bash

# Navigate to your project
cd your-project

# Start Claude Code — it will prompt you to log in on first run
claude

# Generate a starter CLAUDE.md from your project structure
/init

The /init command scans your project and generates a starter CLAUDE.md file. Think of this as a rough first draft — it captures your tech stack, directory structure, and build commands. We'll refine it heavily in the next section.

Understand the permission model

Before you let an agent loose on your code, understand the guardrails. Claude Code has three permission modes — start with the safest and loosen as you build trust:

ModeWhat it doesWhen to use it
DefaultPrompts you before file edits and commandsFirst week — learn what the agent does before trusting it
Accept EditsAuto-approves file changes, still asks for shell commandsAfter you've seen 10-20 edits you would have approved anyway
YOLOAuto-approves everything including shell commandsOnly in sandboxed / containerized environments

Start with Default mode. The temptation to skip ahead is strong — resist it. The first few sessions in Default mode teach you what kinds of operations the agent performs. That pattern recognition is what makes you comfortable loosening permissions later.

Add your first hook

Hooks are shell commands that fire automatically at specific points in Claude Code's lifecycle. They're how you add guardrails without slowing down. Start with one that auto-formats code after every edit — add this to your .claude/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write|MultiEdit",
        "hooks": [
          {
            "type": "command",
            "command": "npx @biomejs/biome format --write "$TOOL_INPUT_file_path""
          }
        ]
      }
    ]
  }
}

Swap biome format for prettier --write or whatever formatter your project uses — the pattern is the same. This hook means the agent can't ship badly formatted code, no matter what. You don't have to review formatting. You don't have to ask the agent to format. It just happens. That's the pattern: automate the things you'd always approve.

Step 2: Write a Production-Grade AGENTS.md

In Part 1, we showed a basic AGENTS.md template. Now we're going deeper. A good AGENTS.md is the difference between an agent that produces generic code you have to rewrite and one that produces code that looks like your team wrote it.

The five sections that matter

Every AGENTS.md should have these five sections. Order matters — agents process files top-to-bottom, and the most important context should come first.

1. Project overview

What this project is, in one paragraph. The agent uses this to make architectural decisions when your instructions are ambiguous.

High-performance REST API for a fintech platform. Handles payment processing, account management, and regulatory reporting. Uptime and correctness are non-negotiable.

2. Directory structure

The map of your codebase. Don't just list folders — explain what goes where and why.

src/routes/ → Express handlers (thin: validate input, call service, return response)
src/services/ → Business logic (no HTTP awareness, no database imports)
src/repositories/ → Database queries (raw SQL, no ORMs)

3. Build & test commands

Exact commands. No ambiguity. Include what "success" looks like.

pnpm test          # Must exit 0. Anything else = broken.
pnpm lint           # Uses Biome. Zero warnings policy.
pnpm build          # TypeScript strict mode. No implicit any.

4. Conventions

The rules your team actually follows. Be specific — "clean code" means nothing to an agent.

Errors: always throw AppError, never raw Error or string
Logging: structured JSON via src/lib/logger.ts
Naming: camelCase files, PascalCase classes, kebab-case routes

5. Architecture rules

The constraints that prevent architectural drift. Write these as MUST / MUST NOT rules — agents respond to strong language.

Services MUST NOT import from routes (dependency inversion)
All external API calls MUST go through src/integrations/
Never use `any` — prefer `unknown` with type narrowing

Before vs. after: a real AGENTS.md rewrite

Here's what a bad AGENTS.md looks like versus a good one — same project, same information, dramatically different agent output.

Before (vague)

# AGENTS.md

## Stack
Node.js, Express, PostgreSQL

## Commands
npm install
npm test
npm run build

## Rules
- Write clean code
- Add tests
- Follow best practices

After (precise)

# AGENTS.md

## Project
Payment processing API. Express 5
+ TypeScript 5.4 + PostgreSQL 16.
Strict correctness requirements.

## Structure
src/routes/    → HTTP handlers only
src/services/  → Business logic
src/repos/     → SQL queries (no ORM)
src/models/    → Zod schemas + types

## Build & Test
pnpm install --frozen-lockfile
pnpm test  # vitest, must exit 0
pnpm build # tsc --strict, no errors

## Conventions
- Errors: throw AppError, never raw
- Validation: Zod in models/, check
  in middleware
- Routes MUST NOT contain logic
- Services MUST NOT import routes

The "before" version sounds professional but gives the agent zero actionable information. "Write clean code" is subjective. "Follow best practices" is meaningless without specifying which practices. The "after" version is terse, specific, and leaves no room for interpretation.

Monorepo tip: nested AGENTS.md files

If you work in a monorepo, put a root AGENTS.md with shared conventions and a nested AGENTS.md in each package with package-specific rules. Agents automatically pick up the nearest AGENTS.md in the directory tree.

project/
├── AGENTS.md                    # Shared: monorepo conventions, CI commands
├── apps/
│   ├── web/
│   │   └── AGENTS.md            # React + Vite conventions, component patterns
│   └── api/
│       └── AGENTS.md            # Express conventions, DB query patterns
└── packages/
    └── shared/
        └── AGENTS.md            # "This package is imported by apps. No side effects."

Step 3: Learn to Decompose Tasks

The number one reason agent output disappoints: the task was too big. Agents work best on focused, well-scoped tasks. When you hand an agent a vague, multi-step feature, it loses context halfway through and the output degrades.

Feature RequestTask 1: API routeTask 2: Service logicTask 3: TestsSingle file~100-300 linesOne responsibilityClear input/outputIsolated scopeNo hidden depsToo big: "Build the entire auth system" — agent loses context, output degradesRight size: "Add POST /auth/login that validates credentials and returns a JWT"

The sizing checklist

Before you delegate a task, run it through this checklist. If you can't answer yes to all four, break the task down further.

Single responsibility?

The task does one thing. "Add an endpoint" — yes. "Add auth and also refactor the user model" — no.

Clear inputs and outputs?

You can describe exactly what exists before and what should exist after. No ambiguity about "done."

Fits in ~1-3 files?

If the task touches more than three files, it's probably too big. More files = more context = more drift.

Testable in isolation?

You can verify the output without running the entire system. Unit tests, a curl command, a type check.

Decomposition in practice

Let's take a real feature — "add password reset" — and break it into agent-sized tasks.

TaskScopeFiles touched
1. Add ResetToken Zod schemaType + validation onlymodels/reset-token.ts
2. Add POST /forgot-password routeRoute + service methodroutes/auth.ts, services/auth.ts
3. Add token generation + email sendingService + integrationservices/auth.ts, integrations/email.ts
4. Add POST /reset-password routeRoute + service methodroutes/auth.ts, services/auth.ts
5. Add tests for all reset flowsTests only__tests__/auth-reset.test.ts

Each task is one prompt to the agent. Each produces a reviewable diff. Each can be tested independently. If task 3 goes wrong, you don't lose the work from tasks 1 and 2.

Step 4: Your First Autonomous Task

Let's walk through a real task end-to-end. We'll delegate task 1 from the password reset example above — adding a Zod schema for the reset token.

The prompt

Notice how the prompt references AGENTS.md conventions, gives explicit constraints, and defines exactly what "done" looks like:

Create a Zod schema for password reset tokens in src/models/reset-token.ts.

Requirements:
- Token field: string, min 32 chars (JWT format)
- Email field: string, valid email
- ExpiresAt field: date, must be in the future
- CreatedAt field: date, defaults to now
- UsedAt field: date, nullable (null means unused)

Export:
- The Zod schema as resetTokenSchema
- The inferred TypeScript type as ResetToken

Follow the patterns in src/models/user.ts for style reference.
Run the type checker after creating the file.

What happens next

Here's what the agent does with that prompt (assuming you have a well-configured AGENTS.md):

1

Reads context

Agent reads AGENTS.md and CLAUDE.md. Learns your conventions, directory structure, naming patterns.

2

Reads reference

Agent reads src/models/user.ts to match your existing style. This is why we said "follow patterns in user.ts."

3

Writes the file

Agent creates src/models/reset-token.ts with the Zod schema, matching your conventions.

4

Runs verification

Agent runs the type checker (pnpm build or tsc). If it fails, agent fixes the error and tries again.

5

Reports results

Agent shows you the file it created and the type checker output. You review.

Review the output — not the code

At Level 2, you're still reviewing the code. But start training yourself to review against the prompt, not line by line. Ask three questions:

Does it match the spec?

All five fields present? Correct types? Correct validation?

Does it follow conventions?

File location correct? Naming matches? Export style matches?

Does it pass checks?

Type checker passes? Linter passes? No new warnings?

If all three answers are "yes," you're done. The exact code shape doesn't matter — only the behavior and compliance. This mindset shift is what prepares you for Level 3, where you stop reviewing code entirely.

Step 5: Build the Feedback Loop

Your AGENTS.md is a living document. Every time the agent makes a mistake, that's not a failure — it's a signal that your context is incomplete. The feedback loop is how you turn agent mistakes into permanent improvements.

Delegate taskAgent deliversYou review outputUpdate CLAUDE.mdContinuousimprovement

The three-step fix

When the agent produces wrong output, follow this exact process:

1. Identify the category

Was it a convention violation? A missing constraint? A wrong assumption about architecture?

Agent used console.log instead of the structured logger. → Convention gap.

2. Add the rule to AGENTS.md

Write the rule as a MUST/MUST NOT statement. Be specific enough that the agent can't misinterpret it.

Added: "Logging: MUST use src/lib/logger.ts. MUST NOT use console.log/warn/error anywhere."

3. Verify on the next task

On the next task, watch if the same mistake happens. If it does, your rule isn't specific enough — rewrite it.

Next task: agent uses logger correctly. Rule is working. Move on.

Common mistakes and their AGENTS.md fixes

Here are the most common agent mistakes teams encounter at Level 1-2, and the AGENTS.md rules that fix them:

Agent mistakeAGENTS.md rule to add
Uses wrong import style"Use ESM imports. No require(). No default exports."
Puts files in wrong directory"Tests go in __tests__/ next to the source file, not in a top-level test/ dir."
Adds unnecessary dependencies"MUST NOT add packages without explicit permission. Use built-in Node APIs first."
Over-engineers simple tasks"Keep it simple. No abstractions for one-time operations. Three similar lines > a premature helper."
Ignores error handling patterns"All errors MUST be AppError instances. Wrap external errors: AppError.from(err)"
Generates verbose comments"Only add comments where logic isn't self-evident. Never add JSDoc to private methods."

Step 6: Know When You're Ready for Level 3

Level 3 is where you stop pair-programming and start managing agent output. You're ready when these signals are consistent:

Agent output is predictable

You can predict the structure of what the agent will produce before it runs. Your AGENTS.md is specific enough that surprise is rare.

Review time is under 5 minutes

You spend more time writing the prompt than reviewing the output. The output consistently matches your expectations.

AGENTS.md is stable

You're no longer adding new rules every session. The document has settled. Agent mistakes are rare, not systemic.

You trust the permission model

You've moved to "Accept Edits" mode and feel comfortable. The hooks catch the edge cases you care about.

When you're seeing all four signals, it's time for Part 3: Spec-Driven Development — where you write structured specifications and stop reviewing code entirely. The agent works from the spec. You review the results, not the diff.

Your Checklist

Bookmark this. Work through it over the next week. Each item builds on the last.

1

Install and configure Claude Code

Run /init, generate your starter CLAUDE.md, set Default permission mode.

2

Add a formatting hook

PostToolUse hook that auto-formats after every Edit/Write. One less thing to review.

3

Write your AGENTS.md

All five sections: project, structure, commands, conventions, architecture rules. Be precise.

4

Delegate 5 small tasks

Start with types, tests, and utility functions. Review the output. Note what went wrong.

5

Update AGENTS.md from mistakes

Every mistake becomes a MUST/MUST NOT rule. Run the feedback loop 3 times.

6

Delegate a medium task

A route + service method. Two files. One clear responsibility. Review against the prompt.

7

Assess your readiness

Check the four signals. If they're consistent, you're ready for Part 3.

References

Up Next

Part 3: Spec-Driven Development (coming soon)