Without automation, state files, verifiers, or automated schedules, you are missing the biggest shift in AI development. The leverage point has officially moved — from typing prompts to designing systems that prompt.
This is the 14-step roadmap to make that shift, sourced from Anthropic engineering docs, Addy Osmani’s deep dives on loop engineering, and recent measurement studies.
We can break this journey into three distinct tiers: figuring out if you actually need a loop, mastering the five essential building blocks, and building the smallest viable loop that works without draining your wallet.
14 steps. 3 tiers. Stop prompting. Start designing.
In this guide, you’ll discover:
For the last two years, getting value out of a coding agent followed a predictable pattern: write a prompt, share the context, review the output, and write the next prompt. The agent was a tool, and you held it the entire time. That phase is ending.
Loop engineering is the practice of building a small system that finds the work, hands it to the agent, checks the result, records the outcome, and decides the next move — entirely on its own. You design the system once, and the system prompts the agent from then on.
Loops earn their keep under very specific circumstances. Miss just one condition, and your loop will cost far more than it returns. To keep it entirely honest — and to bypass the typical overhyped tech threads — a loop only makes sense if it passes these four tests:
The economics of loop engineering are not universal. The builders calling loops “obvious” typically enjoy unmetered enterprise API access. The people calling it “reckless” are usually solo developers on a $20 consumer plan trying to run heavy verification loops while dodging surprise invoices.
While the 4-condition test handles your high-level strategy, this tactical checklist is what you run on a specific task before turning it over to a loop. If you can’t check every box, keep it as a manual prompt.
The task occurs at least once a week.
A test, type check, build, or linter can instantly reject bad output.
The agent has a live environment to run and test its changes.
The loop has an absolute hard stop (token cap, timeout, or iteration limit).
A human approval gate exists before any merge or production deployment.
Automations turn a single manual run into an ongoing system. They trigger based on a schedule, a repository event, or a specific condition. They act as the heartbeat of your loop; everything else hangs off them.
Modern developer environments approach this through specific primitives:
> /loop 30m /goal All tests in test/auth pass and lint is clean.
Scan src/auth for new failures, propose fixes in claude/auth-fixes,
open draft PR when goal holds.
▲ Claude Cron Create(*/30 * * * * : auth quality loop)
Stop condition: tests pass + lint clean (verified by independent checker)
✓ Scheduled. Will continue past intermediate completions until goal condition is met.
The moment you run multiple agents simultaneously, files collide. Two agents writing to the same file causes the exact same merge conflicts as two engineers committing to the exact same lines without syncing.
The solution is a Git Worktree — a separate working directory on its own isolated branch that shares the same repository history.
Isolating agent execution environments using Git Worktrees to prevent merge conflicts.By utilizing isolation flags (like --worktree), subagents get a clean checkout that automatically removes itself after execution. Worktrees eliminate mechanical file collisions, but remember: your own review bandwidth remains the ultimate ceiling on how many parallel loops you should run.
A Skill prevents your loop from acting like a goldfish that has to re-learn your entire project context every single session. Skills are structured as dedicated directories containing a SKILL.md file alongside necessary helper scripts, references, and assets.
Without skills, a loop wastes immense token volume re-deriving your architecture rules from scratch on every single cycle. With skills, intent compounds. Your team’s architectural conventions, build steps, and historical “don’t do this because of that outage” notes are written once on the outside and read by the agent on every run.
# CI Triage Skill
## Classification Rules
- env: Missing secrets or unprovisioned infrastructure. -> Escalate to human.
- flake: Test passes on a clean retry without code changes. -> File a report.
- bug: Deterministic failure tied directly to a recent commit. -> Draft a fix.
## Fix Patterns
- Auth tests -> Verify src/auth/middleware first.
- Database tests -> Check if recent migrations were applied in the CI env.
## Never Do
- Never disable a failing test to pass the build; always escalate.
- Never touch code inside src/payments/ or src/billing/.
A loop that can only see your local filesystem is severely limited. Connectors, built on the Model Context Protocol (MCP), give your agent the ability to read your issue trackers, query live databases, hit staging APIs, and drop notifications into communication channels.
Connectors are the reason an agent moves from saying “here is the fix” to actively opening the PR, linking the tracking ticket, and alerting the team over Slack once the build turns green.
The most critical structural pattern in loop engineering is separating the agent that writes code from the agent that verifies it. As Addy Osmani points out, the model that wrote the code is always “way too nice grading its own homework.” This maps directly to the Evaluator-Optimizer pattern where one model generates the code, a completely separate sub-agent critiques it against the specification, and the cycle repeats.
The Evaluator-Optimizer pattern: dividing labor between a Generator and a Verifier.Modern setups allow you to declare teams of subagents via local configuration files. You can configure your explorer to be a fast, cost-efficient model, while assigning your security and verification checker to a high-reasoning model running on maximum effort. Sub-agents burn more tokens since each one performs its own processing, but a verifier you actually trust is the only reason you can walk away.
This is a component that sounds almost too simple to matter, yet it forms the structural backbone of every production loop. Whether it’s a Markdown file, a Linear board, or a JSON blob, you must maintain a persistent record of state outside of the active conversation window.
LLM sessions are stateless and naturally lose context over long durations. A loop without a persistent state file restarts its entire mental model from zero on every run; a loop with a state file seamlessly resumes where it left off.
{
"loop_id": "ci-triage",
"last_run": "2026-06-15T03:30:00Z",
"status": {
"failures_classified": 7,
"fixes_drafted": 3,
"escalated_to_humans": 4
},
"in_progress": [
{"branch": "claude/fix-auth-refresh", "status": "awaiting_ci"}
],
"lessons_learned": [
"PowerShell runner hits TLS issues on Windows; always fallback to bash.",
"E2E checkouts require the stripe webhook secret; skip if missing."
]
}
If your target task successfully passed the initial 4-condition test, your goal is to build the absolute smallest functional loop possible. Avoid complex multi-agent swarms out of the gate. Stick to these four basic pillars:
Anatomy of a Minimum Viable Loop (MVL). A linear, four-part architectural pipeline: scheduled Automation, targeted Skill, a JSON-based State File for continuity, and an automated Gate for strict quality verification.Execution Order Rules: Always make sure a manual run is 100% reliable first. Document that process into a single static Skill. Wrap that skill into a functional Loop execution. Only then do you schedule it as an automated background process. Skipping straight to scheduling is the number one reason loops fail in production.
Engineer Geoffrey Huntley documented this specific failure mode and named it after Ralph Wiggum (the cartoon character from The Simpsons known for being completely oblivious to his own mistakes). In this scenario, an agent prematurely emits a completion token before the job is genuinely complete, causing the loop to exit on a half-done task while claiming everything is fine. Without objective, rigid gates, loops will routinely fail quietly while continuing to drain your budget.
Anatomy of a silent failure where superficial success metrics mask underlying system crashes.This non-technical failure mode grows more dangerous as your automated loops get better, not worse. Addy Osmani highlights two major psychological risks to watch out for:
An unattended automation loop running with repository access is a live, unattended attack surface. You must architect your loops to explicitly defend against these clear vectors:
Did you run the 4-Condition Test? (Step 02)
Is there an objective, automated gate (test/linter/build) instead of just an LLM "review"?
Are your maker and checker tasks split across completely separate agents?
Does the loop write its progress to a persistent state file?
Have you configured a strict, unbypassable token budget cap?
Is the loop blocked from touching subjective, architectural, or payment code?
Are you actively reading every line of the diffs before hitting merge?
For the past two years, the ultimate leverage in working with coding agents lived directly at the prompt level. Success was determined by who wrote the best instructions, provided the cleanest context window, and generated the best one-shot output.
That phase is officially over. LLMs have become sophisticated enough that the true engineering leverage point has moved up a level: into the design of the system that orchestrates them. Your value as an engineer now lies in defining what they work on, when they trigger, how they log state, and what automated gates validate their success.
But remember the core truth of this shift: loop engineering isn’t for every developer, and it isn’t for every codebase. Until your target task repeats regularly, your validation is fully automated, your budget can absorb the computational overhead, and your agent has access to raw runtime tools, stay in the chair.
Miss just one condition, and a loop will cost you more than it ever saves. If you pass the test, build small, build structured, and maintain the human gate. Build the loop. Stay the engineer.
Loop Engineering: The 14-Step Roadmap from Prompter to Loop Designer was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.


