ralph-loop – Deep Dive Guide
ralph-loop is Anthropic's official implementation of the Ralph Wiggum pattern as a Claude Code plugin. It uses a stop hook to re-feed the same prompt whenever Claude tries to end the session — no external script, no second terminal, no dependencies. One plugin install and you have an autonomous loop.
Repo: anthropics/claude-plugins-official → ralph-loop
1. How the stop hook works
Most Ralph variants run an external Bash/Bun/Go process that calls the agent in a loop. ralph-loop is different: the loop lives inside the Claude Code session itself.
┌─────────────────────────────────────────────────────────┐
│ Claude Code session │
│ │
│ 1. /ralph-loop "…" starts the session │
│ 2. Claude works on the prompt │
│ 3. Claude tries to stop the session │
│ └─ stop hook (stop-hook.sh) intercepts the exit │
│ 4. Hook injects the SAME prompt again │
│ 5. Claude reads its own git diff + log → continues │
│ 6. Repeat until: │
│ • Output contains the completion promise │
│ • OR max-iterations is reached │
│ • OR /cancel-ralph is called │
└─────────────────────────────────────────────────────────┘
Because state lives in the filesystem and git history, each restart is not "starting over" — Claude reads what it already built and picks up where it left off.
2. Plugin structure
- README.md
3. Installation & basic usage
# Install the plugin
/plugin install ralph-loop
# Start a loop
/ralph-loop "<your prompt>" \
--completion-promise "COMPLETE" \
--max-iterations 50
# Cancel the loop at any time
/cancel-ralph
Required parameters
| Parameter | What it does |
|---|---|
<prompt> | The task — runs again every iteration |
--completion-promise | String Claude must output to stop the loop |
--max-iterations | Hard limit on how many times to re-run (always set this) |
Optional parameters
| Parameter | Default | What it does |
|---|---|---|
--branch | current | Git branch for the work |
--no-commit | false | Disable auto-commit per iteration |
--model | claude-sonnet-4-5 | Override the model |
4. Windows pitfall
On Windows the stop hook can fail with wsl: Unknown key 'automount.crossDistro' or execvpe(/bin/bash) failed.
Fix: Edit ~/.claude/plugins/cache/.../hooks/hooks.json and set the hook command explicitly to Git Bash:
"command": "\"C:/Program Files/Git/bin/bash.exe\" ${CLAUDE_PLUGIN_ROOT}/hooks/stop-hook.sh"
Use Git/bin/bash.exe (with PATH wrappers), not Git/usr/bin/bash.exe (raw MinGW).
5. Prompt-writing handbook
This is the core skill. A ralph-loop prompt runs dozens of times. A vague prompt produces dozens of wrong iterations. A precise prompt produces a working result, usually in 5–15 iterations.
The anatomy of a good ralph-loop prompt
[1] CONTEXT — what exists already, what stack, what constraints
[2] TASK — what exactly must be built/changed
[3] CRITERIA — specific, testable acceptance conditions
[4] VALIDATION — the command(s) that prove it works
[5] MEMORY — remind Claude to read AGENTS.md / progress.txt
[6] COMPLETION — when to output the promise
Not every prompt needs all six, but the more complex the task, the more you need all of them.
6. Bad prompts vs. good prompts
6.1 REST API
Build an API for users.
Output COMPLETE when done.
Why it fails:
No endpoints specified. No authentication. No validation. No test requirement. Claude will produce something — but every iteration will produce something different, and "done" is undefined.
Stack: Laravel 11, Sanctum, PostgreSQL, PHPUnit.
Build a User management REST API:
POST /api/users — create user (name, email, password; unique email)
GET /api/users/{id} — show user (auth required)
PUT /api/users/{id} — update user (auth required, own user only)
DELETE /api/users/{id} — soft-delete (auth required, own user only)
Acceptance criteria:
1. All routes respond with the correct HTTP status codes (201, 200, 403, 404).
2. Validation errors return 422 with an "errors" key.
3. `php artisan test --filter UserApiTest` passes with ≥ 90 % coverage on the controller.
4. An OpenAPI doc is generated at /api/documentation.
Read AGENTS.md for existing conventions before starting each iteration.
When all criteria are met → output <promise>COMPLETE</promise>.
6.2 Laravel feature
Add authentication to my Laravel app.
COMPLETE when done.
Why it fails:
"Authentication" could mean anything: Sanctum, Passport, Breeze, Fortify, custom JWT. No routes, no tests, no session vs. token distinction defined.
Stack: Laravel 11, Sanctum, MySQL, Pest.
Add token-based API authentication:
POST /api/auth/register — name, email, password; returns bearer token
POST /api/auth/login — email, password; returns bearer token
POST /api/auth/logout — invalidates the current token (auth:sanctum)
GET /api/auth/me — returns the authenticated user (auth:sanctum)
Rules:
- Email must be unique; duplicate registration returns 409.
- Password: min 8 chars, at least one uppercase, one digit.
- Tokens expire after 24 h (set in Sanctum config).
- Do NOT touch existing routes or migrations.
Tests: `php artisan test --filter AuthTest` must pass (create the test if it doesn't exist).
Read AGENTS.md before each iteration for project conventions.
When all tests are green → output <promise>COMPLETE</promise>.
6.3 Laravel refactoring
Refactor the UserController to be cleaner.
Output COMPLETE when done.
Why it fails:
"Cleaner" is subjective. The agent will keep refactoring indefinitely, undoing its own work in the next iteration.
Refactor UserController (app/Http/Controllers/UserController.php) according to these rules:
1. Extract business logic into UserService (app/Services/UserService.php).
2. Extract validation rules into UserRequest (app/Http/Requests/UserRequest.php).
3. Each controller method must be ≤ 10 lines.
4. No direct Eloquent calls in the controller — only via UserService.
5. All existing tests in tests/Feature/UserTest.php must still pass.
Do NOT change the public method signatures of UserController.
Do NOT add new routes.
Validation command: `php artisan test --filter UserTest`
When all tests pass and the controller meets all 5 rules → output <promise>COMPLETE</promise>.
6.4 TypeScript / Node.js API
Create a todo API in TypeScript.
Say DONE when finished.
Why it fails:
Framework? Validation? Database? Tests? Error handling? Undefined on all counts.
Stack: Hono, Zod, Drizzle ORM, SQLite (dev), Vitest.
Build a Todo API:
POST /todos — { title: string (min 3), done?: boolean }
GET /todos — list all todos, optional ?done=true|false filter
GET /todos/:id — single todo or 404
PATCH /todos/:id — partial update { title?, done? }
DELETE /todos/:id — 204 or 404
Rules:
- All inputs validated with Zod; invalid → 400 with { error, details }.
- Timestamps: created_at, updated_at (ISO 8601 in responses).
- Drizzle migration in /drizzle/migrations/.
- Tests: `pnpm vitest run` must pass; ≥ 80 % branch coverage on the route handlers.
Read AGENTS.md before starting each iteration.
When all tests pass → output <promise>COMPLETE</promise>.
6.5 Business goal / performance
Make the app faster.
Output COMPLETE when it's fast.
Why it fails:
"Faster" is not measurable. The loop will never know when it's done.
Performance target: reduce the response time of GET /api/products (with 10,000 rows) from the current ~1,200 ms to < 200 ms.
Context:
- Stack: Laravel 11, PostgreSQL, Redis available.
- Current query: Product::with('category','tags')->paginate(20) — no indexes.
Allowed changes:
1. Add DB indexes via a new migration.
2. Implement Redis caching with a 5-minute TTL on the query result.
3. Replace eager loading with a targeted select() if needed.
Do NOT change the response shape or any existing tests.
Validation:
Run: `php artisan tinker --execute="echo app(\App\Services\ProductBenchmark::class)->measure();"``
It must output a number below 200.
When the benchmark returns < 200 → output <promise>COMPLETE</promise>.
6.6 Hosting / infrastructure / DevOps
Add Docker to this project.
COMPLETE when done.
Why it fails:
Dev Docker or production Docker? What services? What ports? What environment variables?
Add Docker Compose for local development of this Laravel 11 app.
Services required:
app — PHP 8.3-FPM, listens internally on port 9000
nginx — serves app on host port 8080, config in docker/nginx/default.conf
db — Postgres 16, user=laravel, password=secret, db=laravel, port 5432
redis — Redis 7, port 6379
Files to create:
Dockerfile — php:8.3-fpm-alpine, composer install, APP_ENV=local
docker-compose.yml — all four services with named volumes
docker/nginx/default.conf
Rules:
- `docker compose up -d` must start all services with no errors.
- `docker compose exec app php artisan migrate --force` must succeed.
- No existing source files may be modified.
- .env.example must get entries for DB_HOST=db, REDIS_HOST=redis.
Validation: `docker compose up -d && sleep 5 && curl -s http://localhost:8080 | grep -q "Laravel" && echo OK`
When validation prints OK → output <promise>COMPLETE</promise>.
6.7 AI agent / prompt engineering
Build an AI agent that can answer questions about our docs.
COMPLETE when done.
Why it fails:
What LLM? What retrieval? What tools? No evaluation criteria. No test for "it works."
Build a RAG agent for our Markdown documentation (docs/ folder).
Stack: Python 3.12, LangChain 0.3, OpenAI (gpt-4o-mini for embeddings + gpt-4o for answers), Chroma (local).
Steps:
1. Chunk all .mdx/.md files in docs/ recursively (chunk_size=800, overlap=80).
2. Embed via text-embedding-3-small, store in chroma_db/ (local persist).
3. On query: retrieve top-5 chunks, pass to gpt-4o as context, return answer + sources.
4. CLI: `python agent.py "What is ralph-loop?"` → answer printed, sources listed.
Acceptance criteria:
1. `python agent.py "What is the completion promise in ralph-loop?"` returns an answer that contains the word "COMPLETE".
2. `python agent.py "What is snarktank/ralph?"` returns an answer that mentions "Bash".
3. `pytest tests/test_agent.py -v` passes (write tests if they don't exist).
Read AGENTS.md before each iteration.
When all three criteria pass → output <promise>COMPLETE</promise>.
6.8 Frontend / React component
Build a data table component in React.
COMPLETE when it looks good.
Why it fails:
"Looks good" is not a test. Ralph can't run a visual check. Every iteration will produce a different UI and the loop will never end.
Stack: React 19, TypeScript, TanStack Table v8, Tailwind CSS 4, Vitest + Testing Library.
Build a <DataTable<T>> component in src/components/DataTable.tsx:
Props:
data: T[]
columns: ColumnDef<T>[]
pageSize?: number (default 20)
onRowClick?: (row: T) => void
Features:
- Client-side pagination (pageSize rows per page, prev/next buttons)
- Click on a column header → sort ascending; second click → descending
- Search input above the table filters all string columns (case-insensitive)
- Empty state: "No results found" centered in the table body
Tests in src/components/DataTable.test.tsx:
1. Renders the correct number of rows per page.
2. Sort on "name" column: ascending order matches Array.sort().
3. Search "alice" with fixture data returns only the Alice row.
4. onRowClick is called with the correct row when clicked.
`pnpm vitest run DataTable` must pass all 4 tests.
TypeScript must compile without errors: `pnpm tsc --noEmit`.
Read AGENTS.md before each iteration.
When tests + typecheck pass → output <promise>COMPLETE</promise>.
6.9 Database migration
Add a payments table.
COMPLETE when done.
Why it fails:
What columns? What foreign keys? What indexes? What validation in the model?
Stack: Laravel 11, PostgreSQL.
Create a payments table via a new migration:
Columns:
id uuid, primary key
order_id uuid, foreign key → orders.id (cascade delete)
amount decimal(10,2), not null
currency char(3), not null, default 'EUR'
status enum('pending','completed','failed','refunded'), default 'pending'
provider varchar(50), not null (e.g. "stripe", "paypal")
provider_ref varchar(255), nullable, unique
paid_at timestamp, nullable
created_at / updated_at timestamps
Indexes:
- orders(order_id)
- (status, created_at) composite
- provider_ref unique
Model: app/Models/Payment.php
- $guarded = []
- casts: amount → decimal:2, status → enum, paid_at → datetime
- Relationship: belongsTo(Order::class)
- Scope: scopePending(), scopeCompleted()
Tests: `php artisan test --filter PaymentTest` must pass (create the test).
`php artisan migrate --pretend` must run without errors.
When migration + tests pass → output <promise>COMPLETE</promise>.
6.10 Goal with no code – planning / research
Write a technical spec for our new feature.
COMPLETE when it's done.
Why it fails:
No length, no required sections, no definition of "done." The agent will output something arbitrary on the first iteration and output COMPLETE immediately.
Write a technical specification for a "Waitlist" feature in SPEC.md.
Required sections (all must be present and non-empty):
## Overview — 2–3 sentences, the user problem
## User Stories — ≥ 3 acceptance-criteria-style stories
## Data Model — table definitions with column names and types
## API Endpoints — HTTP method, path, request body, response
## Edge Cases — ≥ 5 specific edge cases with resolution
## Out of Scope — explicit list of what is NOT included
## Open Questions — ≥ 2 unresolved decisions that need a human call
Rules:
- Only describe the Waitlist feature; do not spec any other feature.
- Do not write any code.
- Total length: 400–800 words.
Validation: `wc -w SPEC.md` must print a number between 400 and 800.
All 7 sections must be present: `grep -c "^## " SPEC.md` must output 7.
When both validation commands pass → output <promise>COMPLETE</promise>.
7. Prompt checklist
Before starting a loop, answer yes to each:
- Stack specified — language, framework, version, test tool
- Scope bounded — what must NOT be changed is stated explicitly
- Acceptance criteria are statements, not adjectives — "response time < 200 ms" not "fast"
- A shell command proves completion — tests, grep, curl, wc
- AGENTS.md referenced — "Read AGENTS.md before each iteration" in the prompt
- Completion promise is unique —
<promise>COMPLETE</promise>, not "done" or "finished" -
--max-iterationsis set — prevents runaway loops on broken tasks
8. Token and cost control
| Technique | Saving |
|---|---|
Use --model claude-haiku-4-5 for exploration iterations, switch to Sonnet/Opus for final | 5–10× cheaper per iteration |
Set --max-iterations 10 for the first run; raise only if needed | Prevents runaway costs |
| Keep the prompt under 2,000 tokens | Shorter context = cheaper per call |
Use --no-commit and squash later | Cleaner history, no cost impact but cleaner workflow |
| Break big features into 3–5 smaller loops | Faster convergence, more control per unit |
9. Combining ralph-loop with other tools
# 1. Start with snarktank/ralph to generate prd.json from a conversation
/skill prd
# 2. Run the stories with ralph-loop
/ralph-loop "$(cat prd.json)" --completion-promise "COMPLETE" --max-iterations 30
# 3. Feed the result to ralphex for automated review
ralphex --review --branch feature/my-feature
# 4. If you want parallel tasks on the same result → ralphy
ralphy --prd prd.json --parallel --max-parallel 3 --create-pr
See the Ralph overview for the recommended full-day stack.