ralph-loop – Deep Dive Guide

What is ralph-loop?

ralph-loop is Anthropic's official implementation of the Ralph Wiggum pattern as a Claude Code plugin. It uses a stop hook to re-feed the same prompt whenever Claude tries to end the session — no external script, no second terminal, no dependencies. One plugin install and you have an autonomous loop.

Repo: anthropics/claude-plugins-official → ralph-loop

1. How the stop hook works

Most Ralph variants run an external Bash/Bun/Go process that calls the agent in a loop. ralph-loop is different: the loop lives inside the Claude Code session itself.

┌─────────────────────────────────────────────────────────┐
│  Claude Code session                                    │
│                                                         │
│  1. /ralph-loop "…" starts the session                  │
│  2. Claude works on the prompt                          │
│  3. Claude tries to stop the session                    │
│     └─ stop hook (stop-hook.sh) intercepts the exit     │
│  4. Hook injects the SAME prompt again                  │
│  5. Claude reads its own git diff + log → continues     │
│  6. Repeat until:                                       │
│       • Output contains the completion promise          │
│       • OR max-iterations is reached                    │
│       • OR /cancel-ralph is called                      │
└─────────────────────────────────────────────────────────┘

Because state lives in the filesystem and git history, each restart is not "starting over" — Claude reads what it already built and picks up where it left off.

2. Plugin structure

- README.md

3. Installation & basic usage

# Install the plugin
/plugin install ralph-loop

# Start a loop
/ralph-loop "<your prompt>" \
  --completion-promise "COMPLETE" \
  --max-iterations 50

# Cancel the loop at any time
/cancel-ralph

Required parameters

Parameter	What it does
`<prompt>`	The task — runs again every iteration
`--completion-promise`	String Claude must output to stop the loop
`--max-iterations`	Hard limit on how many times to re-run (always set this)

Optional parameters

Parameter	Default	What it does
`--branch`	current	Git branch for the work
`--no-commit`	false	Disable auto-commit per iteration
`--model`	claude-sonnet-4-5	Override the model

4. Windows pitfall

On Windows the stop hook can fail with wsl: Unknown key 'automount.crossDistro' or execvpe(/bin/bash) failed.

Fix: Edit ~/.claude/plugins/cache/.../hooks/hooks.json and set the hook command explicitly to Git Bash:

"command": "\"C:/Program Files/Git/bin/bash.exe\" ${CLAUDE_PLUGIN_ROOT}/hooks/stop-hook.sh"

Use Git/bin/bash.exe (with PATH wrappers), not Git/usr/bin/bash.exe (raw MinGW).

5. Prompt-writing handbook

This is the core skill. A ralph-loop prompt runs dozens of times. A vague prompt produces dozens of wrong iterations. A precise prompt produces a working result, usually in 5–15 iterations.

The anatomy of a good ralph-loop prompt

[1] CONTEXT      — what exists already, what stack, what constraints
[2] TASK         — what exactly must be built/changed
[3] CRITERIA     — specific, testable acceptance conditions
[4] VALIDATION   — the command(s) that prove it works
[5] MEMORY       — remind Claude to read AGENTS.md / progress.txt
[6] COMPLETION   — when to output the promise

Not every prompt needs all six, but the more complex the task, the more you need all of them.

6. Bad prompts vs. good prompts

6.1 REST API

Bad prompt

Build an API for users.
Output COMPLETE when done.

Why it fails:
No endpoints specified. No authentication. No validation. No test requirement. Claude will produce something — but every iteration will produce something different, and "done" is undefined.

Good prompt

Stack: Laravel 11, Sanctum, PostgreSQL, PHPUnit.

Build a User management REST API:
  POST   /api/users         — create user (name, email, password; unique email)
  GET    /api/users/{id}    — show user (auth required)
  PUT    /api/users/{id}    — update user (auth required, own user only)
  DELETE /api/users/{id}    — soft-delete (auth required, own user only)

Acceptance criteria:
  1. All routes respond with the correct HTTP status codes (201, 200, 403, 404).
  2. Validation errors return 422 with an "errors" key.
  3. `php artisan test --filter UserApiTest` passes with ≥ 90 % coverage on the controller.
  4. An OpenAPI doc is generated at /api/documentation.

Read AGENTS.md for existing conventions before starting each iteration.
When all criteria are met → output <promise>COMPLETE</promise>.

6.2 Laravel feature

Bad prompt

Add authentication to my Laravel app.
COMPLETE when done.

Why it fails:
"Authentication" could mean anything: Sanctum, Passport, Breeze, Fortify, custom JWT. No routes, no tests, no session vs. token distinction defined.

Good prompt

Stack: Laravel 11, Sanctum, MySQL, Pest.

Add token-based API authentication:
  POST /api/auth/register — name, email, password; returns bearer token
  POST /api/auth/login    — email, password; returns bearer token
  POST /api/auth/logout   — invalidates the current token (auth:sanctum)
  GET  /api/auth/me       — returns the authenticated user (auth:sanctum)

Rules:
  - Email must be unique; duplicate registration returns 409.
  - Password: min 8 chars, at least one uppercase, one digit.
  - Tokens expire after 24 h (set in Sanctum config).
  - Do NOT touch existing routes or migrations.

Tests: `php artisan test --filter AuthTest` must pass (create the test if it doesn't exist).

Read AGENTS.md before each iteration for project conventions.
When all tests are green → output <promise>COMPLETE</promise>.

6.3 Laravel refactoring

Bad prompt

Refactor the UserController to be cleaner.
Output COMPLETE when done.

Why it fails:
"Cleaner" is subjective. The agent will keep refactoring indefinitely, undoing its own work in the next iteration.

Good prompt

Refactor UserController (app/Http/Controllers/UserController.php) according to these rules:
  1. Extract business logic into UserService (app/Services/UserService.php).
  2. Extract validation rules into UserRequest (app/Http/Requests/UserRequest.php).
  3. Each controller method must be ≤ 10 lines.
  4. No direct Eloquent calls in the controller — only via UserService.
  5. All existing tests in tests/Feature/UserTest.php must still pass.

Do NOT change the public method signatures of UserController.
Do NOT add new routes.

Validation command: `php artisan test --filter UserTest`

When all tests pass and the controller meets all 5 rules → output <promise>COMPLETE</promise>.

6.4 TypeScript / Node.js API

Bad prompt

Create a todo API in TypeScript.
Say DONE when finished.

Why it fails:
Framework? Validation? Database? Tests? Error handling? Undefined on all counts.

Good prompt

Stack: Hono, Zod, Drizzle ORM, SQLite (dev), Vitest.

Build a Todo API:
  POST   /todos       — { title: string (min 3), done?: boolean }
  GET    /todos       — list all todos, optional ?done=true|false filter
  GET    /todos/:id   — single todo or 404
  PATCH  /todos/:id   — partial update { title?, done? }
  DELETE /todos/:id   — 204 or 404

Rules:
  - All inputs validated with Zod; invalid → 400 with { error, details }.
  - Timestamps: created_at, updated_at (ISO 8601 in responses).
  - Drizzle migration in /drizzle/migrations/.
  - Tests: `pnpm vitest run` must pass; ≥ 80 % branch coverage on the route handlers.

Read AGENTS.md before starting each iteration.
When all tests pass → output <promise>COMPLETE</promise>.

6.5 Business goal / performance

Bad prompt

Make the app faster.
Output COMPLETE when it's fast.

Why it fails:
"Faster" is not measurable. The loop will never know when it's done.

Good prompt

Performance target: reduce the response time of GET /api/products (with 10,000 rows) from the current ~1,200 ms to < 200 ms.

Context:
  - Stack: Laravel 11, PostgreSQL, Redis available.
  - Current query: Product::with('category','tags')->paginate(20) — no indexes.

Allowed changes:
  1. Add DB indexes via a new migration.
  2. Implement Redis caching with a 5-minute TTL on the query result.
  3. Replace eager loading with a targeted select() if needed.

Do NOT change the response shape or any existing tests.

Validation:
  Run: `php artisan tinker --execute="echo app(\App\Services\ProductBenchmark::class)->measure();"``
  It must output a number below 200.

When the benchmark returns < 200 → output <promise>COMPLETE</promise>.

6.6 Hosting / infrastructure / DevOps

Bad prompt

Add Docker to this project.
COMPLETE when done.

Why it fails:
Dev Docker or production Docker? What services? What ports? What environment variables?

Good prompt

Add Docker Compose for local development of this Laravel 11 app.

Services required:
  app    — PHP 8.3-FPM, listens internally on port 9000
  nginx  — serves app on host port 8080, config in docker/nginx/default.conf
  db     — Postgres 16, user=laravel, password=secret, db=laravel, port 5432
  redis  — Redis 7, port 6379

Files to create:
  Dockerfile           — php:8.3-fpm-alpine, composer install, APP_ENV=local
  docker-compose.yml   — all four services with named volumes
  docker/nginx/default.conf

Rules:
  - `docker compose up -d` must start all services with no errors.
  - `docker compose exec app php artisan migrate --force` must succeed.
  - No existing source files may be modified.
  - .env.example must get entries for DB_HOST=db, REDIS_HOST=redis.

Validation: `docker compose up -d && sleep 5 && curl -s http://localhost:8080 | grep -q "Laravel" && echo OK`

When validation prints OK → output <promise>COMPLETE</promise>.

6.7 AI agent / prompt engineering

Bad prompt

Build an AI agent that can answer questions about our docs.
COMPLETE when done.

Why it fails:
What LLM? What retrieval? What tools? No evaluation criteria. No test for "it works."

Good prompt

Build a RAG agent for our Markdown documentation (docs/ folder).

Stack: Python 3.12, LangChain 0.3, OpenAI (gpt-4o-mini for embeddings + gpt-4o for answers), Chroma (local).

Steps:
  1. Chunk all .mdx/.md files in docs/ recursively (chunk_size=800, overlap=80).
  2. Embed via text-embedding-3-small, store in chroma_db/ (local persist).
  3. On query: retrieve top-5 chunks, pass to gpt-4o as context, return answer + sources.
  4. CLI: `python agent.py "What is ralph-loop?"` → answer printed, sources listed.

Acceptance criteria:
  1. `python agent.py "What is the completion promise in ralph-loop?"` returns an answer that contains the word "COMPLETE".
  2. `python agent.py "What is snarktank/ralph?"` returns an answer that mentions "Bash".
  3. `pytest tests/test_agent.py -v` passes (write tests if they don't exist).

Read AGENTS.md before each iteration.
When all three criteria pass → output <promise>COMPLETE</promise>.

6.8 Frontend / React component

Bad prompt

Build a data table component in React.
COMPLETE when it looks good.

Why it fails:
"Looks good" is not a test. Ralph can't run a visual check. Every iteration will produce a different UI and the loop will never end.

Good prompt

Stack: React 19, TypeScript, TanStack Table v8, Tailwind CSS 4, Vitest + Testing Library.

Build a <DataTable<T>> component in src/components/DataTable.tsx:

Props:
  data: T[]
  columns: ColumnDef<T>[]
  pageSize?: number (default 20)
  onRowClick?: (row: T) => void

Features:
  - Client-side pagination (pageSize rows per page, prev/next buttons)
  - Click on a column header → sort ascending; second click → descending
  - Search input above the table filters all string columns (case-insensitive)
  - Empty state: "No results found" centered in the table body

Tests in src/components/DataTable.test.tsx:
  1. Renders the correct number of rows per page.
  2. Sort on "name" column: ascending order matches Array.sort().
  3. Search "alice" with fixture data returns only the Alice row.
  4. onRowClick is called with the correct row when clicked.

`pnpm vitest run DataTable` must pass all 4 tests.
TypeScript must compile without errors: `pnpm tsc --noEmit`.

Read AGENTS.md before each iteration.
When tests + typecheck pass → output <promise>COMPLETE</promise>.

6.9 Database migration

Bad prompt

Add a payments table.
COMPLETE when done.

Why it fails:
What columns? What foreign keys? What indexes? What validation in the model?

Good prompt

Stack: Laravel 11, PostgreSQL.

Create a payments table via a new migration:

Columns:
  id              uuid, primary key
  order_id        uuid, foreign key → orders.id (cascade delete)
  amount          decimal(10,2), not null
  currency        char(3), not null, default 'EUR'
  status          enum('pending','completed','failed','refunded'), default 'pending'
  provider        varchar(50), not null  (e.g. "stripe", "paypal")
  provider_ref    varchar(255), nullable, unique
  paid_at         timestamp, nullable
  created_at / updated_at  timestamps

Indexes:
  - orders(order_id)
  - (status, created_at) composite
  - provider_ref unique

Model: app/Models/Payment.php
  - $guarded = []
  - casts: amount → decimal:2, status → enum, paid_at → datetime
  - Relationship: belongsTo(Order::class)
  - Scope: scopePending(), scopeCompleted()

Tests: `php artisan test --filter PaymentTest` must pass (create the test).

`php artisan migrate --pretend` must run without errors.
When migration + tests pass → output <promise>COMPLETE</promise>.

6.10 Goal with no code – planning / research

Bad prompt

Write a technical spec for our new feature.
COMPLETE when it's done.

Why it fails:
No length, no required sections, no definition of "done." The agent will output something arbitrary on the first iteration and output COMPLETE immediately.

Good prompt

Write a technical specification for a "Waitlist" feature in SPEC.md.

Required sections (all must be present and non-empty):
  ## Overview          — 2–3 sentences, the user problem
  ## User Stories      — ≥ 3 acceptance-criteria-style stories
  ## Data Model        — table definitions with column names and types
  ## API Endpoints     — HTTP method, path, request body, response
  ## Edge Cases        — ≥ 5 specific edge cases with resolution
  ## Out of Scope      — explicit list of what is NOT included
  ## Open Questions    — ≥ 2 unresolved decisions that need a human call

Rules:
  - Only describe the Waitlist feature; do not spec any other feature.
  - Do not write any code.
  - Total length: 400–800 words.

Validation: `wc -w SPEC.md` must print a number between 400 and 800.
All 7 sections must be present: `grep -c "^## " SPEC.md` must output 7.

When both validation commands pass → output <promise>COMPLETE</promise>.

7. Prompt checklist

Before starting a loop, answer yes to each:

Stack specified — language, framework, version, test tool
Scope bounded — what must NOT be changed is stated explicitly
Acceptance criteria are statements, not adjectives — "response time < 200 ms" not "fast"
A shell command proves completion — tests, grep, curl, wc
AGENTS.md referenced — "Read AGENTS.md before each iteration" in the prompt
Completion promise is unique — <promise>COMPLETE</promise>, not "done" or "finished"
--max-iterations is set — prevents runaway loops on broken tasks

8. Token and cost control

Technique	Saving
Use `--model claude-haiku-4-5` for exploration iterations, switch to Sonnet/Opus for final	5–10× cheaper per iteration
Set `--max-iterations 10` for the first run; raise only if needed	Prevents runaway costs
Keep the prompt under 2,000 tokens	Shorter context = cheaper per call
Use `--no-commit` and squash later	Cleaner history, no cost impact but cleaner workflow
Break big features into 3–5 smaller loops	Faster convergence, more control per unit

9. Combining ralph-loop with other tools

# 1. Start with snarktank/ralph to generate prd.json from a conversation
/skill prd

# 2. Run the stories with ralph-loop
/ralph-loop "$(cat prd.json)" --completion-promise "COMPLETE" --max-iterations 30

# 3. Feed the result to ralphex for automated review
ralphex --review --branch feature/my-feature

# 4. If you want parallel tasks on the same result → ralphy
ralphy --prd prd.json --parallel --max-parallel 3 --create-pr

See the Ralph overview for the recommended full-day stack.

10. Further reading

Ralph Wiggum overview – all 6 implementations
Geoffrey Huntley: "Ralph Wiggum as a Software Engineer"
anthropics/claude-plugins-official
snarktank/ralph — the original community Bash variant
open-ralph-wiggum – dedicated guide

1. How the stop hook works​

2. Plugin structure​

3. Installation & basic usage​

Required parameters​

Optional parameters​

4. Windows pitfall​

5. Prompt-writing handbook​

The anatomy of a good ralph-loop prompt​

6. Bad prompts vs. good prompts​

6.1 REST API​

6.2 Laravel feature​

6.3 Laravel refactoring​

6.4 TypeScript / Node.js API​

6.5 Business goal / performance​

6.6 Hosting / infrastructure / DevOps​

6.7 AI agent / prompt engineering​

6.8 Frontend / React component​

6.9 Database migration​

6.10 Goal with no code – planning / research​

7. Prompt checklist​

8. Token and cost control​

9. Combining ralph-loop with other tools​

10. Further reading​

1. How the stop hook works

2. Plugin structure

3. Installation & basic usage

Required parameters

Optional parameters

4. Windows pitfall

5. Prompt-writing handbook

The anatomy of a good ralph-loop prompt

6. Bad prompts vs. good prompts

6.1 REST API

6.2 Laravel feature

6.3 Laravel refactoring

6.4 TypeScript / Node.js API

6.5 Business goal / performance

6.6 Hosting / infrastructure / DevOps

6.7 AI agent / prompt engineering

6.8 Frontend / React component

6.9 Database migration

6.10 Goal with no code – planning / research

7. Prompt checklist

8. Token and cost control

9. Combining ralph-loop with other tools

10. Further reading