Agents Development Guide

What is this about?

This guide explains how to design agent behavior for Codex and OpenAI-based developer workflows. It focuses on the practical choice: when to write AGENTS.md, when to create a skill, when to define a custom subagent, when MCP or plugins are the right extension point, and when to build a real application with the OpenAI Agents SDK.

1. Start with the right meaning of "agent"

"Agent" can mean several different things. Choose the smallest surface that matches the job.

Term	What it is	Best for
Main Codex agent	The active Codex session in app, CLI, or IDE	Normal repo work, implementation, review, debugging
`AGENTS.md`	Durable instructions loaded before Codex works	Repo conventions, commands, safety rules, language policy
Skill	Reusable workflow with instructions, references, and optional scripts	Repeatable tasks such as docs review, release checks, migration steps
Custom subagent	A named Codex worker with its own instructions and optional model/config defaults	Parallel exploration, review specialists, focused analysis workers
MCP server	Live tools and context exposed to Codex	External systems, private data, docs lookup, browser/Figma/GitHub/Sentry tools
Plugin	Installable package that can bundle skills, MCP config, hooks, and assets	Sharing reusable capabilities across machines, repos, or teams
OpenAI Agents SDK	Developer framework for agentic applications	Production agents with tools, handoffs, tracing, guardrails, and product-owned logic

Rule of thumb: use AGENTS.md for rules, skills for workflows, custom subagents for parallel specialists, MCP for live tools, plugins for distribution, and the Agents SDK for products.

2. Decision guide

Use this table before creating anything.

Need	Create	Why
Codex should always follow repository rules	`AGENTS.md`	It is automatically discovered and applies to every task in scope
A repeated task needs a consistent workflow	Skill	It gives Codex a reusable procedure without bloating every prompt
A task should be split into parallel specialists	Custom subagent	It keeps noisy exploration out of the main thread
Codex needs private or live external data	MCP server or connector	It gives Codex tool access instead of relying on memory or web search
A workflow should be installable by other people	Plugin	It packages skills and related integration metadata
You are building an agentic product	OpenAI Agents SDK	It gives you programmatic control, tracing, handoffs, and guardrails
You only need a one-off behavior	Prompt instructions	Persistent files would be unnecessary overhead

Do not turn every preference into an agent. Durable agent behavior should pay rent: it should reduce repeated prompting, prevent mistakes, or make a workflow easier to review.

3. Agent design checklist

Before writing a file, answer these questions:

Question	Good answer
What job should this agent do?	One sentence with a clear verb and object
When should it run?	Explicit trigger words, scope, and exclusions
What inputs does it need?	Files, branches, URLs, issue numbers, docs, environment values
What outputs should it produce?	Patch, report, checklist, summary, test result, decision
What tools may it use?	Shell, file edits, MCP tools, web, browser, external APIs
What must it avoid?	Destructive commands, secret exposure, broad refactors, unsupported paths
How is success verified?	Build, tests, lint, manual check, source citation, diff review
What should happen on uncertainty?	Ask, report assumptions, stop before risky actions, or use a fallback

Strong agents are narrow. A "reviewer for correctness, security, regressions, and missing tests" is useful. A "do everything better" agent is not.

4. `AGENTS.md`: durable rules for Codex

Use AGENTS.md when you want Codex to follow project rules automatically.

Good content:

repository structure,
build, test, lint, and validation commands,
code style and naming rules,
documentation language policy,
security and secret-handling rules,
review expectations,
local workflow notes such as WSL, Docker, package manager, or CI assumptions.

Avoid:

long tutorials,
stale product docs,
personal preferences that do not belong to the repo,
huge pasted READMEs,
rules that conflict with existing tooling.

Example:

# Repository Guidelines

## Documentation

- Write all source docs under `docs/` in English.
- Put German translations under `i18n/de/`.
- Keep code examples, comments, variable names, and command labels in English.

## Validation

- Run `yarn build` after documentation changes.
- Mention existing unrelated warnings in the final summary.

Placement

Scope	Location
Personal defaults	`~/.codex/AGENTS.md`
Temporary personal override	`~/.codex/AGENTS.override.md`
Whole repository	`AGENTS.md` in the Git root
Subtree-specific rules	`AGENTS.md` or `AGENTS.override.md` in that directory

Files closer to the current working directory are loaded later and can override broader guidance.

5. Skills: reusable workflows

Create a skill when Codex should follow the same workflow again and again.

Good skill examples:

"Review Docusaurus MDX changes for broken links and build risks."
"Prepare a release note from merged PRs."
"Migrate one API endpoint to the new auth helper."
"Audit a Terraform change for IAM risk."
"Create a plugin scaffold with the required manifest."

Minimal structure:

.agents/
  skills/
    docs-review/
      SKILL.md

SKILL.md:

---
name: docs-review
description: Review Docusaurus MDX changes for broken links, sidebar metadata, build risks, and translation drift.
---

1. Inspect changed MDX files and nearby docs conventions.
2. Check links, headings, frontmatter, admonitions, code fences, and i18n counterparts.
3. Run the documented validation command when practical.
4. Report correctness and build issues before style suggestions.

Writing good skill descriptions

Codex can choose a skill implicitly from its description, so write the description like a trigger:

name the exact task,
include important tools or file types,
mention boundaries,
keep it short enough to survive truncation when many skills exist.

Good:

description: Review Docusaurus MDX changes for links, frontmatter, sidebars, i18n, and build failures.

Weak:

description: Helps with docs.

When to add scripts or references

Keep a skill instruction-only until it needs deterministic behavior.

Add this	When
`references/`	The workflow needs stable examples, policies, templates, or schemas
`scripts/`	The workflow needs repeatable parsing, generation, validation, or API calls
`assets/`	The workflow needs templates, images, fixtures, or starter files
`agents/openai.yaml`	You want Codex app metadata, tool dependencies, or explicit invocation policy

Prefer scripts for mechanical checks and instructions for judgment. A script can find broken frontmatter quickly; Codex should still decide whether the resulting page reads well.

6. Custom subagents: parallel specialists

Create a custom subagent when a task benefits from a named worker with its own instructions. This is useful for parallel review, exploration, or analysis.

Good subagent examples:

reviewer: correctness, regressions, security, and missing tests.
explorer: read-heavy codebase mapping before implementation.
docs-auditor: docs structure, i18n drift, broken links, and MDX risks.
test-triage: failing test logs, likely root cause, and smallest fix path.
security-checker: secret exposure, auth boundaries, injection, and unsafe commands.

Personal agents live under:

~/.codex/agents/

Project agents live under:

.codex/agents/

Example custom agent:

name = "docs-auditor"
description = "Review documentation changes for structure, broken links, i18n drift, MDX issues, and validation gaps."
developer_instructions = """
Review documentation like a maintainer.
Prioritize broken builds, broken links, incorrect routing, stale translations, and missing validation.
Do not rewrite prose unless the user explicitly asks for edits.
Return findings with file references and a short validation summary.
"""
model_reasoning_effort = "medium"
sandbox_mode = "workspace-write"

Use it from a prompt:

Spawn a docs-auditor subagent to review this branch for Docusaurus and i18n issues. Wait for the result, then summarize only actionable findings with file references.

When not to use subagents

Avoid subagents when:

the task is small,
edits would overlap heavily,
you need one tightly controlled implementation thread,
token cost matters more than parallel speed,
the workflow needs sensitive approvals that are easier to monitor in one thread.

Subagents inherit the current sandbox and approval controls. Treat them as workers inside the same safety boundary, not as a way to bypass it.

7. MCP: tools and live context

Use MCP when an agent needs capabilities it cannot get from files alone.

Good MCP uses:

search internal docs,
read GitHub issues or PRs,
inspect a browser,
query Sentry,
access Figma designs,
call internal developer tools,
fetch official documentation.

MCP is different from a skill:

Skill	MCP
Tells Codex how to work	Gives Codex tools or data
Lives as files	Runs as a server or connector
Best for repeatable procedures	Best for live systems and private context
Can call scripts	Exposes typed tools

Example config:

[mcp_servers.context7]
command = "npx"
args = ["-y", "@upstash/context7-mcp"]
startup_timeout_sec = 20
tool_timeout_sec = 60

Good MCP server instructions should explain workflow constraints, rate limits, and tool boundaries. Keep the first part self-contained because Codex may use it while deciding whether to call the server.

Create a plugin when a skill or integration should be installable instead of copied manually.

Use plugins for:

sharing skills across teammates,
bundling multiple skills,
bundling MCP configuration,
shipping app mappings or UI metadata,
distributing lifecycle hooks,
creating a curated internal marketplace.

Minimal plugin shape:

my-plugin/
  .codex-plugin/
    plugin.json
  skills/
    docs-review/
      SKILL.md

plugin.json:

{
  "name": "docs-workflows",
  "version": "1.0.0",
  "description": "Reusable documentation review workflows.",
  "skills": "./skills/"
}

Start with local skills while you are still designing the workflow. Package a plugin once the workflow is stable enough to share.

9. OpenAI Agents SDK: build agentic products

Use the OpenAI Agents SDK when you are not just configuring Codex, but building your own agentic application.

Choose the Agents SDK when you need:

application-owned tools,
handoffs between agents,
guardrails,
traces and observability,
repeatable production behavior,
product-specific auth and data access,
tests and deployment around the agent.

Minimal pattern:

from agents import Agent, Runner, function_tool

@function_tool
def lookup_order(order_id: str) -> str:
    return "Order data from the application database"

support_agent = Agent(
    name="Support Agent",
    instructions=(
        "Answer using company policy. "
        "Use tools when the user asks about an order. "
        "Escalate when policy is missing."
    ),
    tools=[lookup_order],
)

result = Runner.run_sync(
    support_agent,
    "Summarize order A123 and suggest the next support action.",
)

print(result.final_output)

The key difference: Codex agents help you develop software. Agents SDK agents are software you develop.

10. Creation workflow

Use this sequence for every new agent capability.

Write the job statement.
Choose the smallest surface from the decision guide.
Draft the instructions in plain language.
Add only the tools and files the workflow truly needs.
Test with two easy prompts and one adversarial or ambiguous prompt.
Check whether the agent asks for clarification at the right time.
Verify outputs with build, tests, source citations, or review.
Move from prompt to AGENTS.md, skill, subagent, MCP, plugin, or SDK only after the behavior proves useful.

Example job statement:

When documentation files change, review the branch for Docusaurus build risks, broken links, stale German translations, and source-language policy violations. Return actionable findings first and mention validation performed.

That statement probably becomes a skill. If you want it to run in parallel with security and implementation review, also create a custom subagent that uses the skill.

11. Quality checklist

Before calling an agent "ready", check:

The name is specific and stable.
The description says when it should and should not run.
Instructions define inputs, outputs, tools, and boundaries.
Code examples and commands are in English.
It has a verification path.
It does not require secrets in source files.
It fails safely when data is missing.
It avoids broad refactors unless explicitly asked.
It can be tested with a small prompt.
Team-scoped behavior lives in the repo, not only in one person's home directory.

12. Recommended setup for this repository

For this Docusaurus repo, use this layering:

Need	Recommended location
Source docs must be English and translations live in i18n	Root `AGENTS.md`
Repeatable docs review workflow	`.agents/skills/docs-review/SKILL.md`
Parallel docs specialist for larger branches	`.codex/agents/docs-auditor.toml`
Official OpenAI docs lookup	MCP server or existing OpenAI docs skill
Shared team installation later	Plugin after the workflow stabilizes

Start with AGENTS.md plus one docs-review skill. Add custom subagents only when the branch is large enough that parallel review saves time.

13. Useful prompts

Create a skill:

Create a repo-local skill named docs-review. It should review changed MDX files for Docusaurus build risks, broken links, sidebar metadata, i18n drift, and source-language policy violations. Keep the skill instruction-only for now.

Create a custom subagent:

Create a project custom agent named docs-auditor. It should review documentation changes, use medium reasoning, avoid rewriting prose by default, and return actionable findings with file references.

Use subagents for review:

Review this branch with parallel subagents. Spawn one reviewer for correctness, one docs-auditor for Docusaurus and i18n, and one test-triage agent for validation gaps. Wait for all results and summarize only actionable findings.

Decide the right surface:

I want Codex to do this repeatedly: <workflow>. Should this be a prompt, AGENTS.md rule, skill, custom subagent, MCP server, plugin, or Agents SDK app? Recommend the smallest durable option and explain why.

1. Start with the right meaning of "agent"​

2. Decision guide​

3. Agent design checklist​

4. AGENTS.md: durable rules for Codex​

Placement​

5. Skills: reusable workflows​

Writing good skill descriptions​

When to add scripts or references​

6. Custom subagents: parallel specialists​

When not to use subagents​

7. MCP: tools and live context​

8. Plugins: package and share capabilities​

9. OpenAI Agents SDK: build agentic products​

10. Creation workflow​

11. Quality checklist​

12. Recommended setup for this repository​

13. Useful prompts​