Skip to main content

Agents Development Guide

What is this about?

This guide explains how to design agent behavior for Codex and OpenAI-based developer workflows. It focuses on the practical choice: when to write AGENTS.md, when to create a skill, when to define a custom subagent, when MCP or plugins are the right extension point, and when to build a real application with the OpenAI Agents SDK.

1. Start with the right meaning of "agent"​

"Agent" can mean several different things. Choose the smallest surface that matches the job.

TermWhat it isBest for
Main Codex agentThe active Codex session in app, CLI, or IDENormal repo work, implementation, review, debugging
AGENTS.mdDurable instructions loaded before Codex worksRepo conventions, commands, safety rules, language policy
SkillReusable workflow with instructions, references, and optional scriptsRepeatable tasks such as docs review, release checks, migration steps
Custom subagentA named Codex worker with its own instructions and optional model/config defaultsParallel exploration, review specialists, focused analysis workers
MCP serverLive tools and context exposed to CodexExternal systems, private data, docs lookup, browser/Figma/GitHub/Sentry tools
PluginInstallable package that can bundle skills, MCP config, hooks, and assetsSharing reusable capabilities across machines, repos, or teams
OpenAI Agents SDKDeveloper framework for agentic applicationsProduction agents with tools, handoffs, tracing, guardrails, and product-owned logic

Rule of thumb: use AGENTS.md for rules, skills for workflows, custom subagents for parallel specialists, MCP for live tools, plugins for distribution, and the Agents SDK for products.

2. Decision guide​

Use this table before creating anything.

NeedCreateWhy
Codex should always follow repository rulesAGENTS.mdIt is automatically discovered and applies to every task in scope
A repeated task needs a consistent workflowSkillIt gives Codex a reusable procedure without bloating every prompt
A task should be split into parallel specialistsCustom subagentIt keeps noisy exploration out of the main thread
Codex needs private or live external dataMCP server or connectorIt gives Codex tool access instead of relying on memory or web search
A workflow should be installable by other peoplePluginIt packages skills and related integration metadata
You are building an agentic productOpenAI Agents SDKIt gives you programmatic control, tracing, handoffs, and guardrails
You only need a one-off behaviorPrompt instructionsPersistent files would be unnecessary overhead

Do not turn every preference into an agent. Durable agent behavior should pay rent: it should reduce repeated prompting, prevent mistakes, or make a workflow easier to review.

3. Agent design checklist​

Before writing a file, answer these questions:

QuestionGood answer
What job should this agent do?One sentence with a clear verb and object
When should it run?Explicit trigger words, scope, and exclusions
What inputs does it need?Files, branches, URLs, issue numbers, docs, environment values
What outputs should it produce?Patch, report, checklist, summary, test result, decision
What tools may it use?Shell, file edits, MCP tools, web, browser, external APIs
What must it avoid?Destructive commands, secret exposure, broad refactors, unsupported paths
How is success verified?Build, tests, lint, manual check, source citation, diff review
What should happen on uncertainty?Ask, report assumptions, stop before risky actions, or use a fallback

Strong agents are narrow. A "reviewer for correctness, security, regressions, and missing tests" is useful. A "do everything better" agent is not.

4. AGENTS.md: durable rules for Codex​

Use AGENTS.md when you want Codex to follow project rules automatically.

Good content:

  • repository structure,
  • build, test, lint, and validation commands,
  • code style and naming rules,
  • documentation language policy,
  • security and secret-handling rules,
  • review expectations,
  • local workflow notes such as WSL, Docker, package manager, or CI assumptions.

Avoid:

  • long tutorials,
  • stale product docs,
  • personal preferences that do not belong to the repo,
  • huge pasted READMEs,
  • rules that conflict with existing tooling.

Example:

# Repository Guidelines

## Documentation

- Write all source docs under `docs/` in English.
- Put German translations under `i18n/de/`.
- Keep code examples, comments, variable names, and command labels in English.

## Validation

- Run `yarn build` after documentation changes.
- Mention existing unrelated warnings in the final summary.

Placement​

ScopeLocation
Personal defaults~/.codex/AGENTS.md
Temporary personal override~/.codex/AGENTS.override.md
Whole repositoryAGENTS.md in the Git root
Subtree-specific rulesAGENTS.md or AGENTS.override.md in that directory

Files closer to the current working directory are loaded later and can override broader guidance.

5. Skills: reusable workflows​

Create a skill when Codex should follow the same workflow again and again.

Good skill examples:

  • "Review Docusaurus MDX changes for broken links and build risks."
  • "Prepare a release note from merged PRs."
  • "Migrate one API endpoint to the new auth helper."
  • "Audit a Terraform change for IAM risk."
  • "Create a plugin scaffold with the required manifest."

Minimal structure:

.agents/
skills/
docs-review/
SKILL.md

SKILL.md:

---
name: docs-review
description: Review Docusaurus MDX changes for broken links, sidebar metadata, build risks, and translation drift.
---

1. Inspect changed MDX files and nearby docs conventions.
2. Check links, headings, frontmatter, admonitions, code fences, and i18n counterparts.
3. Run the documented validation command when practical.
4. Report correctness and build issues before style suggestions.

Writing good skill descriptions​

Codex can choose a skill implicitly from its description, so write the description like a trigger:

  • name the exact task,
  • include important tools or file types,
  • mention boundaries,
  • keep it short enough to survive truncation when many skills exist.

Good:

description: Review Docusaurus MDX changes for links, frontmatter, sidebars, i18n, and build failures.

Weak:

description: Helps with docs.

When to add scripts or references​

Keep a skill instruction-only until it needs deterministic behavior.

Add thisWhen
references/The workflow needs stable examples, policies, templates, or schemas
scripts/The workflow needs repeatable parsing, generation, validation, or API calls
assets/The workflow needs templates, images, fixtures, or starter files
agents/openai.yamlYou want Codex app metadata, tool dependencies, or explicit invocation policy

Prefer scripts for mechanical checks and instructions for judgment. A script can find broken frontmatter quickly; Codex should still decide whether the resulting page reads well.

6. Custom subagents: parallel specialists​

Create a custom subagent when a task benefits from a named worker with its own instructions. This is useful for parallel review, exploration, or analysis.

Good subagent examples:

  • reviewer: correctness, regressions, security, and missing tests.
  • explorer: read-heavy codebase mapping before implementation.
  • docs-auditor: docs structure, i18n drift, broken links, and MDX risks.
  • test-triage: failing test logs, likely root cause, and smallest fix path.
  • security-checker: secret exposure, auth boundaries, injection, and unsafe commands.

Personal agents live under:

~/.codex/agents/

Project agents live under:

.codex/agents/

Example custom agent:

name = "docs-auditor"
description = "Review documentation changes for structure, broken links, i18n drift, MDX issues, and validation gaps."
developer_instructions = """
Review documentation like a maintainer.
Prioritize broken builds, broken links, incorrect routing, stale translations, and missing validation.
Do not rewrite prose unless the user explicitly asks for edits.
Return findings with file references and a short validation summary.
"""
model_reasoning_effort = "medium"
sandbox_mode = "workspace-write"

Use it from a prompt:

Spawn a docs-auditor subagent to review this branch for Docusaurus and i18n issues. Wait for the result, then summarize only actionable findings with file references.

When not to use subagents​

Avoid subagents when:

  • the task is small,
  • edits would overlap heavily,
  • you need one tightly controlled implementation thread,
  • token cost matters more than parallel speed,
  • the workflow needs sensitive approvals that are easier to monitor in one thread.

Subagents inherit the current sandbox and approval controls. Treat them as workers inside the same safety boundary, not as a way to bypass it.

7. MCP: tools and live context​

Use MCP when an agent needs capabilities it cannot get from files alone.

Good MCP uses:

  • search internal docs,
  • read GitHub issues or PRs,
  • inspect a browser,
  • query Sentry,
  • access Figma designs,
  • call internal developer tools,
  • fetch official documentation.

MCP is different from a skill:

SkillMCP
Tells Codex how to workGives Codex tools or data
Lives as filesRuns as a server or connector
Best for repeatable proceduresBest for live systems and private context
Can call scriptsExposes typed tools

Example config:

[mcp_servers.context7]
command = "npx"
args = ["-y", "@upstash/context7-mcp"]
startup_timeout_sec = 20
tool_timeout_sec = 60

Good MCP server instructions should explain workflow constraints, rate limits, and tool boundaries. Keep the first part self-contained because Codex may use it while deciding whether to call the server.

8. Plugins: package and share capabilities​

Create a plugin when a skill or integration should be installable instead of copied manually.

Use plugins for:

  • sharing skills across teammates,
  • bundling multiple skills,
  • bundling MCP configuration,
  • shipping app mappings or UI metadata,
  • distributing lifecycle hooks,
  • creating a curated internal marketplace.

Minimal plugin shape:

my-plugin/
.codex-plugin/
plugin.json
skills/
docs-review/
SKILL.md

plugin.json:

{
"name": "docs-workflows",
"version": "1.0.0",
"description": "Reusable documentation review workflows.",
"skills": "./skills/"
}

Start with local skills while you are still designing the workflow. Package a plugin once the workflow is stable enough to share.

9. OpenAI Agents SDK: build agentic products​

Use the OpenAI Agents SDK when you are not just configuring Codex, but building your own agentic application.

Choose the Agents SDK when you need:

  • application-owned tools,
  • handoffs between agents,
  • guardrails,
  • traces and observability,
  • repeatable production behavior,
  • product-specific auth and data access,
  • tests and deployment around the agent.

Minimal pattern:

from agents import Agent, Runner, function_tool

@function_tool
def lookup_order(order_id: str) -> str:
return "Order data from the application database"

support_agent = Agent(
name="Support Agent",
instructions=(
"Answer using company policy. "
"Use tools when the user asks about an order. "
"Escalate when policy is missing."
),
tools=[lookup_order],
)

result = Runner.run_sync(
support_agent,
"Summarize order A123 and suggest the next support action.",
)

print(result.final_output)

The key difference: Codex agents help you develop software. Agents SDK agents are software you develop.

10. Creation workflow​

Use this sequence for every new agent capability.

  1. Write the job statement.
  2. Choose the smallest surface from the decision guide.
  3. Draft the instructions in plain language.
  4. Add only the tools and files the workflow truly needs.
  5. Test with two easy prompts and one adversarial or ambiguous prompt.
  6. Check whether the agent asks for clarification at the right time.
  7. Verify outputs with build, tests, source citations, or review.
  8. Move from prompt to AGENTS.md, skill, subagent, MCP, plugin, or SDK only after the behavior proves useful.

Example job statement:

When documentation files change, review the branch for Docusaurus build risks, broken links, stale German translations, and source-language policy violations. Return actionable findings first and mention validation performed.

That statement probably becomes a skill. If you want it to run in parallel with security and implementation review, also create a custom subagent that uses the skill.

11. Quality checklist​

Before calling an agent "ready", check:

  • The name is specific and stable.
  • The description says when it should and should not run.
  • Instructions define inputs, outputs, tools, and boundaries.
  • Code examples and commands are in English.
  • It has a verification path.
  • It does not require secrets in source files.
  • It fails safely when data is missing.
  • It avoids broad refactors unless explicitly asked.
  • It can be tested with a small prompt.
  • Team-scoped behavior lives in the repo, not only in one person's home directory.

For this Docusaurus repo, use this layering:

NeedRecommended location
Source docs must be English and translations live in i18nRoot AGENTS.md
Repeatable docs review workflow.agents/skills/docs-review/SKILL.md
Parallel docs specialist for larger branches.codex/agents/docs-auditor.toml
Official OpenAI docs lookupMCP server or existing OpenAI docs skill
Shared team installation laterPlugin after the workflow stabilizes

Start with AGENTS.md plus one docs-review skill. Add custom subagents only when the branch is large enough that parallel review saves time.

13. Useful prompts​

Create a skill:

Create a repo-local skill named docs-review. It should review changed MDX files for Docusaurus build risks, broken links, sidebar metadata, i18n drift, and source-language policy violations. Keep the skill instruction-only for now.

Create a custom subagent:

Create a project custom agent named docs-auditor. It should review documentation changes, use medium reasoning, avoid rewriting prose by default, and return actionable findings with file references.

Use subagents for review:

Review this branch with parallel subagents. Spawn one reviewer for correctness, one docs-auditor for Docusaurus and i18n, and one test-triage agent for validation gaps. Wait for all results and summarize only actionable findings.

Decide the right surface:

I want Codex to do this repeatedly: <workflow>. Should this be a prompt, AGENTS.md rule, skill, custom subagent, MCP server, plugin, or Agents SDK app? Recommend the smallest durable option and explain why.