OpenAI Agents SDK tracing

Trace OpenAI Agents SDK runs beyond the final response.

Opswald helps teams inspect OpenAI Agents SDK workflows across instructions, tool calls, handoffs, guardrails, model outputs, retries, and external side effects—then replay the failed path with evidence pinned.

By Opswald Team, OpenAI Agents SDK tracing specialists • Last updated June 29, 2026

Request Early Access → Read the debugging guides

agent-run.trace

01Prompt + retrieved context captured

02Planner chose tool with incomplete state

03Tool output contradicted the next decision

04Replay pins the first divergent step

Direct answer

What is OpenAI Agents SDK tracing?

OpenAI Agents SDK tracing records the steps inside an agent workflow: instructions, model generations, tool calls, handoffs, guardrails, errors, retries, and side effects. For production debugging, those traces should connect SDK events to application state and external systems so teams can replay the failed path safely.

4 SDK event groups to inspect during incidents generations, tool calls, handoffs, and guardrails

2 contexts needed for each handoff state sent by the previous agent and state received by the next agent

0 external writes repeated during safe replay Opswald replay uses stubs or sandboxed side effects

OpenAI Agents SDK tracing Documents traces for agent workflows, generations, tool calls, handoffs, guardrails, and custom spans. OpenAI Agents SDK tools Explains hosted tools, function tools, and agent-as-tool patterns that need debugging evidence. OpenAI Agents SDK handoffs Reference for handoff behavior and context transfer between agents.

What breaks

Agent failures are rarely a single stack trace.

They happen across prompts, memory, retrieved documents, tool schemas, model choices, retries, and side effects. Opswald is built to make that chain inspectable instead of asking engineers to reconstruct it from logs.

Tool choice needs context

When an agent calls the wrong tool, engineers need the instructions, input state, tool schema, model output, and validation result that led there.

Handoffs can obscure blame

A downstream agent may fail because an upstream handoff passed incomplete state or over-compressed context.

Guardrails need evidence

A guardrail failure is easier to fix when it is tied to the model output, tool output, and user-visible behavior it protected.

Production runs are hard to replay

Provider logs do not usually include your application state, external API results, or safe stubs for side effects.

How to trace OpenAI Agents SDK workflows

Capture each agent step as a decision with inputs, outputs, tools, handoffs, guardrails, and effects attached.

InstructionsRecord system instructions, developer messages, user input, active agent configuration, and run metadata.
ToolsCapture tool schemas, selected tool, arguments, validation, output, errors, retries, and side effects.
HandoffsPreserve the context passed between agents and the reason the handoff occurred.
ReplayPin model inputs and tool outputs so failed paths can become regression tests.

openai-agents-run.trace

agent: support_triage
handoff: billing_agent with missing account tier
tool: invoice_lookup(account_id) returned stale cache
guardrail: confidence_check passed incorrectly
fix: handoff contract + cache freshness check + replay fixture

Practical debugging

OpenAI Agents SDK debugging questions Opswald answers

Why this tool?

Connect tool selection back to the instructions, context, and model output that selected it.

Why this handoff?

Inspect what state moved between agents and whether the receiving agent had enough context.

Why did the guardrail fire?

Tie guardrail events to the exact content, tool output, or decision they evaluated.

Comparison

Opswald vs traditional observability for AI agents

Capability Traditional logs and APM Opswald

SDK visibility SDK tracing shows agent events, but the application state, downstream APIs, and side-effect receipts may sit outside the provider trace. Joins SDK events with product state, tool outputs, permissions, retries, and replay evidence.

Handoff debugging Teams inspect final outputs and individual agent logs to infer what context moved between agents. Shows the handoff reason, transferred state, receiving-agent context, and first unsupported assumption.

Regression safety A prompt or guardrail fix is tested manually against a few examples. Promotes the failed SDK trace into a replay fixture for instructions, tools, handoffs, and guardrails.

Keep reading

Related Opswald guides

AI agent tracingThe core evidence model for production agent traces.AI agent debuggingUse traces to move from symptom to root cause.Debug tool calling failuresFind schema, argument, output, and side-effect bugs.Why observability cannot debug agentsUnderstand why SDK traces still need application state and replay evidence.LangSmith alternativeCompare framework-neutral production debugging requirements.

FAQ

Questions teams ask before instrumenting agents

Does this replace OpenAI tracing?

Opswald is focused on production debugging across your full application context: instructions, tools, handoffs, external APIs, side effects, and replay.

What should be captured from the Agents SDK?

Capture agent config, instructions, model inputs and outputs, tool calls, handoffs, guardrails, errors, retries, and writes to external systems.

How should teams use OpenAI Agents SDK tracing docs during incidents?

Use the tracing docs to identify the SDK event boundary, then attach the production context the docs cannot know: session state, permissions, tool outputs, handoff payloads, guardrail decisions, retries, and side-effect receipts.

How do you debug a wrong tool call?

Inspect the instructions, available tools, selected schema, generated arguments, validation result, tool output, retry behavior, and side effects.

What makes handoffs hard to debug?

The receiving agent may act on compressed, stale, or incomplete state. Tracing should show both the handoff reason and the exact context transferred.

Can failed SDK runs become tests?

Yes. Preserve the model inputs, tool outputs, handoff context, guardrail decisions, and side-effect receipts, then replay them as regression fixtures.

Debug the next failed agent run with evidence.

Opswald is in early access for teams shipping AI agents that call tools, use MCP servers, or run multi-step workflows in production.

Request Early Access →