OpenTelemetry for agents

Use OpenTelemetry for agents without losing the debugging evidence.

OpenTelemetry gives production teams a standard way to emit spans. Opswald adds the agent-specific evidence layer: prompts, context, tool decisions, retries, outputs, and replayable failure state.

By Opswald Team, OpenTelemetry and AI agent tracing specialists • Last updated May 18, 2026

Request Early Access → Read the debugging guides

agent-run.trace

01Prompt + retrieved context captured

02Planner chose tool with incomplete state

03Tool output contradicted the next decision

04Replay pins the first divergent step

Direct answer

What is OpenTelemetry for AI agents?

OpenTelemetry for AI agents means using standard traces, spans, metrics, and logs to correlate agent work across model calls, retrievers, tools, MCP servers, services, queues, and side effects. For debugging, teams also need agent-specific evidence—prompts, retrieved context, tool schemas, model outputs, retries, and replay fixtures—attached to those traces.

3 OpenTelemetry signals most agent teams correlate first traces, metrics, and logs in the OpenTelemetry signals model

4 agent boundaries that need span context model calls, retrieval, tool calls, and downstream service writes

1 trace ID to preserve across agent orchestration and tools OpenTelemetry trace context propagation model

OpenTelemetry traces Defines traces and spans for reconstructing work across distributed services. OpenTelemetry logs Explains log records and correlation with traces for production diagnostics. OpenTelemetry GenAI conventions Semantic conventions for describing generative AI operations in telemetry.

What breaks

Agent failures are rarely a single stack trace.

They happen across prompts, memory, retrieved documents, tool schemas, model choices, retries, and side effects. Opswald is built to make that chain inspectable instead of asking engineers to reconstruct it from logs.

Spans show timing, not intent

A span can show that a tool call happened. It usually does not explain why the agent selected that tool with the context it had.

Attributes get too thin

Teams often drop prompts, retrieved documents, and tool outputs because they are large, sensitive, or hard to model as simple attributes.

Semantic conventions are still evolving

AI agent workflows need stable naming for model calls, tool calls, memory, retrieval, MCP servers, retries, and side effects.

Replay is outside normal tracing

Debugging agents requires safe reproduction with pinned context and mocked mutations, not just a historical waterfall chart.

A practical OpenTelemetry pattern for agents

Keep OpenTelemetry as the transport and correlation layer, then attach rich agent evidence where engineers need it.

CorrelateUse trace IDs across HTTP requests, queues, model calls, tools, MCP servers, and background jobs.
AnnotateAdd agent-aware events for planning, retrieval, memory reads, tool selection, validation, retries, and final decisions.
ProtectRedact or reference sensitive prompts, documents, and tool outputs instead of forcing everything into span attributes.
ReplayPersist enough evidence to reproduce the failed path after the trace has explained where to look.

otel-agent-trace.md

trace_id: 4f2c...
span: agent.plan selected refund_lookup
span: tool.call refund_lookup(customer_id)
event: validation skipped idempotency_key
span: tool.call refund_create duplicated side effect
replay: pinned policy chunk + stubbed refund API

Practical debugging

Where OpenTelemetry needs an agent debugging layer

Prompt and context evidence

Tie every model span back to the exact instructions, retrieved content, and memory used for that decision.

Tool-call causality

Connect tool arguments and outputs to the model decision and downstream side effects they triggered.

Failure fixtures

Convert production traces into regression tests for prompts, tools, and orchestration code.

Comparison

Opswald vs traditional observability for AI agents

Capability Traditional logs and APM Opswald

Trace correlation OpenTelemetry connects services and timing across a request, but agent-specific evidence may live outside the span waterfall. Keeps trace IDs while attaching prompts, context, tool outputs, and replay state to the agent decision graph.

Sensitive evidence Teams often drop large prompts, documents, or tool outputs from span attributes to avoid leakage or cardinality problems. Stores redacted references and evidence bundles so authorized engineers can inspect what affected the decision.

Replay A distributed trace explains what happened historically but does not make the failed agent path reproducible. Turns the traced failure into a pinned replay fixture with safe tool stubs and side-effect receipts.

Keep reading

Related Opswald guides

AI agent tracingSee what evidence belongs in every production agent trace.MCP debuggingDebug agent failures that cross MCP server boundaries.Observability cannot debug agentsWhy normal observability needs an agent-specific layer.Why agent failures are invisibleSee why agent failures hide across context, tools, and delayed outcomes.DocsExplore Opswald concepts and integration guidance.

FAQ

Questions teams ask before instrumenting agents

Should AI agent teams still use OpenTelemetry?

Yes. OpenTelemetry is a good standard for correlation and spans. Opswald focuses on the agent-specific debugging evidence and replay workflow on top.

Can prompts be stored safely?

Teams should redact, hash, sample, or store references where needed. The key is preserving enough evidence for authorized engineers to debug the failure.

What should be added to spans for agent debugging?

Add correlation for model calls, retrieved context references, memory reads, tool schemas, tool arguments, outputs, retries, errors, and downstream writes.

How is Opswald different from an OpenTelemetry backend?

Opswald uses trace correlation, then adds an agent debugging workflow: decision graphs, evidence bundles, comparison, replay fixtures, and root-cause review.

Where should teams start?

Start by propagating trace context through the agent run, then capture model decisions, tool boundaries, and side-effect receipts for failed production runs.

Debug the next failed agent run with evidence.

Opswald is in early access for teams shipping AI agents that call tools, use MCP servers, or run multi-step workflows in production.

Request Early Access →