← Opswald home

Tool calling failures

Debug tool calls without guessing which step broke the run.

Opswald records the model decision, selected tool, arguments, schema, response, retry behavior, and downstream state so engineers can diagnose tool-calling failures quickly.

By Opswald Team, AI agent debugging infrastructure • Last updated May 18, 2026

agent-run.trace
01Prompt + retrieved context captured
02Planner chose tool with incomplete state
03Tool output contradicted the next decision
04Replay pins the first divergent step

What breaks

Agent failures are rarely a single stack trace.

They happen across prompts, memory, retrieved documents, tool schemas, model choices, retries, and side effects. Opswald is built to make that chain inspectable instead of asking engineers to reconstruct it from logs.

Wrong tool selected

The agent had the right capability available but chose a broader, stale, or unsafe tool because the plan looked plausible.

Malformed arguments

JSON validates in one layer but fails business validation, drops IDs, or coerces dates and enums unexpectedly.

Missing or partial output

Timeouts, pagination, empty results, and MCP transport errors become model context that looks authoritative.

Silent side effects

A tool succeeds, retries, or partially mutates state before the agent decides what to do next.

Tool-call debugging loop

Debugging starts before the exception. Compare what the model intended to do with the tool contract and the state returned to the next step.

  1. Inspect choiceReview the prompt, tool descriptions, available tools, and reasoning context around the selected action.
  2. Validate argsCheck generated arguments against schema, business rules, permissions, and idempotency requirements.
  3. Verify outputConfirm the tool response represented real state and was not partial, stale, truncated, or retried incorrectly.
  4. Trace impactFollow how the agent interpreted the output and whether subsequent decisions were justified by evidence.
tool-call-trace.json
tool: issue_refund
args: { customer_id, amount, reason }
missing: order_id, idempotency_key
response: 200 OK but retry created second mutation
fix: schema + replay fixture + side-effect receipt

Practical debugging

Common fixes after a tool-call investigation

Tighten schemas

Add required IDs, enum descriptions, units, and validation errors the model can recover from.

Make side effects explicit

Capture dry-run modes, idempotency keys, confirmation gates, and mutation receipts.

Replay with fixtures

Freeze the prompt, tool response, and state transition so the regression stays fixed.

FAQ

Questions teams ask before instrumenting agents

Can tool calls pass schema validation and still be wrong?

Absolutely. Many incidents come from valid JSON that violates product rules, permissions, idempotency, or the real user intent.

What should we capture for tool calls?

Capture the available tool list, selected tool, schema version, raw arguments, validation result, response, retries, latency, and side-effect receipts.

Debug the next failed agent run with evidence.

Opswald is in early access for teams shipping AI agents that call tools, use MCP servers, or run multi-step workflows in production.

Request Early Access →