Debug Skyvern task and workflow failures locally
Skyvern workflows can fail after several browser decisions. BrowserTrace wraps task and workflow calls so you can inspect inputs, outputs, status, errors, and export a trace for review.
Improving this guide or a Skyvern adapter note? Use the First PR Recipe to keep the first contribution small and reviewable.
Try the trace viewer first
uvx --from "browsertrace[ui]" browsertrace doctor
uvx --from "browsertrace[ui]" browsertrace demo
uvx --from "browsertrace[ui]" browsertrace
Open http://127.0.0.1:3000 and inspect the failed checkout-agent run before wiring a real Skyvern client.
Wrap a Skyvern-shaped client
from skyvern import Skyvern
from browsertrace import Tracer
from browsertrace.integrations.skyvern import wrap_skyvern
tracer = Tracer()
skyvern = wrap_skyvern(Skyvern(...), tracer, name="skyvern invoice run")
await skyvern.run_task(
url="https://example.com",
prompt="extract the invoice total",
wait_for_completion=True,
)
skyvern.close()
The wrapper records high-level run_task and run_workflow calls without importing Skyvern at BrowserTrace install time.
What the trace captures
- The Skyvern method, task prompt, URL, and keyword arguments.
- The returned task/workflow payload when available.
- Run IDs and status fields when the response exposes them.
- Exception type and message if the task call fails.
- Exportable HTML with optional prompt/model I/O redaction.
Debug action confidence and authorization
For Skyvern-style browser agents, confidence is diagnostic. It is not the same as authorized execution, and it does not prove the selected action is correct. The failure to watch for is high confidence plus the wrong target, stale context, incomplete browser state, or an unsafe action boundary.
When Skyvern or a deeper adapter exposes the data, preserve three linked records for consequential actions:
- action proposal: intended operation, target metadata, confidence, screenshot or DOM evidence, and candidate element references.
- authorization decision: policy or scope checks, required approvals, allowed, blocked, or escalated status, and reason.
- execution result: what actually happened, URL or state delta, status, error, and retry decision.
Those linked IDs let a reviewer separate "high confidence plus allowed plus wrong target" from "high confidence plus blocked" or "low confidence plus escalated". That makes policy tuning possible without treating model confidence itself as the policy.
Related community case: Skyvern-AI/skyvern#5637.
Debug VNC and CDP evidence together
When a Skyvern workflow needs both VNC visual debugging and CDP browser-state capture, keep both streams linked to the same task or workflow step. Separate screenshots, DOM summaries, console or network snippets, and performance data are much less useful if the trace cannot show which action window they explain.
- Record connect/probe start, success, timeout, or cleanup events before the first browser action.
- Link VNC screenshot or recording artifact ids with CDP DOM snapshot or selected-element summaries for the same time window.
- Preserve task id, workflow id, step id, URL, frame/page id, action/tool name, status, error, and retry or recovery decision.
- Track redaction state for screenshots, URLs, headers, cookies, and form values before public export.
In BrowserTrace terms, connection and debug-session lifecycle is trace evidence too. A failed run can start before a screenshot exists if the browser session never attaches cleanly.
Related community case: Skyvern-AI/skyvern#3260.
Debug multi-session VNC and Take Control drift
In local or self-hosted deployments, a VNC stream can become a shared UI resource while workflows, CDP targets, and manual control state are per-session concerns. When those identities are implicit, a reviewer cannot tell whether the browser failed, the stream attached to the wrong display, or Take Control was lost after reconnect.
- Preserve VNC stream identity, CDP target identity, workflow id, task id, browser session id, browser context id, page id, and redacted display id.
- Record the manual-control lease: agent/manual owner, acquire/renew/release timestamps, persisted-across-reconnect flag, and release reason.
- Store isolation metadata: own X display, container, or browser context versus a shared display.
- Classify failure causes such as no VNC server, connected without frames, stale VNC stream, manual-control lease lost, or display conflict.
Related community case: Skyvern-AI/skyvern#4392.
Share only what is safe
browsertrace list
browsertrace export <run_id> -o full.html
browsertrace export <run_id> --redact -o public.html
browsertrace export <run_id> --public -o public.html
Use --public before attaching a real trace to a public issue or community thread. Use individual redaction flags when you want to keep some fields visible.
For a compact checklist, see the share-safe export recipe.