Summary
When the dev UI/app is online to the local Agents server but the model provider is unreachable (for example no Internet during a demo, Anthropic unreachable, provider request hangs), the UI can remain in a long-running "thinking" state with poor feedback.
We should add default model-provider timeout/error handling in the Pi adapter/runtime path and surface a clear durable error to the UI.
Current behavior / code paths
Electric Agents uses Pi in:
packages/agents-runtime/src/pi-adapter.ts
- constructs
new Agent(...) from @mariozechner/pi-agent-core
- resolves models via
getModel(...) from @mariozechner/pi-ai
- subscribes to Pi events and maps them to Electric runtime events
packages/agents-runtime/src/context-factory.ts
- calls
handle.run(runInput, config.runSignal) inside ctx.agent.run()
packages/agents-runtime/src/outbound-bridge.ts
- writes
runs, steps, texts, toolCalls
- maps
finishReason === 'error' to run status: 'failed'
packages/agents-runtime/src/process-wake.ts
- catches handler failures and writes an
errors row with error_code: 'HANDLER_FAILED'
packages/agents-server-ui/src/components/AgentResponse.tsx
- already renders run/errors rows inline
The UI can already render failures once the runtime writes them. The likely missing piece is making provider hangs fail fast enough and classifying failures into useful messages.
Upstream Pi research
Current upstream Pi repo:
Relevant upstream files:
packages/ai/src/types.ts
- defines stream options including
signal?: AbortSignal, timeoutMs?: number, maxRetries?: number, maxRetryDelayMs?: number
packages/ai/src/stream.ts
- passes stream options through to providers
packages/ai/src/providers/anthropic.ts
- passes
signal to SDK request options
- maps
timeoutMs to Anthropic SDK timeout
- supports
maxRetries
- emits terminal error/aborted stream events
packages/agent/src/types.ts
- stream contract expects failures to be encoded as stream protocol events and final assistant messages with
stopReason: 'error' | 'aborted' and errorMessage
packages/agent/src/agent.ts
- agent has an internal
AbortController per run and stores run failures in state/error messages
packages/agent/src/agent-loop.ts
- passes the active run
signal into streamSimple/custom stream functions
packages/agent/src/harness/agent-harness.ts
- shows wrapper
streamFn injecting timeoutMs, retry settings, auth, headers, signal, etc.
Related upstream issues:
Takeaway: upstream Pi has useful primitives (timeoutMs, AbortSignal, retry settings, terminal error events), but does not appear to provide a rich normalized taxonomy like offline | timeout | auth | rate_limit | provider_unavailable. Electric should use Pi's primitives and add Electric-specific classification/messages at the adapter/runtime boundary.
Goals
-
Provider calls should not leave the UI hanging indefinitely.
-
If a model provider is unreachable/offline/timed out, the run should settle as failed.
-
The UI should show a clear message, e.g.:
Could not reach Anthropic. Check your Internet connection or Anthropic status.
-
Preserve the original provider error details for debugging/logs.
-
Keep behavior configurable for development and future production use.
Non-goals
- Do not implement a browser-side Internet/offline detector as the main solution.
- Do not make the UI guess provider state from client connectivity.
- Do not replace Pi's stream/error contract.
- Do not hide provider error details entirely.
The server/runtime is the right place to know whether model calls are timing out or failing.
Proposed implementation
1. Add default model provider timeout/retry settings
In packages/agents-runtime/src/pi-adapter.ts, ensure the Pi stream path receives defaults such as:
const DEFAULT_MODEL_TIMEOUT_MS = 30_000
const DEFAULT_MODEL_MAX_RETRIES = 0
Use upstream Pi options where available:
timeoutMs: DEFAULT_MODEL_TIMEOUT_MS,
maxRetries: DEFAULT_MODEL_MAX_RETRIES,
signal: abortSignal,
If the currently installed @mariozechner/pi-ai / @mariozechner/pi-agent-core version does not expose timeoutMs, fall back to composing an AbortController timeout around the existing run signal.
Possible env/config knobs:
ELECTRIC_AGENTS_MODEL_TIMEOUT_MS=30000
ELECTRIC_AGENTS_MODEL_MAX_RETRIES=0
Open question: should these live in AgentConfig, runtime config, env vars, or all of the above?
2. Ensure provider errors terminate the run
pi-adapter.ts already handles message_end with:
const isError =
msg?.stopReason === `error` ||
(!!msg?.errorMessage && msg.stopReason !== `aborted`)
and throws:
throw new Error(
`pi-agent message_end error: ${msg.errorMessage ?? `unknown error`} ...`
)
Verify that provider timeout/offline failures reliably produce one of:
message_end with stopReason: 'error' and errorMessage
- rejected
agent.prompt(...) / agent.continue() promise
- abort path via timeout signal
In all cases, the run should call bridge.onRunEnd({ finishReason: 'error' }) or bridge.onRunEnd({ finishReason: 'aborted' }) and not stay streaming forever.
3. Add Electric-specific error classification
Add a small classifier near the adapter/runtime boundary, for example in pi-adapter.ts or a new runtime utility:
type ModelProviderErrorCode =
| 'MODEL_PROVIDER_TIMEOUT'
| 'MODEL_PROVIDER_UNREACHABLE'
| 'MODEL_PROVIDER_AUTH_FAILED'
| 'MODEL_PROVIDER_RATE_LIMITED'
| 'MODEL_PROVIDER_UNAVAILABLE'
| 'MODEL_PROVIDER_ERROR'
Classification can start string/error based:
- timeout:
AbortError, TimeoutError, timeout, timed out
- offline/network:
ENOTFOUND, ECONNREFUSED, ECONNRESET, EAI_AGAIN, fetch failed, network, Failed to fetch
- auth:
401, invalid api key, authentication, unauthorized
- rate limit:
- provider unavailable:
502, 503, 504, overloaded, unavailable
- fallback:
4. Surface a clearer durable error
Currently process-wake.ts catches handler errors and writes:
error_code: `HANDLER_FAILED`,
message: errMsg,
We should preserve compatibility but expose model-provider errors more clearly.
Options:
Option A: throw a classified error and let process-wake.ts map it
Create a runtime error class:
class ModelProviderError extends Error {
code: ModelProviderErrorCode
provider?: string
model?: string
cause?: unknown
}
Then process-wake.ts can write:
error_code: error instanceof ModelProviderError
? error.code
: 'HANDLER_FAILED'
message: error.message
Option B: write an error row directly from the adapter
This is probably less clean because pi-adapter.ts currently writes run/step/text/tool events through OutboundBridge, not generic runtime errors.
Recommendation: Option A.
5. Make UI messaging friendly
AgentResponse.tsx already renders:
✗ {error_code}: {message}
A minimal first slice can rely on this.
A follow-up could map specific error codes to friendlier copy or hide noisy internals. Example:
Could not reach Anthropic. Check your Internet connection or Anthropic status.
Instead of:
MODEL_PROVIDER_UNREACHABLE: fetch failed ENOTFOUND api.anthropic.com
Example desired behavior
If the dev app is running locally but the machine has no Internet:
- User sends a message to Horton.
- UI shows thinking.
- Runtime starts model call with timeout.
- Provider call fails/times out.
- Runtime marks the run failed.
- UI exits thinking state and shows:
Could not reach Anthropic. Check your Internet connection or Anthropic status.
The wake should close normally after recording the error.
Testing plan
Unit tests
Add tests around error classification:
classifyModelProviderError(new Error('fetch failed'))
// MODEL_PROVIDER_UNREACHABLE
classifyModelProviderError(new Error('timeout'))
// MODEL_PROVIDER_TIMEOUT
classifyModelProviderError(new Error('401 invalid api key'))
// MODEL_PROVIDER_AUTH_FAILED
classifyModelProviderError(new Error('429 rate limit'))
// MODEL_PROVIDER_RATE_LIMITED
Adapter tests
Mock/override streamFn or Pi agent stream behavior to simulate:
- no response until timeout
- rejected provider promise
- terminal
message_end with stopReason: 'error'
- aborted run
Assert:
- run is marked
failed for provider errors
- run is marked completed/aborted for explicit aborts as appropriate
- no indefinite pending state
- classified error is written or thrown through the process-wake path
UI smoke test
Create an entity run that writes a classified error and verify AgentResponse.tsx renders it.
Open questions
-
What should the default timeout be?
- 30s is demo-friendly.
- Longer may be safer for slow providers/models.
-
Should timeout be per model/provider?
- Some reasoning models may legitimately take longer before first token.
-
Should timeout mean time-to-first-event or total model call duration?
- For the demo offline case, time-to-first-event timeout is probably sufficient.
- A separate max total run duration could be useful later.
-
Should retries default to 0?
- Upstream Anthropic provider appears to default retries to 0 in current code.
- For demos/offline handling, retries can make failures feel like hangs.
-
Should classified provider errors be represented in runs, steps, errors, or all of the above?
- Today UI can read run errors. Need to confirm the best durable shape.
-
Should this be configured through AgentConfig?
Example:
ctx.useAgent({
model,
provider,
modelTimeoutMs: 30_000,
modelMaxRetries: 0,
})
Or keep as runtime/env config first.
Acceptance criteria
- Model-provider calls have a default timeout or equivalent abort mechanism.
- Anthropic/OpenAI unreachable/offline failures do not leave the UI thinking indefinitely.
- A failed provider call settles the run and closes the wake.
- The entity timeline contains a clear durable error code/message.
- The UI shows a useful error without requiring a page refresh.
- Existing explicit abort/SIGINT behavior remains correct.
Summary
When the dev UI/app is online to the local Agents server but the model provider is unreachable (for example no Internet during a demo, Anthropic unreachable, provider request hangs), the UI can remain in a long-running "thinking" state with poor feedback.
We should add default model-provider timeout/error handling in the Pi adapter/runtime path and surface a clear durable error to the UI.
Current behavior / code paths
Electric Agents uses Pi in:
packages/agents-runtime/src/pi-adapter.tsnew Agent(...)from@mariozechner/pi-agent-coregetModel(...)from@mariozechner/pi-aipackages/agents-runtime/src/context-factory.tshandle.run(runInput, config.runSignal)insidectx.agent.run()packages/agents-runtime/src/outbound-bridge.tsruns,steps,texts,toolCallsfinishReason === 'error'to runstatus: 'failed'packages/agents-runtime/src/process-wake.tserrorsrow witherror_code: 'HANDLER_FAILED'packages/agents-server-ui/src/components/AgentResponse.tsxThe UI can already render failures once the runtime writes them. The likely missing piece is making provider hangs fail fast enough and classifying failures into useful messages.
Upstream Pi research
Current upstream Pi repo:
Relevant upstream files:
packages/ai/src/types.tssignal?: AbortSignal,timeoutMs?: number,maxRetries?: number,maxRetryDelayMs?: numberpackages/ai/src/stream.tspackages/ai/src/providers/anthropic.tssignalto SDK request optionstimeoutMsto Anthropic SDKtimeoutmaxRetriespackages/agent/src/types.tsstopReason: 'error' | 'aborted'anderrorMessagepackages/agent/src/agent.tsAbortControllerper run and stores run failures in state/error messagespackages/agent/src/agent-loop.tssignalintostreamSimple/custom stream functionspackages/agent/src/harness/agent-harness.tsstreamFninjectingtimeoutMs, retry settings, auth, headers, signal, etc.Related upstream issues:
Takeaway: upstream Pi has useful primitives (
timeoutMs,AbortSignal, retry settings, terminal error events), but does not appear to provide a rich normalized taxonomy likeoffline | timeout | auth | rate_limit | provider_unavailable. Electric should use Pi's primitives and add Electric-specific classification/messages at the adapter/runtime boundary.Goals
Provider calls should not leave the UI hanging indefinitely.
If a model provider is unreachable/offline/timed out, the run should settle as failed.
The UI should show a clear message, e.g.:
Preserve the original provider error details for debugging/logs.
Keep behavior configurable for development and future production use.
Non-goals
The server/runtime is the right place to know whether model calls are timing out or failing.
Proposed implementation
1. Add default model provider timeout/retry settings
In
packages/agents-runtime/src/pi-adapter.ts, ensure the Pi stream path receives defaults such as:Use upstream Pi options where available:
If the currently installed
@mariozechner/pi-ai/@mariozechner/pi-agent-coreversion does not exposetimeoutMs, fall back to composing anAbortControllertimeout around the existing run signal.Possible env/config knobs:
Open question: should these live in
AgentConfig, runtime config, env vars, or all of the above?2. Ensure provider errors terminate the run
pi-adapter.tsalready handlesmessage_endwith:and throws:
Verify that provider timeout/offline failures reliably produce one of:
message_endwithstopReason: 'error'anderrorMessageagent.prompt(...)/agent.continue()promiseIn all cases, the run should call
bridge.onRunEnd({ finishReason: 'error' })orbridge.onRunEnd({ finishReason: 'aborted' })and not stay streaming forever.3. Add Electric-specific error classification
Add a small classifier near the adapter/runtime boundary, for example in
pi-adapter.tsor a new runtime utility:Classification can start string/error based:
AbortError,TimeoutError,timeout,timed outENOTFOUND,ECONNREFUSED,ECONNRESET,EAI_AGAIN,fetch failed,network,Failed to fetch401,invalid api key,authentication,unauthorized429,rate limit502,503,504,overloaded,unavailableMODEL_PROVIDER_ERROR4. Surface a clearer durable error
Currently
process-wake.tscatches handler errors and writes:We should preserve compatibility but expose model-provider errors more clearly.
Options:
Option A: throw a classified error and let
process-wake.tsmap itCreate a runtime error class:
Then
process-wake.tscan write:Option B: write an error row directly from the adapter
This is probably less clean because
pi-adapter.tscurrently writes run/step/text/tool events throughOutboundBridge, not generic runtime errors.Recommendation: Option A.
5. Make UI messaging friendly
AgentResponse.tsxalready renders:A minimal first slice can rely on this.
A follow-up could map specific error codes to friendlier copy or hide noisy internals. Example:
Instead of:
Example desired behavior
If the dev app is running locally but the machine has no Internet:
The wake should close normally after recording the error.
Testing plan
Unit tests
Add tests around error classification:
Adapter tests
Mock/override
streamFnor Pi agent stream behavior to simulate:message_endwithstopReason: 'error'Assert:
failedfor provider errorsUI smoke test
Create an entity run that writes a classified error and verify
AgentResponse.tsxrenders it.Open questions
What should the default timeout be?
Should timeout be per model/provider?
Should timeout mean time-to-first-event or total model call duration?
Should retries default to 0?
Should classified provider errors be represented in
runs,steps,errors, or all of the above?Should this be configured through
AgentConfig?Example:
Or keep as runtime/env config first.
Acceptance criteria