Skip to content

Add an API Endpoint to Cancel In-Progress Agent Tasks #2425

@vzegna

Description

@vzegna

The Problem

Currently, when an agent is running a long task (e.g., involving multiple tool calls or a complex chain of thought), there is no way to programmatically stop or cancel the execution via the ADK web server API.

In a web application scenario, a user might trigger a long-running agent task. If the user decides to stop the process, the front-end can send a request, but there is no endpoint on the ADK server to handle this cancellation. Disconnecting a streaming connection (like SSE or WebSocket) only stops the client from receiving further updates; the agent continues to run on the server, consuming resources until it completes its task.

This makes it difficult to build responsive and user-friendly interfaces on top of ADK agents, as there is no mechanism for user-initiated interruption.

Proposed Solution

I propose implementing a session-state-based cancellation mechanism. This would involve a few coordinated changes to the framework that should integrate cleanly with the existing architecture.

1. Introduce a cancelled Flag in the Session State

The Session object's state dictionary (defined via TypeAlias in src/google/adk/sessions/state.py) is a perfect place to manage this. We can introduce a cancelled: bool flag.

2. Create a New Cancellation Endpoint

A new FastAPI endpoint should be added to src/google/adk/cli/adk_web_server.py:

@app.post("/apps/{app_name}/users/{user_id}/sessions/{session_id}:cancel")
async def cancel_session(app_name: str, user_id: str, session_id: str):
    session = await self.session_service.get_session(
        app_name=app_name, user_id=user_id, session_id=session_id
    )
    if not session:
        raise HTTPException(status_code=404, detail="Session not found")

    if session.state is None:
        session.state = {}
    session.state["cancelled"] = True

    await self.session_service.update_session(session)

    return {"status": "cancelled", "session_id": session_id}

This endpoint would be responsible for retrieving the session, setting the cancelled flag to True, and persisting the change.

3. Modify Core Agent Logic to Respect the Flag

The agent's execution loops need to be updated to check for this flag periodically. The key integration points would be:

  • In src/google/adk/flows/llm_flows/base_llm_flow.py: Before making a call to the LLM in _call_llm_async(), the agent should check session.state.get("cancelled"). If true, it should stop execution and yield a final "cancelled" message.
  • In Streaming Loops: For both SSE and WebSocket (/run_live) connections, the message processing loops should check the flag. If it's true, the connection should be gracefully closed.
  • In Tool Execution: Long-running tools should also be designed to accept the ToolContext and check the session's cancelled flag, allowing them to terminate early.

Here is a conceptual example of how the check in base_llm_flow.py might look:

# In base_llm_flow.py's _call_llm_async
session = invocation_context.session
if session and session.state and session.state.get("cancelled"):
    yield LlmResponse(
        model_response=types.GenerateContentResponse(
            # ... create a response indicating cancellation ...
        ),
        turn_complete=True
    )
    return

Additional Implementation Details

4. Function/Tool Execution Cancellation

In src/google/adk/flows/llm_flows/functions.py, the parallel tool execution should check for cancellation:

# In handle_function_calls_live() before creating tasks
session = invocation_context.session
if session and session.state and session.state.get("cancelled"):
    return None  # Skip tool execution if cancelled

5. WebSocket Connection Handling

The WebSocket endpoint should also monitor the cancellation flag:

# In agent_live_run() message processing loop
async def process_messages():
    while True:
        if session.state and session.state.get("cancelled"):
            await websocket.close(code=1000, reason="Session cancelled")
            break
        # ... continue processing

Alternative Approaches Considered

  1. AsyncIO Task Cancellation: Using Python's native task cancellation could be considered but would require significant refactoring of the execution model.
  2. Separate Cancellation Service: A dedicated service to track cancellations could be considered but adds unnecessary complexity.
  3. Timeout-Based Approach: Simply using timeouts could be considered but doesn't provide the immediate responsiveness users expect.

Testing Considerations

The implementation should include tests for:

  • Setting and retrieving the cancellation flag
  • Agent respecting cancellation during LLM calls
  • Tool execution stopping when cancelled
  • Streaming connections closing properly
  • Partial results being returned correctly

Justification

This feature is critical for building production-ready applications with ADK. It provides:

  • Better User Experience: Users can interrupt tasks without having to wait for them to complete or time out.
  • Resource Management: Prevents orphaned, long-running agent processes from consuming unnecessary server resources.
  • Robustness: Creates a more complete and professional API for managing the agent lifecycle.
  • Alignment with Industry Standards: Most AI/LLM frameworks (OpenAI, Anthropic, LangChain) provide cancellation mechanisms.

I believe this approach is minimally invasive and leverages the existing session management infrastructure effectively.

Related Issues/Context

This feature request arose from real-world usage where developers are building web applications on top of ADK agents deployed on Agent Engine and need responsive user interfaces with stop/cancel functionality.

(Co-authored by Gemini CLI and Claude Code)

Metadata

Metadata

Assignees

Labels

core[Component] This issue is related to the core interface and implementationneeds review[Status] The PR/issue is awaiting review from the maintainer
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions