Related issue: #4162
Description
When an MCP server (or a proxy/gateway in front of one) returns a non-2xx HTTP response (e.g., 403 Forbidden), the ADK agent hangs for approximately 5 minutes before timing out. The MCP Python SDK's streamable HTTP transport crashes in a background TaskGroup, but the crash never propagates to the send_request() caller -- it remains blocked on an anyio memory stream that will never receive data.
The on_tool_error_callback never fires because send_request() never raises an exception during the hang period. After the 5-minute sse_read_timeout expires, it raises McpError(REQUEST_TIMEOUT) rather than the original HTTP error.
Environment
- google-adk: 1.27.2
- mcp (Python SDK): 1.26.0
- Python: 3.12
- OS: Linux (Debian 12 / GKE)
- Model:
gemini-3-flash-preview (via Vertex AI Agent Engine)
Steps to Reproduce
- Set up any MCP server (or HTTP endpoint) that returns HTTP 403 for certain requests.
- Create an ADK agent with
McpToolset pointing to that server.
- Invoke a tool that triggers the 403 response.
- Observe the agent hangs for ~5 minutes with no response.
Minimal reproduction
"""Minimal reproduction: ADK agent hangs on MCP HTTP 403."""
from google.adk.agents.llm_agent import Agent
from google.adk.tools import McpToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPConnectionParams
# Point to any MCP server behind a proxy that returns HTTP 403
# for certain tool calls (e.g., an authorization gateway)
MCP_URL = "https://example.com/mcp" # returns 403 on blocked tools
agent = Agent(
model="gemini-3-flash-preview",
name="repro_agent",
instruction="Call the 'blocked_tool' tool.",
tools=[
McpToolset(
connection_params=StreamableHTTPConnectionParams(url=MCP_URL),
),
],
)
When the agent calls a tool and the server responds with HTTP 403, the agent hangs for ~5 minutes instead of reporting the error.
Observed Behavior
- The agent UI/client keeps spinning with no response for ~5 minutes.
- Server logs show:
Error on session runner task: unhandled errors in a TaskGroup (1 sub-exception)
- After ~5 minutes, the request times out with
McpError(REQUEST_TIMEOUT).
on_tool_error_callback is never invoked during the hang.
Expected Behavior
- The agent should immediately surface the HTTP error (or a user-friendly message) back to the LLM/user.
on_tool_error_callback should be invoked with the HTTP error.
- The agent should not hang.
Root Cause Analysis
Full code path trace through mcp SDK and ADK:
1. Tool call initiation
mcp_tool.py:376 -> session.call_tool("tool_name") -> session.send_request() writes a JSON-RPC message to write_stream, then blocks waiting on response_stream_reader.receive() (session.py:292) with a ~5-minute timeout (sse_read_timeout).
2. Transport sends HTTP POST
streamable_http.py:569 -> post_writer spawns handle_request_async via tg.start_soon() inside a TaskGroup (line ~647).
3. HTTP error raised
streamable_http.py:358 -> response.raise_for_status() raises httpx.HTTPStatusError on the 403 response.
4. TaskGroup crashes
The TaskGroup at line ~647 catches the exception, wraps it in an ExceptionGroup, and crashes. The finally block closes read_stream_writer (line ~576).
5. Error logged but not propagated to caller
session_context.py:189-191 -> catches BaseException, logs "Error on session runner task", and re-raises. But this is in a background task -- it doesn't reach send_request().
6. send_request() hangs
Meanwhile, send_request() at session.py:292 is still blocked on response_stream_reader.receive(). This is a per-request anyio memory stream. The transport crashed before writing any response to it. The stream never receives data, so send_request() hangs until the sse_read_timeout (~5 minutes) expires.
7. on_tool_error_callback never fires
Because send_request() doesn't raise during the hang, the ADK error callback path is never triggered. When it finally times out, it raises McpError(REQUEST_TIMEOUT) -- losing the original HTTP 403 context entirely.
Why on_tool_error_callback cannot work for HTTP-level errors
The error occurs in a background task (the transport's TaskGroup). The send_request() call waiting for a response doesn't receive the exception -- it's blocked on a memory stream that will never be written to. The two execution contexts (background transport task vs. foreground request) are not connected for error propagation.
Suggested Fix
The mcp SDK's StreamableHTTPTransport should propagate HTTP errors from background tasks to the waiting send_request() call. Options:
- Write an error sentinel to the per-request memory stream before closing it, so
send_request() can raise immediately with the original error.
- Cancel the pending request when the transport's TaskGroup crashes, instead of letting it hang until timeout.
- Catch
httpx.HTTPStatusError in handle_request_async and convert it to a JSON-RPC error response written to the response stream.
Workaround
If you control the proxy/gateway returning the non-2xx response, return HTTP 200 with a JSON-RPC response containing CallToolResult(isError=true) instead. This keeps the error at the MCP protocol level where the SDK handles it correctly:
{
"jsonrpc": "2.0",
"id": "<request-id>",
"result": {
"content": [{"type": "text", "text": "Tool call denied by authorization policy."}],
"isError": true
}
}
The MCP SDK parses this as a normal tool result, no exceptions are raised, and the LLM sees the denial message directly.
Related issue: #4162
Description
When an MCP server (or a proxy/gateway in front of one) returns a non-2xx HTTP response (e.g., 403 Forbidden), the ADK agent hangs for approximately 5 minutes before timing out. The MCP Python SDK's streamable HTTP transport crashes in a background
TaskGroup, but the crash never propagates to thesend_request()caller -- it remains blocked on ananyiomemory stream that will never receive data.The
on_tool_error_callbacknever fires becausesend_request()never raises an exception during the hang period. After the 5-minutesse_read_timeoutexpires, it raisesMcpError(REQUEST_TIMEOUT)rather than the original HTTP error.Environment
gemini-3-flash-preview(via Vertex AI Agent Engine)Steps to Reproduce
McpToolsetpointing to that server.Minimal reproduction
When the agent calls a tool and the server responds with HTTP 403, the agent hangs for ~5 minutes instead of reporting the error.
Observed Behavior
Error on session runner task: unhandled errors in a TaskGroup (1 sub-exception)McpError(REQUEST_TIMEOUT).on_tool_error_callbackis never invoked during the hang.Expected Behavior
on_tool_error_callbackshould be invoked with the HTTP error.Root Cause Analysis
Full code path trace through
mcpSDK and ADK:1. Tool call initiation
mcp_tool.py:376->session.call_tool("tool_name")->session.send_request()writes a JSON-RPC message towrite_stream, then blocks waiting onresponse_stream_reader.receive()(session.py:292) with a ~5-minute timeout (sse_read_timeout).2. Transport sends HTTP POST
streamable_http.py:569->post_writerspawnshandle_request_asyncviatg.start_soon()inside aTaskGroup(line ~647).3. HTTP error raised
streamable_http.py:358->response.raise_for_status()raiseshttpx.HTTPStatusErroron the 403 response.4. TaskGroup crashes
The
TaskGroupat line ~647 catches the exception, wraps it in anExceptionGroup, and crashes. Thefinallyblock closesread_stream_writer(line ~576).5. Error logged but not propagated to caller
session_context.py:189-191-> catchesBaseException, logs"Error on session runner task", and re-raises. But this is in a background task -- it doesn't reachsend_request().6. send_request() hangs
Meanwhile,
send_request()atsession.py:292is still blocked onresponse_stream_reader.receive(). This is a per-requestanyiomemory stream. The transport crashed before writing any response to it. The stream never receives data, sosend_request()hangs until thesse_read_timeout(~5 minutes) expires.7. on_tool_error_callback never fires
Because
send_request()doesn't raise during the hang, the ADK error callback path is never triggered. When it finally times out, it raisesMcpError(REQUEST_TIMEOUT)-- losing the original HTTP 403 context entirely.Why
on_tool_error_callbackcannot work for HTTP-level errorsThe error occurs in a background task (the transport's
TaskGroup). Thesend_request()call waiting for a response doesn't receive the exception -- it's blocked on a memory stream that will never be written to. The two execution contexts (background transport task vs. foreground request) are not connected for error propagation.Suggested Fix
The
mcpSDK'sStreamableHTTPTransportshould propagate HTTP errors from background tasks to the waitingsend_request()call. Options:send_request()can raise immediately with the original error.httpx.HTTPStatusErrorinhandle_request_asyncand convert it to a JSON-RPC error response written to the response stream.Workaround
If you control the proxy/gateway returning the non-2xx response, return HTTP 200 with a JSON-RPC response containing
CallToolResult(isError=true)instead. This keeps the error at the MCP protocol level where the SDK handles it correctly:{ "jsonrpc": "2.0", "id": "<request-id>", "result": { "content": [{"type": "text", "text": "Tool call denied by authorization policy."}], "isError": true } }The MCP SDK parses this as a normal tool result, no exceptions are raised, and the LLM sees the denial message directly.