Fix CPU spike while streaming tool-call arguments#455
Open
itkonen wants to merge 1 commit intoeditor-code-assistant:masterfrom
Open
Fix CPU spike while streaming tool-call arguments#455itkonen wants to merge 1 commit intoeditor-code-assistant:masterfrom
itkonen wants to merge 1 commit intoeditor-code-assistant:masterfrom
Conversation
Reuse the prompt-turn tool list while streaming tool-call arguments instead of recomputing all available tools for every streamed delta. Add a dev benchmark for reproducing the streamed tool-call prepare path. 🤖 Generated with [eca](https://eca.dev) Co-Authored-By: eca-agent <git@eca.dev>
Contributor
Author
|
The test failure appears to be unrelated to this PR’s code - maybe a transient server problem. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I noticed that ECA could become very CPU-heavy while the model was streaming tool calls, especially when writing or editing files. In those cases the ECA process could reach around 100% CPU and the workflow became noticeably slower.
The issue seemed specific to streamed tool-call arguments. Normal assistant text streaming did not show the same behavior; the slowdown was most visible when the model was gradually building a tool call such as
write_fileoredit_file, where the arguments grow chunk by chunk.I asked Codex/ECA to benchmark this path and identify the hotspot. After applying this change, I tested the same workflow locally and the CPU load is now negligible. Tool-call streaming now appears to keep up with the LLM instead of CPU becoming the bottleneck.
AI-generated technical summary
Problem
While streaming tool-call arguments,
:on-prepare-tool-callrecomputed the full tool list for every streamed argument delta:(f.tools/all-tools chat-id agent @db* config)This is expensive because
f.tools/all-toolsrebuilds native tools, MCP tools, schemas, dynamic descriptions, disabled-tool filtering, approval filtering, and subagent filtering.For large tool calls such as
write_file, providers may stream the file content as tool-call argument JSON. That means this callback can run hundreds or thousands of times for a single tool call.Root cause
The prompt flow already computes
all-toolsonce before sending the request to the provider:The model generates tool calls based on that prompt-turn tool snapshot. However, the streamed prepare callback was rebuilding the tool list again for every argument chunk.
Git history suggests this was likely introduced during a refactor rather than being intentional. Earlier code resolved streamed tool-call prepare events against the existing prompt-turn
all-toolsbinding.Fix
Reuse the prompt’s existing
all-toolsvalue inside:on-prepare-tool-callinstead of recomputing it per streamed delta.This makes streamed tool-call preparation resolve against the same tool list that was sent to the model for that prompt turn.
Functional impact
This should preserve intended behavior.
Tool-call prepare events now use the prompt-turn tool snapshot, which is consistent with provider semantics: the model can only call tools that were included in the request at prompt start.
The only theoretical behavior change is that tool-list changes made during an active streaming response are not reflected in
toolCallPreparemetadata until the next prompt turn. That seems preferable to resolving different chunks of the same streamed tool call against potentially different tool lists.Benchmark evidence
A dev benchmark was added to simulate streamed tool-call argument chunks without involving a live LLM or editor.
Command:
Representative results:
A larger cached-path run processed 5000 streamed chunks with JSON serialization in ~106 ms.
Test plan
Manual validation: