Add more tools (based on searchfox-py) to the LLM review agent, and make it patch stack aware to a certain extent#6051
Conversation
33dbdae to
798947f
Compare
|
padenot/searchfox-cli@v0.10.12...v0.13.0 are the changes in |
9b75539 to
44f260f
Compare
expand_context now shows the file as it appears after the full patch stack is applied. A new patch_apply module provides reusable primitives: - strip_diff_prefix, find_patched_file, apply_patched_file: diff utilities with tenacity retries on both fetches for transient failures - get_file_after_stack(patch, path, fetch): applies patch.patch_stack in order on top of whatever base the caller provides via fetch() expand_context calls get_file_after_stack with patch.get_old_file as the fetch function. The Patch base class gains patch_stack (default: [patch_set]), get_base_revision(), and patch_stack_warning(); phabricator.py overrides all three. Non-linear stacks fall back to the current patch only and set a warning that expand_context surfaces prominently so the agent can flag it in the review. expand_context also gains optional start_line / end_line — omitting either returns from the beginning or to the end of the file. Only check direct dependency updates Fixes mozilla#6054 Version 0.0.636 Use itertools.batched where possible Process try pushes in batches
Expose the searchfox Python API (searchfox-cli) as LangChain tools wired into the code review agent. expand_context switches its fetch function from patch.get_old_file to searchfox at the patch's base revision, using get_file_after_stack from patch_apply. - expand_context: searchfox at base revision + full patch stack applied - search_text: free-text / regex search with file-type and path filters - get_blame: per-line commit history - get_field_layout: C++ class/struct memory layout - find_definition: look up a function, method, class, or struct - search_identifier: exact identifier search with file-type filters - calls_from / calls_to / calls_between: call graph traversal with depth - check_can_gc: SpiderMonkey GC hazard analysis for C++ functions All tools support langs (list) and tests filters where applicable. No additional tests for the tool wrappers: the tools are thin call-and-format wrappers over searchfox-py, which is tested independently in its own repo. Closes mozilla#5284 diff --git a/bugbug/tools/code_review/agent.py b/bugbug/tools/code_review/agent.py
44f260f to
3917e53
Compare
| tools = [ | ||
| expand_context, | ||
| search_text, | ||
| get_blame, | ||
| get_field_layout, | ||
| find_definition, | ||
| search_identifier, | ||
| calls_from, | ||
| calls_to, | ||
| calls_between, | ||
| check_can_gc, | ||
| ] |
There was a problem hiding this comment.
It would be preferable to have this as a list somewhere and then use it here and import it above, so we don't have to remember when we add a new searchfox tool that we also need to add it here
| return f"{prefix}: {message}" | ||
|
|
||
|
|
||
| logger = getLogger(__name__) |
There was a problem hiding this comment.
Nit: move this back above so it doesn't show as diff (unless it's necessary for linting)
| LangStr = Literal[ | ||
| "cpp", "c", "js", "webidl", "java", "kotlin", "rust", "python", "html", "css" | ||
| ] |
There was a problem hiding this comment.
Nit: we could get a list of langs from bugbug/repository.py (SOURCE_CODE_TYPES_TO_EXT.keys())
| def _get_client() -> AsyncSearchfoxClient: | ||
| global _client | ||
| if _client is None: | ||
| _client = AsyncSearchfoxClient() | ||
| return _client |
There was a problem hiding this comment.
Nit: you could use cache from functools to avoid managing it yourself
| except Exception: | ||
| # searchfox raises plain Exception; fall back to tip | ||
| pass | ||
| return await _retry(client.get_file)(path) |
There was a problem hiding this comment.
We could use the old cose as fall back and get the before code from Phab
| check_can_gc, | ||
| ] | ||
| if function_search: | ||
| tools.append(create_find_function_definition_tool(function_search)) |
There was a problem hiding this comment.
We don't need this anymore, do we?
| from requests import HTTPError | ||
| from searchfox import AsyncSearchfoxClient | ||
|
|
||
| from bugbug.code_search.function_search import FunctionSearch |
There was a problem hiding this comment.
We had different types of function search tools implementing this FunctionSearch interface, your approach is to instead just use searchfox directly.
Should we implement yours in the FunctionSearch interface too?
Currently we don't really need the abstraction, but we might want it if we add support for GitHub where we don't have Searchfox. WDYT?
The two patches can be reviewed independently but are close in location so I'm putting one branch for both.