Skip to content

Add more tools (based on searchfox-py) to the LLM review agent, and make it patch stack aware to a certain extent#6051

Open
padenot wants to merge 2 commits into
mozilla:masterfrom
padenot:searchfox-more-tools
Open

Add more tools (based on searchfox-py) to the LLM review agent, and make it patch stack aware to a certain extent#6051
padenot wants to merge 2 commits into
mozilla:masterfrom
padenot:searchfox-more-tools

Conversation

@padenot
Copy link
Copy Markdown
Collaborator

@padenot padenot commented May 25, 2026

The two patches can be reviewed independently but are close in location so I'm putting one branch for both.

@padenot padenot force-pushed the searchfox-more-tools branch 2 times, most recently from 33dbdae to 798947f Compare May 25, 2026 16:12
@padenot padenot requested a review from suhaibmujahid May 25, 2026 16:14
@padenot
Copy link
Copy Markdown
Collaborator Author

padenot commented May 25, 2026

padenot/searchfox-cli@v0.10.12...v0.13.0 are the changes in searchfox-py for this (API tightening, adding async, etc.).

@padenot padenot force-pushed the searchfox-more-tools branch 5 times, most recently from 9b75539 to 44f260f Compare May 26, 2026 14:06
padenot added 2 commits May 26, 2026 17:02
expand_context now shows the file as it appears after the full patch stack
is applied. A new patch_apply module provides reusable primitives:

- strip_diff_prefix, find_patched_file, apply_patched_file: diff utilities
  with tenacity retries on both fetches for transient failures
- get_file_after_stack(patch, path, fetch): applies patch.patch_stack in
  order on top of whatever base the caller provides via fetch()

expand_context calls get_file_after_stack with patch.get_old_file as the
fetch function. The Patch base class gains patch_stack (default: [patch_set]),
get_base_revision(), and patch_stack_warning(); phabricator.py overrides all
three. Non-linear stacks fall back to the current patch only and set a warning
that expand_context surfaces prominently so the agent can flag it in the review.

expand_context also gains optional start_line / end_line — omitting either
returns from the beginning or to the end of the file.

Only check direct dependency updates

Fixes mozilla#6054

Version 0.0.636

Use itertools.batched where possible

Process try pushes in batches
Expose the searchfox Python API (searchfox-cli) as LangChain tools wired
into the code review agent. expand_context switches its fetch function
from patch.get_old_file to searchfox at the patch's base revision, using
get_file_after_stack from patch_apply.

- expand_context: searchfox at base revision + full patch stack applied
- search_text: free-text / regex search with file-type and path filters
- get_blame: per-line commit history
- get_field_layout: C++ class/struct memory layout
- find_definition: look up a function, method, class, or struct
- search_identifier: exact identifier search with file-type filters
- calls_from / calls_to / calls_between: call graph traversal with depth
- check_can_gc: SpiderMonkey GC hazard analysis for C++ functions

All tools support langs (list) and tests filters where applicable.

No additional tests for the tool wrappers: the tools are thin call-and-format
wrappers over searchfox-py, which is tested independently in its own repo.

Closes mozilla#5284

diff --git a/bugbug/tools/code_review/agent.py b/bugbug/tools/code_review/agent.py
@padenot padenot force-pushed the searchfox-more-tools branch from 44f260f to 3917e53 Compare May 26, 2026 17:06
Comment on lines +106 to +117
tools = [
expand_context,
search_text,
get_blame,
get_field_layout,
find_definition,
search_identifier,
calls_from,
calls_to,
calls_between,
check_can_gc,
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to have this as a list somewhere and then use it here and import it above, so we don't have to remember when we add a new searchfox tool that we also need to add it here

return f"{prefix}: {message}"


logger = getLogger(__name__)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move this back above so it doesn't show as diff (unless it's necessary for linting)

Comment on lines +36 to +38
LangStr = Literal[
"cpp", "c", "js", "webidl", "java", "kotlin", "rust", "python", "html", "css"
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could get a list of langs from bugbug/repository.py (SOURCE_CODE_TYPES_TO_EXT.keys())

Comment on lines +49 to +53
def _get_client() -> AsyncSearchfoxClient:
global _client
if _client is None:
_client = AsyncSearchfoxClient()
return _client
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you could use cache from functools to avoid managing it yourself

except Exception:
# searchfox raises plain Exception; fall back to tip
pass
return await _retry(client.get_file)(path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the old cose as fall back and get the before code from Phab

check_can_gc,
]
if function_search:
tools.append(create_find_function_definition_tool(function_search))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this anymore, do we?

from requests import HTTPError
from searchfox import AsyncSearchfoxClient

from bugbug.code_search.function_search import FunctionSearch
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had different types of function search tools implementing this FunctionSearch interface, your approach is to instead just use searchfox directly.
Should we implement yours in the FunctionSearch interface too?

Currently we don't really need the abstraction, but we might want it if we add support for GitHub where we don't have Searchfox. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants