Build agent image in cmd_post_process#96
Build agent image in cmd_post_process#96moodmosaic merged 3 commits intoprotocol-security:masterfrom
Conversation
Pin the PR's contract structurally so a future refactor cannot silently drop one of the call sites or change the SWARM_AGENTS union. - Section 38 sources compute_swarm_agents from launch.sh and exercises it on eight configs: default driver, codex-only, mixed groups, dedup across groups, the codex-agents + claude-code-pp scenario from the PR description, pp matching the agent driver, pp inheriting the top-level driver, and an empty per-group driver falling back to the default. - Section 39 asserts build_image has exactly two call sites (one each in cmd_start and cmd_post_process) and that its body still threads compute_swarm_agents, SWARM_AGENTS, CLAUDE_CODE_VERSION, and CODEX_CLI_VERSION as build-args.
|
One ask going forward: PRs must ship with tests. In the AI era there's no excuse -- (trustworthy) tests can be cheap to author, and untested changes only slow review (the maintainer has to reproduce the bug by hand and build the mental model from scratch). A trustworthy test-suite is what keeps this project maintainable long-term and the maintainer's life easier. I added tests on top of this branch this round so we don't block the fix; please land tests-included next time. 🙏 |
moodmosaic
left a comment
There was a problem hiding this comment.
Thanks @BowTiedRadone! 👍
cmd_post_process never called docker build, so a driver-set switch silently left the post container on a stale image and exited 127; factoring out compute_swarm_agents / build_image is the right shape and gets cache invalidation for free.
LGTM!
|
@moodmosaic Thanks! Sure thing 🙏 |
Summary
Fixes an edge case that showed when trying to use a separate claude-code post-processor after the image was built for a codex-only config. The Docker layer cache wasn't picked up -- because the post-process path never called
docker buildat all -- so the post container inherited a claude-less image and the harness exited 127 on first session, needing manualdocker buildintervention to recover.cmd_post_processnow builds the image from the active config before launching, mirroringcmd_start. Build args are derived from the same config the container will use, so when the driver set changes the layer cache invalidates correctly and the right install layer re-runs; when it hasn't changed, the build is a no-op.Changes
launch.sh: factorcompute_swarm_agents(driver set from a config) andbuild_image(build with derived args) out ofcmd_start, and callbuild_imagefromcmd_post_process.launch.sh: drive-by, switch the submodule mirror cleanup incmd_starttorm_docker_dirso it works when a prior container left UID-mismatched files behind.Self-review checklist
Test plan
Using two configs with identical prompts, having the agent drivers as the only difference (
swarm-codex.jsonandswarm-claude-code.json):swarm-codex.json, then run post-process againstswarm-claude-code.json-- image rebuilds withclaude-codeand the post container reaches a real session.launch.sh startagainst an unchanged config -- behavior identical to before.