Skip to content

fix(crafter): use deterministic filename for AI config collector#2943

Draft
waveywaves wants to merge 1 commit intochainloop-dev:mainfrom
waveywaves:fix/ai-config-deterministic-filename
Draft

fix(crafter): use deterministic filename for AI config collector#2943
waveywaves wants to merge 1 commit intochainloop-dev:mainfrom
waveywaves:fix/ai-config-deterministic-filename

Conversation

@waveywaves
Copy link
Copy Markdown
Contributor

@waveywaves waveywaves commented Mar 27, 2026

Summary

  • Replace os.CreateTemp (random suffix) with a deterministic filename derived from the full ConfigHash, eliminating duplicate CAS uploads on attestation retries
  • Write temp files into a private os.MkdirTemp directory instead of the shared os.TempDir() to prevent symlink attacks
  • Extract aiConfigTempFilePath() helper with documentation explaining the root cause
  • Add unit tests for filename determinism invariants

Root Cause Analysis

The reviewer questioned whether the filename is actually the cause of duplicates since Materials[m.Name] already uses a deterministic key. The answer is yes — the map key is deterministic, but the Artifact struct inside the material is not.

Here is the chain of causation:

  1. uploadAgentConfig() calls os.CreateTemp("", "ai-agent-config-claude-*.json"), which produces a random filename like /tmp/ai-agent-config-claude-3847291.json
  2. AddMaterialContractFree() calls addMaterial(), which calls uploadAndCraft()
  3. uploadAndCraft() calls fileStats(artifactPath), which does os.Stat(filepath).Name() to get the basename of the temp file
  4. That basename becomes Artifact.Name in the material proto (line 136 of materials.go)
  5. fileStats() also computes sha256(file_contents) which becomes Artifact.Digest

Because the JSON payload is identical across retries, Artifact.Digest stays the same. But Artifact.Name changes every time (random temp suffix), so the CAS sees a different resource name on each retry. The map key Materials["ai-agent-config-claude"] is overwritten (good), but each retry still triggers a new CAS upload because the artifact metadata differs.

The fix makes the filename deterministic by deriving it from ConfigHash, so Artifact.Name is stable across retries.

Changes

  • pkg/attestation/crafter/collector_aiagentconfig.go:

    • Add aiConfigTempFilePath() helper with detailed doc comment explaining the root cause
    • Replace os.CreateTemp + manual write/close with os.MkdirTemp (private dir) + os.WriteFile (deterministic filename)
    • Use full ConfigHash instead of truncated [:12] — no reason to truncate a sha256 hex string
    • Remove now-unnecessary defer os.Remove(tmpFile.Name()) in favor of defer os.RemoveAll(tmpDir)
  • pkg/attestation/crafter/collector_aiagentconfig_test.go (new):

    • Test that same (agent, hash) inputs produce identical file paths
    • Test that different config hashes produce different file paths
    • Test that different agent names produce different file paths
    • Test that the full config hash appears in the path (not truncated)

Test Plan

  • go build ./pkg/attestation/crafter/ passes
  • go test ./pkg/attestation/crafter/... — all 8 sub-packages pass
  • New TestAIConfigTempFilePath_Deterministic tests verify:
    • Same config input produces same filename across calls
    • Different config input produces different filenames
    • Full hash is embedded (no truncation)

Fixes #2907

@waveywaves waveywaves force-pushed the fix/ai-config-deterministic-filename branch 2 times, most recently from 529538f to 486591a Compare March 27, 2026 19:21
Replace os.CreateTemp (which generates a random suffix) with a
deterministic filename derived from the config's content hash.

Root cause: although AddMaterialContractFree uses a deterministic map
key (Materials[m.Name]), the Artifact struct embedded in each material
gets its Name and Digest from fileStats(), which calls
os.Stat(filepath).Name(). A random temp filename therefore produces a
different Artifact.Name on every call — even when the JSON payload is
byte-for-byte identical — causing the CAS to treat each retry as a new,
distinct resource.

Changes:
- Extract aiConfigTempFilePath() helper for deterministic path generation
- Use full ConfigHash (no truncation) to avoid collision risk
- Write into a private os.MkdirTemp directory (symlink-safe) instead of
  the shared os.TempDir()
- Add unit tests proving same-input-same-output and different-input-
  different-output invariants

Fixes: chainloop-dev#2907

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vibhav Bobade <vibhav.bobde@gmail.com>
@waveywaves waveywaves force-pushed the fix/ai-config-deterministic-filename branch from 486591a to cb22854 Compare March 28, 2026 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AI config collector creates duplicate uploads on retry

1 participant