Skip to content

Commit dfbbef8

Browse files
Restructure roadmap: protocol-tightening before diagnostics and experiments
- Tier 0 reordered: unify attention → adversarial testing → formal security → real-model corridor attacks → KV provenance → score witnessing → W_o conditioning → payload sizing → RMSNorm contraction (supporting). - New items: #5 real-model corridor attacks, #8 W_o conditioning (promoted from Tier 5), #9 deep-audit payload sizing. - Score witnessing deduplicated: removed Tier 5 #67, canonical item is #7. - Deterministic inference moved to Tier 5 #68 as side experiment, not mainline plan. Updated to note: verifier already recomputes attention, so τ=0 + arithmetic spec closes the gap without score witnessing. But τ=0 alone doesn't solve prefix anchoring — KV provenance still needed. - Score witnessing description corrected: scores checked against canonical QK^T from shell-verified Q and committed K, not "via Freivalds." - Cross-references updated in research/adversarial-methodology.md.
1 parent 88f736a commit dfbbef8

2 files changed

Lines changed: 121 additions & 122 deletions

File tree

research/adversarial-methodology.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Provider precomputes or caches honest responses to avoid real-time computation.
2525

2626
| Attack | Why it matters |
2727
|--------|---------------|
28-
| Replay honest receipt for known prompt | Without freshness binding (#16), cached receipts verify indefinitely |
28+
| Replay honest receipt for known prompt | Without freshness binding (#23), cached receipts verify indefinitely |
2929
| Prefix-sharing forgery | Precompute receipt for common prefix, graft onto new suffix |
3030
| Selective honest/dishonest | Honest for audited fraction, dishonest for the rest — directly attacks sampling rate |
3131

@@ -104,7 +104,7 @@ Replace "does this attack get caught?" with "what fraction of cheating strategie
104104
- **Security curve**: P(detect) as a function of (cheating fraction, audit rate)
105105
- **Break-even analysis**: at what cheating fraction does the expected gain exceed the expected penalty?
106106
- **Minimum audit rate**: for a given target detection probability (e.g., 99%), what audit rate is required?
107-
- Feed results into #66 (cheating-incentive analysis) and #14 (formal security argument)
107+
- Feed results into #70 (cheating-incentive analysis) and #4 (formal security argument)
108108

109109
---
110110

@@ -186,9 +186,9 @@ Given white-box access to the verifier, use optimization (gradient descent or se
186186

187187
## References
188188

189-
- Roadmap #8: Adversarial testing (in progress)
190-
- Roadmap #14: Formal security argument
191-
- Roadmap #9: Fuzz binary parsers
192-
- Roadmap #66: Cheating-incentive analysis
193-
- Roadmap #1: Adversarial methodology research (this document)
189+
- Roadmap #3: Adversarial testing (in progress)
190+
- Roadmap #4: Formal security argument
191+
- Roadmap #17: Fuzz binary parsers
192+
- Roadmap #70: Cheating-incentive analysis
193+
- Roadmap #16: Adversarial methodology research (this document)
194194
- `redteam/attack_matrix.md`: current attack coverage inventory

0 commit comments

Comments
 (0)