Commit d90b728
committed
Add full-bridge corridor amplification tests
Extend corridor attack tests with residual connections and RMSNorm
(the production bridge path). Key finding: RMSNorm does NOT dampen
amplification. Results with full bridge:
- Token flip rate: 80-88% at τ=10 across d=16..256 (2 layers)
- At 28 layers: 96% flip rate, residual L∞ divergence = 630K
- Residual L∞ grows ~exponentially with depth
- RMSNorm normalizes magnitude but preserves directional perturbations
The attention corridor tolerance is exploitable even with the
production residual+RMSNorm bridge path.1 parent e455469 commit d90b728
1 file changed
Lines changed: 410 additions & 1 deletion
0 commit comments