Failure-First is a defensive AI safety research project that studies how AI systems fail under adversarial pressure. This public repository contains pattern-level findings and methodology descriptions. Operational testing infrastructure, adversarial datasets, and full evaluation traces are maintained in a private repository.
We practice coordinated vulnerability disclosure (CVD) for AI safety vulnerabilities discovered through this research.
We have submitted 10 responsible disclosures to model providers (Nvidia, Alibaba, Zhipu, Google/Gemma, Mistral, and others) covering two vulnerability classes: context-collapse attacks and transcription-loophole injection. Notifications were sent 2026-04-07. Disclosure timelines follow standard CVD practices: findings are reported to affected parties with a reasonable remediation window before any public discussion of specifics.
- Discovery: Vulnerability pattern identified through systematic evaluation
- Verification: Finding confirmed across multiple test conditions with statistical controls
- Private notification: Affected provider contacted via their security reporting channel
- Remediation window: Minimum 90 days before public discussion of specifics
- Public disclosure: Pattern-level description published (no operational details)
Our vulnerability research follows the principles documented in our research ethics charter:
- Findings serve the defensive research mission
- Operational details are never published
- Affected parties are notified before any public discussion
- Pattern-level descriptions enable defensive improvements without enabling attacks
If you find security issues with this public repository (exposed credentials, vulnerable dependencies, website security):
- Non-sensitive issues: Open a GitHub issue
- Sensitive issues: Email research@failurefirst.org
- GitHub Security Advisories: Use the "Security" tab to report privately
If you discover vulnerabilities in AI systems through independent research and would like to collaborate on disclosure:
DO:
- Follow responsible disclosure practices
- Report to affected vendors before public disclosure
- Document findings at pattern-level for academic discussion
- Contact us via GitHub issue if you want to coordinate
DO NOT:
- Post operational exploits in public issues
- Share working bypass techniques without vendor notification
- Weaponize research findings
In scope:
- Security issues with this GitHub repository or failurefirst.org
- Vulnerabilities in public documentation or site infrastructure
- Dependency security issues
- Collaboration on coordinated disclosure of AI safety vulnerabilities
Out of scope:
- Vulnerabilities in third-party AI systems (report directly to vendors)
- Requests for operational exploit code or adversarial datasets
- Model-specific jailbreak techniques
- Acknowledgment: Within 3 business days
- Initial assessment: Within 7 business days
- Resolution: Depends on severity and complexity
- Non-sensitive: Open a GitHub issue
- Sensitive disclosures: research@failurefirst.org
- CVD coordination: Open a GitHub issue with institutional affiliation
Last updated: 2026-05-06