Safety Research

bloom Public

bloom - evaluate any behavior immediately 🌸🌱

Python 1.3k 166

persona_vectors Public

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Python 418 100

automated-w2s-research Public

Python 215 42

SCONE-bench Public

181 29

assistant-axis Public

The Assistant Axis is a direction in activation space that captures how "Assistant-like" a model's behavior is. Models can drift away from the Assistant during conversations—sometimes toward bizarr…

Jupyter Notebook 140 37

safety-tooling Public

Inference API for many LLMs and other useful tools for empirical research

Python 121 37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety Research

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!