Signal

New research reveals challenges and advances in AI safety and jailbreak detection

Recent research highlights both the persistent challenges in AI alignment and promising new methods to detect and exploit model safety weaknesses.

modelsai_policy_and_regulation

Evidence locked

Today's free sample is only available for the edition's flagship signal.

Evidence preview

Transparency alone won't solve the Alignment Problem. (via Reddit)
The Hard Truth
ControlProblem on Reddit (via Reddit)
ControlProblem on Reddit (via Reddit)