RESEARCH

Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety

ArXiv cs.AI · Mon, 08 Jun 2026 04:00:00 GMT

arXiv:2606.06529v1 Announce Type: new Abstract: An attacker that strategically chooses when to attack is much harder to catch than one that attacks indiscriminately. AI control is a safety framework for deploying capable but untrusted AI agents under the oversight of a weaker, tr

Read original source Discuss with A.S.I.S