RESEARCH

WorkBench Revisited: Workplace Agents Two Years On

ArXiv cs.AI · Mon, 15 Jun 2026 04:00:00 GMT

arXiv:2606.13715v1 Announce Type: new Abstract: The best agent on WorkBench in March 2024, GPT-4, completed 43% of tasks and took an unintended harmful action, such as emailing the wrong person, on 26% of them. We re-visit the benchmark in June 2026 and find that the best agent t

Read original source Discuss with A.S.I.S