RESEARCH

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

ArXiv cs.AI · Wed, 20 May 2026 04:00:00 GMT

arXiv:2605.19099v1 Announce Type: new Abstract: We introduce DecisionBench, a benchmark substrate for emergent delegation in long-horizon agentic workflows. The substrate fixes a task suite (GAIA, tau-bench, BFCL multi-turn), a peer-model pool (11 models, 7 vendor families), a de

Read original source Discuss with A.S.I.S