RESEARCH

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

ArXiv cs.AI · Mon, 18 May 2026 04:00:00 GMT

arXiv:2605.15224v1 Announce Type: new Abstract: Large language model-based agents make mistakes, yet critique can often guide the same model toward correct behavior. However, when critique is removed, the model may fail again on the same query, indicating that it has not internal

Read original source Discuss with A.S.I.S