RESEARCH

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

ArXiv cs.AI · Sat, 06 Jun 2026 04:00:00 GMT

arXiv:2606.05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show

Read original source Discuss with A.S.I.S