RESEARCH

When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis

ArXiv cs.AI · Fri, 29 May 2026 04:00:00 GMT

arXiv:2605.29025v1 Announce Type: new Abstract: Federal agencies are deploying large language models (LLMs) to categorize public comment corpora, where the model's organization of the record shapes what policymakers see and which arguments register. Standard evaluation, anchored

Read original source Discuss with A.S.I.S