RESEARCH

Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System

ArXiv cs.AI · Fri, 12 Jun 2026 04:00:00 GMT

arXiv:2606.12702v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly integrated into clinical systems, making it essential to evaluate the real-world utility of these systems. However, static benchmarks tend to measure correctness rather than user accepta

Read original source Discuss with A.S.I.S