Instance-Optimal Estimation with Multiple LLM Judges on a Budget

Abstract

Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt–response pair can vary substantially. This raises a basic allocation question: under a fixed budget, how should one distribute evaluation queries across heterogeneous judges and instances to obtain the most accurate score estimates? We formalize this question as budgeted heteroskedastic multi-judge estimation. Given $K$ prompt–response pairs, $J$ judges with known costs, and unknown query–judge variances, the goal is to estimate a bounded score vector while minimizing an $\ell_p$-error. Our first contribution is to analyze the inverse-variance weighted estimator (IVWE) and to derive the oracle allocation that minimizes its error rate. Since this allocation depends on the unknown variances, we then address the practical unknown-variance setting by proposing Est-IVWE, an adaptive algorithm that constructs and leverages optimistically biased variance estimates to stabilize the empirical allocation. We prove that Est-IVWE matches the oracle IVWE rate up to lower-order terms in the budget. Our second and central theoretical contribution is a matching local minimax lower bound, which establishes the instance-optimality of the proposed algorithms. A key technical insight is that Fano-type high-probability arguments are too coarse for this problem: their packing construction loses the local variance structure that governs the optimal allocation. We instead use an Assouad-type in-expectation argument, based on local perturbations, which preserves this structure and yields the sharp allocation-dependent lower bound. Finally, we numerically validate the superiority of our approach over na"{i}ve uniform allocation on synthetic and HelpSteer2 datasets.

Publication
Forthcoming
Junghyun Lee
Junghyun Lee
PhD Student

PhD student at GSAI, KAIST, jointly advised by Profs. Se-Young Yun and Chulhee Yun. Research focuses on interactive machine learning, “theoretical perspectives” of LLMs, optimization theory, and statistical analyses of large networks with an emphasis on community detection. Broadly interested in mathematical and theoretical AI, as well as related problems in mathematics and statistics.