Instance-Optimal Estimation with Multiple LLM Judges on a Budget

May 25, 2026·

Junghyun Lee

Equal contribution

Sanghwa Kim

Equal contribution

Yassir Jedra

Alexandre Proutière

Se-Young Yun

· 0 min read

PDF

Abstract

Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt–response pair can vary substantially. This raises a basic allocation question: under a fixed budget, how should one distribute evaluation queries across heterogeneous judges and instances to obtain the most accurate score estimates? We formalize this question as budgeted heteroskedastic multi-judge estimation. Given $K$ prompt–response pairs, $J$ judges with known costs, and unknown query–judge variances, the goal is to estimate a bounded score vector while minimizing an $\ell_p$-error. Our first contribution is to analyze the inverse-variance weighted estimator (IVWE) and to derive the oracle allocation that minimizes its error rate. Since this allocation depends on the unknown variances, we then address the practical unknown-variance setting by proposing Est-IVWE, an adaptive algorithm that constructs and leverages optimistically biased variance estimates to stabilize the empirical allocation. We prove that Est-IVWE matches the oracle IVWE rate up to lower-order terms in the budget. Our second and central theoretical contribution is a matching local minimax lower bound, which establishes the instance-optimality of the proposed algorithms. A key technical insight is that Fano-type high-probability arguments are too coarse for this problem: their packing construction loses the local variance structure that governs the optimal allocation. We instead use an Assouad-type in-expectation argument, based on local perturbations, which preserves this structure and yields the sharp allocation-dependent lower bound. Finally, we numerically validate the superiority of our approach over na"{i}ve uniform allocation on synthetic and HelpSteer2 datasets.

Type

Preprint

Publication

arXiv preprint arXiv:2605.23362

Last updated on May 25, 2026

Statistics Information Theory LLMs

Authors

Junghyun Lee

PhD Candidate in AI

PhD candidate at KAIST AI, jointly advised by Se-Young Yun and Chulhee Yun. I work on interactive machine learning, theoretical aspects of LLMs, learning/optimization theory, and statistical analysis of large networks.

Cumulative Distribution Regret Minimization with Max- Quantile Threshold in Multi-Armed Bandit May 22, 2026 →

No results found

Instance-Optimal Estimation with Multiple LLM Judges on a Budget