LLMs

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and …

Junghyun Lee

• May 25, 2026 • 1 min read

Learning to Reason in LLMs by Expectation Maximization

Large language models (LLMs) solve reasoning problems by first generating a rationale and then answering. We formalize reasoning as a latent variable model and derive a …

Junghyun Lee

• Dec 23, 2025 • 1 min read

LLMs

AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners

Self-Taught Reasoners (STaR), synonymously known as Rejection sampling Fine-Tuning (RFT), is an integral part of the training pipeline of self-improving reasoning Language Models …

woosung-koh

• May 22, 2025 • 1 min read

No results found

LLMs

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

Learning to Reason in LLMs by Expectation Maximization

AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners