Se Young Yun

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and …

avatar
Junghyun Lee

Looking Through the Mirror: Minimax-Optimal Regularized Regrets in Online Learning and Bandits

We revisit regularized regret minimization under full-information and bandit feedback, where a learner optimizes an objective of the form $\langle r, \pi \rangle - \eta^{-1} …

avatar
Junghyun Lee
Near-Optimal Clustering in Mixture of Markov Chains featured image

Near-Optimal Clustering in Mixture of Markov Chains

We study the problem of clustering T trajectories of length H, each generated by one of K unknown ergodic Markov chains over a finite state space of size S. The goal is to …

avatar
Junghyun Lee
GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression featured image

GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression

We present GL-LowPopArt, a novel Catoni-style estimator for generalized low-rank trace regression. Building on LowPopArt (Jang et al., 2024), it employs a two-stage approach -- …

avatar
Junghyun Lee

Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences

We consider the problem of *regularized* best-response max-regret minimization in online RLHF under general preferences and bandit feedback. While various regularizers are utilized …

avatar
Junghyun Lee
A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions featured image

A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions

We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes various stochastic contextual bandit settings, including …

sanghwa-kim
Preliminary Empirical Study of Low-Rank, Hierarchical Gaussian Linear Bandits featured image

Preliminary Empirical Study of Low-Rank, Hierarchical Gaussian Linear Bandits

Inspired by recent advances in multi-task bandits, we propose a new problem setting called low-rank, hierarchical Gaussian linear bandits, which combines low-rank structure with …

avatar
Junghyun Lee
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners featured image

AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners

Self-Taught Reasoners (STaR), synonymously known as Rejection sampling Fine-Tuning (RFT), is an integral part of the training pipeline of self-improving reasoning Language Models …

woosung-koh
Probability-Flow ODE in Infinite-Dimensional Function Spaces featured image

Probability-Flow ODE in Infinite-Dimensional Function Spaces

Recent advances in infinite-dimensional diffusion models have demonstrated their effectiveness and scalability in function generation tasks where the underlying structure is …

kunwoo-na
FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL featured image

FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL …

woosung-koh