Statistical Problems Related to (LLM) Alignment and Preference Learning
Project #1. Bandit/Statistical Problems related to Reward-based RLHF
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
- Accepted to NeurIPS 2024.
- Accepted to ICML 2024 Workshop on Aligning Reinforcement Learning Experimentalists and Theorists (ARLET) as oral.
- Joint work with Se-Young Yun (KAIST AI) and Kwang-Sung Jun (Univ. of Arizona CS).
Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion
- Accepted to AISTATS 2024.
- Joint work with Se-Young Yun (KAIST AI) and Kwang-Sung Jun (Univ. of Arizona CS).
Project #2. Bandit/Statistical Problems related to General Preference Learning
GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression
- Accepted to ICML 2025 Spotlight.
- Joint work with Kyoungseok Jang (CAU AI), Kwang-Sung Jun (Univ. of Arizona CS), Milan Vojnović (LSE Stat), and Se-Young Yun (KAIST AI).