Theoretical Analyses of Reinforcement Learning with Human Feedback (RLHF) and Related Problems
Logistic and Generalized Linear Bandits, Dueling Bandits, etc.
Project #2. A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
- Accepted to NeurIPS 2024.
- Accepted to ICML 2024 Workshop on Aligning Reinforcement Learning Experimentalists and Theorists (ARLET) as oral.
- Joint work with Se-Young Yun (KAIST AI) and Kwang-Sung Jun (Univ. of Arizona CS).
Project #1. Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion
- Accepted to AISTATS 2024.
- Joint work with Se-Young Yun (KAIST AI) and Kwang-Sung Jun (Univ. of Arizona CS).