1st Korean AI Theory Community Workshop: Bandits

Introduction

This (student-led) workshop is motivated by the desire to build a solid, theory community amongst Korean researchers, either located in Korea or abroad, working in mathematical and theoretical AI. As the first edition, this workshop focuses on bandits. The workshop will feature six talks by leading emerging Korean researchers in the field, along with lunch, coffees and networking opportunities. The workshop is open to all researchers and students with an interest in bandits and related topics.

Logistics

The workshop will be held entirely in Korean. The location is KAIST Seoul Campus, Room 1114 (Kim Dong-Myoung Lecture Room). You are free to attend the workshop, but you will have to take care of lunch on your own if you haven't registered. ~~If you are interested in attending, please fill this Google Form out ASAP!~~
For speakers:

Each talk should last 40min in total (free to use however you want, ending early is also an option)
Preferably, 30min presentation + 10min QnA/discussions

For chairs:

Introduce the speaker & facilitate QnA + discussions for each talk
Each session should last 1h 30min in total

Workshop Schedule (2024.06.20)

Timeslot	Speaker	Schedule/Topic of the Talk
09:30 - 10:20	Free Networking, Coffee Break
10:20 - 10:30	Junghyun Lee PhD Student KAIST AI	Opening Remarks
Session #1. (Chair: Se-Young Yun)
10:30 - 11:10	Dabeen Lee Assistant Professor KAIST ISysE	Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation Slides
11:10 - 11:20	Short Break
11:20 - 12:00	Yeoneung Kim Assistant Professor SeoulTech AAI	Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret Slides
12:00 - 14:00	Lunchtime + Coffee + Free Discussions (Lunch: 교직원식당 @KAIST Seoul Campus — 예약확정)
Session #2. (Chair: Min-hwan Oh)
14:00 - 14:40	Kwang-Sung Jun Assistant Professor Univ. of Arizona CS	Noise-Adaptive Confidence Sets for Linear Bandits Slides
14:40 - 14:50	Short Break
14:50 - 15:30	Se-Young Yun Associate Professor KAIST AI	Confidence Set Analysis in Preference Feedback Slides
15:30 - 16:00	Long (Coffee) break
Session #3. (Chair: Dabeen Lee)
16:00 - 16:40	Min-hwan Oh Assistant Professor SNU DS	Lasso Bandit with Compatibility Condition on Optimal Arm Slides
16:40 - 16:50	Short Break
16:50 - 17:30	Kyungjae Lee Assistant Professor CAU AI	Optimal Algorithms for Multi-Armed Bandits with Heavy-Tailed Rewards Slides
17:30 - 17:45	Junghyun Lee PhD Student KAIST AI	Closing Remarks

Talk Abstracts & Speaker Bios

Talk #1 (Dabeen Lee): Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

Abstract:

We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model, and we consider the infinite-horizon average reward setting. Our first algorithm $UCRL2-MNL$ applies to the class of communicating MDPs and achieves an $\tilde{O}(dD\sqrt{T})$ regret, where $d$ is the dimension of feature mapping, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. The second algorithm $OVIFH-MNL$ is computationally more efficient and applies to the more general class of weakly communicating MDPs, for which we show a regret guarantee of $\tilde{O}(d^{2/5} \mathrm{sp}(v^*)T^{4/5})$ where $\mathrm{sp}(v^*)$ is the span of the associated optimal bias function. We also prove a lower bound of $\Omega(d\sqrt{DT})$ for learning communicating MDPs with diameter at most $D$. Furthermore, we show a regret lower bound of $\Omega(dH^{3/2}\sqrt{K})$ for learning $H$-horizon episodic MDPs with MNL function approximation where $K$ is the number of episodes, which improves upon the best-known lower bound for the finite-horizon setting. This is based on joint work with Jaehyun Park (KAIST ISysE).

Speaker Bio:

Dabeen Lee is an Assistant Professor in the Department of Industrial and Systems Engineering (ISE) at KAIST. He obtained his Ph.D. in the Algorithms, Combinatorics, and Optimization (ACO) program at the Tepper School of Business at Carnegie Mellon University. He received the IBS Young Scientist Fellowship (YSF) and won Second Place in the INFORMS Optimization Society Best Student Paper Prize Competition in 2019. His research interests are in designing algorithms and mathematical programming frameworks for broad areas of optimization and machine learning.

Talk #2 (Yeoneung Kim): Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

Abstract:

We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Moreover, we identify nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without the unrealistic restrictive assumptions on parameter sets that are often used in the literature.
This is joint work with Gihun Kim (SNU ECE) and Insoon Yang (SNU ECE), and is based on the recent preprint of the same title.

Speaker Bio:

Yeoneung Kim is an Assistant Professor in the Department of Applied Artificial Intelligence at SeoulTech. He received a B.S. in Mathematics from POSTECH and a Ph.D. in Mathematics from the University of Wisconsin-Madison in 2011 and 2019. Before joining SeoulTech, he was a Researcher at the National Institute for Mathematical Sciences (2013 - 2016), a Staff Engineer at Samsung Electronics (2019-2021), a BK21 Postdoctoral Researcher at SNU (2021 - 2022), and an Assistant Professor in the Department of Financial Mathematics at Gachon University (2022-2023). His research focuses on optimal control and reinforcement learning.

Talk #3 (Kwang-Sung Jun): Noise-Adaptive Confidence Sets for Linear Bandits

Abstract:

Adapting to a priori unknown noise level is a very important but challenging problem in sequential decision-making as efficient exploration typically requires knowledge of the noise level, which is often loosely specified. We report significant progress in addressing this issue in linear bandits in two respects. First, we propose a novel confidence set that is `semi-adaptive' to the unknown sub-Gaussian parameter $\sigma^2_*$ where the confidence width w.r.t. the dimension d scales with $\sigma^2_*$ rather than the specified sub-Gaussian parameter $\sigma^2_0$ that can be much larger than $\sigma^2_*$. In particular, when the true noise level is zero, this leads to removing a factor of $\sqrt{d}$ from the confidence width and the regret bound when used for linear bandits. Second, for bounded rewards, we propose a novel variance-adaptive confidence set that has a much improved numerical performance upon prior art while removing unrealistic assumptions. We then apply this confidence set to develop, as we claim, the first practical variance-adaptive linear bandit algorithm via an optimistic approach, which is enabled by our novel regret analysis technique. Both of our confidence sets rely critically on `regret equality' from online learning. Our empirical evaluation in Bayesian optimization tasks shows that our algorithms demonstrate better or comparable performance compared to existing methods.
This is joint work with Jungtaek Kim (Univ. of Pittsburg) and is based on the recently accepted ICML 2024 paper of the same title.

Speaker Bio:

Kwang-Sung Jun is an Assistant Professor in the Department of Computer Science at the University of Arizona. His research interest is interactive machine learning, especially theoretical aspects of bandit problems, online learning, and confidence bounds.

Talk #4 (Se-Young Yun): Confidence Set Analysis in Preference Feedback

Abstract:

Learning from Human Feedback has recently garnered considerable interest, especially with the advent of ChatGPT, which aligns with human preferences through Reinforcement Learning from Human Feedback (RLHF). In the context of the Bradley-Terry model, feedback is modeled as a logistic bandit problem, a common approach for capturing user preferences, such as click versus no-click in advertisement recommendation systems. Previous research often neglects dependencies in $S$, the norm of the unknown parameter vector, which becomes a significant issue when $S$ is large. We first introduce Regret-to-Confidence Set Conversion (R2CS), which constructs a convex, Likelihood Ratio-based Confidence Set solely based on the existence of an online learning algorithm with a regret guarantee. The novel confidence sets result in the optimal regret for the logistic bandits.
This is joint work with Junghyun Lee (KAIST AI) and Kwang-Sung Jun (Univ. of Arizona CS), and is based on the following two papers:

Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion (AISTATS 2024)
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits (forthcoming, oral at ICML 2024 ARLET Workshop)

Speaker Bio:

Se-Young Yun is an Associate Professor at the Graduate School of AI, KAIST, in Seoul, South Korea. He holds a Ph.D. in Electrical Engineering from KAIST in 2012, where he also completed his B.S in 2006 summa cum laude. Before joining KAIST, he was a researcher at KTH, Inria, Microsoft Research Cambridge, and Los Alamos National Lab. His research focuses on machine learning, particularly on the dynamics of evolving knowledge, enhancing reasoning in small models, and improving instruction robustness. He has been recognized with multiple awards, including Outstanding Reviewer Award at NIPS 2016 and Best Paper Award at ACM MobiHoc 2013.

Talk #5 (Min-hwan Oh): Lasso Bandit with Compatibility Condition on Optimal Arm

Abstract:

We study a stochastic sparse linear bandit problem where only a sparse subset of context features affects the expected reward function, i.e., the unknown reward parameter has sparse structure. In the existing Lasso bandit literature, the compatibility conditions together with additional diversity conditions on the context features are imposed to achieve regret bounds that only depend logarithmically on the ambient dimension $d$. In this paper, we demonstrate that even without the additional diversity assumptions, the compatibility condition only on the optimal arm is sufficient to derive a regret bound that depends logarithmically on $d$, and our assumption is strictly weaker than those used in the lasso bandit literature under the single parameter setting. To our knowledge, the proposed algorithm requires the weakest assumptions among Lasso bandit algorithms under a single parameter setting that achieve $O(\mathrm{poly} \log dT)$ regret. This is a joint work with Harin Lee and Taehyun Hwang from SNU and is based on the recent preprint of the same title.

Speaker Bio:

Min-hwan Oh is an Assistant Professor in the Graduate School of Data Science at Seoul National University. His research interests are in sequential decision making under uncertainty including bandit algorithms and reinforcement learning, statistical machine learning, and optimization.

Talk #6 (Kyungjae Lee): Optimal Algorithms for Multi-Armed Bandits with Heavy-Tailed Rewards

Abstract:

In this presentation, we will explore significant advancements in multi-armed bandits with heavy-tailed rewards. The first paper, "Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy-Tailed Rewards (NeurIPS 2020)," proposes an adaptive perturbation method combined with a p-robust estimator, offering both theoretical and empirical improvements over existing methods. The second paper, "Minimax Optimal Bandits for Heavy-Tail Rewards (TNNLS 2024)," introduces a robust estimator and a perturbation-based exploration strategy that achieve minimax optimal regret bounds without prior knowledge of reward distribution bounds. Recently, I extended these approaches to distributional settings, further enhancing their applicability and performance.

Speaker Bio:

Kyungjae Lee is currently an Assistant Professor in the Department of Artificial Intelligence at Chung-Ang University. His current research interests include multi-armed bandit, Bayesian optimization, reinforcement learning, and its application.

Organizer.

Junghyun Lee [jh_lee00@kaist.ac.kr]

I am a PhD student at OSI Lab and OptiML Lab at KAIST AI with broad interests in mathematical and theoretical AI. Some of my recent interests include obtaining tight statistical guarantees for interactive machine learning (RL, bandits), exploring fundamental concepts in statistics and probability theory, algorithmic fairness, deep learning theory from both optimization and statistical perspectives, and essentially any machine learning challenges necessitating mathematical analysis.
If you are interested in talking about anything (research, workshop, etc), feel free to contact me! Also, I'm planning to make this into a series of workshops! If you are interested, please let me know!

1st Korean AI Theory Community Workshop: Bandits

Introduction

Logistics

Workshop Schedule (2024.06.20)

Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

Noise-Adaptive Confidence Sets for Linear Bandits

Confidence Set Analysis in Preference Feedback

Lasso Bandit with Compatibility Condition on Optimal Arm

Optimal Algorithms for Multi-Armed Bandits with Heavy-Tailed Rewards

Talk Abstracts & Speaker Bios

Talk #1 (Dabeen Lee): Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

Abstract:

Speaker Bio:

Talk #2 (Yeoneung Kim): Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

Abstract:

Speaker Bio:

Talk #3 (Kwang-Sung Jun): Noise-Adaptive Confidence Sets for Linear Bandits

Abstract:

Speaker Bio:

Talk #4 (Se-Young Yun): Confidence Set Analysis in Preference Feedback

Abstract:

Speaker Bio:

Talk #5 (Min-hwan Oh): Lasso Bandit with Compatibility Condition on Optimal Arm

Abstract:

Speaker Bio:

Talk #6 (Kyungjae Lee): Optimal Algorithms for Multi-Armed Bandits with Heavy-Tailed Rewards

Abstract:

Speaker Bio:

For Visitors

Workshop Poster