1st Korean AI Theory Community Workshop: Bandits

Introduction

This (student-led) workshop is motivated by the desire to build a solid, theory community amongst Korean researchers, either located in Korea or abroad, working in mathematical and theoretical AI. As the first edition, this workshop focuses on bandits. The workshop will feature six talks by leading emerging Korean researchers in the field, along with lunch, coffees and networking opportunities. The workshop is open to all researchers and students with an interest in bandits and related topics.

Logistics

The workshop will be held entirely in Korean. The location is KAIST Seoul Campus, Room 1114 (Kim Dong-Myoung Lecture Room). You are free to attend the workshop, but you will have to take care of lunch on your own if you haven't registered. If you are interested in attending, please fill this Google Form out ASAP!
For speakers: For chairs:

Workshop Schedule (2024.06.20)

Timeslot Speaker Schedule/Topic of the Talk
09:30 - 10:20 Free Networking, Coffee Break
10:20 - 10:30 Junghyun Lee
PhD Student
KAIST AI
Opening Remarks
Session #1. (Chair: Se-Young Yun)
10:30 - 11:10 Dabeen Lee
Assistant Professor
KAIST ISysE

Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

Slides
11:10 - 11:20 Short Break
11:20 - 12:00 Yeoneung Kim
Assistant Professor
SeoulTech AAI

Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

Slides
12:00 - 14:00 Lunchtime + Coffee + Free Discussions
(Lunch: 교직원식당 @KAIST Seoul Campus — 예약확정)
Session #2. (Chair: Min-hwan Oh)
14:00 - 14:40 Kwang-Sung Jun
Assistant Professor
Univ. of Arizona CS

Noise-Adaptive Confidence Sets for Linear Bandits

Slides
14:40 - 14:50 Short Break
14:50 - 15:30 Se-Young Yun
Associate Professor
KAIST AI

Confidence Set Analysis in Preference Feedback

Slides
15:30 - 16:00 Long (Coffee) break
Session #3. (Chair: Dabeen Lee)
16:00 - 16:40 Min-hwan Oh
Assistant Professor
SNU DS

Lasso Bandit with Compatibility Condition on Optimal Arm

Slides
16:40 - 16:50 Short Break
16:50 - 17:30 Kyungjae Lee
Assistant Professor
CAU AI

Optimal Algorithms for Multi-Armed Bandits with Heavy-Tailed Rewards

Slides
17:30 - 17:45 Junghyun Lee
PhD Student
KAIST AI
Closing Remarks

Talk Abstracts & Speaker Bios

Talk #1 (Dabeen Lee): Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

Abstract:

We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model, and we consider the infinite-horizon average reward setting. Our first algorithm $UCRL2-MNL$ applies to the class of communicating MDPs and achieves an $\tilde{O}(dD\sqrt{T})$ regret, where $d$ is the dimension of feature mapping, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. The second algorithm $OVIFH-MNL$ is computationally more efficient and applies to the more general class of weakly communicating MDPs, for which we show a regret guarantee of $\tilde{O}(d^{2/5} \mathrm{sp}(v^*)T^{4/5})$ where $\mathrm{sp}(v^*)$ is the span of the associated optimal bias function. We also prove a lower bound of $\Omega(d\sqrt{DT})$ for learning communicating MDPs with diameter at most $D$. Furthermore, we show a regret lower bound of $\Omega(dH^{3/2}\sqrt{K})$ for learning $H$-horizon episodic MDPs with MNL function approximation where $K$ is the number of episodes, which improves upon the best-known lower bound for the finite-horizon setting. This is based on joint work with Jaehyun Park (KAIST ISysE).

Speaker Bio:

Dabeen Lee is an Assistant Professor in the Department of Industrial and Systems Engineering (ISE) at KAIST. He obtained his Ph.D. in the Algorithms, Combinatorics, and Optimization (ACO) program at the Tepper School of Business at Carnegie Mellon University. He received the IBS Young Scientist Fellowship (YSF) and won Second Place in the INFORMS Optimization Society Best Student Paper Prize Competition in 2019. His research interests are in designing algorithms and mathematical programming frameworks for broad areas of optimization and machine learning.

Talk #2 (Yeoneung Kim): Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

Abstract:

We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Moreover, we identify nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without the unrealistic restrictive assumptions on parameter sets that are often used in the literature.
This is joint work with Gihun Kim (SNU ECE) and Insoon Yang (SNU ECE), and is based on the recent preprint of the same title.

Speaker Bio:

Yeoneung Kim is an Assistant Professor in the Department of Applied Artificial Intelligence at SeoulTech. He received a B.S. in Mathematics from POSTECH and a Ph.D. in Mathematics from the University of Wisconsin-Madison in 2011 and 2019. Before joining SeoulTech, he was a Researcher at the National Institute for Mathematical Sciences (2013 - 2016), a Staff Engineer at Samsung Electronics (2019-2021), a BK21 Postdoctoral Researcher at SNU (2021 - 2022), and an Assistant Professor in the Department of Financial Mathematics at Gachon University (2022-2023). His research focuses on optimal control and reinforcement learning.

Talk #3 (Kwang-Sung Jun): Noise-Adaptive Confidence Sets for Linear Bandits

Abstract:

Adapting to a priori unknown noise level is a very important but challenging problem in sequential decision-making as efficient exploration typically requires knowledge of the noise level, which is often loosely specified. We report significant progress in addressing this issue in linear bandits in two respects. First, we propose a novel confidence set that is `semi-adaptive' to the unknown sub-Gaussian parameter $\sigma^2_*$ where the confidence width w.r.t. the dimension d scales with $\sigma^2_*$ rather than the specified sub-Gaussian parameter $\sigma^2_0$ that can be much larger than $\sigma^2_*$. In particular, when the true noise level is zero, this leads to removing a factor of $\sqrt{d}$ from the confidence width and the regret bound when used for linear bandits. Second, for bounded rewards, we propose a novel variance-adaptive confidence set that has a much improved numerical performance upon prior art while removing unrealistic assumptions. We then apply this confidence set to develop, as we claim, the first practical variance-adaptive linear bandit algorithm via an optimistic approach, which is enabled by our novel regret analysis technique. Both of our confidence sets rely critically on `regret equality' from online learning. Our empirical evaluation in Bayesian optimization tasks shows that our algorithms demonstrate better or comparable performance compared to existing methods.
This is joint work with Jungtaek Kim (Univ. of Pittsburg) and is based on the recently accepted ICML 2024 paper of the same title.

Speaker Bio:

Kwang-Sung Jun is an Assistant Professor in the Department of Computer Science at the University of Arizona. His research interest is interactive machine learning, especially theoretical aspects of bandit problems, online learning, and confidence bounds.

Talk #4 (Se-Young Yun): Confidence Set Analysis in Preference Feedback

Abstract:

Learning from Human Feedback has recently garnered considerable interest, especially with the advent of ChatGPT, which aligns with human preferences through Reinforcement Learning from Human Feedback (RLHF). In the context of the Bradley-Terry model, feedback is modeled as a logistic bandit problem, a common approach for capturing user preferences, such as click versus no-click in advertisement recommendation systems. Previous research often neglects dependencies in $S$, the norm of the unknown parameter vector, which becomes a significant issue when $S$ is large. We first introduce Regret-to-Confidence Set Conversion (R2CS), which constructs a convex, Likelihood Ratio-based Confidence Set solely based on the existence of an online learning algorithm with a regret guarantee. The novel confidence sets result in the optimal regret for the logistic bandits.
This is joint work with Junghyun Lee (KAIST AI) and Kwang-Sung Jun (Univ. of Arizona CS), and is based on the following two papers:

Speaker Bio:

Se-Young Yun is an Associate Professor at the Graduate School of AI, KAIST, in Seoul, South Korea. He holds a Ph.D. in Electrical Engineering from KAIST in 2012, where he also completed his B.S in 2006 summa cum laude. Before joining KAIST, he was a researcher at KTH, Inria, Microsoft Research Cambridge, and Los Alamos National Lab. His research focuses on machine learning, particularly on the dynamics of evolving knowledge, enhancing reasoning in small models, and improving instruction robustness. He has been recognized with multiple awards, including Outstanding Reviewer Award at NIPS 2016 and Best Paper Award at ACM MobiHoc 2013.

Talk #5 (Min-hwan Oh): Lasso Bandit with Compatibility Condition on Optimal Arm

Abstract:

We study a stochastic sparse linear bandit problem where only a sparse subset of context features affects the expected reward function, i.e., the unknown reward parameter has sparse structure. In the existing Lasso bandit literature, the compatibility conditions together with additional diversity conditions on the context features are imposed to achieve regret bounds that only depend logarithmically on the ambient dimension $d$. In this paper, we demonstrate that even without the additional diversity assumptions, the compatibility condition only on the optimal arm is sufficient to derive a regret bound that depends logarithmically on $d$, and our assumption is strictly weaker than those used in the lasso bandit literature under the single parameter setting. To our knowledge, the proposed algorithm requires the weakest assumptions among Lasso bandit algorithms under a single parameter setting that achieve $O(\mathrm{poly} \log dT)$ regret. This is a joint work with Harin Lee and Taehyun Hwang from SNU and is based on the recent preprint of the same title.

Speaker Bio:

Min-hwan Oh is an Assistant Professor in the Graduate School of Data Science at Seoul National University. His research interests are in sequential decision making under uncertainty including bandit algorithms and reinforcement learning, statistical machine learning, and optimization.

Talk #6 (Kyungjae Lee): Optimal Algorithms for Multi-Armed Bandits with Heavy-Tailed Rewards

Abstract:

In this presentation, we will explore significant advancements in multi-armed bandits with heavy-tailed rewards. The first paper, "Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy-Tailed Rewards (NeurIPS 2020)," proposes an adaptive perturbation method combined with a p-robust estimator, offering both theoretical and empirical improvements over existing methods. The second paper, "Minimax Optimal Bandits for Heavy-Tail Rewards (TNNLS 2024)," introduces a robust estimator and a perturbation-based exploration strategy that achieve minimax optimal regret bounds without prior knowledge of reward distribution bounds. Recently, I extended these approaches to distributional settings, further enhancing their applicability and performance.

Speaker Bio:

Kyungjae Lee is currently an Assistant Professor in the Department of Artificial Intelligence at Chung-Ang University. His current research interests include multi-armed bandit, Bayesian optimization, reinforcement learning, and its application.

Organizer.

Junghyun Lee [jh_lee00@kaist.ac.kr]

I am a PhD student at OSI Lab and OptiML Lab at KAIST AI with broad interests in mathematical and theoretical AI. Some of my recent interests include obtaining tight statistical guarantees for interactive machine learning (RL, bandits), exploring fundamental concepts in statistics and probability theory, algorithmic fairness, deep learning theory from both optimization and statistical perspectives, and essentially any machine learning challenges necessitating mathematical analysis.
If you are interested in talking about anything (research, workshop, etc), feel free to contact me! Also, I'm planning to make this into a series of workshops! If you are interested, please let me know!


For Visitors

서울특별시 동대문구 회기로 85 (한국과학기술원(KAIST) 홍릉캠퍼스) 1호관 1114호

주차는 무료입니다.


Workshop Poster