Cumulative Distribution Regret Minimization with Max- Quantile Threshold in Multi-Armed Bandit

May 22, 2026·
Jaeyoung Cha
Equal contribution
Junghyun Lee
Junghyun Lee
Equal contribution
,
Chulhee Yun
· 0 min read
Abstract
We study a new risk-averse bandit setting motivated by semiconductor manufacturing, where the quality of a recipe is judged not by its mean performance but by its weakest outcomes. We formalize this via cumulative distribution regret with a max-quantile threshold, which measures the cumulative excess defective ratio relative to the arm attaining the best τ-quantile. We develop two UCB-type algorithms, C-UCB and Q-UCB, whose regret bounds depend on distinct problem-dependent gaps arising from CDF and quantile separations.
Type
Publication
Korea Computer Congress
publications
Junghyun Lee
Authors
PhD Candidate in Artificial Intelligence
PhD candidate at KAIST AI, jointly advised by Se-Young Yun and Chulhee Yun. I work on interactive machine learning, theoretical aspects of LLMs, learning/optimization theory, and statistical analysis of large networks.