Cumulative Distribution Regret Minimization with Max- Quantile Threshold in Multi-Armed Bandit

May 22, 2026·
Jaeyoung Cha
Equal contribution
Junghyun Lee
Junghyun Lee
Equal contribution
,
Chulhee Yun
· 0 min read
Abstract
We study a new risk-averse bandit setting motivated by semiconductor manufacturing, where the quality of a recipe is judged not by its mean performance but by its weakest outcomes. We formalize this via cumulative distribution regret with a max-quantile threshold, which measures the cumulative excess defective ratio relative to the arm attaining the best τ-quantile. We develop two UCB-type algorithms, C-UCB and Q-UCB, whose regret bounds depend on distinct problem-dependent gaps arising from CDF and quantile separations.
Type
Publication
Korea Computer Congress
publications
Junghyun Lee
Authors
PhD Candidate in AI
PhD candidate at KAIST AI, jointly advised by Se-Young Yun and Chulhee Yun. I work on interactive machine learning, theoretical aspects of LLMs, learning/optimization theory, and statistical analysis of large networks.