Cumulative Distribution Regret Minimization with Max- Quantile Threshold in Multi-Armed Bandit
May 22, 2026·
·
0 min read
Jaeyoung Cha
Equal contribution
Junghyun Lee
Equal contribution
,Chulhee Yun
Abstract
We study a new risk-averse bandit setting motivated by semiconductor manufacturing, where the quality of a recipe is judged not by its mean performance but by its weakest outcomes. We formalize this via cumulative distribution regret with a max-quantile threshold, which measures the cumulative excess defective ratio relative to the arm attaining the best τ-quantile. We develop two UCB-type algorithms, C-UCB and Q-UCB, whose regret bounds depend on distinct problem-dependent gaps arising from CDF and quantile separations.
Type
Publication
Korea Computer Congress
