Cumulative Distribution Regret Minimization with Max- Quantile Threshold in Multi-Armed Bandit

May 22, 2026·

Jaeyoung Cha

Equal contribution

Junghyun Lee

Equal contribution

Chulhee Yun

· 0 min read

Abstract

We study a new risk-averse bandit setting motivated by semiconductor manufacturing, where the quality of a recipe is judged not by its mean performance but by its weakest outcomes. We formalize this via cumulative distribution regret with a max-quantile threshold, which measures the cumulative excess defective ratio relative to the arm attaining the best τ-quantile. We develop two UCB-type algorithms, C-UCB and Q-UCB, whose regret bounds depend on distinct problem-dependent gaps arising from CDF and quantile separations.

Type

Report

Publication

Korea Computer Congress

Last updated on May 22, 2026

Bandits Statistics

Authors

Junghyun Lee

PhD Candidate in AI

PhD candidate at KAIST AI, jointly advised by Se-Young Yun and Chulhee Yun. I work on interactive machine learning, theoretical aspects of LLMs, learning/optimization theory, and statistical analysis of large networks.

← Instance-Optimal Estimation with Multiple LLM Judges on a Budget May 25, 2026

Looking Through the Mirror: Minimax-Optimal Regularized Regrets in Online Learning and Bandits May 21, 2026 →

No results found

Cumulative Distribution Regret Minimization with Max- Quantile Threshold in Multi-Armed Bandit