Chulhee Yun

Cumulative Distribution Regret Minimization with Max- Quantile Threshold in Multi-Armed Bandit

We study a new risk-averse bandit setting motivated by semiconductor manufacturing, where the quality of a recipe is judged not by its mean performance but by its weakest outcomes. …

jaeyoung-cha

Looking Through the Mirror: Minimax-Optimal Regularized Regrets in Online Learning and Bandits

We revisit regularized regret minimization under full-information and bandit feedback, where a learner optimizes an objective of the form $\langle r, \pi \rangle - \eta^{-1} …

avatar
Junghyun Lee

Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences

We consider the problem of *regularized* best-response max-regret minimization in online RLHF under general preferences and bandit feedback. While various regularizers are utilized …

avatar
Junghyun Lee
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults featured image

Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults

Although gradient descent with Polyak's momentum is widely used in modern machine and deep learning, a concrete understanding of its effects on the training trajectory remains …

prin-phunyaphibarn
Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint featured image

Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

Proposes a framework for performing fair PCA in memory limited, streaming setting. Sample complexity results and empirical discussions show the superiority of our approach compared …

avatar
Junghyun Lee