Looking Through the Mirror: Minimax-Optimal Regularized Regrets in Online Learning and Bandits

Junghyun Lee, Yujun Kim, Chulhee Yun, Se-Young Yun

May, 2026

Abstract

We revisit regularized regret minimization under full-information and bandit feedback, where a learner optimizes an objective of the form $\langle r, \pi \rangle - \eta^{-1} \psi(\pi)$ for adversarially chosen rewards $r$, policies $\pi \in \Delta_K$, and a known strongly-convex regularizer $\psi(\cdot)$. We establish the minimax regularized regret for a broad class of regularizers, specifically those that are relatively strongly-convex and smooth with exponent $p \geq 1$, Legendre, and coordinate-wise separable. For large $\eta$, the problem approaches unregularized online learning, recovering the standard $\Theta(\sqrt{T})$ scaling. For small $\eta$, the strong convexity of the regularizer dominates, and we identify the regimes that attain $\Theta(\eta \log T)$ minimax regret under both feedback models. Notably, while prior logarithmic regret lower bounds are largely restricted to specific curved loss functions (e.g., squared loss), our lower bounds hold for any such regularizer against purely linear adversarial rewards. A key technical contribution is our proof strategy, which leverages a Bregman geometric perspective of regularized regret over the quotient space $\mathbb{R}^K / \mathrm{span}({\mathbf{1}})$ to resolve the translation invariance of the inverse mirror map. By combining this dual geometric formulation with the van Trees inequality, we reduce the adversarial online learning to Bayesian estimation, offering a unified framework for establishing logarithmic regret lower bounds.

Type

Preprint

Publication

Forthcoming

Looking Through the Mirror: Minimax-Optimal Regularized Regrets in Online Learning and Bandits

Abstract

Junghyun Lee

PhD Student