Ranking Theory

Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion

Event Weekly OptiML Lab Group Meeting Short summary In this seminar, I will talk about my own paper “Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion” (Lee et al.

Jun 18, 2024

Introduction to Reinforcement Learning with Human Feedback (RLHF): A Theoretically Biased Overview

Event Weekly OptiML Lab Group Meeting Short summary In this talk, I will first (somewhat rigorously) introduce the framework of reinforcement learning with human feedback (RLHF). Then I will go over three recent breakthroughs in the analysis and improvement of RLHF.

Nov 30, 2023

Introduction to Reinforcement Learning with Human Feedback (RLHF): A Theoretically Biased Overview

Event Weekly OSI Lab Seminar Short summary In this talk, I will first (somewhat rigorously) introduce the framework of reinforcement learning with human feedback (RLHF). Then I will go over three recent breakthroughs in the analysis and improvement of RLHF.

Nov 30, 2023