RLHF

Regularized Online RLHF with Generalized Bilinear Preferences

We consider the problem of contextual online RLHF with general preferences, where the goal is to identify the Nash Equilibrium. We …

Junghyun Lee, Minju Hong, Kwang-Sung Jun, Chulhee Yun, Se-Young Yun