RL

Learning to Reason in LLMs by Expectation Maximization

Large language models (LLMs) solve reasoning problems by first generating a rationale and then answering. We formalize reasoning as a …

Junghyun Lee, Branislav Kveton, Anup Rao, Subhojyoti Mukherjee, Ryan A. Rossi, Sunav Choudhary, Alexa Siu

Nearly Optimal Latent State Decoding in Block MDPs

First theoretical analysis of model estimation and reward-free RL of block MDP, without resorting to function approximation frameworks. Lower bounds and algorithms with near-optimal upper bound are provided.

Yassir Jedra, Junghyun Lee, Alexandre Proutière, Se-Young Yun

Nearly Optimal Latent State Decoding in Block MDPs