RL

Learning to Reason in LLMs by Expectation Maximization

Large language models (LLMs) solve reasoning problems by first generating a rationale and then answering. We formalize reasoning as a latent variable model and derive a …

Junghyun Lee

• Dec 23, 2025 • 1 min read

Nearly Optimal Latent State Decoding in Block MDPs

First theoretical analysis of model estimation and reward-free RL of block MDP, without resorting to function approximation frameworks. Lower bounds and algorithms with …

yassir-jedra

• Apr 27, 2023 • 1 min read

No results found

RL

Learning to Reason in LLMs by Expectation Maximization

Nearly Optimal Latent State Decoding in Block MDPs