Empirical Analyses of Corruption in the Clustering of Block MDPs

Jan 19, 2024·

Junghyun Lee

Se-Young Yun

· 0 min read

PDF

Abstract

Reinforcement learning (RL) has been shown to be effective by utilizing the low-dimensional representations of an environment. This approach is formalized by imposing structural assumptions on the Markov Decision Process (MDP). Herein, we focus on the Block MDP (BMDP), where contexts are clustered and where observable transitions are governed by latent state transition. A recent study proved a lower bound for the clustering error of BMDP and proposed a two-step algorithm with a performance guarantee that nearly matches the lower bound. In the current paper, we empirically validate their results of that recent study by implementing and simulating the algorithm in a synthetic, non-regular BMDP environment. In what is perhaps the most surprising finding, random trajectory corruption up to a certain level actually aids clustering performance, which resembles the implicit regularization phenomenon that has been researched in label noise SGD in the deep learning theory community.

Type

Report

Publication

KIISE Transactions on Computing Practices

Last updated on Jan 19, 2024

Authors

Junghyun Lee

PhD Candidate in AI

PhD candidate at KAIST AI, jointly advised by Se-Young Yun and Chulhee Yun. I work on interactive machine learning, theoretical aspects of LLMs, learning/optimization theory, and statistical analysis of large networks.

← Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion Jan 20, 2024

On the Estimation of Linear Softmax Parametrized Probability Distributions Dec 20, 2023 →

No results found

Empirical Analyses of Corruption in the Clustering of Block MDPs