On the Estimation of Linear Softmax Parametrized Markov Chains
In reinforcement learning and deep learning, softmax parameterization is commonly used to represent discrete probability distributions.In this work, we study three possible softmax …
In reinforcement learning and deep learning, softmax parameterization is commonly used to represent discrete probability distributions.In this work, we study three possible softmax …
We present a unified likelihood ratio-based confidence sequence (CS) for any (self-concordant) generalized linear model (GLM) that is guaranteed to be convex and numerically tight. …
Logistic bandit is a ubiquitous framework of modeling users' choices, e.g., click vs. no click for advertisement recommender system. We observe that the prior works overlook or …
We show that a simple trick of randomly corrupting the trajectories in Block MDPs allow for us to use the the clustering algorithm proposed of Jedra et al. (2023) for general …
Linear softmax parametrization (LSP) of a discrete probability distribution is ubiquitous in many areas, such as deep learning, RL, NLP, and social choice models. Instead of trying …
Proposes a framework for performing fair PCA in memory limited, streaming setting. Sample complexity results and empirical discussions show the superiority of our approach compared …
A novel problem setting where heterogeneous multi-agent bandits collaborate over a network to minimize their group regret. To deal with the high communication complexity of the …
First theoretical analysis of model estimation and reward-free RL of block MDP, without resorting to function approximation frameworks. Lower bounds and algorithms with …
We empirically validate the clustering algorithm proposed in (Jedra et al., 2022).
Inspired from (Wang et al., ICLR'22), we provide a preliminary statistical analysis of stochastic gradient noises (SGNs) of GIN and GCN in Cora node classification task.