Near-Optimal Clustering in Mixture of Markov Chains
We study the problem of clustering T trajectories of length H, each generated by one of K unknown ergodic Markov chains over a finite state space of size S. The goal is to …
We study the problem of clustering T trajectories of length H, each generated by one of K unknown ergodic Markov chains over a finite state space of size S. The goal is to …
We present GL-LowPopArt, a novel Catoni-style estimator for generalized low-rank trace regression. Building on LowPopArt (Jang et al., 2024), it employs a two-stage approach -- …
Active learning improves training efficiency by selectively querying the most informative samples for labeling. While it naturally fits classification tasks–where informative …
Self-Taught Reasoners (STaR), synonymously known as Rejection sampling Fine-Tuning (RFT), is an integral part of the training pipeline of self-improving reasoning Language Models …
Recent advances in infinite-dimensional diffusion models have demonstrated their effectiveness and scalability in function generation tasks where the underlying structure is …
Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL …
We present a unified likelihood ratio-based confidence sequence (CS) for any (self-concordant) generalized linear model (GLM) that is guaranteed to be convex and numerically tight. …
Although gradient descent with Polyak's momentum is widely used in modern machine and deep learning, a concrete understanding of its effects on the training trajectory remains …
Proposes a new active learning approach by proposing a new uncertainty measure called the least disagree metric, as well as its efficient estimator, which is proven to be …
Logistic bandit is a ubiquitous framework of modeling users' choices, e.g., click vs. no click for advertisement recommender system. We observe that the prior works overlook or …