Cramming Contextual Bandits for On-policy Statistical Evaluation

``Cramming Contextual Bandits for On-policy Statistical Evaluation.''

We introduce the `cram' method as a general statistical framework for evaluating the final learned policy from a multi-armed contextual bandit algorithm, using the dataset generated by the same bandit algorithm. The proposed on-policy evaluation methodology differs from most existing methods that focus on off-policy performance evaluation of contextual bandit algorithms. Cramming utilizes an entire bandit sequence through a single pass of data, leading to both statistically and computationally efficient evaluation. We prove that if a bandit algorithm satisfies a certain stability condition, the resulting crammed evaluation estimator is consistent and asymptotically normal under mild regularity conditions. Furthermore, we show that this stability condition holds for commonly used linear contextual bandit algorithms, including $\epsilon$-greedy, Thompson Sampling, and Upper Confidence Bound algorithms. Using both synthetic and publicly available datasets, we compare the empirical performance of cramming with the state-of-the-art methods. The results demonstrate that the proposed cram method reduces the evaluation standard error by approximately 40\% relative to off-policy evaluation methods while preserving unbiasedness and valid confidence interval coverage.

Imai, Kosuke and Michael Lingzhi Li. (2023). ``Experimental Evaluation of Individualized Treatment Rules.'' Journal of the American Statistical Association, Vol. 118, No. 541, pp. 242-256.

Li, Michael Lingzhi and Kosuke Imai. ``Neyman Meets Causal Machine Learning: Experimental Evaluation of Individualized Treatment Rules.'' Journal of Causal Inference, Forthcoming.

Imai, Kosuke and Michael Lingzhi Li. (2025). ``Statistical Inference for Heterogeneous Treatment Effects Discovered by Generic Machine Learning in Randomized Experiments.'' Journal of Business & Economic Statistics, Vol. 43, No. 1, pp. 256-268.

Li, Michael Lingzhi and Kosuke Imai. ``evalITR: Evaluating Individualized Treatment Rules.'' available through The Comprehensive R Archive Network and GitHub

``Cramming Contextual Bandits for On-policy Statistical Evaluation.''

Abstract

Software and Related Paper