|
|
We introduce the `cram' method as a
general statistical framework for evaluating the final learned
policy from a multi-armed contextual bandit algorithm, using the
dataset generated by the same bandit algorithm. The proposed
on-policy evaluation methodology differs from most existing methods
that focus on off-policy performance evaluation of contextual bandit
algorithms. Cramming utilizes an entire bandit sequence through a
single pass of data, leading to both statistically and
computationally efficient evaluation. We prove that if a bandit
algorithm satisfies a certain stability condition, the resulting
crammed evaluation estimator is consistent and asymptotically normal
under mild regularity conditions. Furthermore, we show that this
stability condition holds for commonly used linear contextual bandit
algorithms, including $\epsilon$-greedy, Thompson Sampling, and
Upper Confidence Bound algorithms. Using both synthetic and publicly
available datasets, we compare the empirical performance of cramming
with the state-of-the-art methods. The results demonstrate that the
proposed cram method reduces the evaluation standard error by
approximately 40\% relative to off-policy evaluation methods while
preserving unbiasedness and valid confidence interval
coverage. |
Imai, Kosuke and Michael Lingzhi
Li. (2023). ``Experimental
Evaluation of Individualized Treatment Rules.''
Journal of the American Statistical Association,
Vol. 118, No. 541, pp. 242-256. |
Li, Michael Lingzhi and Kosuke Imai. ``Neyman Meets Causal Machine Learning:
Experimental Evaluation of Individualized Treatment
Rules.'' Journal of Causal Inference,
Forthcoming. |
Imai, Kosuke and Michael Lingzhi
Li. (2025). ``Statistical Inference
for Heterogeneous Treatment Effects Discovered by Generic Machine
Learning in Randomized Experiments.'' Journal of
Business & Economic Statistics, Vol. 43, No. 1,
pp. 256-268. |
Li, Michael Lingzhi and Kosuke
Imai. ``Statistical Performance
Guarantee for Subgroup Identification with Generic Machine
Learning.'' |
Li, Michael Lingzhi and Kosuke Imai. ``evalITR:
Evaluating Individualized Treatment Rules.'' available
through The Comprehensive R
Archive Network and GitHub |