|
|
The increasing availability of
individual-level data has led to numerous applications of
individualized (or personalized) treatment rules (ITRs). Policy makers
often wish to empirically evaluate ITRs and compare their relative
performance before implementing them in a target population. We
propose a new evaluation metric, the population average prescriptive
effect (PAPE). The PAPE compares the performance of ITR with that of
non-individualized treatment rule, which randomly treats the same
proportion of units. Averaging the PAPE over a range of budget
constraints yields our second evaluation metric, the area under the
prescriptive effect curve (AUPEC). The AUPEC represents an overall
performance measure for evaluation, like the area under the receiver
and operating characteristic curve (AUROC) does for classification,
and is a generalization of the QINI coefficient used in uplift
modeling. We use Neyman’s repeated sampling framework to estimate the
PAPE and AUPEC and derive their exact finite-sample variances based on
random sampling of units and random assignment of treatment.We extend
our methodology to a common setting, in which the same experimental
data are used to both estimate and evaluate ITRs. In this case, our
variance calculation incorporates the additional uncertainty due to
random splits of data used for cross-validation. The proposed
evaluation metrics can be estimated without requiring modeling
assumptions, asymptotic approximation, or resampling methods. As a
result, it is applicable to any ITR including those based on complex
machine learning algorithms. The open-source
software package is available for implementing the
proposed methodology. |
Li, Michael Lingzhi and Kosuke Imai. ``evalITR:
Evaluating Individualized Treatment Rules.'' available
through The Comprehensive R
Archive Network and GitHub |
Li, Michael Lingzhi and Kosuke
Imai. (2024). ``Neyman Meets Causal
Machine Learning: Experimental Evaluation of Individualized
Treatment Rules.'' Journal of Causal
Inference, Vol 12, No. 1, pp. 1-20. Special Issue on Neyman
(1923) and its influences on causal inference. |
Imai, Kosuke and Michael Lingzhi Li. (2025). ``Statistical Inference for Heterogeneous
Treatment Effects Discovered by Generic Machine Learning in
Randomized Experiments.'' Journal of Business &
Economic Statistics, Vol. 43, No. 1,
pp. 256-268. |
Li, Michael Lingzhi and Kosuke
Imai. ``Statistical Performance
Guarantee for Subgroup Identification with Generic Machine
Learning.'' |
Jia, Zeyang, Kosuke Imai, and Michael
Lingzhi Li. ``Cramming Contextual
Bandits for On-policy Statistical
Evaluation.'' |