McCartan, Cory, Robin Fisher, Jacob Goldin, Daniel E. Ho, and Kosuke Imai. ``Estimating Racial Disparities When Race is Not Observed.'' Journal of the American Statistical Association, Forthcoming.

 

  Abstract

The estimation of racial disparities in various fields is often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination. As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census data to predict race. Unfortunately, the residuals of BISG are often correlated with the outcomes of interest, generally attenuating estimates of racial disparities. To correct this bias, we propose an alternative identification strategy under the assumption that surname is conditionally independent of the outcome given (unobserved) race, residence location, and other observed characteristics. We introduce a new class of models, Bayesian Instrumental Regression for Disparity Estimation (BIRDiE), that take BISG probabilities as inputs and produce racial disparity estimates by using surnames as an instrumental variable for race. Our estimation method is scalable, making it possible to analyze large-scale administrative data. We also show how to address potential violations of the key identification assumptions. A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration. Finally, we apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S. Internal Revenue Service. Open-source software is available which implements the proposed methodology.

  Related Papers

Imai, Kosuke and Kabir Khanna. (2016). ``Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Record.'' Political Analysis, Vol. 24, No. 2 (Spring), pp. 263-272.
Imai, Kosuke, Santiago Olivella, and Evan T. Rosenman. (2022). ``Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements.'' Science Advances, Vol. 8, No. 49, pp. 1-10.
Rosenman, Evan, Santiago Olivella, and Kosuke Imai. ``Race and ethnicity data for first, middle, and last names.'' Scientific Data, Vol. 10, No. 299, pp. 1-11.

  Software and related papers

McCartan, Cory. ``BIRDiE: Estimating disparities when race is not observed.'' available through GitHub.
Khanna, Kabir, Brandon Bertelsen, Santiago Olivella, Evan Rosenman, and Kosuke Imai. ``wru: Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation.'' available through The Comprehensive R Archive Network and GitHub
Imai, Kosuke and Kabir Khanna. (2016). ``Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Record.'' Political Analysis, Vol. 24, No. 2 (Spring), pp. 263-272.
Imai, Kosuke, Evan T.R. Rosenman, and Santiago Olivella. (2022). ``Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements.'' Science Advances, Vol. 8, No. 49, pp. 1-10.
Rosenman, Evan T.R., Santiago Olivella, and Kosuke Imai. (2023). ``Race and ethnicity data for first, middle, and last names.'' Scientific Data, Vol. 10, No. 299, pp. 1-11.

© Kosuke Imai
 Last modified: Mon Jun 23 21:51:14 BST 2025