|
|
The estimation of racial disparities in
various fields is often hampered by the lack of
individual-level racial information. In many cases, the law
prohibits the collection of such information to prevent direct
racial discrimination. As a result, analysts have frequently
adopted Bayesian Improved Surname Geocoding (BISG) and its
variants, which combine individual names and addresses with
Census data to predict race. Unfortunately, the residuals of
BISG are often correlated with the outcomes of interest,
generally attenuating estimates of racial disparities. To
correct this bias, we propose an alternative identification
strategy under the assumption that surname is conditionally
independent of the outcome given (unobserved) race, residence
location, and other observed characteristics. We introduce a
new class of models, Bayesian Instrumental Regression for
Disparity Estimation (BIRDiE), that take BISG probabilities as
inputs and produce racial disparity estimates by using surnames
as an instrumental variable for race. Our estimation method is
scalable, making it possible to analyze large-scale
administrative data. We also show how to address potential
violations of the key identification assumptions. A validation
study based on the North Carolina voter file shows that
BISG+BIRDiE reduces error by up to 84% when estimating racial
differences in party registration. Finally, we apply the
proposed methodology to estimate racial differences in who
benefits from the home mortgage interest deduction using
individual-level tax data from the U.S. Internal Revenue
Service. Open-source
software is available which implements the
proposed methodology. |
Imai, Kosuke and Kabir
Khanna. (2016). ``Improving
Ecological Inference by Predicting Individual Ethnicity from Voter
Registration Record.'' Political Analysis,
Vol. 24, No. 2 (Spring), pp. 263-272.
|
Imai, Kosuke, Santiago Olivella, and Evan
T. Rosenman. (2022). ``Addressing
Census data problems in race imputation via fully Bayesian
Improved Surname Geocoding and name supplements.''
Science Advances, Vol. 8, No. 49,
pp. 1-10. |
Rosenman, Evan, Santiago Olivella, and
Kosuke Imai. ``Race and ethnicity
data for first, middle, and last names.''
Scientific Data, Vol. 10, No. 299,
pp. 1-11. |
McCartan, Cory. ``BIRDiE: Estimating
disparities when race is not observed.'' available
through GitHub.
|
Khanna, Kabir, Brandon Bertelsen, Santiago
Olivella, Evan Rosenman, and Kosuke Imai. ``wru: Who Are You?
Bayesian Prediction of Racial Category Using Surname and
Geolocation.'' available through The Comprehensive R
Archive Network and GitHub |
Imai, Kosuke and Kabir
Khanna. (2016). ``Improving
Ecological Inference by Predicting Individual Ethnicity from Voter
Registration Record.'' Political Analysis,
Vol. 24, No. 2 (Spring), pp. 263-272.
|
Imai, Kosuke, Evan T.R. Rosenman, and
Santiago Olivella. (2022). ``Addressing Census data problems in
race imputation via fully Bayesian Improved Surname Geocoding and
name supplements.'' Science Advances,
Vol. 8, No. 49, pp. 1-10. |
Rosenman, Evan T.R., Santiago Olivella, and
Kosuke Imai. (2023). ``Race and ethnicity data for
first, middle, and last names.'' Scientific
Data, Vol. 10, No. 299, pp. 1-11. |