|
|
Prediction of an individual's race and
ethnicity plays an important role in social science and public
health research. Examples include studies of racial disparity in
health and voting. Recently, Bayesian Improved Surname Geocoding
(BISG), which uses Bayes' rule to combine information from Census
surname files with the geocoding of an individual's residence, has
emerged as a leading methodology for this prediction
task. Unfortunately, BISG suffers from two Census data problems that
contribute to unsatisfactory predictive performance for
minorities. First, the decennial Census often contains zero counts
for minority racial groups in the Census blocks where some members
of those groups reside. Second, because the Census surname files
only include frequent names, many surnames -- especially those of
minorities -- are missing from the list. To address the zero counts
problem, we introduce a fully Bayesian Improved Surname Geocoding
(fBISG) methodology that accounts for potential measurement error in
Census counts by extending the naïve Bayesian inference of the BISG
methodology to full posterior inference. To address the missing
surname problem, we supplement the Census surname data with
additional data on last, first, and middle names taken from the
voter files of six Southern states where self-reported race is
available. Our empirical validation shows that the fBISG methodology
and name supplements significantly improve the accuracy of race
imputation across all racial groups, and especially for Asians. The
proposed methodology, together with additional name data, is
available via the open-source software
package wru. |
Khanna, Kabir, Kosuke Imai, Santiago
Olivella, and Evan Rosenman. ``wru: Who Are You?
Bayesian Prediction of Racial Category Using Surname and
Geolocation.'' available through The Comprehensive R
Archive Network and GitHub |
Imai, Kosuke and Kabir
Khanna. (2016). ``Improving
Ecological Inference by Predicting Individual Ethnicity from Voter
Registration Record.'' Political Analysis,
Vol. 24, No. 2 (Spring), pp. 263-272.
|
Rosenman, Evan T.R., Santiago Olivella,
and Kosuke Imai. ``Race and ethnicity
data for first, middle, and last names.''
Scientific Data, Forthcoming. |
McCartan, Cory, Robin Fisher, Jacob
Goldin, Daniel E. Ho, Kosuke Imai. ``Estimating Racial Disparities When Race is Not
Observed.'' Journal of the American Statistical
Association, Forthcoming. |