Research on Computational Social Science

 

  Overview

Over the last two decades, the amount and variety of data available to social scientists have dramatically increased. While in the 1990s most researchers were analyzing a handful of national surveys and government data, today's quantitative social scientists conduct their own randomized experiments and surveys and analyze a diverse array of large-scale data sets, ranging from textual to spatial data. This emerging trend demands new statistical methodologies that enable social scientists to overcome these data analytical and computational challenges.

I have developed fast and reliable computational methods for popular Bayesian models such as the multinomial probit and ecological inference models. I have also worked on the development of computational methods for lage-scale data sets in social science research. They include the fast and scalable estimation of various ideal point models for massive data, a dynamic clustering method for large scale product-level trade data, a dynamic regression model for networks, analyses of textual and video data, simulation and enumeration methods for redistricting, and a method for record linkage with large-scale administrative data.

  Manuscripts and Publications

Algorithm-assisted human decision-making and policy learning:
Imai, Kosuke, Zhichao Jiang, D. James Greiner, Ryan Halen, and Sooahn Shin. (2023). ``Experimental Evaluation of Algorithm-Assisted Human Decision-Making: Application to Pretrial Public Safety Assessment.'' (with discussion) Journal of the Royal Statistical Society, Series A (Statistics in Society), Vol. 186, No. 2 (April), pp. 167-189. Read before the Royal Statistical Society.
Ben-Michael, Eli, D. James Greiner, Kosuke Imai, and Zhichao Jiang. ``Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment.'' Journal of the American Statistical Association, Forthcoming.
Ben-Michael, Eli, D. James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, Sooahn Shin. ``Does AI help humans make better decisions? A methodological framework for experimental evaluation.''
Imai, Kosuke and Zhichao Jiang. (2023). ``Principal Fairness for Human and Algorithmic Decision-Making.'' Statistical Science, Vol. 38 No. 2 (July), pp317-328.
Ben-Michael, Eli, Kosuke Imai, and Zhichao Jiang. (2024). ``Policy Learning with Asymmetric Counterfactual Utilities.'' Journal of the American Statistical Association, Vol. 119, No. 548, pp. 3045-3058.
Zhang, Yi, Eli Ben-Michael, and Kosuke Imai. ``Safe Policy Learning under Regression Discontinuity Designs..''
Jia, Zeyang, Eli Ben-Michael, and Kosuke Imai. ``Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War..'' Journal of the Royal Statistical Society, Series A (Statistics in Society), Forthcoming.
Koch, Benedikt and Kosuke Imai. ``Statistical Decision Theory with Counterfactual Loss.''
Heterogeneous treatment effects:
Imai, Kosuke, and Aaron Strauss. (2011). ``Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with Application to the Optimal Planning of the Get-out-the-vote Campaign.'' Political Analysis, Vol. 19, No. 1 (Winter), pp. 1-19. (lead article) Winner of Political Analysis Editors' Choice Award.
Imai, Kosuke and Marc Ratkovic. (2013). ``Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation.'' Annals of Applied Statistics, Vol. 7, No. 1 (March), pp. 443-470. Winner of the Tom Ten Have Memorial Award.
Imai, Kosuke and Michael Lingzhi Li. (2023). ``Experimental Evaluation of Individualized Treatment Rules.'' Journal of the American Statistical Association, Vol. 118, No. 541, pp. 242-256.
Li, Michael Lingzhi and Kosuke Imai. (2024). ``Neyman Meets Causal Machine Learning: Experimental Evaluation of Individualized Treatment Rules.'' Journal of Causal Inference, Vol 12, No. 1, pp. 1-20. Special Issue on Neyman (1923) and its influences on causal inference.
Imai, Kosuke and Michael Lingzhi Li. (2025). ``Statistical Inference for Heterogeneous Treatment Effects Discovered by Generic Machine Learning in Randomized Experiments.'' Journal of Business & Economic Statistics, Forthcoming.
Li, Michael Lingzhi and Kosuke Imai. ``Statistical Performance Guarantee for Subgroup Identification with Generic Machine Learning.''
Jia, Zeyang, Kosuke Imai, and Michael Lingzhi Li. ``Cramming Contextual Bandits for On-policy Statistical Evaluation.''
Zhang, Yi and Kosuke Imai. ``Individualized Policy Evaluation and Learning under Clustered Network Interference.''
Zhang, Yi, Melody Huang, and Kosuke Imai. ``Minimax Regret Estimation for Generalizing Heterogeneous Treatment Effects with Multisite Data.''
Zhou, Lingxiao, and Kosuke Imai, Jason Lyall, and Georgia Papadogeorgou. ``Estimating Heterogeneous Treatment Effects for Spatio-Temporal Causal Inference: How Economic Assistance Moderates the Effects of Airstrikes on Insurgent Violence.''
Highdimensional treatments:
Egami, Naoki, and Kosuke Imai. (2019). ``Causal Interaction in Factorial Experiments: Application to Conjoint Analysis.'' Journal of the American Statistical Association, Vol. 114, No. 526 (June), pp. 529-540.
de la Cuesta, Brandon, Naoki Egami, and Kosuke Imai. (2022). ``Experimental Design and Statistical Inference for Conjoint Analysis: The Essential Role of Population Distribution..'' Political Analysis, Vol. 30, No. 1 (January), pp. 19-45.
Goplerud, Max, Kosuke Imai, Nicole E. Pashley. (2025). ``Estimating Heterogeneous Causal Effects of High-Dimensional Treatments: Application to Conjoint Analysis.'' Annals of Applied Statistics, Vol. 19, No. 2 (June), pp. 866-888.
Ham, Dae Woong, Kosuke Imai, and Lucas Janson. (2024). ``Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis.'' Political Analysis, Vol. 32, No. 3 (July), pp. 329-344.
Highdimensional propensity score:
Ning, Yang, Sida Peng, and Kosuke Imai. (2020). ``Robust Estimation of Causal Effects via High-Dimensional Covariate Balancing Propensity Score..'' Biometrika, Vol. 107, No. 3 (September), pp. 533–554.
Clustering and scaling methods for large-scale data:
Imai, Kosuke, James Lo, and Jonathan Olmsted. (2016). ``Fast Estimation of Ideal Points with Massive Data.'' American Political Science Review, Vol. 110, No. 4 (December), pp. 631-656.
Kim, In Song, Steven Liao, and Kosuke Imai. (2020). ``Measuring Trade Profile with Granular Product-level Trade Data.'' American Journal of Political Science, Vol. 64, No. 1 (January), pp. 102-117.
Olivella, Santiago, Tyler Pratt, and Kosuke Imai. (2022). ``Dynamic Stochastic Blockmodel Regression for Network Data: Application to International Conflicts..'' Journal of the American Statistical Association, Vol. 117, No. 539, pp. 1068-1081.
Lo, Adeline, Santiago Olivella, and Kosuke Imai. ``A Statistical Model of Bipartite Networks: Application to Cosponsorship in the United States Senate..''
Analysis of unstructured data: texts, video, and maps:
Imai, Kosuke and Kentaro Nakamura. ``Gen-AI Powered Inference.''
Imai, Kosuke and Kentaro Nakamura. ``Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments.''
McCartan, Cory, Jacob Brown, and Kosuke Imai. (2024). ``Measuring and Modeling Neighborhoods.'' American Political Science Review, Vol. 118, No. 4 (November), pp. 1966-1985.
Breuer, Adam, Bryce J. Dietrich, Michael H. Crespin, Matthew Butler, J.A. Pyrse, Kosuke Imai. ``Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012.'' Scientific Data, Forthcoming.
Tarr, Alexander, June Hwang, and Kosuke Imai. (2023). ``Automated Coding of Political Campaign Advertisement Videos: An Empirical Validation Study.'' Political Analysis, Vol. 31, No. 4 (October), pp. 554-574.
Eshima, Shusei, Kosuke Imai, and Tomoya Sasaki. (2024). ``Keyword-Assisted Topic Models.'' American Journal of Political Science, Vol. 68, No. 2 (April), pp. 730-750.
Algorithms for legislative redistricting and applications:
Miyazaki, Sho, Kento Yamada, and Kosuke Imai. ``Estimating the Partisan Bias of Japanese Legislative Redistricting Plans Using a Simulation Algorithm.''
McCartan, Cory, Christopher Kenny, Tyler Simko, Emma Ebowe, Michael Zhao, and Kosuke Imai. ``Redistricting Reforms Reduce Gerrymandering by Constraining Partisan Actors.''
Kenny, Christopher T., Cory McCartan, Tyler Simko, Shiro Kuriwaki, and Kosuke Imai. (2023). ``Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition .'' Proceedings of the National Academy of Sciences, Vol. 120, No. 25, e2217322120.
McCartan, Cory, Christopher T. Kenny, Tyler Simko, George Garcia III, Kevin Wang, Melissa Wu, Shiro Kuriwaki, and Kosuke Imai. (2022). ``Simulated redistricting plans for the analysis and evaluation of redistricting in the United States: 50stateSimulations.'' Scientific Data, Vol. 9, No. 689, pp. 1-10.
Kenny, Christopher T., Shiro Kuriwaki, Cory McCartan, Evan T.R. Rosenman, Tyler Simko, and Kosuke Imai. (2023). ``Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System..'' Harvard Data Science Review, Special Issue 2: Dierential Privacy for the 2020 U.S. Census (January), pp. 1-16.
McCartan, Cory and Kosuke Imai. (2023). ``Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans.'' Annals of Applied Statistics, Vol. 17, No. 4 (December), pp. 3300-3323..
Fifield, Benjamin, Michael Higgins, Kosuke Imai, and Alexander Tarr. (2020). ``Automated Redistricting Simulation Using Markov Chain Monte Carlo.'' Journal of Computational and Graphical Statistics, Vol. 29, No. 4, pp. 715-728.
Fifield, Benjamin, Kosuke Imai, Jun Kawahara, and Christopher T. Kenny. (2020). ``The Essential Role of Empirical Validation in Legislative Redistricting Simulation.'' Statistics and Public Policy, Vol. 7, No. 1, pp 52-68.
Census and differential privacy:
Kenny, Christopher T., Shiro Kuriwaki, Cory McCartan, Evan T.R. Rosenman, Tyler Simko, and Kosuke Imai. (2021). ``The Use of Differential Privacy for Census Data and its Impact on Redistricting: The Case of the 2020 U.S. Census..'' Science Advances, Vol. 7, No. 7 (October), pp. 1-17.
McCartan, Cory, Tyler Simko, and Kosuke Imai. (2023). ``Researchers need better access to US Census data.'' Science, Vol. 380, No. 6648 pp. 902-903
McCartan, Cory, Tyler Simko, and Kosuke Imai. (2023). ``Making Differential Privacy Work for Census Data Users.'' Harvard Data Science Review, Vol. 5, No. 4 (Fall).
Kenny, Christopher, Shiro Kuriwaki, Cory McCartan, Tyler Simko, and Kosuke Imai. (2024). ``Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods.'' Science Advances, Vol 10, No. 18 (May), pp. 1-13.
Record linkage methods:
Enamorado, Ted, Benjamin Fifield, and Kosuke Imai. (2019). ``Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records.'' American Political Science Review, Vol. 113, No. 2 (May), pp. 353-371.
Enamorado, Ted, and Kosuke Imai. (2019). ``Validating Self-reported Turnout by Linking Public Opinion Surveys with Administrative Records.'' Public Opinion Quarterly, Vol. 83, No. 4 (Winter), pp. 723–748.
Multinomial probit models:
Imai, Kosuke, and David A. van Dyk. (2005). ``A Bayesian Analysis of the Multinomial Probit Model Using Marginal Data Augmentation.'' Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311-334.
Imai, Kosuke, and David A. van Dyk. (2005). ``MNP: R Package for Fitting the Multinomial Probit Model.'' Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 1-32. abstract reprinted in Journal of Computational and Graphical Statistics, (2005) Vol. 14, No. 3 (September), p. 747.
Ecological inference and racial prediction models:
Imai, Kosuke, and Gary King. (2004). ``Did Illegal Overseas Absentee Ballots Decide the 2000 U.S. Presidential Election?.'' Perspectives on Politics, Vol. 2, No. 3 (September), pp.537-549. Our analysis is a part of The New York Times article, ``How Bush Took Florida: Mining the Overseas Absentee Vote'' By David Barstow and Don van Natta Jr. July 15, 2001, Page 1, Column 1.
Imai, Kosuke, Ying Lu, and Aaron Strauss. (2008). ``Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach.'' Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.
Imai, Kosuke, Ying Lu, and Aaron Strauss. (2011). ``eco: R Package for Ecological Inference in 2 x 2 Tables.'' Journal of Statistical Software, Vol. 42, No. 5 (Special Volume on Political Methodology), pp. 1-23.
Imai, Kosuke and Kabir Khanna. (2016). ``Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Record.'' Political Analysis, Vol. 24, No. 2 (Spring), pp. 263-272.
Imai, Kosuke, Santiago Olivella, and Evan T.R. Rosenman. (2022). ``Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements.'' Science Advances, Vol. 8, Issue 49, pp. 1-10.
Rosenman, Evan T.R., Santiago Olivella, and Kosuke Imai. (2023). ``Race and ethnicity data for first, middle, and last names.'' Scientific Data, Vol. 10, No. 299, pp. 1-11.
McCartan, Cory, Robin Fisher, Jacob Goldin, Daniel E. Ho, Kosuke Imai. ``Estimating Racial Disparities When Race is Not Observed.'' Journal of the American Statistical Association, Forthcoming.

     Statistical Software

Imai, Kosuke, Ying Lu, and Aaron Strauss. ``eco: R Package for Ecological Inference in 2 x 2 Tables.'' available through The Comprehensive R Archive Network. 2004-2009.
Imai, Kosuke, and David A. van Dyk. ``MNP: R Package for Fitting the Multinomial Probit Model.'' available through The Comprehensive R Archive Network. 2004-2008.
Khanna, Kabir, and Kosuke Imai. ``wru: Who Are You? Bayesian Predictions of Racial Category Using Surname and Geolocation.'' available through GitHub. 2015.
Fifield, Benjamin, Christopher T. Kenny, Cory MaCartan, Alexander Tarr, and Kosuke Imai. ``redist: Computational Algorithms for Redistricting Simulation.'' available through The Comprehensive R Archive Network and GitHub.
Imai, Kosuke, James Lo, and Jonathan Olmsted. ``emIRT: EM Algorithms for Estimating Item Response Theory Models.'' available through The Comprehensive R Archive Network and the GitHub. 2015.

© Kosuke Imai
 Last modified: Tue Jul 8 22:38:11 BST 2025