Although it is widely known that the
self-reported turnout rates obtained from public opinion surveys
tend to substantially overestimate the actual turnout rates,
scholars sharply disagree on what causes this bias. Some blame
overreporting due to social desirability, whereas others attribute
it to non-response bias and the accuracy of turnout validation.
While we can validate self-reported turnout by directly linking
surveys with administrative records, most existing studies rely on
proprietary merging algorithms with little scientific transparency
and report conflicting results. To shed a light on this debate, we
apply a probabilistic record linkage model, implemented via the
open-source software package
fastLink, to merge two
major election studies -- the American National Election Studies and
the Cooperative Congressional Election Survey -- with a national
voter file of over 180 million records. For both studies,
fastLink
successfully produces validated turnout rates close to the actual
turnout rates, leading to public-use validated turnout data for the
two studies. Using these merged data sets, we find that the bias of
self-reported turnout originates primarily from overreporting rather
than non-response. Our findings suggest that those who are educated
and interested in politics are more likely to overreport turnout.
Finally, we show that
fastLink performs as well
as a proprietary algorithm.