In recent years, fully automated content
analysis based on probabilistic topic models has become popular
among social scientists because of their scalability. However,
researchers find that these models often fail to measure specific
concepts of substantive interest by inadvertently creating multiple
topics with similar content and combining distinct themes into a
single topic. In this article, we empirically demonstrate that
providing a small number of keywords can substantially enhance the
measurement performance of topic models. An important advantage of
the proposed keyword-assisted topic model (keyATM) is that the
specification of keywords requires researchers to label topics prior
to fitting a model to the data. This contrasts with a widespread
practice of post hoc topic interpretation and adjustments that
compromises the objectivity of empirical findings. In our
application, we find that keyATM provides more interpretable
results, has better document classification performance, and is less
sensitive to the number of topics. An
open-source
software package is available for implementing the
proposed methodology.