Publications
Peer-Reviewed Articles
2025
Measuring Scalar Constructs in Social Science With LLMs
Many constructs that characterize language, like its complexity or emotionality, have a naturally continuous semantic structure; a public speech is not just “simple” or “complex”, but exists on a continuum between extremes. Although large language models (LLMs) are an attractive tool for measuring scalar constructs, their idiosyncratic treatment of numerical outputs raises questions of how to best apply them. We address these questions with a comprehensive evaluation of LLM-based approaches to scalar construct measurement in social science. Using multiple datasets sourced from the political science literature, we evaluate four approaches: unweighted direct pointwise scoring, aggregation of pairwise comparisons, token-probability-weighted pointwise scoring, and finetuning. Our study finds that pairwise comparisons made by LLMs produce better measurements than simply prompting the LLM to directly output the scores, which suffers from bunching around arbitrary numbers. However, taking the weighted mean over the token probability of scores further improves the measurements over the two previous approaches. Finally, finetuning smaller models with as few as 1,000 training pairs can match or exceed the performance of prompted LLMs.
Detecting Group Mentions in Political Rhetoric: A Supervised Learning Approach
Politicians appeal to social groups to court their electoral support. However, quantifying which groups politicians refer to, claim to represent, or address in their public communication presents researchers with challenges. We propose a supervised learning approach for extracting group mentions from political texts. We first collect human annotations to determine the passages of a text that refer to social groups. We then fine-tune a transformer language model for contextualized supervised classification at the word level. Applied to unlabeled texts, our approach enables researchers to automatically detect and extract word spans that contain group mentions. We illustrate our approach in two applications, generating new empirical insights into how British parties use social groups in their rhetoric. Our method allows for detecting and extracting mentions of social groups from various sources of texts, creating new possibilities for empirical research in political science.
Measuring and Understanding Parties' Anti-elite Strategies
This article presents a new measure and analysis of parties’ anti-elite appeals In order to measure parties’ anti-elite appeals, we apply crowd-sourced coding, supervised machine learning, and novel cross-lingual transfer learning techniques to parties’ Twitter posts Our dataset records quarterly estimates of parties’ anti-elite strategies for 20 countries between 2008 and 2021 Based on these indicators, we analyze whether parties’ anti-elite rhetoric reflects the potential costs and benefits of this electoral strategy We find that mainstream parties use anti-elite rhetoric less frequently when they are more likely to be included in the next governing coalition When challenger parties do well in the polls, they become more anti-elitist Our article not only contributes to the literature on democratic competition by introducing and applying a new measure of anti-elite strategies, but it also outlines a novel, modular, and scalable procedure to measure party appeals using social media posts.
2023
Going Cross-Lingual: A Guide to Multilingual Text Analysis
Text-as-data methods have revolutionized the study of political behavior and communication, and the increasing availability of multilingual text collections promises exciting new applications of these methods in comparative research. To encourage researchers to seize these opportunities, we provide a guide to multilingual quantitative text analysis. Responding to the unique challenges research faces in multilingual analysis, we provide a systematic overview of multilingual text analysis methods developed for political and communication science research. To structure this overview, we distinguish between separate analysis, input alignment, and anchoring approaches to cross-lingual text analysis. We then compare these approaches’ resource intensiveness and discuss the strategies they offer for approaching measurement equivalence. We argue that to ensure valid measurement across languages and contexts, researchers should reflect on these aspects when choosing between approaches. We conclude with an outlook on future directions for method development and potential fields of applications. Overall, our contribution helps political and communication scientists to navigate the field of multilingual text analysis and gives impulses for their wider adoption and further development.
Cross-Lingual Classification of Political Texts Using Multilingual Sentence Embeddings
Established approaches to analyze multilingual text corpora require either a duplication of analysts' efforts or high-quality machine translation (MT). In this paper, I argue that multilingual sentence embedding (MSE) is an attractive alternative approach to language-independent text representation. To support this argument, I evaluate MSE for cross-lingual supervised text classification. Specifically, I assess how reliably MSE-based classifiers detect manifesto sentences' topics and positions compared to classifiers trained using bag-of-words representations of machine-translated texts, and how this depends on the amount of training data. These analyses show that when training data are relatively scarce (e.g., 20K or less-labeled sentences), MSE-based classifiers can be more reliable and are at least no less reliable than their MT-based counterparts. Furthermore, I examine how reliable MSE-based classifiers label sentences written in languages *not* in the training data, focusing on the task of discriminating sentences that discuss the issue of immigration from those that do not. This analysis shows that compared to the within-language classification benchmark, such "cross-lingual transfer" tends to result in fewer reliability losses when relying on the MSE instead of the MT approach. This study thus presents an important addition to the cross-lingual text analysis toolkit.
Working Papers
2026
Computational Emotion Analysis With Multimodal LLMs: Current Evidence on an Emerging Methodological Opportunity
Research increasingly leverages audio-visual materials to analyze emotions in political communication. Multimodal large language models (mLLMs) promise to enable such analyses through in-context learning. However, we lack systematic evidence on whether these models can reliably measure emotions in real-world political settings. This paper evaluates leading mLLMs for video-based emotional arousal measurement using two complementary human-labeled video datasets: recordings created under laboratory conditions and real-world parliamentary debates. I find a critical lab-vs-field performance gap. In video created under laboratory conditions, mLLMs arousal scores approach human-level reliability with little to no demographic bias. However, in parliamentary debate recordings, all examined models' arousal scores correlate at best moderately with average human ratings and exhibit systematic bias by speaker gender and age. Neither relying on leading closed-source mLLMs nor computational noise mitigation strategies change this finding. Further, mLLMs underperform even in sentiment analysis when using video recordings instead of text transcripts of the same speeches. These findings reveal important limitations of current mLLMs for real-world political video analysis and establish a rigorous evaluation framework for tracking future developments.
under review Pre-Print
2025
Trading Off Policy, Stability, and Exclusion of the Radical Right: Experimental Evidence on Voters' Government Preferences
The electoral success of radical right parties in Europe presents significant challenges for government formation in representative democracies. This study investigates how voters evaluate governments that either include or exclude radical right parties, focusing on the difficult trade-offs voters may need to make between policy congruence and government stability. We present results of a multi-country survey experiment and assess voters' willingness to accept policy losses and the formation of minority governments to keep radical right parties out of government. Our experimental evidence shows that principled opponents, defined as citizens who dislike radical right parties and want them to be treated differently than other parties, are indeed willing to make policy sacrifices and support the formation of minority governments. Our results have important implications for our understanding of political representation, voters' perceptions of coalition negotiation strategies, and the role of populist radical right parties in Europe.
revise and resubmit
Validating Open-Source Machine Translation for Quantitative Text Analysis
As more and more scholars apply computational text analysis methods to multilingual corpora, machine translation has become an indispensable tool. However, relying on commercial services for machine translation, such as Google Translate or DeepL, limits reproducibility and can be expensive. This paper assesses the viability of a reproducible and affordable alternative: free and open-source machine translation models. We ask whether researchers who use an open-source model instead of a commercial service for machine translation would obtain substantially different measurements from their multilingual corpora. We address this question by replicating and extending an influential study by de Vries et al. (2018) on the use of machine translation in cross-lingual topic modeling, and an original study of its use in supervised text classification with Transformer-based classifiers. We find only minor differences between the measurements generated by these methods when applied to corpora translated with open-source models and commercial services, respectively. We conclude that "free" machine translation is a very valuable addition to researchers' multilingual text analysis toolkit. Our study adds to a growing body of work on multilingual text analysis methods and has direct practical implications for applied researchers.
conditionally accepted Pre-Print
2023
Do We Still Need BERT in the Age of GPT? Comparing the Benefits of Domain-Adaptation and In-Context-Learning Approaches to Using LLMs for Political Science Research
With the rapid development of large language models (LLMs), we claim that researchers using LLMs must make three critical decisions: model selection, domain-adaptation strategies, and prompt design. To help provide guidance on these choices, we establish a set of benchmarks for a wide range of natural language processing (NLP) tasks pursued by political science tasks. We use this benchmark to compare two common approaches to the classification of political text: domain-adapting smaller LLMs such as BERT to one's own data with varying levels of unsupervised pre-training and supervised fine-tuning, and querying larger LLMs such as GPT-3 without additional training. Preliminary results indicate that when labeled data is available, the fine-tuning focused approach remains the superior technique for text classification.
stale PDF