GROUPY
Zero-shot multitask models for analyzing group-based rhetoric in political texts from synthetic labeled data
GROUPY develops a suite of automated methods for identifying and categorizing social group mentions in political texts. Political competition, representation, and polarization are deeply rooted in divisions along group lines. However, despite the importance of group-based rhetoric for understanding political competition, representation, polarization, and public opinion formation, existing computational methods often fall short in terms of scalability, reliability, and generalization.
GROUPY closes this gap by developing a suite of zero-shot multitask models for analyzing group-based rhetoric in political texts. It adopts a comprehensive conceptual framework focusing on three central dimensions of group-based political rhetoric: (i) the social group mention, (i) a group mention’s defining attributes, and (iii) a speaker’s expressed stance towards a group they mention. The project will leverage controlled synthetic labeled data generation with large language models (LLMs) and transformer model fine-tuning to create a suite of classification models for identifying social group mentions in political texts, categorizing their defining attributes, and scoring of authors’ stances towards the groups they mention. In particular, the project will mine established social surveys like the ESS and EVS to for socio-culturally salient social group names and categories that will be used to generate synthetic group mentions and sentences. GROUPY’s models will be fine-tuned on these synthetic labeled data for reliable zero-shot multidimensional measurement across multiple domains of political communication.
Releasing the models, code and data produced by the project, GROUPY will accelerate the computational study of group-based political rhetoric in the social sciences.