A methodological and conceptual contribution to the study of explanations in political text
PhD dissertation defense
Explanations are fundamental part of the social world.
Overarching theories across paradigms consider why attributing that “A causes B” matters in our understanding of the social world (Stone 1989; Shiller 2019; Entman 1993)
Selective attribution helps explain how people assign causes of social issues, and how to solve them (Sahar 2014; Rudolph 2006; McCabe 2016).
In politics, Partisan Motivated Reasoning is the mechanism behind why we tend to assign responsibility for positive outcomes to our own party, while blaming bad outcomes on opponents.
However, attribution is not just a psychological process, but also a discursive practice:
Politician’s use of justifications can alter accountability (Hansson 2015; Grose, Malhotra, and Parks Van Houweling 2015)
Media uses causal stories to allocate responsibility and this affects citizens’ blame attributions (Kim 2015; Rozenas and Stukal 2019; Iyengar 1989)
Elite’s attributions of responsibility influences public attitudes and choices (Hobolt and Tilley 2014; Bisgaard 2015)
Graham and Singh (2023)
Understanding the mechanisms behind selective attribution is being addressed by improving experimental design (Graham and Singh 2023; Tappin, Pennycook, and Rand 2021).
However, questions of external validity persist.
When analyzing attributions as they are made in political discourse, content analysis relies on domain specific coding.
| Citation | Coding Scheme |
|---|---|
| Sotirovic (2003) | Individual vs. Societal |
| Rozenas and Stukal (2019) | National vs. Foreign |
| Hameleers, Bos, and Vreese (2017) | Blame vs. Claim |
| Tilley and Hobolt (2011) | National vs. Regional |
| Hameleers, Bos, and Vreese (2017) | Us vs. Them |
These two approaches have limited the understanding of the broader structure of causal attribution.
We need a general, scalable method to answer the question:
“who is claiming what and about whom?”
Text-as-data methods are well-suited to capture discursive objects in natural occurring text data.
Causal Language Modeling (CLM) offers a systematic approach to identifying and analyzing causal claims.
Methodological:
How can causal attributions be used to empirically study political discourse?
Conceptual:
How are causal attribution employed in media framing?
Causal language is very hard to model.
CLM has not been tested in political text data.
Rhetorical, strategic, and often contested, and highly contextual.
Consensus about basic causal stories is fragmented.
The task is not to determine if statements are true, but to parse as expressed.
Break down causal attributions to its linguistic components And get structured data from unstructured text
Causal Language Modeling:
Define what causal language looks like in text
Annotation codebook
Corpus creation
Train a model to capture these expressions
First corpus of political text annotated for causality (Corral et al. 2024).
Annotated ~18,000 sentences of the UNGD corpus (Jankin, Baturo, and Dasandi 2017).
Larger contexts promise better performance.
This raises a memorization v.s. generalization question between linguistic cues and causal mechanisms (Kıcıman et al. 2023; Hobbhahn, Lieberum, and Seiler 2022; Takayanagi et al. 2024).
Designed an evaluation to determine if models can parse causal expressions: Linguistic Causal Disambiguation.
Used adversarial testing to begin understanding the mechanism used to perform a “causal classification” task when using token generation.
Causal Language Modeling Framework for the study of attribution in political discourse.
Transformer-based models provide a balance between transferability, scalability, and accuracy with modest training data needs.
Provide a probabilistic output perfect for downstream analysis
Causal attributions are seldom operationalized as frames, despite being central to the theory (Entman 1993).
Using causal attributions as frames, I seek answer the question:
How are causal attribution employed in media framing?
To answer this question, I propose non-attribution as a unique strategy.
Events that have a known cause and a known outcome such as:
| Event 1 | \(\rightarrow\) | Event 2 |
|---|---|---|
| Russian strike | causes | 20 people are not alive anymore |
Structure |
Type |
Example |
| cause event \(\rightarrow\) effect event | explicit causal | 20 killed in Kharkiv strike by Russia. |
| \(\otimes\) \(\rightarrow\) effect event | cause implicit | 20 killed in Kharkiv. |
| event - event | outcome implicit | 20 dead in strike. |
| Region | Source | Count (Proportion) |
|---|---|---|
| ME | AJ | 1994 (0.29) |
| BBC | 2781 (0.41) | |
| CNN | 2017 (0.30) | |
| EE | AJ | 3603 (0.39) |
| BBC | 2625 (0.29) | |
| CNN | 2962 (0.32) |
| Regime | Citation | Finding | |
|---|---|---|---|
| State | Autocratic | Rozenas and Stukal (2019) | Selective attribution serves to explain social material issues. |
| Private | Democratic | Iyengar (1987) | Selective attribution serves as partisan filtering of events |
| PSM | Democratic | Neutrality performance under soft political power? |
Methodological: How can causal attributions be used to empirically study political discourse?
Theoretical: How are causal attribution strategically used in media framing?
By advancing a methodological question I show how we can extend theory of framing and attribution.
Create a corpus and annotation scheme to detect causal attributions in political text.
Develop an evaluation framework for causal language extraction for LMs.
Introduce a supervised, NLP-based framework to identify and measure causal attributions in political texts.
Analyze how attributions are used in media and present non-attribution as a framing strategy for neutrality performance.
Paulina García-Corral
corral@hertie-school.org
| Dataset | Citation | Description |
|---|---|---|
| BeCAUSE 2.0 | Dunietz (n.d.) | 5,380 based on construction grammar, newspaper articles, PDT, and US congress. |
| Parallel Wikipedia Corpus | Hidey and McKeown (2016) | 265,627 Wikipedia based on alternative lexicalizations. |
| Causal-TimeBank | Mirza et al. (2014) | TempEval-3 task + causal singals and links. |
| EventStoryLine Corpus | Caselli and Vossen (2017) | 258 documents temporal and causal language of stories |
| Causal News Corpus | Tan et al. (2022) | 3,559 events from protest news. |
| Unicausal | Tan, Zuo, and Ng (2023) | 58,720 sentences from different datasets |
| Chemical Induced Disease | Gu, Qian, and Zhou (2016) | Relation extraction for drugs and adverse effects |
| FinCausal | Mariko et al. (n.d.) | Financial documents for causal extraction |
| PubMed Corpus | Yu, Li, and Wang (n.d.) | 3,000 research conclusion sentences for correlational v causal language |
| e-CARE | Du et al. (2022) | question answering task dataset |
Corpus composition
| tokens (word-level) | |||
|---|---|---|---|
| N | mean | std | |
| UNGD | 8,872 | 2,702.87 | 1,357.05 |
| UK Press | 429 | 787.78 | 477.58 |
Finetuning
| Total | Not Causal | Causal | |
|---|---|---|---|
| train | 12,446 | 8,897 | 3,549 |
| val | 2,667 | 1,906 | 761 |
| test | 2,667 | 1,906 | 760 |
| BERT | RoBERTa | DistilBERT | UniCausal | |
|---|---|---|---|---|
| Acc | 0.832 | 0.836 | 0.832 | 0.715 |
| Prec | 0.671 | 0.686 | 0.696 | 0.500 |
| Recall | 0.805 | 0.783 | 0.730 | 0.612 |
| MCC | 0.617 | 0.617 | 0.594 | 0.550 |
| F1 | 0.732 | 0.731 | 0.712 | 0.349 |
| Full Corpus | Error Subset | |||
|---|---|---|---|---|
| Label | TN | TP | FN | FP |
| Mean conf | 4.63 | 4.13 | 4.20 | 4.30 |
| Maj label | 0.79 | 0.10 | 0.71 | 0.26 |
| Mean len | 26.13 | 31.73 | 30.10 | 31.35 |
| Causal conn | 0.37 | 0.50 | 0.44 | 0.48 |
Explicit causal sentences only: Implicit structures are also rhetorical and possibly intentional.
Single sentence causal structures: Causal attributions also exist across sentences.
Corpus external validity from UN debates: Elite rhetoric, audience is other elites, prepared.
| Citation | Model | Findings |
|---|---|---|
| Takayanagi et al. (2024) | ChatGPT | baseline proficency, but can be outperformed by smaller LM |
| Hobbhahn, Lieberum, and Seiler (2022) | GPT-3 | Importance of prompting |
| Gao et al. (2023) | ChatGPT | Can provide causal explanations but reasoning is unclear |
| Kıcıman et al. (2023) | GPT-3 | Outperforms in causal reasoning tasks |
| Ho et al. (2022) | GPT-3 | Low rating for cause-effect pair matching |
| Jin et al. (2024) | Llama, Alpaca, GPT | High performance for causal stories |
| Data | N | Text |
|---|---|---|
| PolitiCAUSE | 527 | mostly UNGD |
| Fake news | 50 | multiple fake news detection databases |
| Post-train date | 50 | LexisNexis news databases |
System: You are a causal language model that performs causal sequence classification and causal span detection. You will classify a text as causal or not causal, and if it’s causal you will extract the causes and effects. The output should be a json with label 1 or 0, cause, and effect value such as {”label”:, ”cause”: ,”effect”: }.
User: But to pay for it, we had to take on debt, precipitated by massive reduction in Government revenue.
Assistant:{”label”: 1,”cause”: ”massive reduction in Government revenue”,
n ”effect”: ”had to take on debt”}
| Parameter | Value |
|---|---|
| Temperature | 0.0 |
| Top p: | 1.0 |
| Top k: | 1.0 |
| Frequency penalty | 0.0 |
| Presence pentaly | 0.0 |
| Repetition penalty | 0.0 |
| Max body tokens | 200 |
All Open AI models were run using the Open AI Batch API.
Llama and Gemma models were accessed via the Transformers library from Hugging Face, and inference was run using the Together AI API.
Gemini-1.5 was run using Google’s AI Studio API.
For Google models, the HarmBlockThreshold in the safety settings parameter was set to None for the first two experiments, and set to default for the Fake news and post-training events set of sentences in LCD evaluation.
Model’s with more parameters exhibit better performance overall:
Following recall and precision smaller models may be relying on more explicit causal markers, while larger models can “infer from context”.
Scale contributes not only to broader generalization, but also to more faithful alignment with syntactic and semantic cues in discourse.
Smaller models appear more susceptible to heuristic pattern matching and are prone to overgeneralizing causal signals from pretraining data.
PolitiCAUSE contains extensive but noisy data, so only the highest-quality, human-verified sentences were used to ensure reliable evaluation and avoid cases where human annotations might be wrong.
The prompt follows established structures from prior work to maximize stability in zero-shot settings, acknowledging that alternative prompting strategies (e.g., Chain of Thought) could also work.
Although paraphrasing can also show correct causal structure identification, the experiment required outputs in a consistent, easily parseable format for downstream applications.
Identifying claims relies on linguistic cues separate from truthfulness, and the study assumes models should do the same; observed safety behaviors and genre effects suggest that even with similar training data, LLMs still conflate real vs. fake content.
| Region | Source | Count (Proportion) |
|---|---|---|
| ME | AJ | 1251 (0.48) |
| BBC | 792 (0.30) | |
| CNN | 567 (0.22) | |
| EE | AJ | 1018 (0.43) |
| BBC | 784 (0.33) | |
| CNN | 581 (0.24) |
| Set | Count (Proportion) |
|---|---|
| Train | 862 (0.70) |
| Test | 191 (0.15) |
| Validation | 185 (0.15) |
| Total | 1238 (1.00) |
🤗 pgarco/CausalPalUkr_Train_Test
🤗 pgarco/CausalPalUkr_Val.
🤗 pgarco/CausalPalUkr_token
🤗 pgarco/CausalPalUkr_seq
| Precision | Recall | Accuracy | F1-score | |
|---|---|---|---|---|
| CAUSAL | 0.891 | 0.855 | 0.872 | |
| NOT CAUSAL | 0.727 | 0.787 | 0.756 | |
| weighted average | 0.837 | 0.832 | 0.832 | 0.834 |
Selective attribution always works under the assumption of framing works as a competitions between actors (Iyengar 1987; Rozenas and Stukal 2019; Bisgaard 2015)
Media bias has also shown the omission can also affect responsibility (Gentzkow and Shapiro 2006) .
| Causal with responsibility | Causal without responsibility |
|---|---|
| Officials report indicent in Kharkiv. | Officials report incident in Kharkiv kills 20. |
| Agentless with overall ambiguity | Agentless with precise outcome |
|---|---|
| Officials report incdent in Kharkic. | Officials report incident in Kharkiv kills 20. |
| Non-attribution by supression | Non-attribution by predicate change |
|---|---|
| 20 killed in Kharkiv strike. | Officials in Kharkic report 20 dead. |
| Region | Source | Count (Proportion) |
|---|---|---|
| ME | AJ | 1994 (0.29) |
| BBC | 2781 (0.41) | |
| CNN | 2017 (0.30) | |
| EE | AJ | 3603 (0.39) |
| BBC | 2625 (0.29) | |
| CNN | 2962 (0.32) |
Low used tokens render very big CI
News events are not matched nor weighted
NER might be more robust than “actor = subject” analysis
Mechanism might be a function of maximizing content production and minimizing ideological inconsistency