Political Tweet Sentiment & Framing
Overview
This project explored political communication on Twitter through an applied NLP pipeline for lexical analysis, sentiment profiling and emotion-based framing.
The work combined tweet collection, text preprocessing, word-frequency analysis, n-gram extraction, sentiment analysis and emotion lexicon matching to compare communication patterns across political actors and time periods.
The project should be read as an exploratory text-analytics prototype, not as a modern transformer-based sentiment system.
Problem
Political messages are not only informative; they also frame issues through specific vocabularies, emotional tones and repeated expressions.
The goal was to build a reproducible workflow for extracting these patterns from Twitter data and comparing political discourse through interpretable textual indicators.
Example 1 — Lexical and N-Gram Analysis
The pipeline tokenized tweet text, removed punctuation and stopwords, and computed frequent words, bigrams and trigrams.
tweet text
↓
tokenization
↓
cleaning and stopword removal
↓
word frequencies, bigrams and trigrams
This was used to identify recurrent terms and expressions associated with specific political actors or thematic periods.
Example 2 — Sentiment and Emotion Profiling
The project also used sentiment and emotion lexicons to summarize the emotional profile of political communication.
In particular, the workflow explored polarity scores and NRC-style emotion categories such as anger, fear, trust, joy and anticipation.
These outputs were used as descriptive indicators, not as definitive measures of political intent.
Technologies and Methods Used
- Python for tweet collection, preprocessing and exploratory analysis.
- R for text mining, visualization and exploratory sentiment workflows.
- GetOldTweets3 for historical tweet collection.
- pandas / NumPy for tabular data handling and intermediate structures.
- NLTK for tokenization, stopword removal and n-gram construction.
- TextBlob for exploratory polarity and subjectivity scoring.
- NRC Emotion Lexicon for emotion-category matching.
- tidytext / dplyr / ggplot2 for R-based text processing and visualization.
- matplotlib / seaborn for Python-based exploratory plots.
Implemented Elements
- Collection of tweet data for selected political accounts and periods.
- Cleaning of raw tweet text, links, punctuation and noisy tokens.
- Italian and English stopword filtering.
- Word-frequency analysis for political discourse comparison.
- Bigram and trigram extraction.
- Exploratory polarity and subjectivity scoring.
- Emotion profiling using lexicon-based matching.
- Bar plots and radar-style visualizations of lexical and emotional patterns.
Evaluation Status
The project produced descriptive NLP indicators rather than a supervised classification model.
The outputs are therefore useful for exploration and comparison, but they should not be interpreted as externally validated sentiment labels.
- Already present: lexical frequencies, n-grams, polarity scores and emotion profiles.
- Already present: visual comparison of political-text indicators.
- To be added: manually annotated validation set for sentiment or framing categories.
- To be added: robustness checks across time periods, topics and political actors.
- To be added: comparison with transformer-based language models for Italian text.
Methodological Note
Lexicon-based sentiment analysis is interpretable and useful for exploratory work, but it has important limitations.
It may fail with irony, sarcasm, negation, political slang, multilingual content and context-dependent meaning. For this reason, the results should be treated as descriptive signals rather than as ground-truth psychological or political measures.
Modern Extension
A modern version of this project would preserve the interpretable descriptive layer while adding contextual language models and stronger evaluation.
- Use transformer-based models for Italian sentiment and stance detection.
- Compare lexicon-based sentiment with supervised or zero-shot classifiers.
- Add topic modeling or embedding-based clustering to identify recurrent frames.
- Evaluate sentiment and framing outputs against human annotations.
- Track discourse dynamics over time and around political events.
Resources
Technical report in preparation.
Code available upon request.
Technical Context
- Mohammad & Turney, NRC Emotion Lexicon — relevant because the project used lexicon-based emotion categories for exploratory profiling.
- TextBlob — relevant because the prototype used polarity and subjectivity scores for exploratory sentiment analysis.
- Silge & Robinson, tidytext — relevant to the R-based text mining and visualization workflow.