Political Tweet Sentiment & Framing

NLP & Text Representation 2018 Exploratory Applied NLP

Overview

This project explored political communication on Twitter through an applied NLP pipeline for lexical analysis, sentiment profiling and emotion-based framing.

The work combined tweet collection, text preprocessing, word-frequency analysis, n-gram extraction, sentiment analysis and emotion lexicon matching to compare communication patterns across political actors and time periods.

The project should be read as an exploratory text-analytics prototype, not as a modern transformer-based sentiment system.

Problem

Political messages are not only informative; they also frame issues through specific vocabularies, emotional tones and repeated expressions.

The goal was to build a reproducible workflow for extracting these patterns from Twitter data and comparing political discourse through interpretable textual indicators.

Example 1 — Lexical and N-Gram Analysis

The pipeline tokenized tweet text, removed punctuation and stopwords, and computed frequent words, bigrams and trigrams.

tweet text
    ↓
tokenization
    ↓
cleaning and stopword removal
    ↓
word frequencies, bigrams and trigrams

This was used to identify recurrent terms and expressions associated with specific political actors or thematic periods.

Example 2 — Sentiment and Emotion Profiling

The project also used sentiment and emotion lexicons to summarize the emotional profile of political communication.

In particular, the workflow explored polarity scores and NRC-style emotion categories such as anger, fear, trust, joy and anticipation.

These outputs were used as descriptive indicators, not as definitive measures of political intent.

Technologies and Methods Used

Python for tweet collection, preprocessing and exploratory analysis.
R for text mining, visualization and exploratory sentiment workflows.
GetOldTweets3 for historical tweet collection.
pandas / NumPy for tabular data handling and intermediate structures.
NLTK for tokenization, stopword removal and n-gram construction.
TextBlob for exploratory polarity and subjectivity scoring.
NRC Emotion Lexicon for emotion-category matching.
tidytext / dplyr / ggplot2 for R-based text processing and visualization.
matplotlib / seaborn for Python-based exploratory plots.

Implemented Elements

Collection of tweet data for selected political accounts and periods.
Cleaning of raw tweet text, links, punctuation and noisy tokens.
Italian and English stopword filtering.
Word-frequency analysis for political discourse comparison.
Bigram and trigram extraction.
Exploratory polarity and subjectivity scoring.
Emotion profiling using lexicon-based matching.
Bar plots and radar-style visualizations of lexical and emotional patterns.

Evaluation Status

The project produced descriptive NLP indicators rather than a supervised classification model.

The outputs are therefore useful for exploration and comparison, but they should not be interpreted as externally validated sentiment labels.

Already present: lexical frequencies, n-grams, polarity scores and emotion profiles.
Already present: visual comparison of political-text indicators.
To be added: manually annotated validation set for sentiment or framing categories.
To be added: robustness checks across time periods, topics and political actors.
To be added: comparison with transformer-based language models for Italian text.

Methodological Note

Lexicon-based sentiment analysis is interpretable and useful for exploratory work, but it has important limitations.

It may fail with irony, sarcasm, negation, political slang, multilingual content and context-dependent meaning. For this reason, the results should be treated as descriptive signals rather than as ground-truth psychological or political measures.

Modern Extension

A modern version of this project would preserve the interpretable descriptive layer while adding contextual language models and stronger evaluation.

Use transformer-based models for Italian sentiment and stance detection.
Compare lexicon-based sentiment with supervised or zero-shot classifiers.
Add topic modeling or embedding-based clustering to identify recurrent frames.
Evaluate sentiment and framing outputs against human annotations.
Track discourse dynamics over time and around political events.

Resources

Technical report in preparation.

Code available upon request.

Technical Context

Mohammad & Turney, NRC Emotion Lexicon — relevant because the project used lexicon-based emotion categories for exploratory profiling.
TextBlob — relevant because the prototype used polarity and subjectivity scores for exploratory sentiment analysis.
Silge & Robinson, tidytext — relevant to the R-based text mining and visualization workflow.