Ad-Tech Analytics & Click Dynamics

Applied Data Science 2022-2023 Professional Project

Overview

This project summarizes applied data-science work on advertising performance data: cloud extraction, campaign analytics, keyword-level modeling, seasonality analysis and decision support for budget allocation.

The work treated advertising data as a high-frequency panel where queries, keywords, channels, devices, markets and time interact. The goal was not to predict exact outcomes deterministically, but to build interpretable metrics and models for campaign timing, keyword evaluation and revenue-risk decisions.

The public version is intentionally anonymized. It describes methods and system design without exposing company data, account identifiers, credentials, partner names or raw operational files.

Business Problem

Advertising platforms generate large volumes of noisy performance data. Useful decisions require more than raw clicks or revenue: teams need to understand where performance is persistent, where it is seasonal, where it is unstable and which keyword or channel states deserve additional budget.

The project connected campaign metrics to operational questions:

Which keywords, queries or channels show favorable revenue-per-click behavior?
How do click and revenue metrics vary by hour, weekday, market and device?
Which clusters of keyword value are stable, and which tend to transition into weaker states?
How can historical and planner-derived features support campaign and budget decisions?

Data Infrastructure

The workflow combined cloud data access with local analytical modeling. Advertising tables were queried from a cloud warehouse and converted into analysis-ready pandas datasets.

AWS Athena / S3 advertising tables

↓

SQL extraction and filtering

↓

Python boto3 connector

↓

pandas cleaning and aggregation

↓

KPI modeling and decision-support outputs

Data and Metrics

The data were organized around query, keyword, channel, device, market and time dimensions. Core quantitative fields included searches, impressions, clicks, cost and revenue.

From these fields, the workflow constructed advertising KPIs such as:

CTR: clicks divided by searches or impressions, depending on the reporting layer.
RPC: revenue per click.
RPM: revenue per thousand impressions or searches.
CPC: cost per click.
Gross revenue and cost-revenue indicators for campaign evaluation.
Device, market and geo splits for segment-level interpretation.

Seasonality and Panel Structure

High-frequency advertising panel data were transformed into time-series and panel structures to study trend, seasonality and stochastic residual dynamics.

The analysis included hourly and daily aggregations, market splits, device filters and seasonal decomposition logic. This made it possible to inspect how revenue and click behavior changed across hours of day, weekdays, months and campaign contexts.

The methodological idea was to separate recurring deterministic structure from more volatile residual behavior, so that campaign decisions were not based on raw short-term fluctuations alone.

Keyword Value Clustering

One component modeled keyword value as a set of empirical performance states. Instead of assigning each keyword to one fixed channel or treating every query independently, keywords were grouped into value clusters based on revenue-per- click behavior and related performance metrics.

The operational goal was to support channel assignment and budget decisions by identifying favorable, intermediate and weak keyword states.

Keyword-level performance histories were cleaned and aggregated.
Rolling windows were compared to evaluate short-term predictive stability.
Cluster assignments were used as interpretable states rather than black-box scores.
The framework supported analysis of persistence and movement between value states.

Markov Transition Modeling

The project explored a finite-state Markov-style transition model for keyword value clusters. The model estimated how often keywords moved between value states over time, making it possible to study persistence, volatility and jump probabilities.

This was not framed as exact price prediction. It was a state-dynamics tool: given a keyword's recent performance state, the model helped reason about the probability of remaining stable, improving or moving into a weaker cluster.

Revenue and Click Performance Modeling

Additional modeling work connected keyword-level performance to planner and campaign features such as search volume, bid information, competition index, estimated clicks, estimated CTR and estimated CPC.

The modeling layer included regression-style experiments, tree-based models and error diagnostics for revenue or click-performance targets. These models were used as exploratory decision-support tools rather than fully automated budgeting systems.

A/B Testing and Experimentation

Additional analyses included A/B testing workflows for landing-page or creative variants. These analyses connected click behavior and conversion metrics to product or campaign decisions while keeping statistical uncertainty explicit.

Business Output

The project produced analytical outputs for interpreting campaign performance and supporting operational decisions.

Seasonality diagnostics for campaign timing.
Keyword and query-level performance summaries.
Cluster-based views of favorable and risky keyword states.
Transition matrices for state persistence and movement.
Revenue and click-performance modeling experiments.
Reports and visualizations for business stakeholders.

Evaluation Limits

Advertising systems are non-stationary: platform rules, competition, bidding behavior, traffic mix and user behavior can change over time. The project was therefore treated as decision support rather than a fully autonomous prediction engine.

Non-stationarity: historical performance may degrade when platform or market conditions change.
Selection effects: observed keyword histories reflect previous campaign decisions.
Missing histories: sparse or new queries can be difficult to model reliably.
Attribution limits: revenue and click metrics depend on platform reporting definitions.
Operational constraints: models must be interpreted with budget, channel and business constraints in mind.

Technologies and Methods Used

Python for data processing, modeling and automation.
pandas / NumPy for high-frequency panel data preparation and aggregation.
AWS Athena / S3 for cloud data extraction and analytical storage.
boto3 for programmatic query execution and result retrieval.
SQL for cloud-side filtering, joins and reporting-table construction.
scikit-learn for regression, tree-based modeling and evaluation experiments.
statsmodels for time-series decomposition and seasonality analysis.
Matplotlib / Seaborn for visual diagnostics and stakeholder reporting.
A/B testing for comparing campaign or landing-page variants.
Finite-state transition modeling for keyword value dynamics.

Resources

Internal documents, data and code are not public.

An anonymized technical note can be prepared upon request.