Partial Cointegration & Statistical Arbitrage
Overview
This project explored a statistical arbitrage strategy based on partial cointegration, a framework in which the spread between two financial assets can contain both a mean-reverting component and a persistent stochastic trend.
The goal was to test whether partial cointegration could generate more robust pairs-trading signals than classical cointegration approaches based on raw spread residuals.
The project was implemented in R and should be interpreted as an applied quantitative finance prototype rather than as a production trading system.
Problem
Classical pairs trading often relies on the idea that the spread between two assets is stationary and mean-reverting. In practice, this assumption can be too restrictive.
Many financial spreads are only partially mean-reverting: one component may revert toward equilibrium, while another component may behave like a persistent random walk.
Partial cointegration addresses this by decomposing the observed spread into a tradable mean-reverting component and a non-stationary component.
Model Idea
Given two asset price series, the spread can be written as:
spread_t = y_t − α − β x_t
In a partial cointegration framework, the spread is decomposed as:
spread_t = M_t + R_t
where M_t is the mean-reverting component and R_t is the persistent random-walk component.
The trading idea is to base entry and exit decisions on the mean-reverting part of the spread, rather than on the raw spread alone.
Dataset and Universe
- S&P 500 equity universe.
- Daily adjusted price series.
- Candidate equity pairs selected through correlation screening.
- Classical cointegration diagnostics used as benchmark comparison.
- Partial cointegration estimated for selected pairs.
Example — Trading Signal
A standard pairs-trading signal is based on the normalized spread:
z_t = (spread_t − μ) / σ
Positions are opened when the spread moves sufficiently far from its estimated equilibrium and closed when the spread reverts.
The partial cointegration approach tests whether the latent mean-reverting component produces cleaner and more stable trading signals than the raw spread or a classical Engle–Granger residual.
Technologies and Methods Used
- R for data handling, econometric modeling and strategy prototyping.
- Time-series analysis for modeling price dynamics and spreads.
- Correlation screening for selecting candidate equity pairs.
- Classical cointegration testing as benchmark diagnostic.
- Partial cointegration for decomposing spreads into mean-reverting and persistent components.
- State-space modeling for representing latent spread dynamics.
- Kalman filtering for recursive estimation of latent components.
- Z-score signal construction for entry and exit rules.
- Backtesting logic for evaluating simulated trading performance.
- Risk-return diagnostics including cumulative return, Sharpe ratio and drawdown.
Implemented Elements
- Import and preparation of daily equity price data.
- Candidate pair selection through correlation filtering.
- Classical cointegration comparison.
- Partial cointegration estimation in R.
- Construction of standardized spread-based trading signals.
- Simulation of entry and exit rules for selected equity pairs.
- Comparison between classical cointegration and partial cointegration signals.
- Basic performance evaluation through return and risk diagnostics.
Preliminary Results
The prototype compared signals based on classical cointegration with signals derived from the partial cointegration framework.
Preliminary experiments suggested that the partial cointegration approach could produce smoother signals and reduce some false entries relative to raw-spread or classical cointegration strategies.
These results should be interpreted cautiously. A robust trading evaluation requires strict out-of-sample testing, realistic transaction costs, liquidity constraints and controls for data-snooping bias.
Figures
Figures in preparation.
R Implementation Sketch
The implementation was developed in R using a workflow based on pair selection, cointegration diagnostics, partial cointegration estimation and signal construction.
CI <- egcm(X, Y) PCI <- fit.pci(X, Y) spread <- Y - alpha - beta * X z <- (spread - mean(spread)) / sd(spread)
This sketch illustrates the logic of the workflow rather than the full implementation.
Evaluation Limits
The project included internal performance diagnostics, but it should not be interpreted as a fully validated trading system.
- Already present: signal construction, trading simulation and basic risk-return diagnostics.
- To be strengthened: strict train/test or walk-forward validation.
- To be strengthened: transaction cost, slippage and liquidity sensitivity.
- To be strengthened: robustness across different market regimes and estimation windows.
- To be strengthened: controls for look-ahead bias, survivorship bias and data-snooping.
- To be strengthened: comparison against additional baselines such as distance-based pairs trading.
Methodological Note
The main value of the project is methodological: it treats financial spreads as latent dynamic objects rather than as simple residual series.
This connects naturally to broader themes in quantitative finance and financial risk modeling: observed prices are noisy signals of underlying latent processes, and trading or risk decisions depend on how those latent processes are estimated.
Modern Extension
A modern version of this project would keep the partial cointegration framework but evaluate it with a more rigorous experimental design.
- Use rolling or walk-forward estimation windows.
- Compare partial cointegration against classical cointegration and distance-based pairs trading.
- Include realistic transaction costs, shorting constraints and liquidity filters.
- Run sensitivity analysis over thresholds, window lengths and pair-selection rules.
- Evaluate performance stability across market regimes.
- Track turnover, exposure, drawdown duration and tail risk, not only average return.
Resources
Technical report in preparation.
Technical Note: Partial Cointegration and Kalman Representation
Code available upon request.
Technical Context
- Engle & Granger (1987), Co-integration and Error Correction — relevant as the classical benchmark for cointegration-based spread modeling.
- Clegg & Krauss (2018), Pairs Trading with Partial Cointegration — relevant to the partial cointegration framework for statistical arbitrage.
- Durbin & Koopman (2012), Time Series Analysis by State Space Methods — relevant to the state-space and Kalman filtering representation.