Historical Concessions & Geographic RDD
Overview
This project studied whether historical concession boundaries were associated with persistent differences in local outcomes, using a geographic regression-discontinuity design and spatially matched control areas.
The work combined concession boundary data, village-level data, geographic controls, conflict outcomes, public-goods indicators and night-light measures to compare areas inside historical concessions with nearby areas outside the same boundaries.
The project should be read as an applied causal-inference and political-economy prototype, not as a simple descriptive comparison.
Problem
Historical concessions may shape local development through infrastructure, settlement patterns, institutional exposure and conflict dynamics. The empirical challenge is that concession areas were not randomly assigned and may differ geographically from nearby untreated areas.
The goal was to construct a credible comparison between treated villages inside concession boundaries and nearby control villages outside those boundaries, while accounting for geographic confounders.
Identification Strategy
The core design was geographic: compare units near concession boundaries, where inside and outside villages are more likely to share similar geography than units farther away.
villages inside concession boundary vs. nearby villages outside concession boundary
The analysis used distance buffers, geographic matching and boundary-based comparisons to reduce imbalance between treated and control locations.
Data Construction
The workflow joined multiple spatial and tabular datasets, including concession indicators, village identifiers, coordinates and local-outcome variables. Boundary and GIS-derived inputs were treated as source data; the main implementation work focused on coordinate-based matching, dataset construction and econometric estimation.
- Concession boundary indicators for treated locations.
- Nearby buffer groups at alternative distance thresholds.
- Latitude, longitude, altitude and distance-to-city controls.
- Conflict-event data matched to village locations.
- Public-goods variables such as schools, health facilities and roads.
- Night-light intensity as a proxy for local economic activity.
Geographic Balance and Matching
The project included balance checks between treated and nearby control units. Matched datasets were constructed at multiple distance thresholds, using latitude-longitude information and distance-based rules to connect treated locations with comparable nearby controls.
Balance diagnostics compared geographic and baseline variables across inside and outside areas before interpreting outcome regressions. The matching and data-construction logic was developed across MATLAB and Stata, while the final regression workflow was implemented in Stata.
Outcome Models
The analysis considered several outcome families:
- Conflict: counts of conflict events around treated and control villages.
- Public goods: schools, health facilities and road density.
- Economic activity: night-light intensity.
- Population and geography: density, altitude, distance and river-related controls.
Regression specifications included geographic controls such as latitude and longitude, and alternative count-data models were considered for overdispersed outcomes.
Spatial Inference
Because observations are spatially located, the project considered the problem of spatial dependence in residuals and standard errors.
The workflow included references to Conley-style spatial HAC corrections and spatially robust inference as part of the econometric design.
Implemented Elements
- Construction of treated and control datasets around concession boundaries.
- Cleaning and merging of village-level, conflict and public-goods data.
- Creation of buffer groups at different distance thresholds.
- Coordinate-based matching of treated and nearby control villages.
- Geographic balance checks using treated and nearby control units.
- Regression-discontinuity style specifications with coordinate controls.
- Conflict regressions and count-model diagnostics.
- Public-goods regressions for health facilities, schools and roads.
- Night-light regressions as a proxy for local economic activity.
- Exploration of spatial-HAC and Conley-style inference.
Outputs
The project produced cleaned analysis datasets, balance checks and regression tables comparing concession and nearby non-concession areas across several outcomes.
The value of the project lies in the research design: moving from historical geographic boundaries to a structured causal comparison with explicit attention to matching, balance and spatial confounding.
Evaluation Limits
The design is informative but not automatic proof of causality. Historical concession placement, omitted geographic variables and spatial dependence all require caution.
- Boundary validity: treated and control areas must be comparable near the concession boundary.
- Spatial dependence: nearby observations may have correlated shocks and outcomes.
- Historical selection: concession placement may reflect unobserved economic or geographic advantages.
- Measurement: conflict, public-goods and night-light data can contain location and reporting error.
- Bandwidth choice: estimates can change across distance thresholds.
Modern Extension
A modern version of the project would formalize the design in a reproducible geospatial pipeline and strengthen the robustness framework.
- Automate nearest-neighbor matching and duplicate-resolution rules.
- Use modern geographic RDD tooling and bandwidth sensitivity analysis.
- Add Conley or spatial-clustered standard errors consistently.
- Run placebo boundaries and pre-treatment balance checks.
- Visualize concession boundaries, buffers and matched villages on maps.
- Package the workflow as a reproducible MATLAB/Stata or Python geospatial project.
Technologies and Methods Used
- MATLAB for coordinate-based matching and distance logic.
- Stata for data preparation, merging, regression analysis and output tables.
- Geographic regression discontinuity for boundary-based causal comparison.
- Geospatial matching for nearby treated-control comparisons.
- Negative binomial models for overdispersed count outcomes.
- Spatial-HAC / Conley-style inference as a robustness direction.
- GIS-derived variables including coordinates, buffers and distance measures.
Resources
Code and raw data are not public.
An anonymized technical note can be prepared upon request.