Downloads
Each dataset is available as a labeled Stata .dta and its source file.
⇩ Download all data (ZIP)stata_codebook.do
| Dataset | Grain | Rows | Stata | Source |
|---|---|---|---|---|
industrial_park_district_panel | district-year (woreda x year) | 2,224 × 34 | industrial_park_district_panel.dta | industrial_park_district_panel.csv |
industrial_park_household_rcs | household-round (repeated cross-section) | 13,200 × 13 | industrial_park_household_rcs.dta | industrial_park_household_rcs.csv |
industrial_park_individual_rcs | individual-round (repeated cross-section) | 17,900 × 22 | industrial_park_individual_rcs.dta | industrial_park_individual_rcs.csv |
Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.
Load directly in code
Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.
Stata
* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_did_industrial_park/data/"
use "${BASE}industrial_park_district_panel.dta", clear
describe
notesPython
!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_did_industrial_park/data/"
df = pd.read_stata(BASE + "industrial_park_district_panel.dta")
# load every dataset at once
files = ["industrial_park_district_panel", "industrial_park_household_rcs", "industrial_park_individual_rcs"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}
# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "industrial_park_district_panel.dta", "industrial_park_district_panel.dta")
df, meta = pyreadstat.read_dta("industrial_park_district_panel.dta")Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb
R
# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_did_industrial_park/data/"
df <- read_dta(paste0(BASE, "industrial_park_district_panel.dta"))Overview & sources
Companion data for a beginner-friendly, end-to-end staggered difference-in-differences (DiD) tutorial that asks whether Ethiopia's industrial parks raised local economic activity, household living standards, and women's economic agency — and for whom. The tutorial replicates Huang, Wang & Xu (2026) on fully synthetic data spanning three grains: a satellite district-year panel of 139 woredas (2005–2020), and two Ethiopia DHS repeated cross-sections of households and individuals across five survey rounds. The estimand throughout is the average treatment effect on the treated (ATT) — the effect on the 17 park-hosting woredas relative to 122 propensity-score-matched never-treated controls — identified under parallel trends in an explicitly observational setting. The analysis runs a static two-way fixed-effects DiD and an event study with pyfixest, cross-checks them against the Sun-Abraham, Borusyak/Gardner and Callaway-Sant'Anna estimators plus a Goodman-Bacon decomposition, and runs survey-weighted repeated-cross-section DiD with Conley spatial standard errors.
industrial_park_district_panel is a balanced annual satellite panel (one row per woreda × year, 139 woredas × 16 years, 2005–2020) carrying the activity outcomes, the staggered treatment timing, geography, and 2007 baseline covariates. industrial_park_household_rcs and industrial_park_individual_rcs are DHS repeated cross-sections (different respondents each of five rounds: 2000, 2005, 2011, 2016, 2019), so they carry no within-respondent panel key and admit only coarse event phases and survey-weighted regressions, never unit fixed effects.
Data sources
| Source | Provides | Reference / URL |
|---|---|---|
| Huang, Wang & Xu (2026) | Replicated study; calibration targets (coefficient magnitudes, signs, significance) and the real park geography shown for context | Huang, G., Wang, M., & Xu, H. (2026). The socioeconomic impacts of industrial parks in Ethiopia. Journal of Urban Economics. https://doi.org/10.1016/j.jue.2026.103867 |
| Synthetic (this study) | All values — simulated by a calibrated data-generating process so re-running the paper's regressions reproduces its findings (open & reproducible) | Mendez, C. (2026). See the post's Python script.py for the full DGP. |
| Ethiopia DHS (calibration reference) | Survey design and outcome definitions the household/individual cross-sections imitate (the real DHS micro-data are confidential and not used) | Ethiopia Demographic and Health Surveys (DHS), 2000–2019 — The DHS Program, ICF / Ethiopian Public Health Institute. https://dhsprogram.com/ |
| Method references | Estimators and concepts | Callaway & Sant'Anna (2021); Sun & Abraham (2021); Borusyak, Jaravel & Spiess (2024); Goodman-Bacon (2021); Conley (1999). |
Cite this data
Please cite this dataset as follows.
APA
Mendez, C. (2026). Do Industrial Parks Work? Evaluating Place-Based Policy in Ethiopia with Difference-in-Differences [Data set]. https://carlos-mendez.org/post/python_did_industrial_park/
Huang, G., Wang, M., & Xu, H. (2026). The socioeconomic impacts of industrial parks in Ethiopia. Journal of Urban Economics. https://doi.org/10.1016/j.jue.2026.103867BibTeX
@misc{mendez2026pythondidindustrialpark,
author = {Mendez, Carlos},
title = {Do Industrial Parks Work? Evaluating Place-Based Policy in Ethiopia with Difference-in-Differences},
year = {2026},
howpublished = {\url{https://carlos-mendez.org/post/python_did_industrial_park/}},
note = {Data set}
}
@article{huang2026industrial,
author = {Huang, Gordon and Wang, Mengjie and Xu, Hangtian},
title = {The socioeconomic impacts of industrial parks in Ethiopia},
journal = {Journal of Urban Economics},
year = {2026},
doi = {10.1016/j.jue.2026.103867}
}Variable explorer search & filter all 56 variables
Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.
| Variable | Type | Distribution | Label | Definition | Units | In files | Source |
|---|---|---|---|---|---|---|---|
age# | continuous | Respondent age | Age of the individual respondent (years); a demographic control. | years | industrial_park_individual_rcs | Synthetic (this study) | |
age_head# | continuous | Age of household head | Age of the household head (years); a demographic control. | years | industrial_park_household_rcs, industrial_park_individual_rcs | Synthetic (this study) | |
age_sq# | continuous | Respondent age squared | Square of respondent age; a demographic control for nonlinear age effects. | years^2 | industrial_park_individual_rcs | Synthetic (this study) | |
china_aid# | dummy | Chinese-financed park (1=yes) | 1 if the park involves Chinese financing/aid, else 0 (context indicator). | 0/1 | industrial_park_district_panel | Synthetic (this study) | |
decision_power# | dummy | Women's decision-making power (1=yes) | 1 if the woman participates in household decision-making, else 0 (empowerment outcome, women). | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
dist_addis_km# | continuous | Distance to Addis Ababa (km) | Road/great-circle distance from the woreda to the capital, Addis Ababa. | km | industrial_park_district_panel | Synthetic (this study) | |
dist_nearest_city_km# | continuous | Distance to nearest city (km) | Distance from the woreda to the nearest city; the steepest effect moderator. | km | industrial_park_district_panel | Synthetic (this study) | |
dist_state_capital_km# | continuous | Distance to regional-state capital (km) | Distance from the woreda to its regional-state capital. | km | industrial_park_district_panel | Synthetic (this study) | |
district_id# | identifier | – | Woreda (district) identifier | Synthetic woreda identifier; the panel's unit of analysis and DiD fixed-effect / cluster key. | string | industrial_park_district_panel, industrial_park_household_rcs, industrial_park_individual_rcs | Synthetic (this study) |
district_name# | identifier | – | Woreda (district) name | Human-readable name for the woreda (real Ethiopian district names used as labels). | string | industrial_park_district_panel | Synthetic (this study) |
durable_goods_pc# | continuous | Durable goods per capita (standardized) | Standardized count of household durable goods per capita; a living-standards outcome. | standardized | industrial_park_household_rcs | Synthetic (this study) | |
dv_accept# | dummy | Accepts domestic violence (composite, 1=yes) | 1 if the respondent justifies wife-beating in any of the five DHS scenarios, else 0 (gender-norms outcome). | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
dv_arguing# | dummy | Justifies beating: arguing with husband | DHS sub-item — 1 if wife-beating is justified for arguing with the husband, else 0. | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
dv_food# | dummy | Justifies beating: burning the food | DHS sub-item — 1 if wife-beating is justified for burning the food, else 0. | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
dv_goingout# | dummy | Justifies beating: going out without telling | DHS sub-item — 1 if wife-beating is justified for going out without telling the husband, else 0. | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
dv_kids# | dummy | Justifies beating: neglecting the children | DHS sub-item — 1 if wife-beating is justified for neglecting the children, else 0. | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
dv_sex# | dummy | Justifies beating: refusing sex | DHS sub-item — 1 if wife-beating is justified for refusing to have sex, else 0. | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
elevation# | continuous | Elevation (m) | Woreda mean elevation above sea level. | metres | industrial_park_district_panel | Synthetic (this study) | |
employment_rate_2007# | continuous | Employment rate, 2007 baseline | Share of the woreda's working-age population employed at the 2007 baseline. | 0-1 (share) | industrial_park_district_panel | Synthetic (this study) | |
event_phase# | continuous | Coarse event phase (rounds since opening) | Round position relative to the district's park opening; the RCS event-study axis (k = -1 reference). | phase (k) | industrial_park_household_rcs, industrial_park_individual_rcs | Synthetic (this study) | |
event_time# | continuous | Years relative to park opening | Year minus the woreda's open_year (event time k); the event-study time axis. | years (k) | industrial_park_district_panel | Synthetic (this study) | |
hh_id# | identifier | – | Household identifier (per-round) | Synthetic per-round household identifier; NOT a panel key (each round samples different households). | string | industrial_park_household_rcs | Synthetic (this study) |
hh_size# | continuous | Household size | Number of members in the household; a demographic control. | persons | industrial_park_household_rcs, industrial_park_individual_rcs | Synthetic (this study) | |
housing_quality# | dummy | Housing-quality indicator (1=meets all) | 1 if the household has electricity, piped water, a toilet, and a finished floor, else 0. | 0/1 | industrial_park_household_rcs | Synthetic (this study) | |
ihs_light# | continuous | IHS nighttime light (headline activity outcome) | Inverse hyperbolic sine of nighttime luminosity; a log-like transform that handles zeros. | asinh(DN) | industrial_park_district_panel | Synthetic (this study) | |
impervious_ratio# | continuous | Built-up (impervious) land share | Share of the woreda's land that is built-up/impervious surface; observed only every five years. | 0-1 (share) | industrial_park_district_panel | Synthetic (this study) | |
ind_id# | identifier | – | Individual identifier (per-round) | Synthetic per-round individual identifier; NOT a panel key (each round samples different individuals). | string | industrial_park_individual_rcs | Synthetic (this study) |
labor_intensive_park# | dummy | Labor-intensive park (1=yes) | 1 if the woreda's park is labor-intensive (textiles/garments), else 0 (park-type context). | 0/1 | industrial_park_district_panel | Synthetic (this study) | |
latitude# | continuous | Woreda latitude | Woreda centroid latitude (decimal degrees). | degrees | industrial_park_district_panel | Synthetic (this study) | |
light_intensity# | continuous | Raw nighttime light intensity | Untransformed mean nighttime luminosity of the woreda (VIIRS-like, calibrated). | DN (light units) | industrial_park_district_panel | Synthetic (this study) | |
light_positive# | dummy | Woreda emits any light (1=positive) | 1 if the woreda has positive nighttime luminosity, else 0. | 0/1 | industrial_park_district_panel | Synthetic (this study) | |
log_pop_density_2007# | continuous | Log population density, 2007 baseline | Natural log of population per km^2 at the 2007 baseline. | log persons/km^2 | industrial_park_district_panel | Synthetic (this study) | |
longitude# | continuous | Woreda longitude | Woreda centroid longitude (decimal degrees). | degrees | industrial_park_district_panel | Synthetic (this study) | |
nearby# | dummy | Control woreda near an open park (<=10 km) | 1 for a control woreda within 10 km of an operational park (spillover/SUTVA test), else 0. | 0/1 | industrial_park_district_panel | Synthetic (this study) | |
nonag_employment# | dummy | Non-agricultural employment (1=yes) | 1 if the individual works in a non-agricultural job, else 0; the headline employment outcome. | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
open_year# | year | – | Park opening year (treated woredas) | Calendar year the woreda's park opened; missing for never-treated controls. | year | industrial_park_district_panel | Synthetic (this study) |
paved_road_density# | continuous | Paved-road density | Density of paved roads in the woreda; the significant road moderator. | km/km^2 (density) | industrial_park_district_panel | Synthetic (this study) | |
population_2007# | continuous | Population, 2007 baseline | Woreda population at the 2007 baseline. | persons | industrial_park_district_panel | Synthetic (this study) | |
post# | dummy | Post-2017 indicator (naive 2x2) | 1 for years at/after the median opening year (2017), used to collapse the design into the naive 2x2. | 0/1 | industrial_park_district_panel | Synthetic (this study) | |
primary_road_density# | continuous | Primary-road density | Density of primary roads in the woreda; a road-access moderator. | km/km^2 (density) | industrial_park_district_panel | Synthetic (this study) | |
public_park# | dummy | Publicly developed park (1=yes) | 1 if the park is publicly (government) developed, else 0 (park-type context). | 0/1 | industrial_park_district_panel | Synthetic (this study) | |
region# | identifier | – | Region (regional state) name | Ethiopian regional state the woreda belongs to. | string | industrial_park_district_panel | Synthetic (this study) |
region_id# | identifier | – | Region numeric code | Integer code for the regional state; used in region x year and region x round fixed effects. | code | industrial_park_district_panel, industrial_park_household_rcs, industrial_park_individual_rcs | Synthetic (this study) |
savings_account# | dummy | Owns a savings account (1=yes) | 1 if the woman owns a savings/bank account, else 0 (financial-inclusion outcome, women). | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
sex# | dummy | Sex (1=female, 0=male) | Respondent sex; the heterogeneity split for the employment/empowerment climax (1 = women). | 0/1 | industrial_park_individual_rcs | Synthetic (this study) | |
share_amharic_2007# | continuous | Amharic-speaking share, 2007 baseline | Share of the woreda population speaking Amharic at the 2007 baseline. | 0-1 (share) | industrial_park_district_panel | Synthetic (this study) | |
share_christian_2007# | continuous | Christian population share, 2007 baseline | Share of the woreda population that is Christian at the 2007 baseline. | 0-1 (share) | industrial_park_district_panel | Synthetic (this study) | |
slope# | continuous | Terrain slope | Woreda mean terrain slope. | degrees | industrial_park_district_panel | Synthetic (this study) | |
survey_round# | year | – | DHS survey round (year) | Calendar year of the DHS round the respondent belongs to. | year | industrial_park_household_rcs, industrial_park_individual_rcs | Synthetic (this study) |
survey_weight# | continuous | DHS sampling weight | Respondent sampling weight for the complex DHS design; used to weight all RCS regressions. | weight | industrial_park_household_rcs, industrial_park_individual_rcs | Synthetic (this study) | |
transport_project# | dummy | Linked transport project (1=yes) | 1 if the woreda has an associated transport project, else 0 (context indicator). | 0/1 | industrial_park_district_panel | Synthetic (this study) | |
treated# | dummy | Ever-treated woreda (1=hosts a park) | 1 if the woreda ever receives an industrial park (group indicator), else 0 (never-treated control). | 0/1 | industrial_park_district_panel, industrial_park_household_rcs, industrial_park_individual_rcs | Synthetic (this study) | |
treatment# | dummy | Treatment switch (1 once the park is open) | Time-varying DiD indicator: 1 for a treated woreda in years at/after its open_year, else 0. | 0/1 | industrial_park_district_panel, industrial_park_household_rcs, industrial_park_individual_rcs | Synthetic (this study) | |
urbanization_rate_2007# | continuous | Urbanization rate, 2007 baseline | Share of the woreda's population in urban areas at the 2007 baseline. | 0-1 (share) | industrial_park_district_panel | Synthetic (this study) | |
wealth_index# | continuous | DHS wealth index (standardized) | Composite standardized household wealth index; effects read in standard deviations. | SD (z-score) | industrial_park_household_rcs | Synthetic (this study) | |
year# | year | – | Calendar year | Annual time index of the satellite panel. | year | industrial_park_district_panel | Synthetic (this study) |
Cross-file variable index
Which file each variable appears in (● = present).
| Variable | industrial_park_district_panel | industrial_park_household_rcs | industrial_park_individual_rcs |
|---|---|---|---|
age | ● | ||
age_head | ● | ● | |
age_sq | ● | ||
china_aid | ● | ||
decision_power | ● | ||
dist_addis_km | ● | ||
dist_nearest_city_km | ● | ||
dist_state_capital_km | ● | ||
district_id | ● | ● | ● |
district_name | ● | ||
durable_goods_pc | ● | ||
dv_accept | ● | ||
dv_arguing | ● | ||
dv_food | ● | ||
dv_goingout | ● | ||
dv_kids | ● | ||
dv_sex | ● | ||
elevation | ● | ||
employment_rate_2007 | ● | ||
event_phase | ● | ● | |
event_time | ● | ||
hh_id | ● | ||
hh_size | ● | ● | |
housing_quality | ● | ||
ihs_light | ● | ||
impervious_ratio | ● | ||
ind_id | ● | ||
labor_intensive_park | ● | ||
latitude | ● | ||
light_intensity | ● | ||
light_positive | ● | ||
log_pop_density_2007 | ● | ||
longitude | ● | ||
nearby | ● | ||
nonag_employment | ● | ||
open_year | ● | ||
paved_road_density | ● | ||
population_2007 | ● | ||
post | ● | ||
primary_road_density | ● | ||
public_park | ● | ||
region | ● | ||
region_id | ● | ● | ● |
savings_account | ● | ||
sex | ● | ||
share_amharic_2007 | ● | ||
share_christian_2007 | ● | ||
slope | ● | ||
survey_round | ● | ● | |
survey_weight | ● | ● | |
transport_project | ● | ||
treated | ● | ● | ● |
treatment | ● | ● | ● |
urbanization_rate_2007 | ● | ||
wealth_index | ● | ||
year | ● |
Construction & formulas
All estimators target the ATT — the average park effect on the treated woredas, identified under parallel trends.
- Static TWFE DiD (satellite panel):
Y_dt = β·D_dt + α_d + γ_{r(d),t} + ε_dt—α_dis a woreda fixed effect,γ_{r(d),t}a region×year fixed effect, andβthe ATT. The "with-trends" spec addst_*interactions of centred time (year − 2012) with 2007 baseline characteristics, letting each woreda follow its own linear trend. - Event study:
Y_dt = Σ_{k≠−1} δ_k·1[t−g = k] + α_d + γ_{r(d),t} + ε_dt— one coefficient per event timek(years since openingg), normalized tok = −1. - Repeated-cross-section DiD (DHS household/individual): no respondent panel, so
identified off district×round means —
Y ~ treatment | district_id + region_id^survey_round, DHS survey-weighted, district-clustered; only coarse event phases. - Goodman-Bacon: decomposes TWFE into weighted 2×2 comparisons (treated-vs-never, earlier-vs-later, forbidden later-vs-earlier).
- Conley spatial-HAC SE: hardens the panel SE for spatial (nearby woredas in a year) and serial dependence; the point estimate is unchanged.
- IHS transform:
ihs_light = asinh(light_intensity)— a log-like transform that admits zeros; coefficients read approximately as proportional changes.
Synthetic data-generating process. The three CSVs are 100% synthetic, calibrated so re-running the paper's regressions reproduces its findings (signs, significance stars, approximate magnitudes). A deliberate bright-base device keeps treated park-cities intrinsically much brighter than rural controls (a level the district fixed effect absorbs); spatial and serial shocks are injected so standard errors behave realistically without moving the point estimates.
The datasets
Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.
expand to search (Ctrl/⌘+F) or print across all datasets
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
district_id identifier | Woreda (district) identifier | Synthetic woreda identifier; the panel's unit of analysis and DiD fixed-effect / cluster key. | Sequential synthetic codes ET_D001 .. ET_D139. | string | Synthetic (this study) | all files |
district_name identifier | Woreda (district) name | Human-readable name for the woreda (real Ethiopian district names used as labels). | Assigned from a name scaffold; calibrated to the paper's park geography. | string | Synthetic (this study) | district panel |
region identifier | Region (regional state) name | Ethiopian regional state the woreda belongs to. | Assigned per woreda (e.g. Oromia, Addis Ababa, Amhara, Tigray, Sidama). | string | Synthetic (this study) | district panel |
region_id identifier | Region numeric code | Integer code for the regional state; used in region x year and region x round fixed effects. | 1..12, one per region. | code | Synthetic (this study) | all files |
treated dummy | Ever-treated woreda (1=hosts a park) | 1 if the woreda ever receives an industrial park (group indicator), else 0 (never-treated control). | 1 for the 17 park-hosting woredas, 0 for the 122 matched controls. | 0/1 | Synthetic (this study) | all files |
open_year year | Park opening year (treated woredas) | Calendar year the woreda's park opened; missing for never-treated controls. | Staggered rollout: 2008 anchor, then 2014-2020 build-out (2-3 woredas/year). | year | Synthetic (this study) | district panel (treated only) |
treatment dummy | Treatment switch (1 once the park is open) | Time-varying DiD indicator: 1 for a treated woreda in years at/after its open_year, else 0. | 1[treated == 1 and year >= open_year]. | 0/1 | Synthetic (this study) | all files |
nearby dummy | Control woreda near an open park (<=10 km) | 1 for a control woreda within 10 km of an operational park (spillover/SUTVA test), else 0. | 1 if a never-treated woreda lies within 10 km of any open park in that year. | 0/1 | Synthetic (this study) | district panel |
event_time continuous | Years relative to park opening | Year minus the woreda's open_year (event time k); the event-study time axis. | year - open_year for treated woredas. | years (k) | Synthetic (this study) | district panel (treated) |
year year | Calendar year | Annual time index of the satellite panel. | 2005-2020, balanced for all 139 woredas. | year | Synthetic (this study) | district panel |
post dummy | Post-2017 indicator (naive 2x2) | 1 for years at/after the median opening year (2017), used to collapse the design into the naive 2x2. | 1[year >= 2017]. | 0/1 | Synthetic (this study) | district panel |
light_intensity continuous | Raw nighttime light intensity | Untransformed mean nighttime luminosity of the woreda (VIIRS-like, calibrated). | Simulated; treated park-cities carry an intrinsically bright base (the bright-base device). | DN (light units) | Synthetic (this study) | district panel |
ihs_light continuous | IHS nighttime light (headline activity outcome) | Inverse hyperbolic sine of nighttime luminosity; a log-like transform that handles zeros. | asinh(light_intensity); coefficients read approximately as proportional changes. | asinh(DN) | Synthetic (this study) | district panel |
light_positive dummy | Woreda emits any light (1=positive) | 1 if the woreda has positive nighttime luminosity, else 0. | 1[light_intensity > 0]. | 0/1 | Synthetic (this study) | district panel |
impervious_ratio continuous | Built-up (impervious) land share | Share of the woreda's land that is built-up/impervious surface; observed only every five years. | Simulated impervious-surface ratio calibrated to the GISD30-style product. | 0-1 (share) | Synthetic (this study) | district panel (5-yearly) |
longitude continuous | Woreda longitude | Woreda centroid longitude (decimal degrees). | Assigned per woreda, calibrated to the paper's park geography. | degrees | Synthetic (this study) | district panel |
latitude continuous | Woreda latitude | Woreda centroid latitude (decimal degrees). | Assigned per woreda, calibrated to the paper's park geography. | degrees | Synthetic (this study) | district panel |
elevation continuous | Elevation (m) | Woreda mean elevation above sea level. | Assigned per woreda (time-invariant). | metres | Synthetic (this study) | district panel |
slope continuous | Terrain slope | Woreda mean terrain slope. | Assigned per woreda (time-invariant). | degrees | Synthetic (this study) | district panel |
dist_addis_km continuous | Distance to Addis Ababa (km) | Road/great-circle distance from the woreda to the capital, Addis Ababa. | Computed from woreda coordinates; a heterogeneity moderator. | km | Synthetic (this study) | district panel |
dist_state_capital_km continuous | Distance to regional-state capital (km) | Distance from the woreda to its regional-state capital. | Computed from woreda coordinates; a heterogeneity moderator. | km | Synthetic (this study) | district panel |
dist_nearest_city_km continuous | Distance to nearest city (km) | Distance from the woreda to the nearest city; the steepest effect moderator. | Computed from woreda coordinates; a heterogeneity moderator. | km | Synthetic (this study) | district panel |
urbanization_rate_2007 continuous | Urbanization rate, 2007 baseline | Share of the woreda's population in urban areas at the 2007 baseline. | 2007 baseline value; interacted with centred time for the unit-specific trends. | 0-1 (share) | Synthetic (this study) | district panel |
employment_rate_2007 continuous | Employment rate, 2007 baseline | Share of the woreda's working-age population employed at the 2007 baseline. | 2007 baseline value; interacted with centred time for the unit-specific trends. | 0-1 (share) | Synthetic (this study) | district panel |
log_pop_density_2007 continuous | Log population density, 2007 baseline | Natural log of population per km^2 at the 2007 baseline. | log of 2007 population density; interacted with centred time for the trends. | log persons/km^2 | Synthetic (this study) | district panel |
population_2007 continuous | Population, 2007 baseline | Woreda population at the 2007 baseline. | 2007 baseline population count. | persons | Synthetic (this study) | district panel |
primary_road_density continuous | Primary-road density | Density of primary roads in the woreda; a road-access moderator. | Assigned per woreda; positive interaction with treatment (amplifies the effect). | km/km^2 (density) | Synthetic (this study) | district panel |
paved_road_density continuous | Paved-road density | Density of paved roads in the woreda; the significant road moderator. | Assigned per woreda; positive interaction with treatment (amplifies the effect). | km/km^2 (density) | Synthetic (this study) | district panel |
share_christian_2007 continuous | Christian population share, 2007 baseline | Share of the woreda population that is Christian at the 2007 baseline. | 2007 baseline value; interacted with centred time for the unit-specific trends. | 0-1 (share) | Synthetic (this study) | district panel |
share_amharic_2007 continuous | Amharic-speaking share, 2007 baseline | Share of the woreda population speaking Amharic at the 2007 baseline. | 2007 baseline value; interacted with centred time for the unit-specific trends. | 0-1 (share) | Synthetic (this study) | district panel |
labor_intensive_park dummy | Labor-intensive park (1=yes) | 1 if the woreda's park is labor-intensive (textiles/garments), else 0 (park-type context). | Assigned per treated woreda; 0 for controls. | 0/1 | Synthetic (this study) | district panel |
public_park dummy | Publicly developed park (1=yes) | 1 if the park is publicly (government) developed, else 0 (park-type context). | Assigned per treated woreda; 0 for controls. | 0/1 | Synthetic (this study) | district panel |
china_aid dummy | Chinese-financed park (1=yes) | 1 if the park involves Chinese financing/aid, else 0 (context indicator). | Assigned per treated woreda; 0 for controls. | 0/1 | Synthetic (this study) | district panel |
transport_project dummy | Linked transport project (1=yes) | 1 if the woreda has an associated transport project, else 0 (context indicator). | Assigned per woreda. | 0/1 | Synthetic (this study) | district panel |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
district_id | – | 100% | 2,224 | 139 | — | — | — | — | — |
district_name | – | 100% | 2,224 | 139 | — | — | — | — | — |
region | – | 100% | 2,224 | 12 | — | — | — | — | — |
region_id | – | 100% | 2,224 | 12 | — | — | — | — | — |
treated | 100% | 2,224 | 2 | 0 | 0.122 | 0 | 1.00 | 0.328 | |
open_year | – | 12% | 272 | 8 | 2008 | 2016.4 | 2017 | 2020 | 2.79 |
treatment | 100% | 2,224 | 2 | 0 | 0.035 | 0 | 1.00 | 0.184 | |
nearby | 100% | 2,224 | 2 | 0 | 0.009 | 0 | 1.00 | 0.094 | |
event_time | 12% | 272 | 11 | -5.00 | -2.30 | -4.00 | 5.00 | 3.26 | |
year | – | 100% | 2,224 | 16 | 2005 | 2012.5 | 2012 | 2020 | 4.61 |
post | 100% | 2,224 | 2 | 0 | 0.035 | 0 | 1.00 | 0.184 | |
light_intensity | 100% | 2,224 | 997 | 0 | 0.668 | 1.00e-04 | 10.62 | 1.67 | |
ihs_light | 100% | 2,224 | 998 | 0 | 0.352 | 1.00e-04 | 3.06 | 0.715 | |
light_positive | 100% | 2,224 | 2 | 0 | 0.585 | 1.00 | 1.00 | 0.493 | |
impervious_ratio | 25% | 556 | 548 | 0 | 0.032 | 0.031 | 0.085 | 0.014 | |
longitude | 100% | 2,224 | 139 | 35.67 | 38.65 | 38.65 | 42.42 | 1.33 | |
latitude | 100% | 2,224 | 139 | 2.49 | 8.90 | 8.97 | 13.55 | 1.80 | |
elevation | 100% | 2,224 | 139 | 88.20 | 1,881.5 | 1,831.4 | 3,711.2 | 612.8 | |
slope | 100% | 2,224 | 138 | 0 | 5.99 | 6.30 | 16.26 | 3.28 | |
dist_addis_km | 100% | 2,224 | 132 | 0 | 212.6 | 213.5 | 545.7 | 121.8 | |
dist_state_capital_km | 100% | 2,224 | 130 | 0 | 137.1 | 129.6 | 340.4 | 85.01 | |
dist_nearest_city_km | 100% | 2,224 | 124 | 0 | 61.81 | 57.40 | 164.3 | 41.04 | |
urbanization_rate_2007 | 100% | 2,224 | 102 | 0 | 0.233 | 0.229 | 0.979 | 0.209 | |
employment_rate_2007 | 100% | 2,224 | 135 | 0.363 | 0.677 | 0.692 | 0.989 | 0.111 | |
log_pop_density_2007 | 100% | 2,224 | 139 | 0.493 | 5.10 | 5.12 | 9.71 | 1.56 | |
population_2007 | 100% | 2,224 | 139 | 2,619.0 | 1,030,765 | 268,727 | 26,334,578 | 2,927,471 | |
primary_road_density | 100% | 2,224 | 135 | 0.020 | 0.844 | 0.553 | 4.13 | 0.825 | |
paved_road_density | 100% | 2,224 | 136 | 0.003 | 0.730 | 0.441 | 3.92 | 0.744 | |
share_christian_2007 | 100% | 2,224 | 137 | 0.043 | 0.635 | 0.672 | 0.990 | 0.215 | |
share_amharic_2007 | 100% | 2,224 | 136 | 0.023 | 0.413 | 0.407 | 0.891 | 0.226 | |
labor_intensive_park | 12% | 272 | 2 | 0 | 0.647 | 1.00 | 1.00 | 0.479 | |
public_park | 12% | 272 | 2 | 0 | 0.941 | 1.00 | 1.00 | 0.236 | |
china_aid | 100% | 2,224 | 2 | 0 | 0.115 | 0 | 1.00 | 0.319 | |
transport_project | 100% | 2,224 | 2 | 0 | 0.101 | 0 | 1.00 | 0.301 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
hh_id identifier | Household identifier (per-round) | Synthetic per-round household identifier; NOT a panel key (each round samples different households). | Sequential codes HH_000000 .. across all rounds. | string | Synthetic (this study) | household RCS |
survey_round year | DHS survey round (year) | Calendar year of the DHS round the respondent belongs to. | One of five rounds: 2000, 2005, 2011, 2016, 2019. | year | Synthetic (this study) | household & individual RCS |
district_id identifier | Woreda (district) identifier | Synthetic woreda identifier; the panel's unit of analysis and DiD fixed-effect / cluster key. | Sequential synthetic codes ET_D001 .. ET_D139. | string | Synthetic (this study) | all files |
region_id identifier | Region numeric code | Integer code for the regional state; used in region x year and region x round fixed effects. | 1..12, one per region. | code | Synthetic (this study) | all files |
treated dummy | Ever-treated woreda (1=hosts a park) | 1 if the woreda ever receives an industrial park (group indicator), else 0 (never-treated control). | 1 for the 17 park-hosting woredas, 0 for the 122 matched controls. | 0/1 | Synthetic (this study) | all files |
treatment dummy | Treatment switch (1 once the park is open) | Time-varying DiD indicator: 1 for a treated woreda in years at/after its open_year, else 0. | 1[treated == 1 and year >= open_year]. | 0/1 | Synthetic (this study) | all files |
event_phase continuous | Coarse event phase (rounds since opening) | Round position relative to the district's park opening; the RCS event-study axis (k = -1 reference). | Survey round position minus the opening round, for treated districts. | phase (k) | Synthetic (this study) | household & individual RCS |
durable_goods_pc continuous | Durable goods per capita (standardized) | Standardized count of household durable goods per capita; a living-standards outcome. | Standardized score (mean ~0); ATT reads against a near-zero mean. | standardized | Synthetic (this study) | household RCS |
housing_quality dummy | Housing-quality indicator (1=meets all) | 1 if the household has electricity, piped water, a toilet, and a finished floor, else 0. | Composite 0/1 indicator over the four housing amenities. | 0/1 | Synthetic (this study) | household RCS |
wealth_index continuous | DHS wealth index (standardized) | Composite standardized household wealth index; effects read in standard deviations. | Standardized composite (mean ~0, SD ~1). | SD (z-score) | Synthetic (this study) | household RCS |
hh_size continuous | Household size | Number of members in the household; a demographic control. | Integer count, 1-12. | persons | Synthetic (this study) | household & individual RCS |
age_head continuous | Age of household head | Age of the household head (years); a demographic control. | Integer years, 18-90. | years | Synthetic (this study) | household & individual RCS |
survey_weight continuous | DHS sampling weight | Respondent sampling weight for the complex DHS design; used to weight all RCS regressions. | Calibrated to a DHS-style design (mean ~1). | weight | Synthetic (this study) | household & individual RCS |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
hh_id | – | 100% | 13,200 | 13,200 | — | — | — | — | — |
survey_round | – | 100% | 13,200 | 5 | 2000 | 2011.9 | 2011 | 2019 | 6.36 |
district_id | – | 100% | 13,200 | 139 | — | — | — | — | — |
region_id | – | 100% | 13,200 | 12 | — | — | — | — | — |
treated | 100% | 13,200 | 2 | 0 | 0.189 | 0 | 1.00 | 0.392 | |
treatment | 100% | 13,200 | 2 | 0 | 0.077 | 0 | 1.00 | 0.266 | |
event_phase | 19% | 2,496 | 7 | -4.00 | -1.02 | -1.00 | 2.00 | 1.43 | |
durable_goods_pc | 92% | 12,207 | 12,207 | -1.67 | 0.308 | 0.308 | 2.17 | 0.487 | |
housing_quality | 92% | 12,206 | 2 | 0 | 0.307 | 0 | 1.00 | 0.461 | |
wealth_index | 73% | 9,688 | 9,688 | -3.85 | -4.57e-04 | -0.011 | 3.41 | 1.02 | |
hh_size | 100% | 13,200 | 12 | 1.00 | 4.95 | 5.00 | 12.00 | 1.81 | |
age_head | 100% | 13,200 | 70 | 18.00 | 42.82 | 43.00 | 90.00 | 11.85 | |
survey_weight | 100% | 13,200 | 8,249 | 0.135 | 0.999 | 0.956 | 2.81 | 0.353 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
ind_id identifier | Individual identifier (per-round) | Synthetic per-round individual identifier; NOT a panel key (each round samples different individuals). | Sequential codes IND_000000 .. across all rounds. | string | Synthetic (this study) | individual RCS |
survey_round year | DHS survey round (year) | Calendar year of the DHS round the respondent belongs to. | One of five rounds: 2000, 2005, 2011, 2016, 2019. | year | Synthetic (this study) | household & individual RCS |
district_id identifier | Woreda (district) identifier | Synthetic woreda identifier; the panel's unit of analysis and DiD fixed-effect / cluster key. | Sequential synthetic codes ET_D001 .. ET_D139. | string | Synthetic (this study) | all files |
region_id identifier | Region numeric code | Integer code for the regional state; used in region x year and region x round fixed effects. | 1..12, one per region. | code | Synthetic (this study) | all files |
treated dummy | Ever-treated woreda (1=hosts a park) | 1 if the woreda ever receives an industrial park (group indicator), else 0 (never-treated control). | 1 for the 17 park-hosting woredas, 0 for the 122 matched controls. | 0/1 | Synthetic (this study) | all files |
treatment dummy | Treatment switch (1 once the park is open) | Time-varying DiD indicator: 1 for a treated woreda in years at/after its open_year, else 0. | 1[treated == 1 and year >= open_year]. | 0/1 | Synthetic (this study) | all files |
event_phase continuous | Coarse event phase (rounds since opening) | Round position relative to the district's park opening; the RCS event-study axis (k = -1 reference). | Survey round position minus the opening round, for treated districts. | phase (k) | Synthetic (this study) | household & individual RCS |
sex dummy | Sex (1=female, 0=male) | Respondent sex; the heterogeneity split for the employment/empowerment climax (1 = women). | 1 for women, 0 for men. | 0/1 | Synthetic (this study) | individual RCS |
age continuous | Respondent age | Age of the individual respondent (years); a demographic control. | Integer years, 15-62. | years | Synthetic (this study) | individual RCS |
age_sq continuous | Respondent age squared | Square of respondent age; a demographic control for nonlinear age effects. | age^2. | years^2 | Synthetic (this study) | individual RCS |
nonag_employment dummy | Non-agricultural employment (1=yes) | 1 if the individual works in a non-agricultural job, else 0; the headline employment outcome. | 0/1 indicator; the average effect is null while the female effect is significant. | 0/1 | Synthetic (this study) | individual RCS |
decision_power dummy | Women's decision-making power (1=yes) | 1 if the woman participates in household decision-making, else 0 (empowerment outcome, women). | 0/1 indicator. | 0/1 | Synthetic (this study) | individual RCS (women) |
savings_account dummy | Owns a savings account (1=yes) | 1 if the woman owns a savings/bank account, else 0 (financial-inclusion outcome, women). | 0/1 indicator. | 0/1 | Synthetic (this study) | individual RCS (women) |
dv_accept dummy | Accepts domestic violence (composite, 1=yes) | 1 if the respondent justifies wife-beating in any of the five DHS scenarios, else 0 (gender-norms outcome). | Composite over the five dv_* sub-items; treatment lowers it. | 0/1 | Synthetic (this study) | individual RCS |
dv_goingout dummy | Justifies beating: going out without telling | DHS sub-item — 1 if wife-beating is justified for going out without telling the husband, else 0. | 0/1 DHS attitude sub-item composing dv_accept. | 0/1 | Synthetic (this study) | individual RCS |
dv_kids dummy | Justifies beating: neglecting the children | DHS sub-item — 1 if wife-beating is justified for neglecting the children, else 0. | 0/1 DHS attitude sub-item composing dv_accept. | 0/1 | Synthetic (this study) | individual RCS |
dv_arguing dummy | Justifies beating: arguing with husband | DHS sub-item — 1 if wife-beating is justified for arguing with the husband, else 0. | 0/1 DHS attitude sub-item composing dv_accept. | 0/1 | Synthetic (this study) | individual RCS |
dv_sex dummy | Justifies beating: refusing sex | DHS sub-item — 1 if wife-beating is justified for refusing to have sex, else 0. | 0/1 DHS attitude sub-item composing dv_accept. | 0/1 | Synthetic (this study) | individual RCS |
dv_food dummy | Justifies beating: burning the food | DHS sub-item — 1 if wife-beating is justified for burning the food, else 0. | 0/1 DHS attitude sub-item composing dv_accept. | 0/1 | Synthetic (this study) | individual RCS |
hh_size continuous | Household size | Number of members in the household; a demographic control. | Integer count, 1-12. | persons | Synthetic (this study) | household & individual RCS |
age_head continuous | Age of household head | Age of the household head (years); a demographic control. | Integer years, 18-90. | years | Synthetic (this study) | household & individual RCS |
survey_weight continuous | DHS sampling weight | Respondent sampling weight for the complex DHS design; used to weight all RCS regressions. | Calibrated to a DHS-style design (mean ~1). | weight | Synthetic (this study) | household & individual RCS |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
ind_id | – | 100% | 17,900 | 17,900 | — | — | — | — | — |
survey_round | – | 100% | 17,900 | 5 | 2000 | 2011.9 | 2011 | 2019 | 6.36 |
district_id | – | 100% | 17,900 | 139 | — | — | — | — | — |
region_id | – | 100% | 17,900 | 12 | — | — | — | — | — |
treated | 100% | 17,900 | 2 | 0 | 0.168 | 0 | 1.00 | 0.374 | |
treatment | 100% | 17,900 | 2 | 0 | 0.068 | 0 | 1.00 | 0.252 | |
event_phase | 17% | 3,012 | 7 | -4.00 | -1.00 | -1.00 | 2.00 | 1.41 | |
sex | 100% | 17,900 | 2 | 0 | 0.656 | 1.00 | 1.00 | 0.475 | |
age | 100% | 17,900 | 47 | 15.00 | 30.21 | 30.00 | 62.00 | 8.63 | |
age_sq | 100% | 17,900 | 47 | 225.0 | 987.0 | 900.0 | 3,844.0 | 543.3 | |
nonag_employment | 96% | 17,219 | 2 | 0 | 0.343 | 0 | 1.00 | 0.475 | |
decision_power | 26% | 4,737 | 2 | 0 | 0.871 | 1.00 | 1.00 | 0.335 | |
savings_account | 62% | 11,155 | 2 | 0 | 0.063 | 0 | 1.00 | 0.242 | |
dv_accept | 62% | 11,109 | 2 | 0 | 0.635 | 1.00 | 1.00 | 0.481 | |
dv_goingout | 62% | 11,064 | 2 | 0 | 0.450 | 0 | 1.00 | 0.498 | |
dv_kids | 62% | 11,069 | 2 | 0 | 0.498 | 0 | 1.00 | 0.500 | |
dv_arguing | 62% | 11,043 | 2 | 0 | 0.438 | 0 | 1.00 | 0.496 | |
dv_sex | 60% | 10,818 | 2 | 0 | 0.370 | 0 | 1.00 | 0.483 | |
dv_food | 62% | 11,068 | 2 | 0 | 0.428 | 0 | 1.00 | 0.495 | |
hh_size | 100% | 17,900 | 12 | 1.00 | 4.96 | 5.00 | 12.00 | 1.81 | |
age_head | 100% | 17,900 | 72 | 18.00 | 42.83 | 43.00 | 90.00 | 11.84 | |
survey_weight | 100% | 17,900 | 9,616 | 0.126 | 0.997 | 0.958 | 2.92 | 0.349 |
Known limitations & caveats
- Synthetic data. The three CSVs are 100% synthetic, built for teaching. They are calibrated to reproduce the paper's findings, not the paper's real (confidential) inputs — harmonized nighttime lights, the GISD30 impervious-surface product, Ethiopia DHS micro-data, and the official park list. Use them to learn the methods, not to draw conclusions about Ethiopia.
- Tiny treated group. Only 17 treated woredas against 122 matched controls, so several effects are borderline and the small treated sample cannot make all moderators precise at once (e.g. the primary-road interaction is correctly signed but insignificant).
- Observational, not randomized. Parks were not randomly placed (they went to denser, more urban, better-connected woredas), so identification rests on parallel trends, not randomization; the fixed effects and trend terms are confounding controls, not precision-only adjustments.
- Documented synthetic gaps. The raw-light coefficient runs high (~1.6 vs the paper's 1.276, a bright-base artifact); light levels are intentionally not PSM-matched (the EDA figure is baseline-normalized); the decision-power mean sits a touch below the paper's because of a linear-probability clipping ceiling. See the post's Section 13 reproduction audit.
- Repeated cross-sections. The household and individual files have no within-respondent panel key, so they admit only coarse event phases and survey-weighted regressions, never unit fixed effects.