← Back to the post
Interactive data dictionary

Do Industrial Parks Work? Evaluating Place-Based Policy in Ethiopia with Difference-in-Differences

Companion data for a staggered DiD tutorial in Python, on synthetic data calibrated to Huang, Wang & Xu (2026).

3
datasets
139
woredas
17 / 122
treated / control
2005–2020
panel years

Downloads

Each dataset is available as a labeled Stata .dta and its source file.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataSource
industrial_park_district_paneldistrict-year (woreda x year)2,224 × 34industrial_park_district_panel.dtaindustrial_park_district_panel.csv
industrial_park_household_rcshousehold-round (repeated cross-section)13,200 × 13industrial_park_household_rcs.dtaindustrial_park_household_rcs.csv
industrial_park_individual_rcsindividual-round (repeated cross-section)17,900 × 22industrial_park_individual_rcs.dtaindustrial_park_individual_rcs.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_did_industrial_park/data/"
use "${BASE}industrial_park_district_panel.dta", clear
describe
notes

Python

!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_did_industrial_park/data/"
df = pd.read_stata(BASE + "industrial_park_district_panel.dta")

# load every dataset at once
files = ["industrial_park_district_panel", "industrial_park_household_rcs", "industrial_park_individual_rcs"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "industrial_park_district_panel.dta", "industrial_park_district_panel.dta")
df, meta = pyreadstat.read_dta("industrial_park_district_panel.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_did_industrial_park/data/"
df <- read_dta(paste0(BASE, "industrial_park_district_panel.dta"))

Overview & sources

Companion data for a beginner-friendly, end-to-end staggered difference-in-differences (DiD) tutorial that asks whether Ethiopia's industrial parks raised local economic activity, household living standards, and women's economic agency — and for whom. The tutorial replicates Huang, Wang & Xu (2026) on fully synthetic data spanning three grains: a satellite district-year panel of 139 woredas (2005–2020), and two Ethiopia DHS repeated cross-sections of households and individuals across five survey rounds. The estimand throughout is the average treatment effect on the treated (ATT) — the effect on the 17 park-hosting woredas relative to 122 propensity-score-matched never-treated controls — identified under parallel trends in an explicitly observational setting. The analysis runs a static two-way fixed-effects DiD and an event study with pyfixest, cross-checks them against the Sun-Abraham, Borusyak/Gardner and Callaway-Sant'Anna estimators plus a Goodman-Bacon decomposition, and runs survey-weighted repeated-cross-section DiD with Conley spatial standard errors.

Three files at three grains. industrial_park_district_panel is a balanced annual satellite panel (one row per woreda × year, 139 woredas × 16 years, 2005–2020) carrying the activity outcomes, the staggered treatment timing, geography, and 2007 baseline covariates. industrial_park_household_rcs and industrial_park_individual_rcs are DHS repeated cross-sections (different respondents each of five rounds: 2000, 2005, 2011, 2016, 2019), so they carry no within-respondent panel key and admit only coarse event phases and survey-weighted regressions, never unit fixed effects.

Data sources

SourceProvidesReference / URL
Huang, Wang &amp; Xu (2026)Replicated study; calibration targets (coefficient magnitudes, signs, significance) and the real park geography shown for contextHuang, G., Wang, M., & Xu, H. (2026). The socioeconomic impacts of industrial parks in Ethiopia. Journal of Urban Economics. https://doi.org/10.1016/j.jue.2026.103867
Synthetic (this study)All values — simulated by a calibrated data-generating process so re-running the paper&#x27;s regressions reproduces its findings (open &amp; reproducible)Mendez, C. (2026). See the post's Python script.py for the full DGP.
Ethiopia DHS (calibration reference)Survey design and outcome definitions the household/individual cross-sections imitate (the real DHS micro-data are confidential and not used)Ethiopia Demographic and Health Surveys (DHS), 2000–2019 — The DHS Program, ICF / Ethiopian Public Health Institute. https://dhsprogram.com/
Method referencesEstimators and conceptsCallaway & Sant'Anna (2021); Sun & Abraham (2021); Borusyak, Jaravel & Spiess (2024); Goodman-Bacon (2021); Conley (1999).

Cite this data

Please cite this dataset as follows.

APA

Mendez, C. (2026). Do Industrial Parks Work? Evaluating Place-Based Policy in Ethiopia with Difference-in-Differences [Data set]. https://carlos-mendez.org/post/python_did_industrial_park/

Huang, G., Wang, M., & Xu, H. (2026). The socioeconomic impacts of industrial parks in Ethiopia. Journal of Urban Economics. https://doi.org/10.1016/j.jue.2026.103867

BibTeX

@misc{mendez2026pythondidindustrialpark,
  author       = {Mendez, Carlos},
  title        = {Do Industrial Parks Work? Evaluating Place-Based Policy in Ethiopia with Difference-in-Differences},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/python_did_industrial_park/}},
  note         = {Data set}
}

@article{huang2026industrial,
  author  = {Huang, Gordon and Wang, Mengjie and Xu, Hangtian},
  title   = {The socioeconomic impacts of industrial parks in Ethiopia},
  journal = {Journal of Urban Economics},
  year    = {2026},
  doi     = {10.1016/j.jue.2026.103867}
}

Variable explorer search & filter all 56 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
age#continuousmin 15 | median 30 | max 62Respondent ageAge of the individual respondent (years); a demographic control.yearsindustrial_park_individual_rcsSynthetic (this study)
age_head#continuousmin 18 | median 43 | max 90Age of household headAge of the household head (years); a demographic control.yearsindustrial_park_household_rcs, industrial_park_individual_rcsSynthetic (this study)
age_sq#continuousmin 225 | median 900 | max 3.84e+03Respondent age squaredSquare of respondent age; a demographic control for nonlinear age effects.years^2industrial_park_individual_rcsSynthetic (this study)
china_aid#dummyshare coded 1 = 0.115Chinese-financed park (1=yes)1 if the park involves Chinese financing/aid, else 0 (context indicator).0/1industrial_park_district_panelSynthetic (this study)
decision_power#dummyshare coded 1 = 0.871Women's decision-making power (1=yes)1 if the woman participates in household decision-making, else 0 (empowerment outcome, women).0/1industrial_park_individual_rcsSynthetic (this study)
dist_addis_km#continuousmin 0 | median 213 | max 546Distance to Addis Ababa (km)Road/great-circle distance from the woreda to the capital, Addis Ababa.kmindustrial_park_district_panelSynthetic (this study)
dist_nearest_city_km#continuousmin 0 | median 57.4 | max 164Distance to nearest city (km)Distance from the woreda to the nearest city; the steepest effect moderator.kmindustrial_park_district_panelSynthetic (this study)
dist_state_capital_km#continuousmin 0 | median 130 | max 340Distance to regional-state capital (km)Distance from the woreda to its regional-state capital.kmindustrial_park_district_panelSynthetic (this study)
district_id#identifierWoreda (district) identifierSynthetic woreda identifier; the panel's unit of analysis and DiD fixed-effect / cluster key.stringindustrial_park_district_panel, industrial_park_household_rcs, industrial_park_individual_rcsSynthetic (this study)
district_name#identifierWoreda (district) nameHuman-readable name for the woreda (real Ethiopian district names used as labels).stringindustrial_park_district_panelSynthetic (this study)
durable_goods_pc#continuousmin -1.67 | median 0.308 | max 2.17Durable goods per capita (standardized)Standardized count of household durable goods per capita; a living-standards outcome.standardizedindustrial_park_household_rcsSynthetic (this study)
dv_accept#dummyshare coded 1 = 0.635Accepts domestic violence (composite, 1=yes)1 if the respondent justifies wife-beating in any of the five DHS scenarios, else 0 (gender-norms outcome).0/1industrial_park_individual_rcsSynthetic (this study)
dv_arguing#dummyshare coded 1 = 0.438Justifies beating: arguing with husbandDHS sub-item — 1 if wife-beating is justified for arguing with the husband, else 0.0/1industrial_park_individual_rcsSynthetic (this study)
dv_food#dummyshare coded 1 = 0.428Justifies beating: burning the foodDHS sub-item — 1 if wife-beating is justified for burning the food, else 0.0/1industrial_park_individual_rcsSynthetic (this study)
dv_goingout#dummyshare coded 1 = 0.450Justifies beating: going out without tellingDHS sub-item — 1 if wife-beating is justified for going out without telling the husband, else 0.0/1industrial_park_individual_rcsSynthetic (this study)
dv_kids#dummyshare coded 1 = 0.498Justifies beating: neglecting the childrenDHS sub-item — 1 if wife-beating is justified for neglecting the children, else 0.0/1industrial_park_individual_rcsSynthetic (this study)
dv_sex#dummyshare coded 1 = 0.370Justifies beating: refusing sexDHS sub-item — 1 if wife-beating is justified for refusing to have sex, else 0.0/1industrial_park_individual_rcsSynthetic (this study)
elevation#continuousmin 88.2 | median 1.83e+03 | max 3.71e+03Elevation (m)Woreda mean elevation above sea level.metresindustrial_park_district_panelSynthetic (this study)
employment_rate_2007#continuousmin 0.363 | median 0.692 | max 0.989Employment rate, 2007 baselineShare of the woreda's working-age population employed at the 2007 baseline.0-1 (share)industrial_park_district_panelSynthetic (this study)
event_phase#continuousmin -4 | median -1 | max 2Coarse event phase (rounds since opening)Round position relative to the district's park opening; the RCS event-study axis (k = -1 reference).phase (k)industrial_park_household_rcs, industrial_park_individual_rcsSynthetic (this study)
event_time#continuousmin -5 | median -4 | max 5Years relative to park openingYear minus the woreda's open_year (event time k); the event-study time axis.years (k)industrial_park_district_panelSynthetic (this study)
hh_id#identifierHousehold identifier (per-round)Synthetic per-round household identifier; NOT a panel key (each round samples different households).stringindustrial_park_household_rcsSynthetic (this study)
hh_size#continuousmin 1 | median 5 | max 12Household sizeNumber of members in the household; a demographic control.personsindustrial_park_household_rcs, industrial_park_individual_rcsSynthetic (this study)
housing_quality#dummyshare coded 1 = 0.307Housing-quality indicator (1=meets all)1 if the household has electricity, piped water, a toilet, and a finished floor, else 0.0/1industrial_park_household_rcsSynthetic (this study)
ihs_light#continuousmin 0 | median 0.0001 | max 3.06IHS nighttime light (headline activity outcome)Inverse hyperbolic sine of nighttime luminosity; a log-like transform that handles zeros.asinh(DN)industrial_park_district_panelSynthetic (this study)
impervious_ratio#continuousmin 0 | median 0.0313 | max 0.0848Built-up (impervious) land shareShare of the woreda's land that is built-up/impervious surface; observed only every five years.0-1 (share)industrial_park_district_panelSynthetic (this study)
ind_id#identifierIndividual identifier (per-round)Synthetic per-round individual identifier; NOT a panel key (each round samples different individuals).stringindustrial_park_individual_rcsSynthetic (this study)
labor_intensive_park#dummyshare coded 1 = 0.647Labor-intensive park (1=yes)1 if the woreda's park is labor-intensive (textiles/garments), else 0 (park-type context).0/1industrial_park_district_panelSynthetic (this study)
latitude#continuousmin 2.49 | median 8.97 | max 13.5Woreda latitudeWoreda centroid latitude (decimal degrees).degreesindustrial_park_district_panelSynthetic (this study)
light_intensity#continuousmin 0 | median 0.0001 | max 10.6Raw nighttime light intensityUntransformed mean nighttime luminosity of the woreda (VIIRS-like, calibrated).DN (light units)industrial_park_district_panelSynthetic (this study)
light_positive#dummyshare coded 1 = 0.585Woreda emits any light (1=positive)1 if the woreda has positive nighttime luminosity, else 0.0/1industrial_park_district_panelSynthetic (this study)
log_pop_density_2007#continuousmin 0.493 | median 5.12 | max 9.71Log population density, 2007 baselineNatural log of population per km^2 at the 2007 baseline.log persons/km^2industrial_park_district_panelSynthetic (this study)
longitude#continuousmin 35.7 | median 38.7 | max 42.4Woreda longitudeWoreda centroid longitude (decimal degrees).degreesindustrial_park_district_panelSynthetic (this study)
nearby#dummyshare coded 1 = 0.009Control woreda near an open park (<=10 km)1 for a control woreda within 10 km of an operational park (spillover/SUTVA test), else 0.0/1industrial_park_district_panelSynthetic (this study)
nonag_employment#dummyshare coded 1 = 0.343Non-agricultural employment (1=yes)1 if the individual works in a non-agricultural job, else 0; the headline employment outcome.0/1industrial_park_individual_rcsSynthetic (this study)
open_year#yearPark opening year (treated woredas)Calendar year the woreda's park opened; missing for never-treated controls.yearindustrial_park_district_panelSynthetic (this study)
paved_road_density#continuousmin 0.0034 | median 0.441 | max 3.92Paved-road densityDensity of paved roads in the woreda; the significant road moderator.km/km^2 (density)industrial_park_district_panelSynthetic (this study)
population_2007#continuousmin 2.62e+03 | median 2.69e+05 | max 2.63e+07Population, 2007 baselineWoreda population at the 2007 baseline.personsindustrial_park_district_panelSynthetic (this study)
post#dummyshare coded 1 = 0.035Post-2017 indicator (naive 2x2)1 for years at/after the median opening year (2017), used to collapse the design into the naive 2x2.0/1industrial_park_district_panelSynthetic (this study)
primary_road_density#continuousmin 0.02 | median 0.553 | max 4.13Primary-road densityDensity of primary roads in the woreda; a road-access moderator.km/km^2 (density)industrial_park_district_panelSynthetic (this study)
public_park#dummyshare coded 1 = 0.941Publicly developed park (1=yes)1 if the park is publicly (government) developed, else 0 (park-type context).0/1industrial_park_district_panelSynthetic (this study)
region#identifierRegion (regional state) nameEthiopian regional state the woreda belongs to.stringindustrial_park_district_panelSynthetic (this study)
region_id#identifierRegion numeric codeInteger code for the regional state; used in region x year and region x round fixed effects.codeindustrial_park_district_panel, industrial_park_household_rcs, industrial_park_individual_rcsSynthetic (this study)
savings_account#dummyshare coded 1 = 0.063Owns a savings account (1=yes)1 if the woman owns a savings/bank account, else 0 (financial-inclusion outcome, women).0/1industrial_park_individual_rcsSynthetic (this study)
sex#dummyshare coded 1 = 0.656Sex (1=female, 0=male)Respondent sex; the heterogeneity split for the employment/empowerment climax (1 = women).0/1industrial_park_individual_rcsSynthetic (this study)
share_amharic_2007#continuousmin 0.0227 | median 0.407 | max 0.891Amharic-speaking share, 2007 baselineShare of the woreda population speaking Amharic at the 2007 baseline.0-1 (share)industrial_park_district_panelSynthetic (this study)
share_christian_2007#continuousmin 0.0432 | median 0.672 | max 0.99Christian population share, 2007 baselineShare of the woreda population that is Christian at the 2007 baseline.0-1 (share)industrial_park_district_panelSynthetic (this study)
slope#continuousmin 0 | median 6.3 | max 16.3Terrain slopeWoreda mean terrain slope.degreesindustrial_park_district_panelSynthetic (this study)
survey_round#yearDHS survey round (year)Calendar year of the DHS round the respondent belongs to.yearindustrial_park_household_rcs, industrial_park_individual_rcsSynthetic (this study)
survey_weight#continuousmin 0.135 | median 0.956 | max 2.81DHS sampling weightRespondent sampling weight for the complex DHS design; used to weight all RCS regressions.weightindustrial_park_household_rcs, industrial_park_individual_rcsSynthetic (this study)
transport_project#dummyshare coded 1 = 0.101Linked transport project (1=yes)1 if the woreda has an associated transport project, else 0 (context indicator).0/1industrial_park_district_panelSynthetic (this study)
treated#dummyshare coded 1 = 0.122Ever-treated woreda (1=hosts a park)1 if the woreda ever receives an industrial park (group indicator), else 0 (never-treated control).0/1industrial_park_district_panel, industrial_park_household_rcs, industrial_park_individual_rcsSynthetic (this study)
treatment#dummyshare coded 1 = 0.035Treatment switch (1 once the park is open)Time-varying DiD indicator: 1 for a treated woreda in years at/after its open_year, else 0.0/1industrial_park_district_panel, industrial_park_household_rcs, industrial_park_individual_rcsSynthetic (this study)
urbanization_rate_2007#continuousmin 0 | median 0.229 | max 0.979Urbanization rate, 2007 baselineShare of the woreda's population in urban areas at the 2007 baseline.0-1 (share)industrial_park_district_panelSynthetic (this study)
wealth_index#continuousmin -3.85 | median -0.0114 | max 3.41DHS wealth index (standardized)Composite standardized household wealth index; effects read in standard deviations.SD (z-score)industrial_park_household_rcsSynthetic (this study)
year#yearCalendar yearAnnual time index of the satellite panel.yearindustrial_park_district_panelSynthetic (this study)

Cross-file variable index

Which file each variable appears in (● = present).

Construction & formulas

All estimators target the ATT — the average park effect on the treated woredas, identified under parallel trends.

Synthetic data-generating process. The three CSVs are 100% synthetic, calibrated so re-running the paper's regressions reproduces its findings (signs, significance stars, approximate magnitudes). A deliberate bright-base device keeps treated park-cities intrinsically much brighter than rural controls (a level the district fixed effect absorbs); spatial and serial shocks are injected so standard errors behave realistically without moving the point estimates.

The datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all datasets

district-year (woreda x year)  2,224 × 34 · 2005-2020 · 139 woredas (17 treated, 122 matched controls); balanced

Panel key: district_id x year · Static TWFE DiD, event study, modern staggered estimators, heterogeneity, spillover, and Conley-SE robustness on satellite outcomes.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
district_id identifierWoreda (district) identifierSynthetic woreda identifier; the panel's unit of analysis and DiD fixed-effect / cluster key.Sequential synthetic codes ET_D001 .. ET_D139.stringSynthetic (this study)all files
district_name identifierWoreda (district) nameHuman-readable name for the woreda (real Ethiopian district names used as labels).Assigned from a name scaffold; calibrated to the paper's park geography.stringSynthetic (this study)district panel
region identifierRegion (regional state) nameEthiopian regional state the woreda belongs to.Assigned per woreda (e.g. Oromia, Addis Ababa, Amhara, Tigray, Sidama).stringSynthetic (this study)district panel
region_id identifierRegion numeric codeInteger code for the regional state; used in region x year and region x round fixed effects.1..12, one per region.codeSynthetic (this study)all files
treated dummyEver-treated woreda (1=hosts a park)1 if the woreda ever receives an industrial park (group indicator), else 0 (never-treated control).1 for the 17 park-hosting woredas, 0 for the 122 matched controls.0/1Synthetic (this study)all files
open_year yearPark opening year (treated woredas)Calendar year the woreda's park opened; missing for never-treated controls.Staggered rollout: 2008 anchor, then 2014-2020 build-out (2-3 woredas/year).yearSynthetic (this study)district panel (treated only)
treatment dummyTreatment switch (1 once the park is open)Time-varying DiD indicator: 1 for a treated woreda in years at/after its open_year, else 0.1[treated == 1 and year >= open_year].0/1Synthetic (this study)all files
nearby dummyControl woreda near an open park (<=10 km)1 for a control woreda within 10 km of an operational park (spillover/SUTVA test), else 0.1 if a never-treated woreda lies within 10 km of any open park in that year.0/1Synthetic (this study)district panel
event_time continuousYears relative to park openingYear minus the woreda's open_year (event time k); the event-study time axis.year - open_year for treated woredas.years (k)Synthetic (this study)district panel (treated)
year yearCalendar yearAnnual time index of the satellite panel.2005-2020, balanced for all 139 woredas.yearSynthetic (this study)district panel
post dummyPost-2017 indicator (naive 2x2)1 for years at/after the median opening year (2017), used to collapse the design into the naive 2x2.1[year >= 2017].0/1Synthetic (this study)district panel
light_intensity continuousRaw nighttime light intensityUntransformed mean nighttime luminosity of the woreda (VIIRS-like, calibrated).Simulated; treated park-cities carry an intrinsically bright base (the bright-base device).DN (light units)Synthetic (this study)district panel
ihs_light continuousIHS nighttime light (headline activity outcome)Inverse hyperbolic sine of nighttime luminosity; a log-like transform that handles zeros.asinh(light_intensity); coefficients read approximately as proportional changes.asinh(DN)Synthetic (this study)district panel
light_positive dummyWoreda emits any light (1=positive)1 if the woreda has positive nighttime luminosity, else 0.1[light_intensity > 0].0/1Synthetic (this study)district panel
impervious_ratio continuousBuilt-up (impervious) land shareShare of the woreda's land that is built-up/impervious surface; observed only every five years.Simulated impervious-surface ratio calibrated to the GISD30-style product.0-1 (share)Synthetic (this study)district panel (5-yearly)
longitude continuousWoreda longitudeWoreda centroid longitude (decimal degrees).Assigned per woreda, calibrated to the paper's park geography.degreesSynthetic (this study)district panel
latitude continuousWoreda latitudeWoreda centroid latitude (decimal degrees).Assigned per woreda, calibrated to the paper's park geography.degreesSynthetic (this study)district panel
elevation continuousElevation (m)Woreda mean elevation above sea level.Assigned per woreda (time-invariant).metresSynthetic (this study)district panel
slope continuousTerrain slopeWoreda mean terrain slope.Assigned per woreda (time-invariant).degreesSynthetic (this study)district panel
dist_addis_km continuousDistance to Addis Ababa (km)Road/great-circle distance from the woreda to the capital, Addis Ababa.Computed from woreda coordinates; a heterogeneity moderator.kmSynthetic (this study)district panel
dist_state_capital_km continuousDistance to regional-state capital (km)Distance from the woreda to its regional-state capital.Computed from woreda coordinates; a heterogeneity moderator.kmSynthetic (this study)district panel
dist_nearest_city_km continuousDistance to nearest city (km)Distance from the woreda to the nearest city; the steepest effect moderator.Computed from woreda coordinates; a heterogeneity moderator.kmSynthetic (this study)district panel
urbanization_rate_2007 continuousUrbanization rate, 2007 baselineShare of the woreda's population in urban areas at the 2007 baseline.2007 baseline value; interacted with centred time for the unit-specific trends.0-1 (share)Synthetic (this study)district panel
employment_rate_2007 continuousEmployment rate, 2007 baselineShare of the woreda's working-age population employed at the 2007 baseline.2007 baseline value; interacted with centred time for the unit-specific trends.0-1 (share)Synthetic (this study)district panel
log_pop_density_2007 continuousLog population density, 2007 baselineNatural log of population per km^2 at the 2007 baseline.log of 2007 population density; interacted with centred time for the trends.log persons/km^2Synthetic (this study)district panel
population_2007 continuousPopulation, 2007 baselineWoreda population at the 2007 baseline.2007 baseline population count.personsSynthetic (this study)district panel
primary_road_density continuousPrimary-road densityDensity of primary roads in the woreda; a road-access moderator.Assigned per woreda; positive interaction with treatment (amplifies the effect).km/km^2 (density)Synthetic (this study)district panel
paved_road_density continuousPaved-road densityDensity of paved roads in the woreda; the significant road moderator.Assigned per woreda; positive interaction with treatment (amplifies the effect).km/km^2 (density)Synthetic (this study)district panel
share_christian_2007 continuousChristian population share, 2007 baselineShare of the woreda population that is Christian at the 2007 baseline.2007 baseline value; interacted with centred time for the unit-specific trends.0-1 (share)Synthetic (this study)district panel
share_amharic_2007 continuousAmharic-speaking share, 2007 baselineShare of the woreda population speaking Amharic at the 2007 baseline.2007 baseline value; interacted with centred time for the unit-specific trends.0-1 (share)Synthetic (this study)district panel
labor_intensive_park dummyLabor-intensive park (1=yes)1 if the woreda's park is labor-intensive (textiles/garments), else 0 (park-type context).Assigned per treated woreda; 0 for controls.0/1Synthetic (this study)district panel
public_park dummyPublicly developed park (1=yes)1 if the park is publicly (government) developed, else 0 (park-type context).Assigned per treated woreda; 0 for controls.0/1Synthetic (this study)district panel
china_aid dummyChinese-financed park (1=yes)1 if the park involves Chinese financing/aid, else 0 (context indicator).Assigned per treated woreda; 0 for controls.0/1Synthetic (this study)district panel
transport_project dummyLinked transport project (1=yes)1 if the woreda has an associated transport project, else 0 (context indicator).Assigned per woreda.0/1Synthetic (this study)district panel

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
district_id100%2,224139
district_name100%2,224139
region100%2,22412
region_id100%2,22412
treatedshare coded 1 = 0.122100%2,224200.12201.000.328
open_year12%272820082016.4201720202.79
treatmentshare coded 1 = 0.035100%2,224200.03501.000.184
nearbyshare coded 1 = 0.009100%2,224200.00901.000.094
event_timemin -5 | median -4 | max 512%27211-5.00-2.30-4.005.003.26
year100%2,2241620052012.5201220204.61
postshare coded 1 = 0.035100%2,224200.03501.000.184
light_intensitymin 0 | median 0.0001 | max 10.6100%2,22499700.6681.00e-0410.621.67
ihs_lightmin 0 | median 0.0001 | max 3.06100%2,22499800.3521.00e-043.060.715
light_positiveshare coded 1 = 0.585100%2,224200.5851.001.000.493
impervious_ratiomin 0 | median 0.0313 | max 0.084825%55654800.0320.0310.0850.014
longitudemin 35.7 | median 38.7 | max 42.4100%2,22413935.6738.6538.6542.421.33
latitudemin 2.49 | median 8.97 | max 13.5100%2,2241392.498.908.9713.551.80
elevationmin 88.2 | median 1.83e+03 | max 3.71e+03100%2,22413988.201,881.51,831.43,711.2612.8
slopemin 0 | median 6.3 | max 16.3100%2,22413805.996.3016.263.28
dist_addis_kmmin 0 | median 213 | max 546100%2,2241320212.6213.5545.7121.8
dist_state_capital_kmmin 0 | median 130 | max 340100%2,2241300137.1129.6340.485.01
dist_nearest_city_kmmin 0 | median 57.4 | max 164100%2,224124061.8157.40164.341.04
urbanization_rate_2007min 0 | median 0.229 | max 0.979100%2,22410200.2330.2290.9790.209
employment_rate_2007min 0.363 | median 0.692 | max 0.989100%2,2241350.3630.6770.6920.9890.111
log_pop_density_2007min 0.493 | median 5.12 | max 9.71100%2,2241390.4935.105.129.711.56
population_2007min 2.62e+03 | median 2.69e+05 | max 2.63e+07100%2,2241392,619.01,030,765268,72726,334,5782,927,471
primary_road_densitymin 0.02 | median 0.553 | max 4.13100%2,2241350.0200.8440.5534.130.825
paved_road_densitymin 0.0034 | median 0.441 | max 3.92100%2,2241360.0030.7300.4413.920.744
share_christian_2007min 0.0432 | median 0.672 | max 0.99100%2,2241370.0430.6350.6720.9900.215
share_amharic_2007min 0.0227 | median 0.407 | max 0.891100%2,2241360.0230.4130.4070.8910.226
labor_intensive_parkshare coded 1 = 0.64712%272200.6471.001.000.479
public_parkshare coded 1 = 0.94112%272200.9411.001.000.236
china_aidshare coded 1 = 0.115100%2,224200.11501.000.319
transport_projectshare coded 1 = 0.101100%2,224200.10101.000.301

household-round (repeated cross-section)  13,200 × 13 · rounds 2000, 2005, 2011, 2016, 2019 · 13,200 households across 5 DHS rounds; 139 districts

Panel key: hh_id (no panel key; identified off district_id x survey_round) · Survey-weighted repeated-cross-section DiD on household durables, housing quality, and wealth (Table 5).

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
hh_id identifierHousehold identifier (per-round)Synthetic per-round household identifier; NOT a panel key (each round samples different households).Sequential codes HH_000000 .. across all rounds.stringSynthetic (this study)household RCS
survey_round yearDHS survey round (year)Calendar year of the DHS round the respondent belongs to.One of five rounds: 2000, 2005, 2011, 2016, 2019.yearSynthetic (this study)household & individual RCS
district_id identifierWoreda (district) identifierSynthetic woreda identifier; the panel's unit of analysis and DiD fixed-effect / cluster key.Sequential synthetic codes ET_D001 .. ET_D139.stringSynthetic (this study)all files
region_id identifierRegion numeric codeInteger code for the regional state; used in region x year and region x round fixed effects.1..12, one per region.codeSynthetic (this study)all files
treated dummyEver-treated woreda (1=hosts a park)1 if the woreda ever receives an industrial park (group indicator), else 0 (never-treated control).1 for the 17 park-hosting woredas, 0 for the 122 matched controls.0/1Synthetic (this study)all files
treatment dummyTreatment switch (1 once the park is open)Time-varying DiD indicator: 1 for a treated woreda in years at/after its open_year, else 0.1[treated == 1 and year >= open_year].0/1Synthetic (this study)all files
event_phase continuousCoarse event phase (rounds since opening)Round position relative to the district's park opening; the RCS event-study axis (k = -1 reference).Survey round position minus the opening round, for treated districts.phase (k)Synthetic (this study)household & individual RCS
durable_goods_pc continuousDurable goods per capita (standardized)Standardized count of household durable goods per capita; a living-standards outcome.Standardized score (mean ~0); ATT reads against a near-zero mean.standardizedSynthetic (this study)household RCS
housing_quality dummyHousing-quality indicator (1=meets all)1 if the household has electricity, piped water, a toilet, and a finished floor, else 0.Composite 0/1 indicator over the four housing amenities.0/1Synthetic (this study)household RCS
wealth_index continuousDHS wealth index (standardized)Composite standardized household wealth index; effects read in standard deviations.Standardized composite (mean ~0, SD ~1).SD (z-score)Synthetic (this study)household RCS
hh_size continuousHousehold sizeNumber of members in the household; a demographic control.Integer count, 1-12.personsSynthetic (this study)household & individual RCS
age_head continuousAge of household headAge of the household head (years); a demographic control.Integer years, 18-90.yearsSynthetic (this study)household & individual RCS
survey_weight continuousDHS sampling weightRespondent sampling weight for the complex DHS design; used to weight all RCS regressions.Calibrated to a DHS-style design (mean ~1).weightSynthetic (this study)household & individual RCS

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
hh_id100%13,20013,200
survey_round100%13,200520002011.9201120196.36
district_id100%13,200139
region_id100%13,20012
treatedshare coded 1 = 0.189100%13,200200.18901.000.392
treatmentshare coded 1 = 0.077100%13,200200.07701.000.266
event_phasemin -4 | median -1 | max 219%2,4967-4.00-1.02-1.002.001.43
durable_goods_pcmin -1.67 | median 0.308 | max 2.1792%12,20712,207-1.670.3080.3082.170.487
housing_qualityshare coded 1 = 0.30792%12,206200.30701.000.461
wealth_indexmin -3.85 | median -0.0114 | max 3.4173%9,6889,688-3.85-4.57e-04-0.0113.411.02
hh_sizemin 1 | median 5 | max 12100%13,200121.004.955.0012.001.81
age_headmin 18 | median 43 | max 90100%13,2007018.0042.8243.0090.0011.85
survey_weightmin 0.135 | median 0.956 | max 2.81100%13,2008,2490.1350.9990.9562.810.353

individual-round (repeated cross-section)  17,900 × 22 · rounds 2000, 2005, 2011, 2016, 2019 · 17,900 individuals across 5 DHS rounds; 139 districts

Panel key: ind_id (no panel key; identified off district_id x survey_round) · Survey-weighted RCS DiD on non-agricultural employment and (for women) decision power, savings, and acceptance of domestic violence (Tables 6-7) — the study's heterogeneity climax.

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
ind_id identifierIndividual identifier (per-round)Synthetic per-round individual identifier; NOT a panel key (each round samples different individuals).Sequential codes IND_000000 .. across all rounds.stringSynthetic (this study)individual RCS
survey_round yearDHS survey round (year)Calendar year of the DHS round the respondent belongs to.One of five rounds: 2000, 2005, 2011, 2016, 2019.yearSynthetic (this study)household & individual RCS
district_id identifierWoreda (district) identifierSynthetic woreda identifier; the panel's unit of analysis and DiD fixed-effect / cluster key.Sequential synthetic codes ET_D001 .. ET_D139.stringSynthetic (this study)all files
region_id identifierRegion numeric codeInteger code for the regional state; used in region x year and region x round fixed effects.1..12, one per region.codeSynthetic (this study)all files
treated dummyEver-treated woreda (1=hosts a park)1 if the woreda ever receives an industrial park (group indicator), else 0 (never-treated control).1 for the 17 park-hosting woredas, 0 for the 122 matched controls.0/1Synthetic (this study)all files
treatment dummyTreatment switch (1 once the park is open)Time-varying DiD indicator: 1 for a treated woreda in years at/after its open_year, else 0.1[treated == 1 and year >= open_year].0/1Synthetic (this study)all files
event_phase continuousCoarse event phase (rounds since opening)Round position relative to the district's park opening; the RCS event-study axis (k = -1 reference).Survey round position minus the opening round, for treated districts.phase (k)Synthetic (this study)household & individual RCS
sex dummySex (1=female, 0=male)Respondent sex; the heterogeneity split for the employment/empowerment climax (1 = women).1 for women, 0 for men.0/1Synthetic (this study)individual RCS
age continuousRespondent ageAge of the individual respondent (years); a demographic control.Integer years, 15-62.yearsSynthetic (this study)individual RCS
age_sq continuousRespondent age squaredSquare of respondent age; a demographic control for nonlinear age effects.age^2.years^2Synthetic (this study)individual RCS
nonag_employment dummyNon-agricultural employment (1=yes)1 if the individual works in a non-agricultural job, else 0; the headline employment outcome.0/1 indicator; the average effect is null while the female effect is significant.0/1Synthetic (this study)individual RCS
decision_power dummyWomen's decision-making power (1=yes)1 if the woman participates in household decision-making, else 0 (empowerment outcome, women).0/1 indicator.0/1Synthetic (this study)individual RCS (women)
savings_account dummyOwns a savings account (1=yes)1 if the woman owns a savings/bank account, else 0 (financial-inclusion outcome, women).0/1 indicator.0/1Synthetic (this study)individual RCS (women)
dv_accept dummyAccepts domestic violence (composite, 1=yes)1 if the respondent justifies wife-beating in any of the five DHS scenarios, else 0 (gender-norms outcome).Composite over the five dv_* sub-items; treatment lowers it.0/1Synthetic (this study)individual RCS
dv_goingout dummyJustifies beating: going out without tellingDHS sub-item — 1 if wife-beating is justified for going out without telling the husband, else 0.0/1 DHS attitude sub-item composing dv_accept.0/1Synthetic (this study)individual RCS
dv_kids dummyJustifies beating: neglecting the childrenDHS sub-item — 1 if wife-beating is justified for neglecting the children, else 0.0/1 DHS attitude sub-item composing dv_accept.0/1Synthetic (this study)individual RCS
dv_arguing dummyJustifies beating: arguing with husbandDHS sub-item — 1 if wife-beating is justified for arguing with the husband, else 0.0/1 DHS attitude sub-item composing dv_accept.0/1Synthetic (this study)individual RCS
dv_sex dummyJustifies beating: refusing sexDHS sub-item — 1 if wife-beating is justified for refusing to have sex, else 0.0/1 DHS attitude sub-item composing dv_accept.0/1Synthetic (this study)individual RCS
dv_food dummyJustifies beating: burning the foodDHS sub-item — 1 if wife-beating is justified for burning the food, else 0.0/1 DHS attitude sub-item composing dv_accept.0/1Synthetic (this study)individual RCS
hh_size continuousHousehold sizeNumber of members in the household; a demographic control.Integer count, 1-12.personsSynthetic (this study)household & individual RCS
age_head continuousAge of household headAge of the household head (years); a demographic control.Integer years, 18-90.yearsSynthetic (this study)household & individual RCS
survey_weight continuousDHS sampling weightRespondent sampling weight for the complex DHS design; used to weight all RCS regressions.Calibrated to a DHS-style design (mean ~1).weightSynthetic (this study)household & individual RCS

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
ind_id100%17,90017,900
survey_round100%17,900520002011.9201120196.36
district_id100%17,900139
region_id100%17,90012
treatedshare coded 1 = 0.168100%17,900200.16801.000.374
treatmentshare coded 1 = 0.068100%17,900200.06801.000.252
event_phasemin -4 | median -1 | max 217%3,0127-4.00-1.00-1.002.001.41
sexshare coded 1 = 0.656100%17,900200.6561.001.000.475
agemin 15 | median 30 | max 62100%17,9004715.0030.2130.0062.008.63
age_sqmin 225 | median 900 | max 3.84e+03100%17,90047225.0987.0900.03,844.0543.3
nonag_employmentshare coded 1 = 0.34396%17,219200.34301.000.475
decision_powershare coded 1 = 0.87126%4,737200.8711.001.000.335
savings_accountshare coded 1 = 0.06362%11,155200.06301.000.242
dv_acceptshare coded 1 = 0.63562%11,109200.6351.001.000.481
dv_goingoutshare coded 1 = 0.45062%11,064200.45001.000.498
dv_kidsshare coded 1 = 0.49862%11,069200.49801.000.500
dv_arguingshare coded 1 = 0.43862%11,043200.43801.000.496
dv_sexshare coded 1 = 0.37060%10,818200.37001.000.483
dv_foodshare coded 1 = 0.42862%11,068200.42801.000.495
hh_sizemin 1 | median 5 | max 12100%17,900121.004.965.0012.001.81
age_headmin 18 | median 43 | max 90100%17,9007218.0042.8343.0090.0011.84
survey_weightmin 0.126 | median 0.958 | max 2.92100%17,9009,6160.1260.9970.9582.920.349

Known limitations & caveats