Downloads
Each dataset is available as a labeled Stata .dta and its source file.
⇩ Download all data (ZIP)stata_codebook.do
| Dataset | Grain | Rows | Stata | Source |
|---|---|---|---|---|
aceh_tsunami_district_panel | district-year | 1,750 × 30 | aceh_tsunami_district_panel.dta | aceh_tsunami_district_panel.csv |
aceh_tsunami_subdistrict_panel | kecamatan-year | 3,864 × 19 | aceh_tsunami_subdistrict_panel.dta | aceh_tsunami_subdistrict_panel.csv |
Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.
Load directly in code
Every file loads straight from GitHub (raw URLs). Swap the file name to load any dataset.
Stata
* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_did_sc_tsunami/data/"
use "${BASE}aceh_tsunami_district_panel.dta", clear
describe
notesPython
!pip install -q pyreadstat
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_did_sc_tsunami/data/"
df = pd.read_stata(BASE + "aceh_tsunami_district_panel.dta")
# load every dataset at once
files = ["aceh_tsunami_district_panel", "aceh_tsunami_subdistrict_panel"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}
# pyreadstat (richest metadata) reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "aceh_tsunami_district_panel.dta", "aceh_tsunami_district_panel.dta")
df, meta = pyreadstat.read_dta("aceh_tsunami_district_panel.dta")Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb
R
# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_did_sc_tsunami/data/"
df <- read_dta(paste0(BASE, "aceh_tsunami_district_panel.dta"))Overview & sources
Companion data for a hands-on Python tutorial that evaluates the long-run economic impact of the 2004 Indian Ocean tsunami on the Indonesian province of Aceh, replicating Heger & Neumayer (2019) on fully synthetic, calibrated panels. The post treats coastal inundation as a quasi-natural experiment and estimates a dynamic four-period difference-in-differences with pyfixest, an event study with diff-diff, a night-lights dose-response, a synthetic control with mlsynth, and Conley spatial-HAC standard errors validated by Moran’s I. Flooded districts lost about 7.9% of output in 2005 but grew 6.3 percentage points per year faster during 2006–08, and the synthetic control places flooded Aceh +18.3% above its no-tsunami counterfactual by 2012. The data-generating process is calibrated so that re-running the paper’s analyses reproduces its findings (signs, significance, approximate magnitudes); it is for teaching the methods, not for drawing new conclusions about Aceh.
aceh_tsunami_district_panel is an annual district panel (one row per district × year) of 125 Sumatran districts over 1999–2012 carrying district GDP growth, covariates, treatment indicators, and centroids for spatial inference. aceh_tsunami_subdistrict_panel is a finer annual panel of 276 Aceh sub-districts (kecamatans) over the same years, carrying satellite night-lights and continuous flood intensity for the dose-response analysis.
Data sources
| Source | Provides | Reference / URL |
|---|---|---|
| Heger & Neumayer (2019) | Replicated study; calibration targets (coefficient signs, significance, approximate magnitudes) and the empirical design | Heger, M. P., & Neumayer, E. (2019). The impact of the Indian Ocean tsunami on Aceh's long-term economic growth. Journal of Development Economics, 141, 102365. https://doi.org/10.1016/j.jdeveco.2019.06.008 |
| Synthetic (this study) | All values — simulated via a calibrated data-generating process (open & reproducible) | Mendez, C. (2026). See the post's Python script script.py and reference/generate_synthetic_data.py for the full DGP. |
| Real-world analogues (proxied, not used directly) | The constructs the synthetic series imitate: district GDP (INDO-DAPOER / SUSENAS), night-lights (DMSP-OLS), inundation maps (DLR/ZKI, Dartmouth Flood Observatory) | World Bank INDO-DAPOER (https://datacatalog.worldbank.org/); NOAA DMSP-OLS Nighttime Lights (https://www.ncei.noaa.gov/); DLR/ZKI & Dartmouth Flood Observatory inundation maps. |
| Method references | Estimators and concepts | Abadie, Diamond & Hainmueller (2010, synthetic control); Conley (1999, spatial-HAC standard errors). |
Cite this data
Please cite this dataset as follows.
APA
Mendez, C. (2026). Bouncing Back Better? Evaluating the Economic Impact of the Aceh Tsunami [Data set]. https://carlos-mendez.org/post/python_did_sc_tsunami/
Heger, M. P., & Neumayer, E. (2019). The impact of the Indian Ocean tsunami on Aceh's long-term economic growth. Journal of Development Economics, 141, 102365. https://doi.org/10.1016/j.jdeveco.2019.06.008BibTeX
@misc{mendez2026pythondidsctsunami,
author = {Mendez, Carlos},
title = {Bouncing Back Better? Evaluating the Economic Impact of the Aceh Tsunami},
year = {2026},
howpublished = {\url{https://carlos-mendez.org/post/python_did_sc_tsunami/}},
note = {Data set}
}
@article{heger2019impact,
author = {Heger, Martin Philipp and Neumayer, Eric},
title = {The impact of the Indian Ocean tsunami on Aceh's long-term economic growth},
journal = {Journal of Development Economics},
volume = {141}, pages = {102365}, year = {2019},
doi = {10.1016/j.jdeveco.2019.06.008}
}Variable explorer search & filter all 41 variables
Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution. Click a header to sort.
| Variable | Type | Distribution | Label | Definition | Units | In files | Source |
|---|---|---|---|---|---|---|---|
area_km2# | continuous | Area (km^2) | Approximate land area of the kecamatan; sets the number of night-light pixels. | km^2 | aceh_tsunami_subdistrict_panel | Simulation | |
avg_luminosity# | continuous | Average luminosity (DN 0-63) | Mean Digital Number (brightness) across the kecamatan's pixels; flooded 2004 mean ~5.79, non-flooded ~2.36. | DN (0-63) | aceh_tsunami_subdistrict_panel | Simulation | |
capital_formation_pc_usd# | continuous | Capital formation per capita (current USD) | Gross capital formation per capita; reproduces the post-tsunami investment bonanza. | current USD per capita | aceh_tsunami_district_panel | Simulation | |
coastal# | dummy | Coastal dummy (1=coastal) | 1 if the district lies on the coast, 0 if inland. | 0/1 | aceh_tsunami_district_panel | Assigned (this study) | |
district_id# | identifier | – | District ID | Unique identifier for the district (Kabupaten/Kota); the panel key. | string | aceh_tsunami_district_panel | Assigned (this study) |
district_name# | identifier | – | District name | Name of the district. Real names for Aceh's 23 districts; systematic placeholders elsewhere. | string | aceh_tsunami_district_panel, aceh_tsunami_subdistrict_panel | Assigned (this study) |
district_type# | identifier | – | District type | Kota (urban city district) vs Kabupaten (rural regency). | {Kota, Kabupaten} | aceh_tsunami_district_panel | Assigned (this study) |
doctors_per_1000# | continuous | Doctors per 1,000 | Physicians per 1,000 people; Aceh rises faster after 2005 (synthetic-control predictor). | per 1,000 | aceh_tsunami_district_panel | Simulation | |
electricity_access_pct# | continuous | Electricity access (%) | % of households with electricity; Aceh 84% (2004) -> 97% (2012). | % | aceh_tsunami_district_panel | Simulation | |
flood_intensity_quintile# | identifier | – | Flood-intensity quintile (1-5; 0=non-flooded) | Quintile of the flooding-intensity distribution among flooded units; only the top quintile shows a significant effect. | 0-5 | aceh_tsunami_subdistrict_panel | Derived |
flood_treatment_group# | identifier | – | Treatment-group label | Readable label combining treatment status and region (convenience for selecting control pools). | category | aceh_tsunami_district_panel | Derived |
flooded# | dummy | Flooded / treated dummy (1=flooded) | Treatment indicator: 1 if the unit was flooded by the 2004 tsunami (the DiD 'D' variable). | 0/1 | aceh_tsunami_district_panel, aceh_tsunami_subdistrict_panel | Assigned (this study) | |
gdp_const_usd_m# | continuous | Real GDP (million constant 2004 USD) | District real GDP excluding oil & gas, constant 2004 USD, millions. | million constant 2004 USD | aceh_tsunami_district_panel | Simulation | |
gdp_growth# | continuous | GDP growth rate (log difference) | Annual growth rate of real district GDP (main dependent variable). | proportion/yr | aceh_tsunami_district_panel | Simulation | |
gdp_pc_growth# | continuous | GDP per-capita growth rate | Annual growth rate of real GDP per capita; no significant 2005 loss, significant 2006-08 gain. | proportion/yr | aceh_tsunami_district_panel | Derived | |
gdp_pc_usd# | continuous | GDP per capita (constant 2004 USD) | Real GDP per capita. | constant 2004 USD | aceh_tsunami_district_panel | Derived | |
hdi# | continuous | Human Development Index (0-100) | Human Development Index; Aceh ~69 -> ~73 over the period. | index 0-100 | aceh_tsunami_district_panel | Simulation | |
kecamatan_id# | identifier | – | Sub-district (kecamatan) ID | Unique identifier for the sub-district; the sub-district panel key. | string | aceh_tsunami_subdistrict_panel | Assigned (this study) |
kecamatan_name# | identifier | – | Sub-district name | Readable name linking the kecamatan to its parent district. | string | aceh_tsunami_subdistrict_panel | Assigned (this study) |
latitude# | continuous | Latitude (decimal degrees, +N) | Unit-centroid latitude; time-invariant. Enables Conley spatial standard errors. | degrees | aceh_tsunami_district_panel, aceh_tsunami_subdistrict_panel | Assigned (this study) | |
longitude# | continuous | Longitude (decimal degrees, +E) | Unit-centroid longitude; time-invariant. Used with latitude for haversine distances. | degrees | aceh_tsunami_district_panel, aceh_tsunami_subdistrict_panel | Assigned (this study) | |
n_pixels# | continuous | Pixel count | Number of ~0.86 km^2 night-light grid cells in the kecamatan. | count | aceh_tsunami_subdistrict_panel | Derived | |
neighbour_of_flooded# | dummy | Neighbour-of-flooded dummy (1=neighbour) | 1 if a non-flooded district borders a flooded one (placebo-treated in the robustness test). | 0/1 | aceh_tsunami_district_panel | Assigned (this study) | |
nl_growth# | continuous | Night-lights growth rate (log difference) | Annual growth rate of log night-lights (main sub-district dependent variable). | proportion/yr | aceh_tsunami_subdistrict_panel | Simulation | |
nl_log# | continuous | Log luminosity (log DN-sum) | log( sum of (DN + 0.001) ) — the transformed regression variable matching the paper's log night-lights. | log DN-sum | aceh_tsunami_subdistrict_panel | Simulation | |
nl_sum# | continuous | Summed luminosity (DN-sum) | Sum of Digital Numbers over all pixels in the kecamatan (the unit-level activity measure). | DN-sum | aceh_tsunami_subdistrict_panel | Derived | |
period# | identifier | – | DiD event-time period | Event-time period for the staggered DiD dummies; baseline 2000-02 is the omitted reference. | category | aceh_tsunami_district_panel, aceh_tsunami_subdistrict_panel | Derived |
pop_growth# | continuous | Population growth rate | Annual population growth rate (carries the 2005 death/displacement shock). | proportion/yr | aceh_tsunami_district_panel | Simulation | |
population# | continuous | Population (persons) | District population; drives the per-capita denominator. | persons | aceh_tsunami_district_panel | Simulation | |
post# | dummy | Post-tsunami dummy (1=2005+) | 1 for years 2005 and later (simple pre/post split). | 0/1 | aceh_tsunami_district_panel, aceh_tsunami_subdistrict_panel | Derived | |
poverty_rate# | continuous | Poverty rate (%) | Share of population below the poverty line; Aceh improves after 2005 (synthetic-control predictor). | % | aceh_tsunami_district_panel | Simulation | |
province# | identifier | – | Province | Indonesian province the unit belongs to (district panel: 10 Sumatra provinces; sub-district panel: always Aceh). | string | aceh_tsunami_district_panel, aceh_tsunami_subdistrict_panel | Assigned (this study) |
region_group# | identifier | – | Region group | Coarse grouping used to build estimation samples. | {Aceh, North Sumatra, Rest of Sumatra} | aceh_tsunami_district_panel | Derived |
sanitation_access_pct# | continuous | Sanitation access (%) | % of households with sanitation access; Aceh boosted after 2005 (synthetic-control predictor). | % | aceh_tsunami_district_panel | Simulation | |
share_area_flooded# | continuous | Share of area flooded | Share of the kecamatan's physical area flooded; tiny mean (~1.2%) gives a large coefficient. | 0-1 | aceh_tsunami_subdistrict_panel | Simulation | |
share_pop_flooded# | continuous | Share of population flooded | Share of the kecamatan's population in flooded area — the headline continuous dose. | 0-1 | aceh_tsunami_subdistrict_panel | Simulation | |
va_agri_share# | continuous | Agriculture VA share (% of GDP) | Agriculture value added as % of GDP; Aceh falls 44->32% after 2004. | % of GDP | aceh_tsunami_district_panel | Simulation | |
va_manu_share# | continuous | Manufacturing VA share (% of GDP) | Manufacturing value added as % of GDP; Aceh falls ~6->3.5% after 2004. | % of GDP | aceh_tsunami_district_panel | Simulation | |
va_serv_share# | continuous | Services VA share (% of GDP) | Services / tertiary value added as % of GDP; Aceh rises ~40->55% after 2004. | % of GDP | aceh_tsunami_district_panel | Simulation | |
water_access_pct# | continuous | Water access (%) | % of households with clean-water access; Aceh boosted after 2005 (synthetic-control predictor). | % | aceh_tsunami_district_panel | Simulation | |
year# | year | – | Calendar year | Calendar year of the observation. Panel spans 1999-2012 (levels); growth rates defined 2000-2012. | year | aceh_tsunami_district_panel, aceh_tsunami_subdistrict_panel | Simulation |
Cross-file variable index
Which file each variable appears in (● = present).
Construction & formulas
The disaster is treated as a quasi-natural experiment: coastal geography (not
economic prospects) decided which districts the wave flooded. Treatment is split into event-time
windows measured against the omitted 2000–02 baseline: pre
(2003–04), tsunami (2005), recovery (2006–08), and
postrec (2009–12).
- Dynamic DiD (the headline, on district GDP growth):
ΔY_it = β1·D_i·1[pre] + β2·D_i·1[2005] + β3·D_i·1[recovery] + β4·D_i·1[post] + α_i + γ_t + ε_it, with district FEα_iand year FEγ_t. The estimand is the ATT; identification rests on parallel trends. - Night-lights measure (
nl_log):NL_ct = log( ∑_n ( DN_nct + 0.001 ) )— log of total Digital Numbers (DMSP-OLS brightness, 0–63) summed over a kecamatan’s pixels; the tiny 0.001 keeps zeros loggable.nl_growthis its annual log-difference. - Dose-response: the same DiD with the on/off dummy replaced by continuous
share_pop_flooded/share_area_flooded, or by flood-intensity quintiles. - Synthetic control: choose donor weights
w≥ 0 summing to 1 to minimise the pre-2005 mismatchmin_w (X1 − X0·w)′ V (X1 − X0·w); the treated-minus-synthetic gap after 2005 is the estimated effect. - Conley spatial-HAC standard errors: a sandwich estimator adding serial (a district with itself over time) and spatial (different districts within 100 km in the same year, weight fading linearly to zero) error correlation; the point estimates never move.
Synthetic data-generating process. GDP and night-lights levels are cumulated from a base year using simulated growth series built as district/kecamatan FE + year FE + a treated×period increment + spatial & serial shocks + Gaussian noise, tuned so a fixed-effects DiD recovers the paper’s coefficients column by column (within about 0.005 on the headline cells). Centroids and continuous flood doses are injected so the spatial standard errors and the dose-response behave like the paper’s without moving the point estimates.
The datasets
Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.
expand to search (Ctrl/⌘+F) or print across all datasets
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
district_id identifier | District ID | Unique identifier for the district (Kabupaten/Kota); the panel key. | Assigned as <PROVINCE-ABBREV>_D<nn>; Aceh districts carry real names in district_name. | string | Assigned (this study) | 125 districts |
district_name identifier | District name | Name of the district. Real names for Aceh's 23 districts; systematic placeholders elsewhere. | Real Aceh district names hand-coded from the paper's maps; other provinces use 'Province District k'. | string | Assigned (this study) | 125 districts / 276 kecamatans |
province identifier | Province | Indonesian province the unit belongs to (district panel: 10 Sumatra provinces; sub-district panel: always Aceh). | Hand-coded; 10 Sumatra provinces in the district panel, constant 'Aceh' in the sub-district panel. | string | Assigned (this study) | |
region_group identifier | Region group | Coarse grouping used to build estimation samples. | Derived from province: Aceh, North Sumatra, or Rest of Sumatra. | {Aceh, North Sumatra, Rest of Sumatra} | Derived | |
district_type identifier | District type | Kota (urban city district) vs Kabupaten (rural regency). | Hand-coded to reproduce the paper's city/rural sample sizes (Aceh: 5 Kota, 18 Kabupaten). | {Kota, Kabupaten} | Assigned (this study) | |
coastal dummy | Coastal dummy (1=coastal) | 1 if the district lies on the coast, 0 if inland. | Hand-coded; all 10 treated districts are coastal. Used to drop inland controls. | 0/1 | Assigned (this study) | |
flooded dummy | Flooded / treated dummy (1=flooded) | Treatment indicator: 1 if the unit was flooded by the 2004 tsunami (the DiD 'D' variable). | Hand-coded from the inundation maps; district panel: 10 Aceh + 2 North Sumatra island districts; sub-district panel: 68 of 276 kecamatans. | 0/1 | Assigned (this study) | |
neighbour_of_flooded dummy | Neighbour-of-flooded dummy (1=neighbour) | 1 if a non-flooded district borders a flooded one (placebo-treated in the robustness test). | Hand-coded adjacency for the placebo test; flooded districts are dropped in that test. | 0/1 | Assigned (this study) | |
flood_treatment_group identifier | Treatment-group label | Readable label combining treatment status and region (convenience for selecting control pools). | Derived from flooded + region_group. | category | Derived | |
latitude continuous | Latitude (decimal degrees, +N) | Unit-centroid latitude; time-invariant. Enables Conley spatial standard errors. | Real approximate centroids for Aceh's 23 districts; synthetic non-Aceh districts drawn within the province bounding box. Sub-districts: parent centroid + ~20 km jitter. | degrees | Assigned (this study) | |
longitude continuous | Longitude (decimal degrees, +E) | Unit-centroid longitude; time-invariant. Used with latitude for haversine distances. | See latitude. Used for the Conley spatial kernel (≤100 km). | degrees | Assigned (this study) | |
year year | Calendar year | Calendar year of the observation. Panel spans 1999-2012 (levels); growth rates defined 2000-2012. | Annual index. | year | Simulation | |
post dummy | Post-tsunami dummy (1=2005+) | 1 for years 2005 and later (simple pre/post split). | 1 if year >= 2005. | 0/1 | Derived | |
period identifier | DiD event-time period | Event-time period for the staggered DiD dummies; baseline 2000-02 is the omitted reference. | Mapped from year: '(base year)' 1999, baseline 2000-02, pre 2003-04, tsunami 2005, recovery 2006-08, postrec 2009-12. | category | Derived | |
gdp_const_usd_m continuous | Real GDP (million constant 2004 USD) | District real GDP excluding oil & gas, constant 2004 USD, millions. | Cumulated from the 1999 base level using the simulated gdp_growth series. | million constant 2004 USD | Simulation | district panel |
gdp_growth continuous | GDP growth rate (log difference) | Annual growth rate of real district GDP (main dependent variable). | district FE + year FE + treated increment (city/rural) + Aceh-control spillover + spatial & serial shocks + N(0,0.04) noise. | proportion/yr | Simulation | 1621 rows (NaN in 1999; Subulussalam 2003-06) |
population continuous | Population (persons) | District population; drives the per-capita denominator. | Cumulated from the 1999 base using pop_growth (flooded districts lose ~9.6% in 2005). | persons | Simulation | district panel |
pop_growth continuous | Population growth rate | Annual population growth rate (carries the 2005 death/displacement shock). | district FE + year FE + flooded x population increment (death shock 2005) + noise. | proportion/yr | Simulation | NaN in 1999; Subulussalam 2003-06 |
gdp_pc_usd continuous | GDP per capita (constant 2004 USD) | Real GDP per capita. | gdp_const_usd_m * 1e6 / population. | constant 2004 USD | Derived | district panel |
gdp_pc_growth continuous | GDP per-capita growth rate | Annual growth rate of real GDP per capita; no significant 2005 loss, significant 2006-08 gain. | gdp_growth - pop_growth (reproduces the paper's per-capita table by construction). | proportion/yr | Derived | NaN where growth missing |
va_agri_share continuous | Agriculture VA share (% of GDP) | Agriculture value added as % of GDP; Aceh falls 44->32% after 2004. | Province trajectory + district/type offset. | % of GDP | Simulation | |
va_manu_share continuous | Manufacturing VA share (% of GDP) | Manufacturing value added as % of GDP; Aceh falls ~6->3.5% after 2004. | Province trajectory + noise. | % of GDP | Simulation | |
va_serv_share continuous | Services VA share (% of GDP) | Services / tertiary value added as % of GDP; Aceh rises ~40->55% after 2004. | Province trajectory + offset. | % of GDP | Simulation | |
capital_formation_pc_usd continuous | Capital formation per capita (current USD) | Gross capital formation per capita; reproduces the post-tsunami investment bonanza. | Smooth path + reconstruction spike (peak 2006) for Aceh; smooth for donors. | current USD per capita | Simulation | |
poverty_rate continuous | Poverty rate (%) | Share of population below the poverty line; Aceh improves after 2005 (synthetic-control predictor). | District base + downward trend. | % | Simulation | |
doctors_per_1000 continuous | Doctors per 1,000 | Physicians per 1,000 people; Aceh rises faster after 2005 (synthetic-control predictor). | District base + upward trend. | per 1,000 | Simulation | |
water_access_pct continuous | Water access (%) | % of households with clean-water access; Aceh boosted after 2005 (synthetic-control predictor). | District base + upward trend. | % | Simulation | |
sanitation_access_pct continuous | Sanitation access (%) | % of households with sanitation access; Aceh boosted after 2005 (synthetic-control predictor). | District base + upward trend. | % | Simulation | |
electricity_access_pct continuous | Electricity access (%) | % of households with electricity; Aceh 84% (2004) -> 97% (2012). | Aceh boosted path; others smooth upward trend. | % | Simulation | |
hdi continuous | Human Development Index (0-100) | Human Development Index; Aceh ~69 -> ~73 over the period. | Aceh boosted path; others smooth upward trend. | index 0-100 | Simulation |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
district_id | – | 100% | 1,750 | 125 | — | — | — | — | — |
district_name | – | 100% | 1,750 | 125 | — | — | — | — | — |
province | – | 100% | 1,750 | 10 | — | — | — | — | — |
region_group | – | 100% | 1,750 | 3 | — | — | — | — | — |
district_type | – | 100% | 1,750 | 2 | — | — | — | — | — |
coastal | 100% | 1,750 | 2 | 0 | 0.696 | 1.00 | 1.00 | 0.460 | |
flooded | 100% | 1,750 | 2 | 0 | 0.096 | 0 | 1.00 | 0.295 | |
neighbour_of_flooded | 100% | 1,750 | 2 | 0 | 0.064 | 0 | 1.00 | 0.245 | |
flood_treatment_group | – | 100% | 1,750 | 4 | — | — | — | — | — |
latitude | 100% | 1,750 | 125 | -5.36 | 0.067 | 0.268 | 5.82 | 3.23 | |
longitude | 100% | 1,750 | 121 | 95.32 | 100.9 | 100.6 | 108.3 | 3.11 | |
year | – | 100% | 1,750 | 14 | 1999 | 2005.5 | 2005 | 2012 | 4.03 |
post | 100% | 1,750 | 2 | 0 | 0.571 | 1.00 | 1.00 | 0.495 | |
period | – | 100% | 1,750 | 6 | — | — | — | — | — |
gdp_const_usd_m | 100% | 1,750 | 1,749 | 33.09 | 671.2 | 476.3 | 3,748.4 | 594.0 | |
gdp_growth | 93% | 1,621 | 1,561 | -0.168 | 0.052 | 0.052 | 0.292 | 0.066 | |
population | 100% | 1,750 | 1,748 | 59,071 | 484,305 | 407,631 | 3,613,412 | 400,842 | |
pop_growth | 93% | 1,621 | 1,453 | -0.102 | 0.016 | 0.016 | 0.083 | 0.022 | |
gdp_pc_usd | 100% | 1,750 | 1,740 | 389.4 | 1,421.0 | 1,226.2 | 5,885.3 | 791.1 | |
gdp_pc_growth | 93% | 1,621 | 1,554 | -0.204 | 0.036 | 0.037 | 0.296 | 0.070 | |
va_agri_share | 100% | 1,750 | 1,696 | 8.64 | 40.82 | 44.08 | 62.74 | 11.66 | |
va_manu_share | 100% | 1,750 | 1,540 | 1.28 | 5.39 | 5.68 | 9.82 | 1.87 | |
va_serv_share | 100% | 1,750 | 1,718 | 21.63 | 44.21 | 40.83 | 77.51 | 11.80 | |
capital_formation_pc_usd | 100% | 1,750 | 1,649 | 2.00 | 77.65 | 72.60 | 233.7 | 51.44 | |
poverty_rate | 100% | 1,750 | 1,038 | 9.09 | 18.58 | 18.02 | 32.19 | 4.29 | |
doctors_per_1000 | 100% | 1,750 | 337 | 0.152 | 0.321 | 0.318 | 0.550 | 0.081 | |
water_access_pct | 100% | 1,750 | 1,267 | 52.75 | 67.55 | 67.23 | 87.56 | 6.49 | |
sanitation_access_pct | 100% | 1,750 | 1,271 | 42.66 | 58.87 | 58.72 | 79.25 | 7.34 | |
electricity_access_pct | 100% | 1,750 | 1,224 | 74.31 | 87.81 | 87.46 | 99.92 | 5.96 | |
hdi | 100% | 1,750 | 698 | 64.56 | 69.07 | 69.01 | 74.22 | 1.94 |
Variable dictionary
| Variable | Label | Definition | Construction | Units | Source | Coverage |
|---|---|---|---|---|---|---|
kecamatan_id identifier | Sub-district (kecamatan) ID | Unique identifier for the sub-district; the sub-district panel key. | Assigned KEC_001 .. KEC_276. | string | Assigned (this study) | 276 kecamatans |
kecamatan_name identifier | Sub-district name | Readable name linking the kecamatan to its parent district. | Built as <ParentDistrict>_Kec_<nn>. | string | Assigned (this study) | |
district_name identifier | District name | Name of the district. Real names for Aceh's 23 districts; systematic placeholders elsewhere. | Real Aceh district names hand-coded from the paper's maps; other provinces use 'Province District k'. | string | Assigned (this study) | 125 districts / 276 kecamatans |
province identifier | Province | Indonesian province the unit belongs to (district panel: 10 Sumatra provinces; sub-district panel: always Aceh). | Hand-coded; 10 Sumatra provinces in the district panel, constant 'Aceh' in the sub-district panel. | string | Assigned (this study) | |
flooded dummy | Flooded / treated dummy (1=flooded) | Treatment indicator: 1 if the unit was flooded by the 2004 tsunami (the DiD 'D' variable). | Hand-coded from the inundation maps; district panel: 10 Aceh + 2 North Sumatra island districts; sub-district panel: 68 of 276 kecamatans. | 0/1 | Assigned (this study) | |
share_pop_flooded continuous | Share of population flooded | Share of the kecamatan's population in flooded area — the headline continuous dose. | latent dose * 0.62 + noise (flooded only); GRUMP-population analogue. | 0-1 | Simulation | flooded kecamatans only (0 elsewhere) |
share_area_flooded continuous | Share of area flooded | Share of the kecamatan's physical area flooded; tiny mean (~1.2%) gives a large coefficient. | latent dose * 0.00566 + noise (flooded only). | 0-1 | Simulation | flooded kecamatans only (0 elsewhere) |
flood_intensity_quintile identifier | Flood-intensity quintile (1-5; 0=non-flooded) | Quintile of the flooding-intensity distribution among flooded units; only the top quintile shows a significant effect. | qcut of the latent dose into 5 groups among flooded kecamatans; 0 for non-flooded. | 0-5 | Derived | |
area_km2 continuous | Area (km^2) | Approximate land area of the kecamatan; sets the number of night-light pixels. | Drawn lognormal (30-1500 km^2). | km^2 | Simulation | |
n_pixels continuous | Pixel count | Number of ~0.86 km^2 night-light grid cells in the kecamatan. | round(area_km2 / 0.86); 30x30 arc-second pixels ~0.86 km^2 at the equator. | count | Derived | |
latitude continuous | Latitude (decimal degrees, +N) | Unit-centroid latitude; time-invariant. Enables Conley spatial standard errors. | Real approximate centroids for Aceh's 23 districts; synthetic non-Aceh districts drawn within the province bounding box. Sub-districts: parent centroid + ~20 km jitter. | degrees | Assigned (this study) | |
longitude continuous | Longitude (decimal degrees, +E) | Unit-centroid longitude; time-invariant. Used with latitude for haversine distances. | See latitude. Used for the Conley spatial kernel (≤100 km). | degrees | Assigned (this study) | |
year year | Calendar year | Calendar year of the observation. Panel spans 1999-2012 (levels); growth rates defined 2000-2012. | Annual index. | year | Simulation | |
post dummy | Post-tsunami dummy (1=2005+) | 1 for years 2005 and later (simple pre/post split). | 1 if year >= 2005. | 0/1 | Derived | |
period identifier | DiD event-time period | Event-time period for the staggered DiD dummies; baseline 2000-02 is the omitted reference. | Mapped from year: '(base year)' 1999, baseline 2000-02, pre 2003-04, tsunami 2005, recovery 2006-08, postrec 2009-12. | category | Derived | |
avg_luminosity continuous | Average luminosity (DN 0-63) | Mean Digital Number (brightness) across the kecamatan's pixels; flooded 2004 mean ~5.79, non-flooded ~2.36. | nl_sum / n_pixels, top-coded at 63 (DMSP saturation). | DN (0-63) | Simulation | |
nl_sum continuous | Summed luminosity (DN-sum) | Sum of Digital Numbers over all pixels in the kecamatan (the unit-level activity measure). | exp(nl_log) - 0.001. | DN-sum | Derived | |
nl_log continuous | Log luminosity (log DN-sum) | log( sum of (DN + 0.001) ) — the transformed regression variable matching the paper's log night-lights. | Cumulated from the 2004 anchor using the nl_growth series. | log DN-sum | Simulation | |
nl_growth continuous | Night-lights growth rate (log difference) | Annual growth rate of log night-lights (main sub-district dependent variable). | kecamatan FE + year FE + theta(period)*dose^2 (flooded) + N(0,0.005) noise. | proportion/yr | Simulation | NaN in 1999 |
Distribution & statistics (click a header to sort)
| Variable | Distribution | Coverage | N | Distinct | Min | Mean | Median | Max | SD |
|---|---|---|---|---|---|---|---|---|---|
kecamatan_id | – | 100% | 3,864 | 276 | — | — | — | — | — |
kecamatan_name | – | 100% | 3,864 | 276 | — | — | — | — | — |
district_name | – | 100% | 3,864 | 23 | — | — | — | — | — |
province | – | 100% | 3,864 | 1 | — | — | — | — | — |
flooded | 100% | 3,864 | 2 | 0 | 0.246 | 0 | 1.00 | 0.431 | |
share_pop_flooded | 100% | 3,864 | 68 | 0 | 0.050 | 0 | 0.558 | 0.107 | |
share_area_flooded | 100% | 3,864 | 59 | 0 | 4.51e-04 | 0 | 0.005 | 9.96e-04 | |
flood_intensity_quintile | – | 100% | 3,864 | 6 | — | — | — | — | — |
area_km2 | 100% | 3,864 | 266 | 31.50 | 217.0 | 172.9 | 966.8 | 146.9 | |
n_pixels | 100% | 3,864 | 202 | 37.00 | 252.3 | 201.0 | 1,124.0 | 170.8 | |
latitude | 100% | 3,864 | 274 | 2.06 | 4.38 | 4.61 | 5.94 | 0.967 | |
longitude | 100% | 3,864 | 276 | 95.06 | 96.76 | 96.89 | 98.36 | 0.895 | |
year | – | 100% | 3,864 | 14 | 1999 | 2005.5 | 2005 | 2012 | 4.03 |
post | 100% | 3,864 | 2 | 0 | 0.571 | 1.00 | 1.00 | 0.495 | |
period | – | 100% | 3,864 | 6 | — | — | — | — | — |
avg_luminosity | 100% | 3,864 | 2,364 | 0 | 3.39 | 0.844 | 56.32 | 6.26 | |
nl_sum | 100% | 3,864 | 2,427 | 1.00e-04 | 775.4 | 141.8 | 14,553 | 1,674.2 | |
nl_log | 100% | 3,864 | 3,725 | -6.81 | 1.49 | 4.95 | 9.59 | 6.13 | |
nl_growth | 89% | 3,444 | 3,056 | -0.071 | 0.032 | 0.032 | 0.157 | 0.035 |
Known limitations & caveats
- Synthetic data. There is no real data behind this tutorial; the panels are simulated and calibrated to reproduce the paper's findings. They teach the methods — they are not new empirical evidence about Aceh.
- Tiny treatment group. Only 10 flooded Aceh districts are treated, so point estimates are fragile and standard errors wide; the Aceh-only and city (Kota) columns are especially imprecise (2 flooded Kota districts).
- Observational identification. The estimand is the ATT under parallel trends — an assumption, supported by the flat pre-trend and the null neighbour placebo but never proven.
- Clustered treatment needs honest inference. All treated units sit in one corner of Sumatra; residual growth is spatially autocorrelated (Moran's I = +0.065, p = 0.003). Conley spatial-HAC errors roughly double the recovery effect's SE (0.0146 → 0.0244), downgrading it from a spurious 1% to an honest 5%.
- Missing growth rows are intentional. Growth (gdp_growth, gdp_pc_growth, pop_growth, nl_growth) is undefined in 1999 (no prior year) and gdp/pop growth is also missing for Subulussalam over 2003–06 (an administrative boundary change); every estimator simply drops those rows to match the paper's sample sizes.
- Magnitudes can differ slightly. Signs and significance track the paper closely, but synthetic magnitudes can diverge a little; the night-lights quintile and share-of-area scales follow Table 3's (smaller) units rather than the paper's mutually inconsistent Table 4. See Section 11 of the post for the reproduction audit.