← Back to the post
Interactive data dictionary

Regional Inequality from Outer Space

Predicting regional GDP from nighttime lights and building inequality indices in Python - a replication of Lessmann & Seidel (2017).

6
datasets
53
variables
180
countries
1992–2012
years
26,799
rows

Downloads

All data are free to download. Each dataset comes in two identical forms — Stata .dta (with embedded variable and value labels) and plain .csv.

⇩ Download all data (ZIP)stata_codebook.do

DatasetGrainRowsStataCSV
Prediction_Dataregion-year5,258 × 30Prediction_Data.dtaPrediction_Data.csv
Table_2_dataregion-year5,258 × 8Table_2_data.dtaTable_2_data.csv
Table_3_datacountry-year3,675 × 9Table_3_data.dtaTable_3_data.csv
Table_4_datacountry-year3,675 × 17Table_4_data.dtaTable_4_data.csv
Table_B4_dataregion-year5,258 × 14Table_B4_data.dtaTable_B4_data.csv
Figure_5_datacountry-year3,675 × 5Figure_5_data.dtaFigure_5_data.csv

Run stata_codebook.do in Stata once to attach long-form per-variable notes to the .dta files.

Load directly in code

Every file loads straight from GitHub (raw URLs — robust and stable) — no manual download needed (except pyreadstat, which reads local files). Swap the file name to load any of the six datasets.

Stata

* Stata 14+ : `use` reads an https URL directly
global BASE "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_kuznets_dmsp/data/"
use "${BASE}Table_3_data.dta", clear
describe        // variable + value labels
notes           // long-form documentation (after running stata_codebook.do)

Python

!pip install -q pyreadstat
# Python : pandas reads a .dta URL directly (values + variable labels)
import pandas as pd
BASE = "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_kuznets_dmsp/data/"
df = pd.read_stata(BASE + "Table_3_data.dta")

# load all six datasets at once
files = ["Prediction_Data", "Table_2_data", "Table_3_data",
         "Table_4_data", "Table_B4_data", "Figure_5_data"]
data = {f: pd.read_stata(BASE + f + ".dta") for f in files}

# pyreadstat exposes the richest metadata but reads LOCAL files -> download first
import pyreadstat, urllib.request
urllib.request.urlretrieve(BASE + "Table_3_data.dta", "Table_3_data.dta")
df, meta = pyreadstat.read_dta("Table_3_data.dta")

Copy and paste this snippet in Google Colab app. https://colab.research.google.com/notebooks/empty.ipynb

R

# R : haven::read_dta auto-downloads an https URL
library(haven)
BASE <- "https://raw.githubusercontent.com/cmg777/starter-academic-v501/master/content/post/python_kuznets_dmsp/data/"
df <- read_dta(paste0(BASE, "Table_3_data.dta"))   # labels via attr(df$var, "label")

Overview & sources

Companion data for the post Regional Inequality from Outer Space, a Python replication of Lessmann & Seidel (2017): predict regional GDP per capita from DMSP-OLS nighttime lights, build five population-weighted inequality indices, and estimate the spatial (regional) Kuznets curve and its determinants across up to 180 countries, 1992–2012.

Panel structure. Region files are keyed by region × year over a 1,504-region / 81-country training frame (1992–2010); country files are keyed by Country_ISO × year over 180 countries (1992–2012). The country-file inequality indices are built from the predicted regional incomes in the region files.

Data sources

SourceProvidesReference / URL
NOAA/NGDC DMSP-OLS stable lightsNighttime lights (DMSP-OLS stable lights v4, DN 0-63)NOAA National Geophysical Data Center. https://www.ngdc.noaa.gov/eog/dmsp.html
Gennaioli et al. (2014)Observed regional GDP per capita (training target)Gennaioli, La Porta, Lopez-de-Silanes & Shleifer (2014), J. Economic Growth 19(3).
World Bank WDINational accounts, determinants (GDP, trade, FDI, rents, etc.)World Bank, World Development Indicators. https://databank.worldbank.org/source/world-development-indicators
GADM (Global Administrative Areas)Administrative boundaries, region names, areas, centroidsGADM database of Global Administrative Areas. https://gadm.org
GPW v3 (CIESIN)Gridded population (region and country totals)CIESIN, Gridded Population of the World v3. https://sedac.ciesin.columbia.edu
Polity IV (Center for Systemic Peace)Democracy-autocracy score (Polity2)Center for Systemic Peace, Polity IV project. https://www.systemicpeace.org/inscrdata.html
GREG (Weidmann et al. 2010) + NOAA/NGDCEthnic homelands for the ethnic-inequality light GiniWeidmann, Rod & Cederman (2010), J. Peace Research 47(4).
Lessmann &amp; Seidel (2017)Original study replicated here; interpersonal Gini (Giniall)Lessmann & Seidel (2017), 'Regional inequality, convergence, and its determinants - A view from outer space', European Economic Review 92: 110-132.

Cite this data

Please cite both this dataset/replication and the original study.

APA

Mendez, C. (2026). Regional inequality from outer space: Predicting GDP from nighttime lights and building inequality indices in Python [Data set]. https://carlos-mendez.org/post/python_kuznets_dmsp/

Lessmann, C., & Seidel, A. (2017). Regional inequality, convergence, and its determinants — A view from outer space. European Economic Review, 92, 110–132.

BibTeX

@misc{mendez2026kuznetsdmsp,
  author       = {Mendez, Carlos},
  title        = {Regional Inequality from Outer Space: Predicting GDP from Nighttime Lights and Building Inequality Indices in Python},
  year         = {2026},
  howpublished = {\url{https://carlos-mendez.org/post/python_kuznets_dmsp/}},
  note         = {Data set; replication of Lessmann and Seidel (2017)}
}

@article{lessmann2017regional,
  author  = {Lessmann, Christian and Seidel, Andr\'{e}},
  title   = {Regional inequality, convergence, and its determinants---A view from outer space},
  journal = {European Economic Review},
  volume  = {92},
  pages   = {110--132},
  year    = {2017}
}

Variable explorer search & filter all 53 variables

Type to filter by name or label, or use the chips to filter by type. Each row shows a mini distribution of the variable. Click a column header to sort.

VariableTypeDistributionLabelDefinitionUnitsIn filesSource
Aid#continuousmin -1.22e+09 | median 2.65e+08 | max 2.6e+10Net official development assistance (2011 US$)Net official development assistance receivedUS$ (2011)Table_4_dataWorld Bank WDI
Arable_land#continuousmin 0.000431 | median 0.105 | max 0.661Arable land (share of land area)Arable land as a share of land area (FAO definition)shareTable_4_dataWorld Bank WDI
COVW_pred_GDP_pc#continuousmin 0.00287 | median 0.116 | max 0.365Pop-weighted coefficient of variation (pred income)Population-weighted coefficient of variation of predicted regional income>=0Table_3_dataThis study (derived)
Country_ISO#identifierCountry code (ISO 3166-1 alpha-3)Three-letter country identifierstringPrediction_Data, Table_2_data, Table_3_data, Table_4_data, Table_B4_data, Figure_5_dataGADM (Global Administrative Areas)
Country_NAME#identifierCountry nameCountry name (English)stringPrediction_Data, Table_3_data, Table_4_data, Figure_5_dataGADM (Global Administrative Areas)
FDI_share_of_GDP#continuousmin -0.829 | median 0.0248 | max 4.31FDI openness: net FDI inflows / GDPNet foreign direct investment inflows as a share of GDPratioTable_4_dataWorld Bank WDI
GDP_pc_Country#continuousmin 126 | median 6.86e+03 | max 1.19e+05National GDP per capita (2005 PPP US$)National GDP per capitaUS$ (2005 PPP)Table_3_data, Table_4_dataWorld Bank WDI
GDP_pc_Region#continuousmin 226 | median 8.77e+03 | max 1.51e+05Observed regional GDP per capita (2005 PPP US$)Observed regional GDP per capita (training target)US$ (2005 PPP)Prediction_Data, Table_2_dataGennaioli et al. (2014)
GE_0W_pred_GDP_pc#continuousmin 4.12e-06 | median 0.00683 | max 0.0514Pop-weighted mean log deviation GE(alpha=0)Population-weighted mean log deviation of predicted regional income>=0Table_3_dataThis study (derived)
GE_1W_pred_GDP_pc#continuousmin 4.13e-06 | median 0.00675 | max 0.0577Pop-weighted Theil index GE(alpha=1)Population-weighted Theil index of predicted regional income>=0Table_3_dataThis study (derived)
GE_m1W_pred_GDP_pc#continuousmin 4.12e-06 | median 0.00697 | max 0.0471Pop-weighted generalized entropy GE(alpha=-1)Population-weighted GE(-1) of predicted regional income>=0Table_3_dataThis study (derived)
GINIW_Eth_light#continuousmin 0 | median 0.2 | max 0.83Ethnic inequality: pop-weighted light GiniPopulation-weighted light-Gini computed across ethnic homelands0-1Table_4_dataGREG (Weidmann et al. 2010) + NOAA/NGDC
GINIW_pred_GDP_pc#continuousmin 0.00101 | median 0.0606 | max 0.163Pop-weighted regional Gini (predicted income)Population-weighted Gini of predicted regional income within a country-year0-1Table_3_data, Table_4_data, Figure_5_dataThis study (derived)
Giniall#continuousmin 17.5 | median 37.8 | max 74.3National interpersonal income Gini (0-100)Household-survey interpersonal income Gini0-100Figure_5_dataLessmann & Seidel (2017)
Latitude#continuousmin -54.3 | median 38.4 | max 70Region centroid latitude (degrees)Latitude of the region polygon centroiddegreesTable_B4_dataGADM (Global Administrative Areas)
Light_Country#continuousmin 1.21e+04 | median 1.73e+06 | max 8.33e+07Country total nighttime lights (summed DN)Sum of pixel digital numbers over the whole countrysummed DNTable_2_dataNOAA/NGDC DMSP-OLS stable lights
Light_Region#continuousmin 44 | median 5.84e+04 | max 7.9e+06Regional total nighttime lights (summed DN)Sum of pixel digital numbers over the regionsummed DNTable_2_dataNOAA/NGDC DMSP-OLS stable lights
Longitude#continuousmin -156 | median 23.1 | max 163Region centroid longitude (degrees)Longitude of the region polygon centroiddegreesTable_B4_dataGADM (Global Administrative Areas)
Polity2#continuousmin -1 | median 0.6 | max 1Polity IV democracy-autocracy score (-1..+1)Rescaled Polity IV combined democracy-autocracy score-1..+1Table_4_dataPolity IV (Center for Systemic Peace)
Pop_Country#continuousmin 1.19e+06 | median 3.85e+07 | max 1.33e+09Country total population (persons)Total population of the countrypersonsPrediction_Data, Table_2_data, Table_4_dataGPW v3 (CIESIN)
Pop_Region#continuousmin 928 | median 9.65e+05 | max 2e+08Regional total population (persons)Total population of the regionpersonsPrediction_Data, Table_2_dataGPW v3 (CIESIN)
Region_NAME#identifierFirst-level administrative region nameName of the first-level admin unit (state/province/canton)stringPrediction_DataGADM (Global Administrative Areas)
Resources_rents_share_of_GDP#continuousmin 0 | median 3.48 | max 100Natural-resource rents (% of GDP)Total natural-resource rents as a share of GDP% GDPTable_4_dataWorld Bank WDI
School_enrollment_secondary#continuousmin 5.16 | median 82.8 | max 161Gross secondary-school enrolment (% gross)Gross secondary-school enrolment ratio (>100% with over-age pupils)% grossTable_4_dataWorld Bank WDI
Trade_GDP_share#continuousmin 0.00309 | median 0.761 | max 5.32Trade openness (exports+imports)/GDPTrade as a share of GDPratioTable_4_dataWorld Bank WDI
area#continuousmin 50 | median 1.55e+05 | max 1.64e+07Country land area (km^2)Total land area excluding inland waterkm^2Table_4_dataWorld Bank WDI
code_Coutry_Region#identifierNumeric region key (orig. spelling 'Coutry' kept)Numeric identifier for a region (unique within country)integerPrediction_Data, Table_B4_dataAuthors' replication archive
eap#dummyshare coded 1 = 0.201World Bank region dummy: East Asia & Pacific1 if the country is in East Asia & Pacific (North America = reference)0/1Prediction_DataWorld Bank WDI
eca#dummyshare coded 1 = 0.468World Bank region dummy: Europe & Central Asia1 if the country is in Europe & Central Asia (North America = reference)0/1Prediction_DataWorld Bank WDI
fedelupd2#dummyshare coded 1 = 0.138Federal-state dummy (1=federal)1 if the country is federally organised0/1Table_4_dataAuthors' replication archive
id_t_j#identifierCountry-year key (year+ISO, e.g. 2010CHE)Concatenated year and ISO codestringPrediction_DataAuthors' replication archive
lac#dummyshare coded 1 = 0.165World Bank region dummy: Latin America & Caribbean1 if the country is in Latin America & Caribbean (North America = reference)0/1Prediction_DataWorld Bank WDI
log_GDP_pc_Country#continuousmin 6.07 | median 9.26 | max 11.5Log national GDP per capitaNatural log of national GDP per capitalog US$Prediction_DataWorld Bank WDI
log_GDP_pc_Region#continuousmin 5.42 | median 9.08 | max 11.9Log observed regional GDP per capitaNatural log of GDP_pc_Regionlog US$Prediction_Data, Table_B4_dataGennaioli et al. (2014)
log_Light_ppix_Region#continuousmin -4.61 | median 1.25 | max 4.14Log avg nighttime light per pixel (region)Natural log of the region mean DMSP-OLS stable-lights digital numberlog DNPrediction_Data, Table_B4_dataNOAA/NGDC DMSP-OLS stable lights
log_N_pix_low_cod_1_ppix#continuousmin -15.2 | median -0.523 | max -8.34e-05Log count of low-coded pixels (DN=0)Log number of dark (low-coded) pixels in the regionlog countPrediction_DataNOAA/NGDC DMSP-OLS stable lights
log_N_pix_top_cod_1_ppix#continuousmin -20.7 | median -12.4 | max 4.2e-05Log count of top-coded pixels (DN=63)Log number of saturated (top-coded) pixels in the regionlog countPrediction_DataNOAA/NGDC DMSP-OLS stable lights
log_area#continuousmin 9.91 | median 13 | max 16.6Log region area (km^2)Natural log of the region polygon arealog km^2Prediction_DataGADM (Global Administrative Areas)
log_region#continuousmin 1.39 | median 3.18 | max 4.34Log number of regions in the countryLog count of first-level regions per countrylog countPrediction_DataGADM (Global Administrative Areas)
log_region_X_log_area#continuousmin 17.1 | median 42 | max 72.2Interaction: log_region x log_areaProduct of log_region and log_area-Prediction_DataThis study (derived)
mena#dummyshare coded 1 = 0.041World Bank region dummy: Middle East & North Africa1 if the country is in Middle East & North Africa (North America = reference)0/1Prediction_DataWorld Bank WDI
pred_GDP_pc_Region#continuousmin 360 | median 8.32e+03 | max 7.06e+04Predicted regional GDP per capita (2005 PPP US$)Model-predicted regional GDP per capitaUS$ (2005 PPP)Table_2_dataThis study (derived)
price_gasoline#continuousmin 0.0185 | median 0.844 | max 2.35Gasoline pump price (2005 PPP US$/litre)Pump price for gasolineUS$/litreTable_4_dataWorld Bank WDI
sa#dummyshare coded 1 = 0.044World Bank region dummy: South Asia1 if the country is in South Asia (North America = reference)0/1Prediction_DataWorld Bank WDI
satyear_1#dummyshare coded 1 = 0.016Satellite/sensor-era dummy 1 (of 7)1 for DMSP satellite/sensor configuration era 10/1Prediction_Data, Table_B4_dataNOAA/NGDC DMSP-OLS stable lights
satyear_2#dummyshare coded 1 = 0.004Satellite/sensor-era dummy 2 (of 7)1 for DMSP satellite/sensor configuration era 20/1Prediction_Data, Table_B4_dataNOAA/NGDC DMSP-OLS stable lights
satyear_3#dummyshare coded 1 = 0.230Satellite/sensor-era dummy 3 (of 7)1 for DMSP satellite/sensor configuration era 30/1Prediction_Data, Table_B4_dataNOAA/NGDC DMSP-OLS stable lights
satyear_4#dummyshare coded 1 = 0.022Satellite/sensor-era dummy 4 (of 7)1 for DMSP satellite/sensor configuration era 40/1Prediction_Data, Table_B4_dataNOAA/NGDC DMSP-OLS stable lights
satyear_5#dummyshare coded 1 = 0.251Satellite/sensor-era dummy 5 (of 7)1 for DMSP satellite/sensor configuration era 50/1Prediction_Data, Table_B4_dataNOAA/NGDC DMSP-OLS stable lights
satyear_6#dummyshare coded 1 = 0.246Satellite/sensor-era dummy 6 (of 7)1 for DMSP satellite/sensor configuration era 60/1Prediction_Data, Table_B4_dataNOAA/NGDC DMSP-OLS stable lights
satyear_7#dummyshare coded 1 = 0.032Satellite/sensor-era dummy 7 (of 7)1 for DMSP satellite/sensor configuration era 70/1Prediction_Data, Table_B4_dataNOAA/NGDC DMSP-OLS stable lights
ssa#dummyshare coded 1 = 0.034World Bank region dummy: Sub-Saharan Africa1 if the country is in Sub-Saharan Africa (North America = reference)0/1Prediction_DataWorld Bank WDI
year#yearCalendar yearYear of observationyearPrediction_Data, Table_2_data, Table_3_data, Table_4_data, Table_B4_data, Figure_5_data-

Cross-file variable index

Which file each variable appears in (● = present).

Construction & formulas

All five inequality indices are computed within each country-year, across that country's regions, on predicted regional income y = pred_GDP_pc_Region, weighted by the regional population share p_i = Pop_Region_i / Pop_Country. Let ybar = sum_i p_i * y_i be the population-weighted mean.

Other constructed variables:

The six datasets

Switch datasets with the tabs. Each shows the full variable dictionary plus a sortable statistics table with mini distributions and data coverage.

expand to search (Ctrl/⌘+F) or print across all six datasets

region-year  5,258 × 30 · 1992-2010 · 1,504 regions in 81 countries

Panel key: code_Coutry_Region x year · Train the light->income prediction model (Table 1).

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
Region_NAME identifierFirst-level administrative region nameName of the first-level admin unit (state/province/canton)GADM admin-1 namestringGADM (Global Administrative Areas)1992-2010 · 1,504 reg (81 ctry) · region frame
Country_NAME identifierCountry nameCountry name (English)From GADM country attributesstringGADM (Global Administrative Areas)country/region files
Country_ISO identifierCountry code (ISO 3166-1 alpha-3)Three-letter country identifierAssigned per countrystringGADM (Global Administrative Areas)all files
id_t_j identifierCountry-year key (year+ISO, e.g. 2010CHE)Concatenated year and ISO codeyear concatenated with Country_ISOstringAuthors' replication archiveregion frame (Prediction)
code_Coutry_Region identifierNumeric region key (orig. spelling 'Coutry' kept)Numeric identifier for a region (unique within country)Region identifier carried verbatim from the authors' archiveintegerAuthors' replication archive1992-2010 · 1,504 reg (81 ctry) · region frame
year yearCalendar yearYear of observation-year-per file (see summary)
Pop_Region continuousRegional total population (persons)Total population of the regionPopulation density x region area, rounded up (min 1); 5-yr waves interpolated to annualpersonsGPW v3 (CIESIN)1992-2010 · 1,504 reg (81 ctry) · region frame
Pop_Country continuousCountry total population (persons)Total population of the countrySum of regional populationspersonsGPW v3 (CIESIN)region & country frames
GDP_pc_Region continuousObserved regional GDP per capita (2005 PPP US$)Observed regional GDP per capita (training target)Regional accounts, constant 2005 PPP US$US$ (2005 PPP)Gennaioli et al. (2014)1992-2010 · 1,504 reg (81 ctry) · region frame
log_GDP_pc_Region continuousLog observed regional GDP per capitaNatural log of GDP_pc_Regionln(GDP_pc_Region)log US$Gennaioli et al. (2014)1992-2010 · 1,504 reg (81 ctry) · region frame
log_Light_ppix_Region continuousLog avg nighttime light per pixel (region)Natural log of the region mean DMSP-OLS stable-lights digital numberln(mean DN); mean set to 0.01 when 0 so the log is defined; DN ranges 0-63log DNNOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
log_GDP_pc_Country continuousLog national GDP per capitaNatural log of national GDP per capitaln(national GDP per capita)log US$World Bank WDI1992-2010 · 1,504 reg (81 ctry) · region frame
log_N_pix_top_cod_1_ppix continuousLog count of top-coded pixels (DN=63)Log number of saturated (top-coded) pixels in the regionln(count of DN=63 pixels) per region; controls for sensor saturationlog countNOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
log_N_pix_low_cod_1_ppix continuousLog count of low-coded pixels (DN=0)Log number of dark (low-coded) pixels in the regionln(count of DN=0 pixels) per region; controls for sparse/rural arealog countNOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
log_area continuousLog region area (km^2)Natural log of the region polygon arealn(region area in km^2)log km^2GADM (Global Administrative Areas)1992-2010 · 1,504 reg (81 ctry) · region frame
log_region continuousLog number of regions in the countryLog count of first-level regions per countryln(number of regions in the country)log countGADM (Global Administrative Areas)1992-2010 · 1,504 reg (81 ctry) · region frame
log_region_X_log_area continuousInteraction: log_region x log_areaProduct of log_region and log_arealog_region * log_area-This study (derived)1992-2010 · 1,504 reg (81 ctry) · region frame
eap dummyWorld Bank region dummy: East Asia & Pacific1 if the country is in East Asia & Pacific (North America = reference)World Bank regional grouping indicator0/1World Bank WDI1992-2010 · 1,504 reg (81 ctry) · region frame
eca dummyWorld Bank region dummy: Europe & Central Asia1 if the country is in Europe & Central Asia (North America = reference)World Bank regional grouping indicator0/1World Bank WDI1992-2010 · 1,504 reg (81 ctry) · region frame
lac dummyWorld Bank region dummy: Latin America & Caribbean1 if the country is in Latin America & Caribbean (North America = reference)World Bank regional grouping indicator0/1World Bank WDI1992-2010 · 1,504 reg (81 ctry) · region frame
mena dummyWorld Bank region dummy: Middle East & North Africa1 if the country is in Middle East & North Africa (North America = reference)World Bank regional grouping indicator0/1World Bank WDI1992-2010 · 1,504 reg (81 ctry) · region frame
sa dummyWorld Bank region dummy: South Asia1 if the country is in South Asia (North America = reference)World Bank regional grouping indicator0/1World Bank WDI1992-2010 · 1,504 reg (81 ctry) · region frame
ssa dummyWorld Bank region dummy: Sub-Saharan Africa1 if the country is in Sub-Saharan Africa (North America = reference)World Bank regional grouping indicator0/1World Bank WDI1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_1 dummySatellite/sensor-era dummy 1 (of 7)1 for DMSP satellite/sensor configuration era 1Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_2 dummySatellite/sensor-era dummy 2 (of 7)1 for DMSP satellite/sensor configuration era 2Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_3 dummySatellite/sensor-era dummy 3 (of 7)1 for DMSP satellite/sensor configuration era 3Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_4 dummySatellite/sensor-era dummy 4 (of 7)1 for DMSP satellite/sensor configuration era 4Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_5 dummySatellite/sensor-era dummy 5 (of 7)1 for DMSP satellite/sensor configuration era 5Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_6 dummySatellite/sensor-era dummy 6 (of 7)1 for DMSP satellite/sensor configuration era 6Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_7 dummySatellite/sensor-era dummy 7 (of 7)1 for DMSP satellite/sensor configuration era 7Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
Region_NAME100%5,2581,483
Country_NAME100%5,25882
Country_ISO100%5,25881
id_t_j100%5,258277
code_Coutry_Region100%5,2581,504
year100%5,2581919922002.2200020105.51
Pop_Regionmin 928 | median 9.65e+05 | max 2e+08100%5,2585,258928.43,705,986964,556199,528,67211,695,457
Pop_Countrymin 1.19e+06 | median 3.85e+07 | max 1.33e+09100%5,2582771,193,269103,683,79138,461,0961,328,343,680228,114,600
GDP_pc_Regionmin 226 | median 8.77e+03 | max 1.51e+05100%5,2585,207226.314,3718,770.2150,76813,450
log_GDP_pc_Regionmin 5.42 | median 9.08 | max 11.9100%5,2585,1615.429.109.0811.921.06
log_Light_ppix_Regionmin -4.61 | median 1.25 | max 4.14100%5,2585,184-4.610.9571.254.141.77
log_GDP_pc_Countrymin 6.07 | median 9.26 | max 11.5100%5,2582776.079.289.2611.450.939
log_N_pix_top_cod_1_ppixmin -20.7 | median -12.4 | max 4.2e-05100%5,2583,820-20.75-10.53-12.374.20e-054.31
log_N_pix_low_cod_1_ppixmin -15.2 | median -0.523 | max -8.34e-05100%5,2585,135-15.16-1.55-0.523-8.34e-052.83
log_areamin 9.91 | median 13 | max 16.6100%5,258819.9113.1713.0116.611.74
log_regionmin 1.39 | median 3.18 | max 4.34100%5,258341.393.193.184.340.698
log_region_X_log_areamin 17.1 | median 42 | max 72.2100%5,2588317.1542.5742.0572.1612.98
eapshare coded 1 = 0.201100%5,258200.20101.000.401
ecashare coded 1 = 0.468100%5,258200.46801.000.499
lacshare coded 1 = 0.165100%5,258200.16501.000.372
menashare coded 1 = 0.041100%5,258200.04101.000.198
sashare coded 1 = 0.044100%5,258200.04401.000.204
ssashare coded 1 = 0.034100%5,258200.03401.000.181
satyear_1share coded 1 = 0.016100%5,258200.01601.000.125
satyear_2share coded 1 = 0.004100%5,258200.00401.000.062
satyear_3share coded 1 = 0.230100%5,258200.23001.000.421
satyear_4share coded 1 = 0.022100%5,258200.02201.000.146
satyear_5share coded 1 = 0.251100%5,258200.25101.000.434
satyear_6share coded 1 = 0.246100%5,258200.24601.000.431
satyear_7share coded 1 = 0.032100%5,258200.03201.000.176

region-year  5,258 × 8 · 1992-2010 · same 1,504-region training frame

Panel key: Country_ISO x year (region-year frame) · Validate the inequality indices (Table 2).

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
Country_ISO identifierCountry code (ISO 3166-1 alpha-3)Three-letter country identifierAssigned per countrystringGADM (Global Administrative Areas)all files
year yearCalendar yearYear of observation-year-per file (see summary)
pred_GDP_pc_Region continuousPredicted regional GDP per capita (2005 PPP US$)Model-predicted regional GDP per capitaBack-transformed fitted values of the eq.-1 random-effects modelUS$ (2005 PPP)This study (derived)region frame (Table_2)
GDP_pc_Region continuousObserved regional GDP per capita (2005 PPP US$)Observed regional GDP per capita (training target)Regional accounts, constant 2005 PPP US$US$ (2005 PPP)Gennaioli et al. (2014)1992-2010 · 1,504 reg (81 ctry) · region frame
Light_Region continuousRegional total nighttime lights (summed DN)Sum of pixel digital numbers over the regionSum of DMSP-OLS stable-lights DN over the region's pixelssummed DNNOAA/NGDC DMSP-OLS stable lightsregion frame (Table_2)
Light_Country continuousCountry total nighttime lights (summed DN)Sum of pixel digital numbers over the whole countrySum of DMSP-OLS stable-lights DN over all country pixelssummed DNNOAA/NGDC DMSP-OLS stable lightsregion frame (Table_2)
Pop_Region continuousRegional total population (persons)Total population of the regionPopulation density x region area, rounded up (min 1); 5-yr waves interpolated to annualpersonsGPW v3 (CIESIN)1992-2010 · 1,504 reg (81 ctry) · region frame
Pop_Country continuousCountry total population (persons)Total population of the countrySum of regional populationspersonsGPW v3 (CIESIN)region & country frames

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
Country_ISO100%5,25881
year100%5,2581919922002.2200020105.51
pred_GDP_pc_Regionmin 360 | median 8.32e+03 | max 7.06e+04100%5,2585,254360.113,4228,324.970,63811,689
GDP_pc_Regionmin 226 | median 8.77e+03 | max 1.51e+05100%5,2585,207226.314,3718,770.2150,76813,450
Light_Regionmin 44 | median 5.84e+04 | max 7.9e+06100%5,2585,19944.00213,45658,4017,904,552465,243
Light_Countrymin 1.21e+04 | median 1.73e+06 | max 8.33e+07100%5,25827712,1067,477,4491,733,50883,312,52814,801,072
Pop_Regionmin 928 | median 9.65e+05 | max 2e+08100%5,2585,258928.43,705,986964,556199,528,67211,695,457
Pop_Countrymin 1.19e+06 | median 3.85e+07 | max 1.33e+09100%5,2582771,193,269103,683,79138,461,0961,328,343,680228,114,600

country-year  3,675 × 9 · 1992-2012 · 180 countries

Panel key: Country_ISO x year · Kuznets curve: GDP + five indices (Table 3).

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
Country_NAME identifierCountry nameCountry name (English)From GADM country attributesstringGADM (Global Administrative Areas)country/region files
Country_ISO identifierCountry code (ISO 3166-1 alpha-3)Three-letter country identifierAssigned per countrystringGADM (Global Administrative Areas)all files
year yearCalendar yearYear of observation-year-per file (see summary)
GDP_pc_Country continuousNational GDP per capita (2005 PPP US$)National GDP per capitaWorld Bank WDI, constant 2005 PPP US$US$ (2005 PPP)World Bank WDI1992-2012 · 180 ctry · country frame
GINIW_pred_GDP_pc continuousPop-weighted regional Gini (predicted income)Population-weighted Gini of predicted regional income within a country-yearGini of pred_GDP_pc_Region across regions, weighted by Pop_Region, per country-year0-1This study (derived)1992-2012 · 180 ctry · country frame
COVW_pred_GDP_pc continuousPop-weighted coefficient of variation (pred income)Population-weighted coefficient of variation of predicted regional incomepop-weighted SD / pop-weighted mean of pred_GDP_pc_Region, per country-year>=0This study (derived)1992-2012 · 180 ctry · country frame
GE_1W_pred_GDP_pc continuousPop-weighted Theil index GE(alpha=1)Population-weighted Theil index of predicted regional incomeGeneralized entropy GE(alpha=1) of pred_GDP_pc_Region, pop-weighted, per country-year>=0This study (derived)1992-2012 · 180 ctry · country frame
GE_0W_pred_GDP_pc continuousPop-weighted mean log deviation GE(alpha=0)Population-weighted mean log deviation of predicted regional incomeGeneralized entropy GE(alpha=0) of pred_GDP_pc_Region, pop-weighted, per country-year>=0This study (derived)1992-2012 · 180 ctry · country frame
GE_m1W_pred_GDP_pc continuousPop-weighted generalized entropy GE(alpha=-1)Population-weighted GE(-1) of predicted regional incomeGeneralized entropy GE(alpha=-1) of pred_GDP_pc_Region, pop-weighted, per country-year>=0This study (derived)1992-2012 · 180 ctry · country frame

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
Country_NAME100%3,675180
Country_ISO100%3,675180
year100%3,6752119922002.1200220126.03
GDP_pc_Countrymin 126 | median 6.86e+03 | max 1.19e+05100%3,6753,675126.412,5726,864.4119,06815,364
GINIW_pred_GDP_pcmin 0.00101 | median 0.0606 | max 0.163100%3,6753,6740.0010.0640.0610.1630.033
COVW_pred_GDP_pcmin 0.00287 | median 0.116 | max 0.365100%3,6753,6740.0030.1270.1160.3650.069
GE_1W_pred_GDP_pcmin 4.13e-06 | median 0.00675 | max 0.0577100%3,6753,6754.13e-060.0100.0070.0580.010
GE_0W_pred_GDP_pcmin 4.12e-06 | median 0.00683 | max 0.0514100%3,6753,6754.12e-060.0100.0070.0510.009
GE_m1W_pred_GDP_pcmin 4.12e-06 | median 0.00697 | max 0.0471100%3,6753,6754.12e-060.0100.0070.0470.009

country-year  3,675 × 17 · 1992-2012 · 180 countries (determinants sparser)

Panel key: Country_ISO x year · Determinants of regional inequality (Table 4).

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
Country_NAME identifierCountry nameCountry name (English)From GADM country attributesstringGADM (Global Administrative Areas)country/region files
Country_ISO identifierCountry code (ISO 3166-1 alpha-3)Three-letter country identifierAssigned per countrystringGADM (Global Administrative Areas)all files
year yearCalendar yearYear of observation-year-per file (see summary)
GINIW_pred_GDP_pc continuousPop-weighted regional Gini (predicted income)Population-weighted Gini of predicted regional income within a country-yearGini of pred_GDP_pc_Region across regions, weighted by Pop_Region, per country-year0-1This study (derived)1992-2012 · 180 ctry · country frame
GDP_pc_Country continuousNational GDP per capita (2005 PPP US$)National GDP per capitaWorld Bank WDI, constant 2005 PPP US$US$ (2005 PPP)World Bank WDI1992-2012 · 180 ctry · country frame
Pop_Country continuousCountry total population (persons)Total population of the countrySum of regional populationspersonsGPW v3 (CIESIN)region & country frames
Resources_rents_share_of_GDP continuousNatural-resource rents (% of GDP)Total natural-resource rents as a share of GDPOil + gas + coal + mineral + forest rents, % of GDP% GDPWorld Bank WDI177 ctry · N=3,620
Arable_land continuousArable land (share of land area)Arable land as a share of land area (FAO definition)Arable land / total land areashareWorld Bank WDI178 ctry · N=3,603
Trade_GDP_share continuousTrade openness (exports+imports)/GDPTrade as a share of GDP(Exports + imports) / GDPratioWorld Bank WDI176 ctry · N=3,509
FDI_share_of_GDP continuousFDI openness: net FDI inflows / GDPNet foreign direct investment inflows as a share of GDPNet FDI inflows / GDPratioWorld Bank WDI174 ctry · N=3,477
area continuousCountry land area (km^2)Total land area excluding inland waterWorld Bank WDI land areakm^2World Bank WDI1992-2012 · 180 ctry · country frame
price_gasoline continuousGasoline pump price (2005 PPP US$/litre)Pump price for gasolinePump price, PPP constant 2005 US$/litre; paper's transport cost = area x price_gasolineUS$/litreWorld Bank WDI162 ctry · N=1,366
Aid continuousNet official development assistance (2011 US$)Net official development assistance receivedNet ODA received, constant 2011 US$US$ (2011)World Bank WDI155 ctry · N=2,964
School_enrollment_secondary continuousGross secondary-school enrolment (% gross)Gross secondary-school enrolment ratio (>100% with over-age pupils)Secondary enrolment / age-eligible population% grossWorld Bank WDI172 ctry · N=2,566
GINIW_Eth_light continuousEthnic inequality: pop-weighted light GiniPopulation-weighted light-Gini computed across ethnic homelandsLight Gini across ethnic homelands (method of Alesina et al. 2016)0-1GREG (Weidmann et al. 2010) + NOAA/NGDC173 ctry · N=3,528
Polity2 continuousPolity IV democracy-autocracy score (-1..+1)Rescaled Polity IV combined democracy-autocracy scorePolity IV combined score rescaled -1 (autocracy) to +1 (democracy)-1..+1Polity IV (Center for Systemic Peace)157 ctry · N=3,158
fedelupd2 dummyFederal-state dummy (1=federal)1 if the country is federally organisedFederalism indicator from the authors' archive0/1Authors' replication archive1992-2009 · 154 ctry · N=2,724

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
Country_NAME100%3,675180
Country_ISO100%3,675180
year100%3,6752119922002.1200220126.03
GINIW_pred_GDP_pcmin 0.00101 | median 0.0606 | max 0.163100%3,6753,6740.0010.0640.0610.1630.033
GDP_pc_Countrymin 126 | median 6.86e+03 | max 1.19e+05100%3,6753,675126.412,5726,864.4119,06815,364
Pop_Countrymin 2.69e+03 | median 7.56e+06 | max 1.35e+09100%3,6753,6752,690.034,542,6047,563,8831,353,431,168126,525,234
Resources_rents_share_of_GDPmin 0 | median 3.48 | max 10099%3,6203,45409.933.48100.415.00
Arable_landmin 0.000431 | median 0.105 | max 0.66198%3,6032,4924.31e-040.1480.1050.6610.138
Trade_GDP_sharemin 0.00309 | median 0.761 | max 5.3295%3,5093,5090.0030.8410.7615.320.469
FDI_share_of_GDPmin -0.829 | median 0.0248 | max 4.3195%3,4773,476-0.8290.0440.0254.310.105
areamin 50 | median 1.55e+05 | max 1.64e+07100%3,67518050.00728,582155,36016,380,0841,922,363
price_gasolinemin 0.0185 | median 0.844 | max 2.3537%1,3668340.0190.8820.8442.350.417
Aidmin -1.22e+09 | median 2.65e+08 | max 2.6e+1081%2,9642,914-1,218,120,000557,102,709265,405,00025,985,650,000977,685,466
School_enrollment_secondarymin 5.16 | median 82.8 | max 16170%2,5662,5665.1674.1982.79160.631.42
GINIW_Eth_lightmin 0 | median 0.2 | max 0.8396%3,5283,17800.2730.2000.8300.256
Polity2min -1 | median 0.6 | max 186%3,15821-1.000.3410.6001.000.656
fedelupd2share coded 1 = 0.13874%2,724200.13801.000.345

region-year  5,258 × 14 · 1992-2010 · 1,504 regions in 81 countries

Panel key: code_Coutry_Region x year · Conley spatial-HAC standard errors (+ lat/lon).

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
code_Coutry_Region identifierNumeric region key (orig. spelling 'Coutry' kept)Numeric identifier for a region (unique within country)Region identifier carried verbatim from the authors' archiveintegerAuthors' replication archive1992-2010 · 1,504 reg (81 ctry) · region frame
Country_ISO identifierCountry code (ISO 3166-1 alpha-3)Three-letter country identifierAssigned per countrystringGADM (Global Administrative Areas)all files
year yearCalendar yearYear of observation-year-per file (see summary)
Latitude continuousRegion centroid latitude (degrees)Latitude of the region polygon centroidGADM polygon centroiddegreesGADM (Global Administrative Areas)region frame (Table_B4)
Longitude continuousRegion centroid longitude (degrees)Longitude of the region polygon centroidGADM polygon centroiddegreesGADM (Global Administrative Areas)region frame (Table_B4)
log_GDP_pc_Region continuousLog observed regional GDP per capitaNatural log of GDP_pc_Regionln(GDP_pc_Region)log US$Gennaioli et al. (2014)1992-2010 · 1,504 reg (81 ctry) · region frame
log_Light_ppix_Region continuousLog avg nighttime light per pixel (region)Natural log of the region mean DMSP-OLS stable-lights digital numberln(mean DN); mean set to 0.01 when 0 so the log is defined; DN ranges 0-63log DNNOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_1 dummySatellite/sensor-era dummy 1 (of 7)1 for DMSP satellite/sensor configuration era 1Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_2 dummySatellite/sensor-era dummy 2 (of 7)1 for DMSP satellite/sensor configuration era 2Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_3 dummySatellite/sensor-era dummy 3 (of 7)1 for DMSP satellite/sensor configuration era 3Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_4 dummySatellite/sensor-era dummy 4 (of 7)1 for DMSP satellite/sensor configuration era 4Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_5 dummySatellite/sensor-era dummy 5 (of 7)1 for DMSP satellite/sensor configuration era 5Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_6 dummySatellite/sensor-era dummy 6 (of 7)1 for DMSP satellite/sensor configuration era 6Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame
satyear_7 dummySatellite/sensor-era dummy 7 (of 7)1 for DMSP satellite/sensor configuration era 7Sensor-era indicator; DMSP sensors change and age over 1992-20100/1NOAA/NGDC DMSP-OLS stable lights1992-2010 · 1,504 reg (81 ctry) · region frame

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
code_Coutry_Region100%5,2581,504
Country_ISO100%5,25881
year100%5,2581919922002.2200020105.51
Latitudemin -54.3 | median 38.4 | max 70100%5,2581,504-54.3329.8138.4569.9524.33
Longitudemin -156 | median 23.1 | max 163100%5,2581,504-156.422.4523.12163.067.31
log_GDP_pc_Regionmin 5.42 | median 9.08 | max 11.9100%5,2585,1615.429.109.0811.921.06
log_Light_ppix_Regionmin -4.61 | median 1.25 | max 4.14100%5,2585,184-4.610.9571.254.141.77
satyear_1share coded 1 = 0.016100%5,258200.01601.000.125
satyear_2share coded 1 = 0.004100%5,258200.00401.000.062
satyear_3share coded 1 = 0.230100%5,258200.23001.000.421
satyear_4share coded 1 = 0.022100%5,258200.02201.000.146
satyear_5share coded 1 = 0.251100%5,258200.25101.000.434
satyear_6share coded 1 = 0.246100%5,258200.24601.000.431
satyear_7share coded 1 = 0.032100%5,258200.03201.000.176

country-year  3,675 × 5 · 1992-2012 · 180 countries (Giniall: 153 ctry)

Panel key: Country_ISO x year · Regional vs interpersonal inequality (Figure 5).

Variable dictionary

VariableLabelDefinitionConstructionUnitsSourceCoverage
Country_ISO identifierCountry code (ISO 3166-1 alpha-3)Three-letter country identifierAssigned per countrystringGADM (Global Administrative Areas)all files
Country_NAME identifierCountry nameCountry name (English)From GADM country attributesstringGADM (Global Administrative Areas)country/region files
year yearCalendar yearYear of observation-year-per file (see summary)
GINIW_pred_GDP_pc continuousPop-weighted regional Gini (predicted income)Population-weighted Gini of predicted regional income within a country-yearGini of pred_GDP_pc_Region across regions, weighted by Pop_Region, per country-year0-1This study (derived)1992-2012 · 180 ctry · country frame
Giniall continuousNational interpersonal income Gini (0-100)Household-survey interpersonal income GiniReported household income Gini on a 0-100 scale (note: regional indices are 0-1)0-100Lessmann & Seidel (2017)1992-2012 · 153 ctry · N=1,330 (Figure_5)

Distribution & statistics (click a header to sort)

VariableDistributionCoverageNDistinctMinMeanMedianMaxSD
Country_ISO100%3,675180
Country_NAME100%3,675180
year100%3,6752119922002.1200220126.03
GINIW_pred_GDP_pcmin 0.00101 | median 0.0606 | max 0.163100%3,6753,6740.0010.0640.0610.1630.033
Giniallmin 17.5 | median 37.8 | max 74.336%1,33038417.5039.5537.8074.3010.11

Known limitations & caveats