Predicting subnational GDP in Vietnam with remote sensing data: A machine learning approach

Abstract

Official subnational Gross Domestic Product (GDP) data in Vietnam has been available only since 2010, hindering the analysis of long-term dynamics of local development. Based on remote sensing data and machine learning methods, we construct a subnational GDP indicator for the 63 Vietnamese provinces from 1992 to 2009. Specifically, we rely on nighttime lights (NTL), agricultural land, and climate datasets and employ six machine learning algorithms to construct the GDP dataset. We compare the accuracy of several machine learning algorithms and compare the predicted subnational GDP of the best-performing algorithm using two nighttime lights datasets. We show consistent predictions using both datasets, and construct the subnational GDP dataset using the NTL data with the longer temporal coverage. This new dataset allows researchers and policymakers to analyze long-term economic trends at the subnational level in Vietnam, filling a critical gap in historical economic data.

Publication
Letters in Spatial and Resource Sciences

๐Ÿค– AI Podcast Summary

QuaRCS-lab ยท Vietnam Subnational GDP Prediction Using Remote Sensing and Machine Learning

๐Ÿ›ฐ๏ธ Introduction & Context

  • Challenge: Limited subnational GDP data in Vietnam before 2010
  • Need: Long-term data for economic development analysis
  • Solution: Predict GDP using remote sensing & machine learning

๐ŸŒŒ Data Sources Used

  • Official GDP data (2010-2020)
  • Nighttime Lights (NTL): Harmonized DMSP & VIIRS-like datasets
  • Agricultural land data (ESA)
  • Climate data: Temperature & precipitation (CRU)

๐Ÿง  Machine Learning Approach

  • Six algorithms compared:
    • Artificial Neural Networks (ANN)
    • Random Forest (RF)
    • Support Vector Machines (SVM)
    • K-Nearest Neighbors (KNN)
    • Ridge Regression
    • eXtreme Gradient Boosting (XGBoost)

๐Ÿ”ฆ Key Findings

  • Predictions consistent across different nighttime datasets
  • Ridge Regression chosen for final model
  • Important features: Temperature & Agricultural Land more influential than NTL

๐ŸŒ Application & Significance

  • Created GDP data from 1992-2009
  • Enables detailed long-term analysis of regional economic trends
  • Assists policymakers and researchers in addressing regional inequality and growth

โš ๏ธ Limitations

  • Remote sensing measurement/calibration discrepancies
  • Dependence on official GDP benchmarks
  • Interpretability challenges of machine learning methods

๐Ÿš€ Future Research Directions

  • Explore additional remote sensing datasets
  • Estimate broader socioeconomic indicators
  • Improve models with larger datasets

๐ŸŽฏ Conclusion

  • Machine learning + Remote sensing effectively address subnational data gaps
  • New dataset supports informed economic policy decisions
  • Potentially replicable method for other developing countries
Carlos Mendez
Carlos Mendez
Associate Professor of Development Economics

My research interests focus on the integration of development economics, spatial data science, and econometrics to understand and inform the process of sustainable development across regions.

Related