Household surveys give a precise estimate of poverty; however, surveys are costly and are fielded infrequently. We demonstrate the importance of jointly using multiple public and private sector data sources to estimate levels and changes in wealth for a large set of countries. We train models using 63,854 survey cluster locations across 59 countries, relying on data from satellites, Facebook Marketing information, and OpenStreetMaps. The model generalizes previous approaches to a wide set of countries. On average, across countries, the model explains 55% (min = 14%; max = 85%) of the variation in levels of wealth at the survey cluster level and 59% (min = 0%; max = 93%) of the variation at the district level, and the model explains 4% (min = 0%; max = 17%) and 6% (min = 0%; max = 26%) of the variation of changes in wealth at the cluster and district levels. Models perform best in lower-income countries and in countries with higher variance in wealth. Features from nighttime lights, OpenStreetMaps, and land cover data are most important in explaining levels of wealth, and features from nighttime lights are most important in explaining changes in wealth.
Repository name | URI |
---|---|
Reproducible Research Repository (World Bank) | https://reproducibility.worldbank.org |
Github | https://github.com/dime-worldbank/big-data-poverty-estimation |
The code was reproduced in a computer with the following specifications:
• OS: Windows 11 Home
• Processor: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz 2.3GHz
• Memory available: 15.8 GB
• Software version: R 4.2
~2 hours run-time starting from analysis-ready data
This replication package is divided into two parts:
The first phase focuses on preparing the raw data for analysis. Due to data sharing restrictions, users must manually download specific datasets from their respective sources. The package's README file provides detailed instructions, including URLs, necessary access protocols, and guidelines for organizing the data folder, which is crucial for the package’s proper operation.
The _main.R
script is set up to skip the process of creating analysis-ready datasets from raw data and instead loads pre-prepared datasets directly. However, users can modify a parameter in _main.R
to run scripts that transform the raw data into analysis-ready datasets. This process involves:
_main.R
points to these scripts, they must be opened and executed manually by the user, as documented in the README file._main.R
script continues with the analysis. Users need to adjust the file paths, and the script will automatically run the necessary analyses, producing the figures and tables shown in the paper.Some data is confidential and has not been included in the package. For more details, please refer to the README file
Author | Affiliation | |
---|---|---|
Robert Marty | World Bank | rmarty@worldbank.org |
Alice Duhaut | World Bank | aduhaut@worldbank.org |
2024-02
Location | Code |
---|---|
World | WLD |
The materials in the reproducibility packages are distributed as they were prepared by the staff of the International Bank for Reconstruction and Development/The World Bank. The findings, interpretations, and conclusions expressed in this event do not necessarily reflect the views of the World Bank, the Executive Directors of the World Bank, or the governments they represent. The World Bank does not guarantee the accuracy of the materials included in the reproducibility package.
Name | URI |
---|---|
Modified BSD3 | https://opensource.org/license/bsd-3-clause/ |
Name | Affiliation | |
---|---|---|
Robert Marty | World Bank | rmarty@worldbank.org |
Reproducibility WBG | World Bank | reproducibility@worldbank.org |
Name | Abbreviation | Affiliation | Role |
---|---|---|---|
Reproducibility WBG | DIME | World Bank - Development Impact Department | Verification and preparation of metadata |
2024-02-14
1