{"type":"script","doc_desc":{"producers":[{"name":"Reproducibility WBG","abbr":"DIME","affiliation":"World Bank - Development Impact Department ","role":"Verification and preparation of metadata"}],"prod_date":"2024-02-14","version":"1"},"project_desc":{"authoring_entity":[{"name":"Robert Marty","email":"rmarty@worldbank.org","affiliation":"World Bank"},{"name":"Alice Duhaut","email":"aduhaut@worldbank.org","affiliation":"World Bank"}],"output":[{"title":"Global Poverty Estimation Using Private and Public Sector Big Data Sources","type":"Published Paper","authors":"Robert Marty and Alice Duhaut","uri":"https:\/\/www.nature.com\/articles\/s41598-023-49564-6","doi":"https:\/\/doi.org\/10.1038\/s41598-023-49564-6"}],"software":[{"name":"R","version":"4.2.0"},{"name":"Python","version":"3.9"},{"name":"Stata","version":"17"}],"scripts":[{"title":"Reproducibility package (code and partial data) for Global Poverty Estimation Using Private and Public Sector Big Data Sources","file_name":"PP_WLD_2024_67-v01.zip","zip_package":"PP_WLD_2024_67-v01.zip","date":"2024-02","description":"The code in this folder generates the tables and figures in the paper \"Global Poverty Estimation Using Private and Public Sector Big Data Sources\" by Robert Marty and Alice Duhaut","dependencies":"All dependencies are in the renv of the package and explicitly mentioned in the scripts. ","instructions":"See README in the reproducibility package.","notes":"Computational reproducibility verified by Development Impact (DIME) Analytics team, World Bank."}],"title_statement":{"idno":"PP_WLD_2024_67-v01","title":"Reproducibility package for Global Poverty Estimation Using Private and Public Sector Big Data Sources"},"production_date":"2024-02","abstract":"Household surveys give a precise estimate of poverty; however, surveys are costly and are fielded infrequently. We demonstrate the importance of jointly using multiple public and private sector data sources to estimate levels and changes in wealth for a large set of countries. We train models using 63,854 survey cluster locations across 59 countries, relying on data from satellites, Facebook Marketing information, and OpenStreetMaps. The model generalizes previous approaches to a wide set of countries. On average, across countries, the model explains 55% (min = 14%; max = 85%) of the variation in levels of wealth at the survey cluster level and 59% (min = 0%; max = 93%) of the variation at the district level, and the model explains 4% (min = 0%; max = 17%) and 6% (min = 0%; max = 26%) of the variation of changes in wealth at the cluster and district levels. Models perform best in lower-income countries and in countries with higher variance in wealth. Features from nighttime lights, OpenStreetMaps, and land cover data are most important in explaining levels of wealth, and features from nighttime lights are most important in explaining changes in wealth.","language":[{"name":"English","code":"EN"}],"repository_uri":[{"name":"Reproducible Research Repository (World Bank)","uri":"https:\/\/reproducibility.worldbank.org"},{"uri":"https:\/\/github.com\/dime-worldbank\/big-data-poverty-estimation","name":"Github"}],"technology_environment":"The code was reproduced in a computer with the following specifications:\n\u2022 OS: Windows 11 Home\n\u2022 Processor: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz 2.3GHz\n\u2022 Memory available: 15.8 GB\n\u2022 Software version: R 4.2","technology_requirements":"~2 hours run-time starting from analysis-ready data","reproduction_instructions":"This replication package is divided into two parts:\n\n## Part 1: Creating Analysis-Ready Datasets from Raw Data\nThe first phase focuses on preparing the raw data for analysis. Due to data sharing restrictions, users must manually download specific datasets from their respective sources. The package's README file provides detailed instructions, including URLs, necessary access protocols, and guidelines for organizing the data folder, which is crucial for the package\u2019s proper operation.\n\nThe `_main.R` script is set up to skip the process of creating analysis-ready datasets from raw data and instead loads pre-prepared datasets directly. However, users can modify a parameter in `_main.R` to run scripts that transform the raw data into analysis-ready datasets. This process involves:\n- Manually downloading raw data from the specified sources.\n- Running a series of scripts written in Stata and Python. While `_main.R` points to these scripts, they must be opened and executed manually by the user, as documented in the README file.\n\n## Part 2: Replicating the Analysis Using Analysis-Ready Datasets\n\n- After the raw data has been processed into analysis-ready datasets, the `_main.R` script continues with the analysis. Users need to adjust the file paths, and the script will automatically run the necessary analyses, producing the figures and tables shown in the paper.\n- For this replication, the replicators ran the analysis using the analysis-ready datasets due to the significant time and computational resources required to process the raw data. The verification process began with an intermediate dataset provided by the authors. Although the reproducibility package includes the necessary code to construct this intermediate dataset, we did not verify this process due to the extensive time required\u2014approximately five months for Facebook data and two weeks for other sources.\n","disclaimer":"The materials in the reproducibility packages are distributed as they were prepared by the staff of the International Bank for Reconstruction and Development\/The World Bank. The findings, interpretations, and conclusions expressed in this event do not necessarily reflect the views of the World Bank, the Executive Directors of the World Bank, or the governments they represent. The World Bank does not guarantee the accuracy of the materials included in the reproducibility package.","license":[{"name":"Modified BSD3","uri":"https:\/\/opensource.org\/license\/bsd-3-clause\/"}],"contacts":[{"name":"Robert Marty","affiliation":"World Bank","email":"rmarty@worldbank.org"},{"name":"Reproducibility WBG","affiliation":"World Bank","email":"reproducibility@worldbank.org"}],"datasets":[{"name":"Demographic and Health Surveys (DHS)","access_type":"The dataset is public but cannot be republished, and thus is not included in the reproducibility package.","note":"The user should download the datasets from the link below and put the data in Data\/DHS\/RawData; this directory contains folders that indicate which datasets need to be downloaded. For example, 2020 data for Kenya for the \"HR\" (Household Recode) dataset should be placed here: \/KE\/KE_2020_MIS_03292022_2054_82518\/KEHR81DT\nAfter this, the user must run the scripts in 01_clean_dhs. \nThe analysis-ready, cleaned datasets within the \/Data folder will be located at DHS\/FinalData\/Merged Datasets\/survey_alldata_clean.[Rds\/csv] ","uri":"https:\/\/dhsprogram.com\/data\/"},{"name":"Demographic and Health Surveys (DHS) - Nigeria","note":"The paper includes a specific analysis of Nigeria. Following a similar process as above, data should be downloaded and placed in Data\/DHS_nga_policy_experiment\/RawData. The cleaning code will then produce the analysis-ready cleaned datasets. For this, the user must run the scripts in 01_clean_dhs 01_clean_dhs_nga_experiment. \nThe analysis-ready, cleaned datasets within the \/Data folder will be located at  DHS_nga_policy_experiment\/FinalData\/Merged Datasets\/survey_alldata_clean.[Rds\/csv].","access_type":"The dataset is public but cannot be republished, and thus is not included in the reproducibility package.","uri":"https:\/\/dhsprogram.com\/data\/"},{"note":"To start the analysis from the original data, navigate to Data\/LSMS\/RawData\/individual_files, where you'll find a dedicated folder for each country. Inside each country's folder, there's a README file that documents the specific datasets you will need to download into that folder. Go to the links below and download the indicated files. After downloading and placing the files in the folder, you must execute the scripts located in 01_clean_lsms to generate the analysis-ready datasets.\nThe cleaned datasets will be placed in the package at Data\/LSMS\/FinalData\/Merged Datasets\/survey_alldata_clean.[Rds\/csv].\n","name":"Living Standards Measurement Study (LSMS)","access_type":"The dataset is public, but not published with the package. Clear instructions to download the data and run the analysis are available in the README file. ","uri":"BEN: https:\/\/microdata.worldbank.org\/index.php\/catalog\/4291\/; BFA: https:\/\/microdata.worldbank.org\/index.php\/catalog\/4290; CIV: https:\/\/microdata.worldbank.org\/index.php\/catalog\/4292; ETH: https:\/\/microdata.worldbank.org\/index.php\/catalog\/3823; MWI: https:\/\/microdata.worldbank.org\/index.php\/catalog\/3818; TGO: https:\/\/microdata.worldbank.org\/index.php\/catalog\/4298"},{"name":"Harmonized Nighttime Lights","note":"To download the data from the source the user should go to the following link to download the files and place them in Data\/DMSPOLS_VIIRS_Harmonized\/RawData. To create the ready-for-analysis data bases the user must run the scripts in 02_get_process_ancillary_data\/DMSPOLS_VIIRS_Harmonized\nSource: Li, Xuecao; Zhou, Yuyu; zhao, Min; Zhao, Xia (2020). Harmonization of DMSP and VIIRS nighttime light data from 1992-2020 at the global scale. figshare. Dataset. https:\/\/doi.org\/10.6084\/m9.figshare.9828827.v5","uri":"https:\/\/figshare.com\/articles\/dataset\/Harmonization_of_DMSP_and_VIIRS_nighttime_light_data_from_1992-2018_at_the_global_scale\/9828827\/5","access_type":"Published with the reproducibility package."},{"uri":"https:\/\/cds.climate.copernicus.eu\/cdsapp#!\/dataset\/satellite-land-cover?tab=form","name":"ESA Land Cover Classification Gridded Maps","note":"To download the data you must follow the following link, create an account download the data, and place them in Data\/Globcover\/RawData.\nFor 1992 to 2015 data, put the ESACCI-LC-L4-LCCS-Map-300m-P1Y-1992_2015-v2.0.7.tif file in the \/1992_2015_data folder\nFor 2016 to 2018 data, (1) put the .nc files in the 2016_2018 folder, then (2) use the globcover_netcdf_to_geotiff script to convert .nc files to .tif files.\nThen they should run the scripts in 02_get_process_ancillary_data\/Globcover\nSource: Copernicus Climate Change Service, Climate Data Store, (2019): Land cover classification gridded maps from 1992 to present derived from satellite observation. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381\/cds.006f2c9a\n\nIn the analysis-ready data, GlobCover variables are distinguished by a number id (e.g., _2, _3, etc). The following dataset shows what each parameter ID corresponds to \/Data\/Globcover\/RawData\/gc_classes.csv","access_type":"The data is publicly available but has not been included in the reproducibility package due to the large size of the files. Detailed instructions for accessing the data and understanding the folder structure are provided."},{"name":"Open Street Maps","note":"Users must download data from Geofabrik at the link below. \nTo find data for a specific country, (1) click the continent the country is in, (2) click the name of the country, (3) click \"raw directory index\", (4) and find the relevant date to download; the file that ends in shp.zip should be downloaded. Download the file and unzip it. Place the file in the relevant folder within Data\/OSM\/RawData; this folder contains subfolders for each country and year where OpenStreetMap data needs to be downloaded and stored. For example, the data downloaded and unzipped from kenya-210101-free.shp.zip should be placed in Data\/OSM\/RawData\/kenya-210101-free.shp","uri":"https:\/\/download.geofabrik.de\/","access_type":"The data is publicly available but has not been included in the reproducibility package due to the large size of the shape files. Detailed instructions for accessing the data and understanding the folder structure are provided."},{"name":"GADM maps and data","access_type":"The code to download the information is directly provided with the package (script 00_download_gadm). ","note":"00_download_gadm downloads GADM data that is used in cleaning survey data."},{"name":"Sentinel 5P Pollution Data","note":"To obtain this data run the code 01_download_s5p.js in the Google Earth Engine code editor, and put the data in Data\/Sentinel 5P Pollution\/RawData","access_type":"Public but does not allow republication and therefore is not included in the package, but the code to download it is directly included in the package."},{"name":"Facebook Marketing","note":"This data is directly extracted in the code in 02_get_process_ancillary_data. This part of the code took five months to run, and therefore the ready-to-analyze datasets are directly included in the reproducibility package at Data\/Facebook Marketing\/.\nIn the analysis-ready data, Facebook variables will be distinguished by a number ID (e.g., _2, _3, etc). The following dataset shows what each parameter ID corresponds to \/Data\/Facebook Marketing\/FinalData\/facebook_marketing_parameters_clean.[Rds\/csv]","access_type":"Public. The data and the code to download it are directly included in the package.","uri":"https:\/\/developers.facebook.com\/docs\/marketing-api"},{"name":"NASA's Black Marble","access_type":"Public.The package includes code that enables direct download of the information. \n\n\n","note":"This data is directly extracted in the codes 01_download_black_marble_annual.R and 01_download_black_marble_monthly.R which downloads NASA's Black Marble data. After execution, the files will be saved at NTL Black Marble\/FinalData. Please see the links below for more information about NASA's Black Marble data. ","uri":"https:\/\/ladsweb.modaps.eosdis.nasa.gov\/archive\/allData\/5000\/VNP46A3\/, https:\/\/worldbank.github.io\/blackmarbler\/"},{"name":"World Development Indicators","access_type":"Published with the package and the code to retrieve this is published in the package.","note":"Located at Data\/WDI\/FinalData\/wdi, and the code to retrieve it is 02_get_process_ancillary_data\/WDI\n\/download_wdi.R\n","uri":"https:\/\/github.com\/vincentarelbundock\/WDI"}],"data_statement":" Some data is confidential and has not been included in the package. For more details, please refer to the README file","geographic_units":[{"name":"World","code":"WLD","type":"Region"}]},"tags":[{"tag":"DOI"},{"tag":"Open code"},{"tag":"Restricted data"}],"schematype":"script"}