{"type":"script","doc_desc":{"producers":[{"name":"Reproducibility WBG","abbr":"DECDI","affiliation":"World Bank - Development Impact Department","role":"Verification and preparation of metadata"}],"prod_date":"2026-04-08","version":"1"},"project_desc":{"authoring_entity":[{"name":"Kamwoo Lee","affiliation":"World Bank","email":"klee16@worldbank.org"},{"name":"Brian Blankespoor","affiliation":"World Bank","email":"bblankespoor@worldbank.org"},{"name":"David Newhouse","affiliation":"World Bank","email":"dnewhouse@worldbank.org"}],"title_statement":{"title":"Reproducibility package for Fine-Scale Spatial Disaggregation Of Statistical Data Via Graph Neural Networks","idno":"RR_WLD_2026_606"},"data_statement":"All data is publicly available, but not all is directly included in the reproducibility package. All intermediate data needed to run the package is included in the reproducibility package. ","software":[{"name":"Python","version":" 3.11.15"}],"scripts":[{"title":"Reproducibility package for Fine-Scale Spatial Disaggregation Of Statistical Data Via Graph Neural Networks","date":"2026-04","notes":"Computational reproducibility verified by Development Impact (DECDI) Analytics team, World Bank.","instructions":"See README in reproducibility package.","file_name":"RR_WLD_2026_606","zip_package":"RR_WLD_2026_606.zip","dependencies":"Dependencies are stored in the requirements.txt file. "}],"repository_uri":[{"name":"Reproducible Research Repository (World Bank)","uri":"https:\/\/reproducibility.worldbank.org"},{"name":"GitHub ","uri":"https:\/\/github.com\/worldbank\/gnn-gdp-disaggregation\/releases\/tag\/v1.01","type":"Available to World Bank Staff "}],"production_date":"2026-04-08","abstract":"Fine-grained spatial data are critical for informed decision-making in domains ranging from economic planning to environmental management. However, many statistics are only available for coarse administrative units, necessitating techniques for fine-scale spatial disaggregation. In this paper, we introduce a graph neural network (GNN) based framework for disaggregating aggregated indicators to a finer spatial resolution. The GNN approach leverages graph representations of spatial units to incorporate both feature information and spatial relationships, addressing challenges of heterogeneity and data sparsity. The approach also adopts the H3 hierarchical hexagonal indexing system to define fine-resolution cells, providing a globally consistent, multi-resolution spatial grid well suited to graph-based modeling. We demonstrate the framework using gross domestic product (GDP) as a representative example, disaggregating national or regional GDP to fine-resolution cells. While illustrated with GDP, the proposed methodology is applicable to a broad class of aggregate indicators, offering a flexible and scalable tool for spatial analysis of economic, social, and environmental statistics. Our results show that the framework produces high-resolution estimates that are consistent with known aggregates and aligned with ancillary covariate patterns. This general-purpose approach to spatial disaggregation enables more detailed mapping of indicators like GDP and beyond, unlocking finer insights from coarse data.","geographic_units":[{"name":"World","code":"WLD"}],"keywords":[{"name":"Spatial Disaggregation"},{"name":"Graph Neural Networks"},{"name":"Fine-Scale Statistical Mapping"},{"name":"Regional Gdp"},{"name":"H3 Spatial Indexing"}],"topics":[{"id":"C45","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","vocabulary":"Journal of Economic Literature (JEL)","name":"Neural Networks and Related Topics","parent_id":"C4"},{"id":" C55","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","vocabulary":"Journal of Economic Literature (JEL)","name":"Large Data Sets: Modeling and Analysis","parent_id":"C5"},{"id":" C81","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","vocabulary":"Journal of Economic Literature (JEL)","name":"Methodology for Collecting, Estimating, and Organizing Microeconomic Data \u2022 Data Access","parent_id":"C8"},{"id":" R12","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","vocabulary":"Journal of Economic Literature (JEL)","name":"Size and Spatial Distributions of Regional Economic Activity","parent_id":"R1"},{"id":" R15","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","vocabulary":"Journal of Economic Literature (JEL)","name":"Econometric and Input\u2013Output Models \u2022 Other Models","parent_id":"R1"}],"output":[{"type":"Working Paper","description":"Policy Research Working Papers (PRWP)","title":"Fine-Scale Spatial Disaggregation Of Statistical Data Via Graph Neural Networks"}],"language":[{"name":"English","code":"EN"}],"disclaimer":"The materials in the reproducibility packages are distributed as they were prepared by the staff of the International Bank for Reconstruction and Development\/The World Bank. The findings, interpretations, and conclusions expressed in this event do not necessarily reflect the views of the World Bank, the Executive Directors of the World Bank, or the governments they represent. The World Bank does not guarantee the accuracy of the materials included in the reproducibility package.","license":[{"name":"Modified BSD3","uri":"https:\/\/opensource.org\/license\/bsd-3-clause\/"}],"contacts":[{"name":"Kamwoo Lee","affiliation":"World Bank","email":"klee16@worldbank.org"},{"name":"Reproducibility WBG","affiliation":"World Bank","email":"reproducibility@worldbank.org"}],"datasets":[{"name":"Global Dataset of Reported Subnational Economic Output (DOSE) (V2.11)","note":"Harmonized subnational GDP data for 1,661 sub-national regions across 83 countries (1953\u20132020), with sectoral detail. Regional identifiers use GADM 3.6 GID codes. Downloaded files: DOSE_V2.11.csv; DoseV2p11_changes.pdf (01_raw_data\/admin_gdp\/DOSE\/). Corresponding administrative boundary shapefiles downloaded separately (DOSE_shapefiles.gpkg; 01_raw_data\/admin_boundaries\/DOSE\/). Used in the model training pipeline (steps 02\u201308). \nIntermediate files derived from this source and included in the reproducibility package: 04_admin_mapping\/output\/h3_res6_to_dose_adm1.csv (H3-to-DOSE admin-1 mapping, input to 11_visualization); 08_model_inference\/output\/ and 10_data_product\/output\/ (gdp_intensity_r6_estimates_{year}.csv, years 2015\u20132024, inputs to 09_model_validation and 11_visualization).","uri":"https:\/\/zenodo.org\/records\/16313760","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/","citation":"Druckenmiller, H., & Burke, M. (2025). Global dataset of reported subnational economic output (DOSE), V2.11 [dataset]. Zenodo. https:\/\/doi.org\/10.5281\/zenodo.16313760","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"World Development Indicators","note":"Country-level GDP data are used in two stages of the pipeline. (1) Raw GDP series from 01_raw_data\/admin_gdp\/WB_WDI\/: GDP (current LCU).csv, GDP (current USD).csv, GDP (constant 2015 USD).csv \u2014 used in the model training pipeline (steps 02\u201308). (2) PPP-adjusted GDP series from 10_data_product\/additional_input\/WB_WDI\/downloaded_data\/: GDP, PPP (constant 2021 international $).csv; GDP, PPP (current international $).csv \u2014 directly included in the reproducibility package (see data_hash_report.csv) and used in step 10_data_product to normalize cell-level estimates. Intermediate files using this source: 10_data_product\/output\/gdp_intensity_r6_estimates_{year}.csv (2015\u20132024), inputs to 11_visualization.","uri":"https:\/\/databank.worldbank.org\/source\/world-development-indicators","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/www.worldbank.org\/en\/about\/legal\/terms-of-use-for-datasets","citation":"World Bank. (2025). World Development Indicators [dataset]. Available at https:\/\/databank.worldbank.org\/source\/world-development-indicators","access_type":"Data are publicly available and included in the reproducibility package. "},{"name":"Regional Economic Database","note":"Sub-national (TL2 and TL3) GDP data in PPP USD for OECD member and partner countries. Downloaded files: OECD.CFE.EDS,DSD_REG_ECO@DF_GDP,2.0+all.csv; OECD Territorial correspondence - TL2024.xlsx (01_raw_data\/admin_gdp\/OECD\/). Used in the model training pipeline (steps 02\u201308). Non-NUTS OECD country mapping files included in package: 04_admin_mapping\/output\/h3_res6_to_oecd_non_nuts_{country}_adm1.csv for Australia, Canada, Chile, Colombia, Egypt, Indonesia, Japan, Korea, Mexico, New Zealand, and Peru (inputs to 11_visualization). Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/data-explorer.oecd.org\/vis?df[id]=DSD_REG_ECO%40DF_GDP&df[ag]=OECD.CFE.EDS","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/www.oecd.org\/en\/about\/terms-conditions.html","citation":"Organisation for Economic Co-operation and Development (OECD). (2025). Regional Economic Database [dataset]. Available at https:\/\/data-explorer.oecd.org","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"World Economic Outlook Database","note":"Country-level macroeconomic data including nominal and real GDP used to fill gaps in national accounts coverage. Downloaded file: weoapr2025all.xls (01_raw_data\/admin_gdp\/IMF_WEO\/). Used in the model training pipeline (steps 02\u201308). Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/www.imf.org\/en\/publications\/weo\/weo-database\/2025\/april","license":"IMF Data Terms of Use","license_uri":"https:\/\/www.imf.org\/external\/terms.htm","citation":"International Monetary Fund. (2025). World Economic Outlook Database [dataset]. Available at https:\/\/www.imf.org\/en\/publications\/weo\/weo-database\/2025\/april","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"National Accounts Estimates of Main Aggregates","note":"Country-level GDP in current and constant prices (LCU and USD) from the United Nations Statistics Division (UNSD). Downloaded files: UNdata_Export_GDP_current_lcu.csv; UNdata_Export_GDP_current_usd.csv; UNdata_Export_GDP_constant_2015_usd.csv (01_raw_data\/admin_gdp\/UNData\/). Used in the model training pipeline (steps 02\u201308). Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"http:\/\/data.un.org\/Explorer.aspx","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/data.un.org\/Host.aspx?Content=UNdataUse","citation":"United Nations Statistics Division. (2025). National Accounts Estimates of Main Aggregates [dataset]. Available at http:\/\/data.un.org\/Explorer.aspx","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Regional Accounts in Albania ","note":"Gross Domestic Product by statistical regions for Albania (2019\u20132023). Downloaded file: llogarit\u00eb_rajonale_n\u00eb_shqip\u00ebri_2019-2023-angl.xlsx (01_raw_data\/admin_gdp\/NSO_ALB\/). Used in the model training pipeline (steps 02\u201308). Boundary mapping file included in the package: 04_admin_mapping\/output\/h3_res6_to_nso_alb_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/www.instat.gov.al\/en\/themes\/economy-and-finance\/regional-accounts-in-albania\/","license":"Direct download, no licensing information found ","citation":"Institute of Statistics of Albania (INSTAT). (2025). Gross Domestic Product by Statistical Regions in Albania, 2019\u20132023 [dataset]. Available at https:\/\/www.instat.gov.al\/en\/themes\/economy-and-finance\/regional-accounts-in-albania\/","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Regional Accounts of Brazil","note":"State-level gross domestic product for Brazil. Downloaded file: Especiais_2010_2023_xls.zip (01_raw_data\/admin_gdp\/NSO_BRA\/). An administrative lookup file derived from this source is included in the package: 02_preprocessing\/admin_gdp\/06_NSO_BRA\/admin_lookup.xlsx. Boundary mapping file: 04_admin_mapping\/output\/h3_res6_to_nso_bra_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/www.ibge.gov.br\/en\/statistics\/economic\/national-accounts\/16855-regional-accounts-of-brazil.html","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/www.ibge.gov.br\/en\/institutional\/institutional.html","citation":"Instituto Brasileiro de Geografia e Estat\u00edstica (IBGE). (2025). Regional Accounts of Brazil [dataset]. Available at https:\/\/www.ibge.gov.br\/en\/statistics\/economic\/national-accounts\/16855-regional-accounts-of-brazil.html","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"National Accounts and Regional GDP","note":"National and provincial-level gross domestic product for China. Downloaded files: Annual.csv (national accounts); AnnualbyProvince.csv (provincial GDP) (01_raw_data\/admin_gdp\/NSO_CHN\/). An administrative lookup file derived from this source is included in the package: 02_preprocessing\/admin_gdp\/06_NSO_CHN\/admin_lookup.xlsx. Boundary mapping file: 04_admin_mapping\/output\/h3_res6_to_nso_chn_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/data.stats.gov.cn\/english\/index.htm","license":"National Bureau of Statistics of China Terms of Service","license_uri":"https:\/\/www.stats.gov.cn\/english\/nbs\/200701\/t20070104_59236.html","citation":"National Bureau of Statistics of China. (2025). National Accounts and GDP by Province [dataset]. Available at https:\/\/data.stats.gov.cn\/english\/index.htm","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Handbook of Statistics on Indian Economy and Indian States","note":"Macro-economic aggregates at current prices (national) and Gross State Domestic Product (GSDP) at current prices for Indian states. Downloaded files from Handbook of Statistics on Indian Economy (years 2022\u20132025) and Handbook of Statistics of Indian States (year 2025) (01_raw_data\/admin_gdp\/NSO_IND\/). An administrative lookup file derived from this source is included in the package: 02_preprocessing\/admin_gdp\/06_NSO_IND\/admin_lookup.xlsx. Boundary mapping file: 04_admin_mapping\/output\/h3_res6_to_nso_ind_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/www.rbi.org.in\/Scripts\/AnnualPublications.aspx?head=Handbook%20of%20Statistics%20on%20Indian%20Economy","license":"Reserve Bank of India Terms of Use","license_uri":"https:\/\/www.rbi.org.in\/scripts\/Disclaimer.aspx","citation":"Reserve Bank of India. (2025). Handbook of Statistics on Indian Economy and Handbook of Statistics of Indian States [dataset]. Available at https:\/\/www.rbi.org.in\/Scripts\/publications.aspx","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Bureau of National Statistics of Kazakhstan \u2014 Gross Regional Product","note":"Gross Regional Product (GRP) at current prices for Kazakhstan's regions. Downloaded file: 1. GRP.xlsx (01_raw_data\/admin_gdp\/NSO_KAZ\/). Boundary mapping file included in package: 04_admin_mapping\/output\/h3_res6_to_nso_kaz_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/stat.gov.kz\/en\/industries\/economy\/national-accounts\/dynamic-tables\/","license":"Open Data of the Republic of Kazakhstan","license_uri":"https:\/\/stat.gov.kz\/en\/description\/","citation":"Bureau of National Statistics, Agency for Strategic Planning and Reforms of the Republic of Kazakhstan. (2025). Gross Regional Product [dataset]. Available at https:\/\/stat.gov.kz\/en\/industries\/economy\/national-accounts\/dynamic-tables\/","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"National Statistical Committee of the Kyrgyz Republic \u2014 Gross Regional Product","note":"Gross Regional Product (GRP) at current prices for Kyrgyz regions. Downloaded file: 1.01.00.09 \u0412\u0430\u043b\u043e\u0432\u043e\u0439 \u0440\u0435\u0433\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u043f\u0440\u043e\u0434\u0443\u043a\u0442 (\u0412\u0420\u041f) \u0432 \u0442\u0435\u043a\u0443\u0449\u0438\u0445 \u0446\u0435\u043d\u0430\u0445..xlsx (01_raw_data\/admin_gdp\/NSO_KGZ\/). Boundary mapping file included in package: 04_admin_mapping\/output\/h3_res6_to_nso_kgz_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/www.stat.gov.kg\/en\/statistics\/nacionalnye-scheta\/","license":"Creative Commons Attribution-NonCommercial-ShareAlike license ","license_uri":"https:\/\/www.stat.gov.kg\/en\/","citation":"National Statistical Committee of the Kyrgyz Republic. (2025). Gross Regional Product (GRP) at current prices [dataset]. Available at https:\/\/www.stat.gov.kg\/en\/statistics\/download\/dynamic\/743\/","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Malta National Statistics Office \u2014 Regional Gross Domestic Product","note":"GDP at market prices by region (NUTS 3) for Malta. Downloaded file: NR-237-2025-Table-2-b053b1aa83d93e3b.xlsx (01_raw_data\/admin_gdp\/NSO_MLT\/). Boundary mapping file included in package: 04_admin_mapping\/output\/h3_res6_to_nso_mlt_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/nso.gov.mt\/regional_and_geospatial_statistics\/","license":"Direct download","citation":"National Statistics Office Malta. (2025). Regional Gross Domestic Product: 2024 [dataset]. Available at https:\/\/nso.gov.mt\/regional-and-geospatial\/regional-gross-domestic-product-2024\/","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Philippine Statistics Authority \u2014 Gross Domestic Regional Product ","note":"GRDP by region for the Philippines (2000\u20132024). Downloaded files: GRDP_Reg_2018PSNA_2022_to_2024_with_NIR.xlsx; GRDP_Reg_2018PSNA_2000_to_2023_without_NIR.xlsx (01_raw_data\/admin_gdp\/NSO_PHL\/). Boundary mapping file included in package: 04_admin_mapping\/output\/h3_res6_to_nso_phl_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/psa.gov.ph\/statistics\/grdp\/data-series","license":"Philippine Government Open Data License","license_uri":"https:\/\/psa.gov.ph\/","citation":"Philippine Statistics Authority (PSA). (2025). Gross Domestic Regional Product Data Series [dataset]. Available at https:\/\/psa.gov.ph\/statistics\/grdp\/data-series","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Rosstat Federal State Statistics Service \u2014 Russian Statistical Yearbook","note":"Regional gross domestic product data for Russia, extracted from Russian Statistical Yearbooks 2022, 2023, and 2024. Downloaded files: Russian Statistical Yearbook 2024.rar; Russian Statistical Yearbook 2023.rar; Russian Statistical Yearbook 2022.rar (01_raw_data\/admin_gdp\/NSO_RUS\/). An administrative lookup file derived from this source is included in the package: 02_preprocessing\/admin_gdp\/06_NSO_RUS\/admin_lookup.xlsx. Boundary mapping file: 04_admin_mapping\/output\/h3_res6_to_nso_rus_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/eng.rosstat.gov.ru\/Publications\/document\/74811","license":"Rosstat Open Data Terms","license_uri":"https:\/\/eng.rosstat.gov.ru\/","citation":"Federal State Statistics Service of Russia (Rosstat). (2024). Russian Statistical Yearbook 2024 [dataset]. Available at https:\/\/eng.rosstat.gov.ru\/Publications\/document\/74811","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Tanzania National Bureau of Statistics \u2014 National Accounts of Mainland Tanzania","note":"National accounts data for Mainland Tanzania including regional GDP estimates. Downloaded file: en-1737111899-National Accounts of Mainland Tanzania 2024.xlsx (01_raw_data\/admin_gdp\/NSO_TZA\/). Boundary mapping file included in package: 04_admin_mapping\/output\/h3_res6_to_nso_tza_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/www.nbs.go.tz\/index.php\/statistics\/topic\/annual-national-accounts-publications","license":"Direct download","citation":"National Bureau of Statistics, United Republic of Tanzania. (2025). National Accounts of Mainland Tanzania 2024 [dataset]. Available at https:\/\/www.nbs.go.tz\/index.php\/statistics\/topic\/annual-national-accounts-publications","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Gross Domestic Product by County and Metropolitan Statistical Area (CAGDP2)","note":"GDP in current dollars by county and MSA for the United States. Downloaded file: CAGDP2.zip (01_raw_data\/admin_gdp\/NSO_USA\/). Boundary mapping files included in package: 04_admin_mapping\/output\/h3_res6_to_nso_usa_adm1.csv; h3_res6_to_nso_usa_adm2.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024). US county and state shapefiles (downloaded separately from US Census Bureau TIGER\/Line; see separate entry) are used as boundary reference for this dataset.","uri":"https:\/\/www.bea.gov\/data\/gdp\/gdp-county-metro-and-other-areas","license":"Open Data","license_uri":"https:\/\/www.bea.gov\/open-data","citation":"Bureau of Economic Analysis, U.S. Department of Commerce. (2025). Gross Domestic Product by County and Metropolitan Statistical Area (CAGDP2) (CAGDP2) [dataset]. Available at https:\/\/www.bea.gov\/data\/gdp\/gdp-county-metro-and-other-areas","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Statistics South Africa \u2014 Provincial Gross Domestic Product (P0441.2)","note":"GDP by province at current prices for South Africa (2014\u20132024). Downloaded file: P044122024.pdf (01_raw_data\/admin_gdp\/NSO_ZAF\/). Boundary mapping file included in package: 04_admin_mapping\/output\/h3_res6_to_nso_zaf_adm1.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/www.statssa.gov.za\/?page_id=1854&PPN=P0441.2","license":"Direct download","citation":"Statistics South Africa. (2025). P0441.2 \u2014 Provincial Gross Domestic Product, 2024 [dataset]. Available at https:\/\/www.statssa.gov.za\/?page_id=1854&PPN=P0441.2","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Comprehensive Global Administrative Zones (CGAZ)","note":"Global composite administrative boundary files (ADM1 and ADM2 geoPackages) and individual country files (Morocco ADM1). Downloaded files: geoBoundariesCGAZ_ADM1.gpkg; geoBoundariesCGAZ_ADM2.gpkg; geoBoundaries-MAR-ADM1-all zip (01_raw_data\/admin_boundaries\/CGAZ\/). Used as administrative boundary shapefiles to generate H3-to-admin mapping files in the pipeline (steps 03\u201304). H3-to-admin mapping files derived from CGAZ boundaries and included in package: 04_admin_mapping\/output\/h3_res6_to_adm0.csv; and various h3_res6_to_nso_* and h3_res6_to_oecd_non_nuts_* files. These are intermediate inputs to 11_visualization.","uri":"https:\/\/www.geoboundaries.org\/globalDownloads.html","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/www.geoboundaries.org\/index.html#usage","citation":"Runfola, D., Anderson, A., Baier, H., Crittenden, M., Dowker, E., Fuhrig, S., ... & Hobbs, B. (2020). geoBoundaries: A global database of political administrative boundaries. PLOS ONE, 15(4), e0231866. https:\/\/doi.org\/10.1371\/journal.pone.0231866","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Database of Global Administrative Areas (GADM)","note":"Global database of administrative area boundaries at multiple levels. Versions 4.1 and 3.6 downloaded (gadm_410-levels.zip; gadm36_levels_gpkg.zip; 01_raw_data\/admin_boundaries\/GADM\/). Note: DOSE V2.11 subnational regions use GADM 3.6 GID codes as region identifiers. Used in steps 03\u201304 for spatial crosswalks and H3-to-admin boundary mapping. H3-to-admin mapping files derived (in part) using GADM boundaries are included in the package: 04_admin_mapping\/output\/ h3_res6_to_* CSV files. These are intermediate inputs to 11_visualization.","uri":"https:\/\/gadm.org\/index.html","license":"GADM License ","license_uri":"https:\/\/gadm.org\/license.html","citation":"University of California, Davis. (2022). GADM database of Global Administrative Areas, version 4.1 [dataset]. Available at https:\/\/gadm.org\/download_world.html","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"Territorial Units for Statistics (2021 and 2024)","note":"GeoPackage boundary files for NUTS 2021 and NUTS 2024 statistical regions of Europe at 1:1M scale (EPSG:4326). Downloaded files: NUTS_RG_01M_2021_4326.gpkg; NUTS_RG_01M_2024_4326.gpkg (01_raw_data\/admin_boundaries\/NUTS\/). H3-to-NUTS mapping files derived from these boundaries and included in the package: 04_admin_mapping\/output\/h3_res6_to_oecd_nuts_2021_adm1.csv; h3_res6_to_oecd_nuts_2021_adm2.csv; h3_res6_to_oecd_nuts_2024_adm1.csv; h3_res6_to_oecd_nuts_2024_adm2.csv. These are intermediate inputs to 11_visualization.","uri":"https:\/\/ec.europa.eu\/eurostat\/web\/gisco\/geodata\/statistical-units\/territorial-units-statistics","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/ec.europa.eu\/eurostat\/web\/gisco\/geodata","citation":"Eurostat GISCO. (2024). Territorial units for statistics (NUTS 2021 and NUTS 2024) [dataset]. Available at https:\/\/ec.europa.eu\/eurostat\/web\/gisco\/geodata\/statistical-units\/territorial-units-statistics","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"OpenStreetMap (OSM) Boundaries","note":"OpenStreetMap-derived administrative boundary polygons for Tanzania \u2013 Zanzibar and Taiwan. Downloaded file: OSMB-6b1f9a1a800f048091aade6e439c2a0594d9e726.geojson (01_raw_data\/admin_boundaries\/OSM\/). Used in steps 03\u201304 for H3-to-admin mapping for these jurisdictions. Downstream intermediate files: 04_admin_mapping\/output\/h3_res6_to_* CSV files and 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/osm-boundaries.com\/","license":" Creative Commons Attribution-ShareAlike 2.0 license (CC BY-SA 2.0)","license_uri":"https:\/\/www.openstreetmap.org\/copyright","citation":"OpenStreetMap Contributors. (2025). OpenStreetMap (OSM) Boundaries - administrative boundary polygons [dataset]. Available at https:\/\/osm-boundaries.com\/","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"OpenStreetMap \u2014 Planet OSM","note":"Global OpenStreetMap data (full history extract, November 2025 snapshot) used to derive road network and settlement features as geospatial covariates. Downloaded via AWS S3 (s3:\/\/osm-planet-eu-central-1; history-251117.osm.pbf; 01_raw_data\/geo_covariates\/OSM\/). Note: a separate OSM-Boundaries download was also used for administrative boundary polygons (see OSM-Boundaries entry). Used in the model training pipeline (steps 02\u201308) for geospatial feature extraction. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/wiki.openstreetmap.org\/wiki\/Planet.osm","license":" Creative Commons Attribution-ShareAlike 2.0 license (CC BY-SA 2.0)","license_uri":"https:\/\/www.openstreetmap.org\/copyright","citation":"OpenStreetMap Contributors. (2025). OpenStreetMap \u2014 Planet OSM [dataset]. Available at https:\/\/wiki.openstreetmap.org\/wiki\/Planet.osm","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"TIGER\/Line Shapefiles (Counties and States)","note":"County and state boundary shapefiles for the United States for years 2021\u20132024. Downloaded files: tl_{year}_us_county.zip and tl_{year}_us_state.zip for years 2021, 2022, 2023, 2024 (01_raw_data\/admin_boundaries\/TIGER\/). Used in steps 03\u201304 for H3-to-admin mapping for the United States. Boundary mapping files derived from TIGER and included in package: 04_admin_mapping\/output\/h3_res6_to_nso_usa_adm1.csv; h3_res6_to_nso_usa_adm2.csv. Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/www.census.gov\/cgi-bin\/geo\/shapefiles\/index.php","license":"Direct download","citation":"US Census Bureau. (2024). TIGER\/Line Shapefiles: Counties and States [dataset]. Available at https:\/\/www.census.gov\/cgi-bin\/geo\/shapefiles\/index.php","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"World Bank Official Boundaries","note":"World Bank Official Boundaries GeoPackage files (Admin 0, Admin 0 all layers, Admin 1, Admin 2, NDLSA, Ocean Mask) used as the canonical country boundary reference for the model pipeline. Downloaded from World Bank Data Catalog (01_raw_data\/admin_boundaries\/WBOB\/). (1) Used in steps 03\u201304 for spatial operations and country-level administrative mapping: intermediate outputs include 04_admin_mapping\/output\/h3_res6_to_adm0.csv and other h3_res6_to_* mapping files. (2) Admin 2 and NDLSA boundary layers are also read directly in step 10 (data product creation) by 10_data_product\/01_create_final_dataset.ipynb to assign WB administrative codes and territory status to each H3 cell: files 01_raw_data\/admin_boundaries\/WBOB\/downloaded_data\/World Bank Official Boundaries - Admin 2.gpkg; World Bank Official Boundaries - NDLSA.gpkg. Final data product files used as inputs to steps 09 (model validation) and 11 (visualization): 10_data_product\/output\/gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/datacatalog.worldbank.org\/search\/dataset\/0038272\/World-Bank-Official-Boundaries","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/www.worldbank.org\/en\/about\/legal\/terms-of-use-for-datasets","citation":"World Bank. (2025). World Bank Official Boundaries [dataset]. World Bank Data Catalog. Available at https:\/\/datacatalog.worldbank.org\/search\/dataset\/0038272\/World-Bank-Official-Boundaries","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"MODIS\/Terra+Aqua Land Cover Type Yearly L3 Global 500m (MCD12Q1 v061)","note":"Annual land cover classification rasters at 500 m resolution (EPSG:4326) for years 2015\u20132024. Granule IDs: MCD12Q1.A{year}001.* (01_raw_data\/geo_covariates\/MCD12\/). Used as geospatial covariate features in the model training pipeline (steps 03\u201308). Downstream intermediate files embodying this information: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/doi.org\/10.5067\/MODIS\/MCD12Q1.061","license":"Direct download","citation":"Friedl, M., & Sulla-Menashe, D. (2022). MODIS\/Terra+Aqua Land Cover Type Yearly L3 Global 500m SIN Grid V061 [dataset]. https:\/\/doi.org\/10.5067\/MODIS\/MCD12Q1.061","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"VIIRS Day\/Night Band Annual Composites (VNL v21 and v22)","note":"Annual nighttime light radiance composites at 15 arc-second (~500 m) resolution (EPSG:4326) for 2015\u20132024. VNL v21 covers 2015\u20132021; VNL v22 covers 2022\u20132024. Downloaded files: VNL_v21_npp_{year}_global_vcmslcfg_c202205302300.median_masked.dat.tif (2015\u20132021); VNL_v22_npp-j01_2022_global_vcmslcfg_c202303062300.median_masked.dat.tif (2022); VNL_npp_{year}_global_vcmslcfg_v2_*.median_masked.dat.tif (2023\u20132024) (01_raw_data\/geo_covariates\/VNL\/). Used as geospatial covariate features in the model training pipeline (steps 03\u201308). Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/developers.google.com\/earth-engine\/datasets\/catalog\/NOAA_VIIRS_DNB_ANNUAL_V22","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/eogdata.mines.edu\/products\/vnl\/","citation":"Elvidge, C. D., Zhizhin, M., Ghosh, T., Hsu, F.-C., & Taneja, J. (2021). Annual time series of global VIIRS nighttime lights derived from monthly averages: 2012 to 2019. Remote Sensing 13(5), 922. https:\/\/doi.org\/10.3390\/rs13050922","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"WorldPop Global Population Counts \u2014 Global Mosaics 2015\u20132030 (1 km, R2025A v1)","note":"Annual global population count rasters at 30 arc-second (~1 km) resolution (EPSG:4326) for 2015\u20132024 (R2025A v1). Downloaded files: global_pop_{year}_CN_1km_R2025A_UA_v1.tif for years 2015\u20132024 (01_raw_data\/geo_covariates\/WorldPop\/). Used as a geospatial population covariate in the model training pipeline (steps 03\u201308). Downstream intermediate files: 08_model_inference\/ and 10_data_product\/output\/ gdp_intensity_r6_estimates_{year}.csv (2015\u20132024).","uri":"https:\/\/hub.worldpop.org\/geodata\/listing?id=137","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/hub.worldpop.org\/","citation":"WorldPop (School of Geography and Environmental Science, University of Southampton). (2025). Global Population Counts \u2014 Global Mosaics 2015\u20132030, 1 km resolution, R2025A v1 [dataset]. https:\/\/doi.org\/10.5258\/SOTON\/WP00789","access_type":"Data are publicly available but not directly included in the reproducibility package. Intermediate files derived from this source are included."},{"name":"World Bank Country and Lending Groups","note":"Historical income group classifications (low, lower-middle, upper-middle, high income) for all World Bank member countries from 1987 to 2025. Downloaded file: OGHIST_2025_10_07.xlsx (11_visualization\/additional_input\/WB_country_classification\/downloaded_data\/). This file is directly included in the reproducibility package (see data_hash_report.csv) and is a direct input to 11_visualization\/02_check_temporal_dynamic.ipynb, where it is used to classify countries by income group for distributional analysis of cell-level GDP estimates.","uri":"https:\/\/datahelpdesk.worldbank.org\/knowledgebase\/articles\/906519-world-bank-country-and-lending-groups","license":"Creative Commons Attribution 4.0 International license (CC BY 4.0)","license_uri":"https:\/\/www.worldbank.org\/en\/about\/legal\/terms-of-use-for-datasets","citation":"World Bank. (2025). World Bank Country and Lending Groups [dataset]. Available at https:\/\/datahelpdesk.worldbank.org\/knowledgebase\/articles\/906519-world-bank-country-and-lending-groups","access_type":"Data are publicly available and included in the reproducibility package (11_visualization\/additional_input\/WB_country_classification\/downloaded_data\/OGHIST_2025_10_07.xlsx)."},{"name":"Demographic and Health Surveys \u2014 Household Recode Datasets","note":"Cluster-level wealth index scores derived from DHS Household Recode (HR) Stata files and geographic cluster locations from GE shapefiles, for 83 DHS surveys conducted in 2015 or later. Raw data downloaded from the DHS Program data portal (requires a registered account). Processing notebooks in 09_model_validation\/additional_input\/DHS\/ query the DHS API for the survey catalog, download files, and aggregate wealth indices to H3 resolution-6 cells, producing the processed intermediate file: 09_model_validation\/additional_input\/DHS\/output\/dhs_cluster_wealth_r6.csv. This processed file is a direct input to 09_model_validation\/04_validate_against_DHS_wealth_index.ipynb for external validation of the GNN model. Access date: 2025.","uri":"https:\/\/api.dhsprogram.com\/rest\/dhs\/datasets","license":"DHS Program Data Use Agreement","license_uri":"https:\/\/dhsprogram.com\/data\/terms-of-use.cfm","citation":"DHS Program. (2025). Demographic and Health Surveys \u2014 Household Recode Datasets [dataset]. Available at https:\/\/dhsprogram.com\/data\/","access_type":"Data access requires purchase or human approval and is not included in the reproducibility package. Intermediate data is included in the reproducibility package."}],"technology_environment":"Paper exhibits were reproduced on a computer with the following specifications:\n\u2022 OS: macOS\n\u2022 Processor: Apple M4 Pro\n\u2022 Memory available: 24 GB","technology_requirements":"Runtime: 10 minutes","reproduction_instructions":"To reproduce the findings of this paper, a new user should:\n\n1. Recover the environment by installing the required packages listed in `requirements.txt`.\n2. Run the Jupyter notebooks in the `09_model_validation` folder.\n3. Run the Jupyter notebooks in the `11_visualization` folder.\n\n> **Note:** All data needed to run the package and reproduce the findings in the manuscript are included in the reproducibility package. The starting point is **intermediate data**, which is already included. The associated repository (https:\/\/github.com\/worldbank\/gnn-gdp-disaggregation\/releases\/tag\/v1.01 only available to World Bank Staff) contains the code to go from raw data to the intermediate data used as the starting point. This portion of the workflow is currently accessible to World Bank staff only. All raw data sources are documented in the Data Section of this entry. "},"datacite":{"creators":[{"givenName":"Kamwoo","familyName":"Lee","nameType":"Personal","affiliation":[{"name":"World Bank"}]},{"givenName":"Brian","familyName":"Blankespoor","nameType":"Personal","affiliation":[{"name":"World Bank"}]},{"givenName":"David","familyName":"Newhouse","nameType":"Personal","affiliation":[{"name":"World Bank"}]}],"titles":[{"lang":"en","title":"Reproducibility package for Fine-Scale Spatial Disaggregation Of Statistical Data Via Graph Neural Networks"},{"title":"RR_WLD_2026_606","titleType":"Other"}],"publisher":"World Bank","publicationYear":"2026","types":{"resourceType":"Reproducibility package","resourceTypeGeneral":"Other"},"url":"https:\/\/reproducibility.worldbank.org\/index.php\/catalog\/study\/RR_WLD_2026_606","language":"en","doi":"10.60572\/sj2b-wp55","prefix":"10.60572","suffix":"sj2b-wp55"},"tags":[{"tag":"Accessible Data"},{"tag":"DOI"},{"tag":"Open Code"}],"schematype":"script"}