{"type":"script","doc_desc":{"producers":[{"name":"Reproducibility WBG","abbr":"DECDI","affiliation":"World Bank - Development Impact Department","role":"Verification and preparation of metadata"}],"prod_date":"2026-01-28","version":"1"},"project_desc":{"authoring_entity":[{"name":"Amer Hasan","affiliation":"World Bank","email":"ahasan1@worldbank.org"},{"name":"Umar Saif","affiliation":"Aisightai","email":"saif.umar@gmail.com"},{"name":"Ayesha Tahir","affiliation":"World Bank","email":"atahir@worldbank.org"},{"name":"Koen Geven","affiliation":"World Bank","email":"kgeven@worldbank.org"},{"name":"Mohsen Ali","affiliation":"Information Technology University of the Punjab","email":"mohsen.ali@itu.edu.pk"},{"name":"Aliza Masood","affiliation":"Aisightai","email":"aliza.masood@surveyauto.com"},{"name":"Muhammad Abdul Rahman","affiliation":"Information Technology University of the Punjab","email":"abdulrehmanbhalli@live.com"},{"name":"Sheena Fazili","affiliation":"World Bank","email":"fazilisheena@gmail.com"}],"title_statement":{"title":"Reproducibility package for Where Are Pakistan\u2019s Out-Of-School Children? Mapping Accessibility Of Schooling Using Satellite Data And Machine Learning Methods With An Application To The 2022 Floods","idno":"RR_PAK_2025_493"},"data_statement":"Some data is restricted and has not been included in the reproducibility package. For more details, please refer to the README file.","software":[{"name":"Python","version":"3.9.11"},{"name":"Stata","version":"19 MP"},{"name":"Tableau"},{"name":"QGIS","version":"3.38.1"}],"scripts":[{"title":"Reproducibility package for Where Are Pakistan\u2019s Out-Of-School Children? Mapping Accessibility Of Schooling Using Satellite Data And Machine Learning Methods With An Application To The 2022 Floods","date":"2026-01","notes":"Computational reproducibility verified by Development Impact (DECDI) Analytics team, World Bank.","instructions":"See README in reproducibility package.","file_name":"RR_PAK_2025_493","zip_package":"RR_PAK_2025_493.zip"}],"repository_uri":[{"name":"Reproducible Research Repository (World Bank)","uri":"https:\/\/reproducibility.worldbank.org"}],"production_date":"2026-01-28","abstract":"According to administrative data, as many as 25 million children between the ages of 5 and 16 are out-of-school in Pakistan. However, little is known about where these children are in relation to available public and private schooling options. This paper brings together novel data from satellite imagery, the population census, government school censuses of public schools and publicly available location data on private schools to understand the out-of-school phenomenon at the community level. The paper uses satellite imagery analysis and machine learning methods to present estimates of the accessibility of schools and to describe the white space between schools. The paper relies on administrative data from over 137,000 public schools and documents the existence of over 72,700 private schools using publicly available location data. Our approach suggests that 21.7 million children (49% of the total) are enrolled in private schools, higher than previously reported official estimates. We also present detailed data on how the out-of-school population, accessibility and whitespace are distributed geo-spatially across Pakistan. As an application of the strength of using satellite imagery, we also analyze the effect of a major flood event in 2022 on accessibility, white space and catchment areas. Doing so, we estimate that approximately 3.5 million children in Pakistan had their schooling affected by the 2022 floods. Lastly, we corroborate our findings from an in-person ground-truthing survey conducted about 18 months after the floods. Together, the datasets and methods presented in this paper have applications for both day-to-day monitoring of school infrastructure as well as pre- and post-disaster planning efforts in Pakistan and elsewhere.","geographic_units":[{"name":"Pakistan","code":"PAK"}],"keywords":[{"name":"Public Schooling"},{"name":"Private Schooling"},{"name":"Out-Of-School Children"},{"name":"Machine Learning"},{"name":"Satellite Imagery"},{"name":"Floods"},{"name":"Resilient Infrastructure"}],"topics":[{"id":"I21","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","vocabulary":"Journal of Economic Literature (JEL)","name":"Analysis of Education","parent_id":"I2"},{"id":" I30","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","vocabulary":"Journal of Economic Literature (JEL)","name":"General","parent_id":"I3"}],"output":[{"type":"Working Paper","description":"Policy Research Working Papers (PRWP) 11341","title":"Where Are Pakistan\u2019s Out-Of-School Children? Mapping Accessibility Of Schooling Using Satellite Data And Machine Learning Methods With An Application To The 2022 Floods","uri":"http:\/\/documents.worldbank.org\/curated\/en\/099846503202639394"}],"language":[{"name":"English","code":"EN"}],"disclaimer":"The materials in the reproducibility packages are distributed as they were prepared by the staff of the International Bank for Reconstruction and Development\/The World Bank. The findings, interpretations, and conclusions expressed in this event do not necessarily reflect the views of the World Bank, the Executive Directors of the World Bank, or the governments they represent. The World Bank does not guarantee the accuracy of the materials included in the reproducibility package.","license":[{"name":"Modified BSD3","uri":"https:\/\/opensource.org\/license\/bsd-3-clause\/"}],"contacts":[{"name":"Amer Hasan","affiliation":"World Bank","email":"ahasan1@worldbank.org"},{"name":"Reproducibility WBG","affiliation":"World Bank","email":"reproducibility@worldbank.org"}],"datasets":[{"name":"Administrative Data on Public Schools in Pakistan","note":"Files location:\ndata\/government_publicSchools.csv; data\/inundation_publicSchools.csv; Deatails of Private Schools District Wise.xlsx.\n\nThe data were directly obtained from the following government agencies in Pakistan: the School Education Department, Government of Punjab; the Elementary and Secondary Education Department, Government of Khyber Pakhtunkhwa; the School Education and Literacy Department, Government of Sindh; and the Government of Balochistan.\n\nThe datasets contain information on the location of public schools, as well as school type and enrollment. A complete list and description of variables is provided in the project README file.\n\nThese data are restricted-access and cannot be shared publicly. For further information about the data and access conditions, researchers should contact the corresponding author at ahasan1@worldbank.org.","access_type":"Data is restricted and not included in the reproducibility package","citation":"School Education Department, Government of Punjab; Elementary and Secondary Education Department, Government of Khyber Pakhtunkhwa; School Education and Literacy Department, Government of Sindh; Government of Balochistan (n.d). Administrative Data on Public Schools (location, type, and enrollment) [Data set]. ","license":"Custom License"},{"name":"Public Schools Location Satellite Data ","uri":"https:\/\/www.airbus.com\/en\/products-services\/space\/earth-observation\/satellite-imagery; https:\/\/www.earthdata.nasa.gov\/dashboard\/data-catalog\/la-fires-maxar","citation":"Maxar & Airbus. (n.d). Public Schools Location Satellite Data [Data set]. ","access_type":"Data is limited-access and not included in the reproducibility package","license":"Custom License","note":"Files location: all_province_satellite_vs_gov_vs_flood_vs_rain.csv. Datasets were purchased from Maxar and Airbus. This shows images of public school locations before and after the 2022 floods. World Bank Staff can consult the data at: https:\/\/geo.worldbank.org\/home\/imagery\/WBG%20Purchased%20Imagery\/country\/PAK\/date\/2026-01-28\/range\/ALL\n"},{"name":"CHIRPS: Rainfall Estimates from Rain Gauge and Satellite Observations","citation":"CHIRPS: Rainfall Estimates from Rain Gauge and Satellite Observations, CHIRPS v2.0: Climate Hazards Group InfraRed Precipitation with Stations [Data set]. https:\/\/www.chc.ucsb.edu\/data\/chirps (accessed 22 Aug 2025). ","uri":"https:\/\/www.chc.ucsb.edu\/data\/chirps","note":"Files location: rainfall_data\/balochistan\/*.tif;\nrainfall_data\/kpk\/*.tif;\nrainfall_data\/punjab\/*.tif;\nrainfall_data\/sindh\/*.tif.","license_uri":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/","access_type":"Data is publicly available and included in the reproducibility package"},{"name":"Household Ground-Truthing Survey ","citation":"World Bank. (2025). Household Ground-Truthing Survey [Data set]. Forthcoming at: https:\/\/microdata.worldbank.org\/home","uri":"Forthcoming at: https:\/\/microdata.worldbank.org\/home","access_type":"Data is forthcoming and will be made available through the World Bank Microdata Library","note":"Files location: Survey Data\/HH_survey_clean_desc.dta; Survey Data\/school_damage.dta; Survey Data\/school_survey_clean_desc.dta."},{"name":"Pakistan Administrative Boundary Polygons ","note":"Files location: balochistan_boundaries\/*.kml, kpk_boundaries\/*.kml, punjab_boundaries\/*.kml, and sindh_boundaries\/*.kml. The boundary files were obtained from aisight.ai and were generated and exported using Google Earth Pro. The data are restricted and are not included in the reproducibility package. For further information about the data and access conditions, researchers should contact the corresponding author at ahasan1@worldbank.org.","citation":"AiSight.ai. (2025). Pakistan Administrative Boundary Polygons [dataset]. Obtained via aisight.ai (https:\/\/aisight.ai\/).","uri":"https:\/\/aisight.ai\/","license":"Custom License","access_type":"Data is restricted and not included in the reproducibility package"},{"name":"National and District Level Population Raster","note":"Files location: population\/pk_population.tif;  tifs\/districts\/balochistan\/*.tif; tifs\/districts\/kpk\/*.tif; tifs\/districts\/punjab\/*.tif; and tifs\/districts\/sindh\/*.tif. The raster was obtained from aisight.ai. The processing and preparation steps used to generate this raster for the analysis are described in Appendix 1 of the paper. The data are restricted and are not included in the reproducibility package. For further information about the data and access conditions, researchers should contact the corresponding author at ahasan1@worldbank.org.","access_type":"Data is restricted and not included in the reproducibility package","citation":"AiSight.ai. (2025). National and District Level Population Raster [dataset]. Obtained via aisight.ai (https:\/\/aisight.ai\/)."},{"name":"Pakistan Enrollment Data","note":"This dataset was generated by the authors using restricted data provided by local government authorities. A detailed description of the data construction process is provided in Appendix 3 of the paper. The list of variables is included in the README file.\nFor further information about the data and access conditions, researchers should contact the corresponding author at ahasan1@worldbank.org.","access_type":"Data is restricted and not included in the reproducibility package","citation":"World Bank. (2025). Pakistan Enrollment Data [dataset]. "}],"reproduction_instructions":"- To reproduce the findings in this paper, a new user must first obtain access to the restricted datasets and place them in the appropriate folders. Most of the data used in this project is restricted.\n- The full list of required datasets is provided in the Data section of this package.\n- The file `data_hash_report.csv`, included in the package, specifies the exact file paths where each dataset must be placed once access is granted.\n- After all datasets are available and correctly located, the code must be executed in the following order:\n  - `\/2025\/directory_creator.py`\n  - `\/2025\/whitespace\/code\/`\n  - `\/2025\/spatial_accessibility\/code\/`\n  - `\/2025\/catchment\/`\n  - `\/2025\/school_analysis\/code\/`\n- Each of these folders contains multiple scripts, and the precise execution order within each folder is documented in the README file.\n- The final results reported in the paper are produced by running `main.ipynb` and `plot.ipynb` in `\/2025\/school_analysis\/code\/`.\n- Because most of the data in this package is restricted and access may be difficult for external users, the package includes the outputs from the last successful run in `\/2025\/school_analysis\/results\/`, which can be used to inspect and compare the results against those reported in the paper.","technology_requirements":"Runtime: ~19 hours","technology_environment":"Paper exhibits were reproduced in a computer with the following specifications:\n\u2013 OS: Windows 11 Enterprise\n\u2013 Processor INTEL(R) XEON(R) PLATINUM 8562Y+, 2800 Mhz, 8 Core(s), 8 Logical Processor(s)\n\u2013 Memory available: 106 GB"},"tags":[{"tag":"DOI"},{"tag":"Open Code"},{"tag":"Restricted Data"}],"schematype":"script"}