Reproducible Research Repository
Reproducible Research Repository
  • Home
  • Repository
  • Collections
  • About
    Home / Repository / PRWP / RR_PAK_2025_493
PRWP

Reproducibility package for Where Are Pakistan’s Out-Of-School Children? Mapping Accessibility Of Schooling Using Satellite Data And Machine Learning Methods With An Application To The 2022 Floods

2026
Get Reproducibility Package
Reference ID
RR_PAK_2025_493
DOI
https://doi.org/10.60572/htha-ta93
Author(s)
Amer Hasan, Umar Saif, Ayesha Tahir, Koen Geven, Mohsen Ali, Aliza Masood, Muhammad Abdul Rahman, Sheena Fazili
Collections
World Bank Policy Research Working Papers
Metadata
JSON
Created on
Feb 03, 2026
Last modified
Mar 23, 2026
Page views
76
Downloads
22
  • Project Description
  • Downloads
  • Overview
  • Reproducibility Package
  • Description
  • Scope and coverage
  • Disclaimer
  • Access and rights
  • Contacts
  • Information on metadata
  • Citation
  • Overview

    Abstract

    According to administrative data, as many as 25 million children between the ages of 5 and 16 are out-of-school in Pakistan. However, little is known about where these children are in relation to available public and private schooling options. This paper brings together novel data from satellite imagery, the population census, government school censuses of public schools and publicly available location data on private schools to understand the out-of-school phenomenon at the community level. The paper uses satellite imagery analysis and machine learning methods to present estimates of the accessibility of schools and to describe the white space between schools. The paper relies on administrative data from over 137,000 public schools and documents the existence of over 72,700 private schools using publicly available location data. Our approach suggests that 21.7 million children (49% of the total) are enrolled in private schools, higher than previously reported official estimates. We also present detailed data on how the out-of-school population, accessibility and whitespace are distributed geo-spatially across Pakistan. As an application of the strength of using satellite imagery, we also analyze the effect of a major flood event in 2022 on accessibility, white space and catchment areas. Doing so, we estimate that approximately 3.5 million children in Pakistan had their schooling affected by the 2022 floods. Lastly, we corroborate our findings from an in-person ground-truthing survey conducted about 18 months after the floods. Together, the datasets and methods presented in this paper have applications for both day-to-day monitoring of school infrastructure as well as pre- and post-disaster planning efforts in Pakistan and elsewhere.

    Reproducibility Package

    Scripts
    Readme Get Reproducibility Package
    Link: https://reproducibility.worldbank.org/catalog/455/download/1454/README.pdf
    Reproducibility package for Where Are Pakistan’s Out-Of-School Children? Mapping Accessibility Of Schooling Using Satellite Data And Machine Learning Methods With An Application To The 2022 Floods
    File name
    RR_PAK_2025_493
    Zip package
    RR_PAK_2025_493.zip
    Title
    Reproducibility package for Where Are Pakistan’s Out-Of-School Children? Mapping Accessibility Of Schooling Using Satellite Data And Machine Learning Methods With An Application To The 2022 Floods
    Date
    2026-01
    Instructions
    See README in reproducibility package.
    Notes
    Computational reproducibility verified by Development Impact (DECDI) Analytics team, World Bank.
    Source code repository
    Repository name URI
    Reproducible Research Repository (World Bank) https://reproducibility.worldbank.org
    Software
    Python
    Name
    Python
    Version
    3.9.11
    Stata
    Name
    Stata
    Version
    19 MP
    Tableau
    Name
    Tableau
    QGIS
    Name
    QGIS
    Version
    3.38.1

    Reproducibility

    Technology environment

    Paper exhibits were reproduced in a computer with the following specifications:
    – OS: Windows 11 Enterprise
    – Processor INTEL(R) XEON(R) PLATINUM 8562Y+, 2800 Mhz, 8 Core(s), 8 Logical Processor(s)
    – Memory available: 106 GB

    Technology requirements

    Runtime: ~19 hours

    Reproduction instructions
    • To reproduce the findings in this paper, a new user must first obtain access to the restricted datasets and place them in the appropriate folders. Most of the data used in this project is restricted.
    • The full list of required datasets is provided in the Data section of this package.
    • The file data_hash_report.csv, included in the package, specifies the exact file paths where each dataset must be placed once access is granted.
    • After all datasets are available and correctly located, the code must be executed in the following order:
      • /2025/directory_creator.py
      • /2025/whitespace/code/
      • /2025/spatial_accessibility/code/
      • /2025/catchment/
      • /2025/school_analysis/code/
    • Each of these folders contains multiple scripts, and the precise execution order within each folder is documented in the README file.
    • The final results reported in the paper are produced by running main.ipynb and plot.ipynb in /2025/school_analysis/code/.
    • Because most of the data in this package is restricted and access may be difficult for external users, the package includes the outputs from the last successful run in /2025/school_analysis/results/, which can be used to inspect and compare the results against those reported in the paper.

    Data

    Datasets
    Administrative Data on Public Schools in Pakistan
    Name
    Administrative Data on Public Schools in Pakistan
    Note
    Files location: data/government_publicSchools.csv; data/inundation_publicSchools.csv; Deatails of Private Schools District Wise.xlsx. The data were directly obtained from the following government agencies in Pakistan: the School Education Department, Government of Punjab; the Elementary and Secondary Education Department, Government of Khyber Pakhtunkhwa; the School Education and Literacy Department, Government of Sindh; and the Government of Balochistan. The datasets contain information on the location of public schools, as well as school type and enrollment. A complete list and description of variables is provided in the project README file. These data are restricted-access and cannot be shared publicly. For further information about the data and access conditions, researchers should contact the corresponding author at ahasan1@worldbank.org.
    Access policy
    Data is restricted and not included in the reproducibility package
    License
    Custom License
    Citation
    School Education Department, Government of Punjab; Elementary and Secondary Education Department, Government of Khyber Pakhtunkhwa; School Education and Literacy Department, Government of Sindh; Government of Balochistan (n.d). Administrative Data on Public Schools (location, type, and enrollment) [Data set].
    Public Schools Location Satellite Data
    Name
    Public Schools Location Satellite Data
    Note
    Files location: all_province_satellite_vs_gov_vs_flood_vs_rain.csv. Datasets were purchased from Maxar and Airbus. This shows images of public school locations before and after the 2022 floods. World Bank Staff can consult the data at: https://geo.worldbank.org/home/imagery/WBG%20Purchased%20Imagery/country/PAK/date/2026-01-28/range/ALL
    Access policy
    Data is limited-access and not included in the reproducibility package
    License
    Custom License
    Data URL
    https://www.airbus.com/en/products-services/space/earth-observation/satellite-imagery; https://www.earthdata.nasa.gov/dashboard/data-catalog/la-fires-maxar
    Citation
    Maxar & Airbus. (n.d). Public Schools Location Satellite Data [Data set].
    CHIRPS: Rainfall Estimates from Rain Gauge and Satellite Observations
    Name
    CHIRPS: Rainfall Estimates from Rain Gauge and Satellite Observations
    Note
    Files location: rainfall_data/balochistan/*.tif; rainfall_data/kpk/*.tif; rainfall_data/punjab/*.tif; rainfall_data/sindh/*.tif.
    Access policy
    Data is publicly available and included in the reproducibility package
    License URL
    https://creativecommons.org/licenses/by/4.0/
    Data URL
    https://www.chc.ucsb.edu/data/chirps
    Citation
    CHIRPS: Rainfall Estimates from Rain Gauge and Satellite Observations, CHIRPS v2.0: Climate Hazards Group InfraRed Precipitation with Stations [Data set]. https://www.chc.ucsb.edu/data/chirps (accessed 22 Aug 2025).
    Household Ground-Truthing Survey
    Name
    Household Ground-Truthing Survey
    Note
    Files location: Survey Data/HH_survey_clean_desc.dta; Survey Data/school_damage.dta; Survey Data/school_survey_clean_desc.dta.
    Access policy
    Data is forthcoming and will be made available through the World Bank Microdata Library
    Data URL
    Forthcoming at: https://microdata.worldbank.org/home
    Citation
    World Bank. (2025). Household Ground-Truthing Survey [Data set]. Forthcoming at: https://microdata.worldbank.org/home
    Pakistan Administrative Boundary Polygons
    Name
    Pakistan Administrative Boundary Polygons
    Note
    Files location: balochistan_boundaries/*.kml, kpk_boundaries/*.kml, punjab_boundaries/*.kml, and sindh_boundaries/*.kml. The boundary files were obtained from aisight.ai and were generated and exported using Google Earth Pro. The data are restricted and are not included in the reproducibility package. For further information about the data and access conditions, researchers should contact the corresponding author at ahasan1@worldbank.org.
    Access policy
    Data is restricted and not included in the reproducibility package
    License
    Custom License
    Data URL
    https://aisight.ai/
    Citation
    AiSight.ai. (2025). Pakistan Administrative Boundary Polygons [dataset]. Obtained via aisight.ai (https://aisight.ai/).
    National and District Level Population Raster
    Name
    National and District Level Population Raster
    Note
    Files location: population/pk_population.tif; tifs/districts/balochistan/*.tif; tifs/districts/kpk/*.tif; tifs/districts/punjab/*.tif; and tifs/districts/sindh/*.tif. The raster was obtained from aisight.ai. The processing and preparation steps used to generate this raster for the analysis are described in Appendix 1 of the paper. The data are restricted and are not included in the reproducibility package. For further information about the data and access conditions, researchers should contact the corresponding author at ahasan1@worldbank.org.
    Access policy
    Data is restricted and not included in the reproducibility package
    Citation
    AiSight.ai. (2025). National and District Level Population Raster [dataset]. Obtained via aisight.ai (https://aisight.ai/).
    Pakistan Enrollment Data
    Name
    Pakistan Enrollment Data
    Note
    This dataset was generated by the authors using restricted data provided by local government authorities. A detailed description of the data construction process is provided in Appendix 3 of the paper. The list of variables is included in the README file. For further information about the data and access conditions, researchers should contact the corresponding author at ahasan1@worldbank.org.
    Access policy
    Data is restricted and not included in the reproducibility package
    Citation
    World Bank. (2025). Pakistan Enrollment Data [dataset].
    Data statement

    Some data is restricted and has not been included in the reproducibility package. For more details, please refer to the README file.

    Description

    Output
    Where Are Pakistan’s Out-Of-School Children? Mapping Accessibility Of Schooling Using Satellite Data And Machine Learning Methods With An Application To The 2022 Floods
    Type
    Working Paper
    Title
    Where Are Pakistan’s Out-Of-School Children? Mapping Accessibility Of Schooling Using Satellite Data And Machine Learning Methods With An Application To The 2022 Floods
    Description
    Policy Research Working Papers (PRWP) 11341
    URL
    http://documents.worldbank.org/curated/en/099846503202639394
    Authors
    Author Affiliation Email
    Amer Hasan World Bank ahasan1@worldbank.org
    Umar Saif Aisightai saif.umar@gmail.com
    Ayesha Tahir World Bank atahir@worldbank.org
    Koen Geven World Bank kgeven@worldbank.org
    Mohsen Ali Information Technology University of the Punjab mohsen.ali@itu.edu.pk
    Aliza Masood Aisightai aliza.masood@surveyauto.com
    Muhammad Abdul Rahman Information Technology University of the Punjab abdulrehmanbhalli@live.com
    Sheena Fazili World Bank fazilisheena@gmail.com
    Date of production

    2026-01-28

    Scope and coverage

    Geographic locations
    Location Code
    Pakistan PAK
    Keywords
    Public Schooling Private Schooling Out-Of-School Children Machine Learning Satellite Imagery Floods Resilient Infrastructure
    Topics
    ID Topic Parent topic ID Vocabulary Vocabulary URI
    I21 Analysis of Education I2 Journal of Economic Literature (JEL)
    I30 General I3 Journal of Economic Literature (JEL)

    Disclaimer

    Disclaimer

    The materials in the reproducibility packages are distributed as they were prepared by the staff of the International Bank for Reconstruction and Development/The World Bank. The findings, interpretations, and conclusions expressed in this event do not necessarily reflect the views of the World Bank, the Executive Directors of the World Bank, or the governments they represent. The World Bank does not guarantee the accuracy of the materials included in the reproducibility package.

    Access and rights

    License
    Name URI
    Modified BSD3 https://opensource.org/license/bsd-3-clause/

    Contacts

    Contacts
    Name Affiliation Email
    Amer Hasan World Bank ahasan1@worldbank.org
    Reproducibility WBG World Bank reproducibility@worldbank.org

    Information on metadata

    Producers
    Name Abbreviation Affiliation Role
    Reproducibility WBG DECDI World Bank - Development Impact Department Verification and preparation of metadata
    Date of Production

    2026-01-28

    Document version

    1

    Citation

    Citation
    loading, please wait...
    Citation format
    Export citation: RIS | BibTeX | Plain text
    Back to Catalog
    The World Bank Working for a World Free of Poverty
    • IBRD IDA IFC MIGA ICSID

    © The World Bank Group, All Rights Reserved.