Reproducible Research Repository
Reproducible Research Repository
  • Home
  • Repository
  • Collections
  • About
    Home / Repository / PRWP / RR_WLD_2023_39-V01
PRWP

Reproducibility Package for Missing Evidence: Tracking Academic Data Use around the World

2023
Get Reproducibility Package
Reference ID
RR_WLD_2023_39-v01
DOI
https://doi.org/10.1596/1813-9450-10673
Author(s)
Brian Stacy, Xiaoyu Wang , Daniel Gerszon Mahler, Umar Serajuddin, Lucas Kitzmüller
Collections
World Bank Policy Research Working Papers
Metadata
JSON
Created on
Dec 13, 2023
Last modified
Apr 24, 2024
  • Project Description
  • Downloads
  • Reproducibility Package
  • Description
  • Scope and coverage
  • Disclaimer
  • Access and rights
  • Contacts
  • Information on metadata
  • Citation
  • Reproducibility Package

    Scripts
    Readme Get Reproducibility Package
    Link: https://reproducibility.worldbank.org/index.php/catalog/88/download/224/README.pdf
    Reproducibility package for Missing Evidence: Tracking Academic Data Use around the World
    Title
    Reproducibility package for Missing Evidence: Tracking Academic Data Use around the World
    Date
    2023-12
    Software
    R
    Dependencies
    Run renv::restore() to install necessary dependencies.
    Instructions
    See README in the reproducibility package
    Repository
    https://github.com/stacybri/academic_data_use
    Notes
    Computational reproducibility verified by the Development Impact (DIME) Analytics team, World Bank.
    Source code repository
    Repository name Type URI
    academic_data_use GitHub https://github.com/stacybri/academic_data_use
    Software
    R
    Name
    R
    Version
    4.3.1

    Reproducibility

    Technology environment

    Paper exhibits were attempted to be reproduced in a computer with the following specifications:
    • OS: Windows 10 Enterprise, version 21H2
    • Processor: Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz 2.80 GHz
    • Memory available: 5.9 GB
    • Software version: R 4.3.1

    Technology requirements

    20~30 min runtime

    Reproduction instructions

    README and 02_programs/Data_Use_Academia_tables_figures.Rmd is provided.

    Data

    Datasets
    World Bank World Development Indicators (WDI)
    Name
    World Bank World Development Indicators (WDI)
    Note
    File name: correlates_df.csv The file contains indicator data from the World Bank World Development Indicators (WDI). A set of indicators were pulled from the World Bank API, which was current as of November 2023. However, because the WDI is updated periodically, the data may have changed since the data were downloaded. Data was pulled using the web stats package in R in wdi_pull.R. Please refer to the README for the collections of indicators pulled from the WDI.
    Access policy
    Published with the package.
    Data URL
    https://datatopics.worldbank.org/world-development-indicators/
    The Semantic Scholar Open Research Corpus
    Name
    The Semantic Scholar Open Research Corpus
    Note
    Filename: results_completed_updated_20231003.fst Lo, Kyle, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld. 2020. “S2ORC: The Semantic Scholar Open Research Corpus.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4969–83. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.447. This file contains article metadata for the 1 million articles classified by the machine learning algorithm. The file is too large to host on github, so it is hosted on S3 from AWS. The file is in fst format, which is a compressed format that is faster to read than csv.
    Access policy
    Publicly hosted on S3 from AWS at "s3://data-use-academia/03_output/results_completed_updated_20231003.fst". It can also be downloaded from https://datacatalog.worldbank.org/int/search/dataset/0065200/data-use-in-academia-dataset by clicking on "data_use_dataset_fst"
    Data URL
    https://datacatalog.worldbank.org/int/search/dataset/0065200/data-use-in-academia-dataset
    Statistical Performance Indicators
    Name
    Statistical Performance Indicators
    Note
    Filename: SPI_index.csv Statistical Performance Indicators (Dang et al. 2023, Cameron et al. 2021)
    Access policy
    Published with the package.
    Data URL
    https://github.com/worldbank/SPI
    Amazon Mturk
    Name
    Amazon Mturk
    Note
    Filename: test_doc_wcounts.csv test_doc_wcounts.csv was produced by randomly selecting from the set of articles that were classified by human raters using Amazon Mturk. This subset of the articles that were classified by Mturk workers were used for testing the predictions from our machine learning model to classify data use of articles.
    Access policy
    Published with the package.
    Amazon Mturk
    Name
    Amazon Mturk
    Note
    Filename: train_doc_wcounts.csv train_doc_wcounts.csv was a randomly selected set of articles that were classified by Amazon Mturk workers used for training our machine learning model. These articles were used to build our machine-learning model. The test and training set do not overlap.
    Access policy
    Published with the package.
    Das, Do, Shaines, and Srikant (2013) and Porteous (2022)
    Name
    Das, Do, Shaines, and Srikant (2013) and Porteous (2022)
    Note
    Filename: das_porteous_compare.csv das_porteous_compare.csv was a dataset our team prepared by reading in the articles produced per country from two papers: Das, Do, Shaines, and Srikant (2013) and Porteous (2022). Columns B-D of the spreadsheet were populated from Table A.3 of Das et al. (2013). Columns E-H were populated from Table 1 from Porteous (2022). Das et al. (2013) can be accessed at https://www.sciencedirect.com/science/article/pii/S0304387813001090#s0035 Porteous (2022) can be accessed at https://onlinelibrary.wiley.com/doi/epdf/10.1111/obes.12510
    Access policy
    Published with the paper
    Data statement

    All data are publicly available.

    Description

    Output
    Missing Evidence: Tracking Academic Data Use around the Word
    Type
    WP
    Title
    Missing Evidence: Tracking Academic Data Use around the Word
    Authors
    Brian Stacy, Lucas Kitzmüller, Xiaoyu Wang, Daniel Gerszon Mahler, and Umar Serajuddin
    Description
    PRWP 10673
    URL
    https://documents.worldbank.org/en/publication/documents-reports/documentdetail/099503001172493286/idu1ad0286b61aec014d8c19d3e1e651d3fe6286
    DOI
    https://doi.org/10.1596/1813-9450-10673
    Authors
    Author Role Affiliation Email
    Brian Stacy Data Scientist World Bank bstacy@worldbank.org
    Xiaoyu Wang Consultant World Bank xiaoyuplus@gmail.com
    Daniel Gerszon Mahler Senior Economist World Bank dmahler@worldbank.org
    Umar Serajuddin Manager World Bank userajuddin@worldbank.org
    Lucas Kitzmüller Economist European Bank for Reconstruction and Development (EBRD) mail@lucaskitzmueller.eu
    Date of production

    2023-12

    Scope and coverage

    Geographic locations
    Location Code
    World WLD
    Topics
    ID Topic Vocabulary Vocabulary URI
    C45 Neural Networks and Related Topics JEL Classifications
    C52 Model Evaluation, Validation, and Selection JEL Classifications
    O30 Innovation; Research and Development; Technological Change; Intellectual Property Rights: General JEL Classifications

    Disclaimer

    Disclaimer

    The materials in the reproducibility packages are distributed as they were prepared by the staff of the International Bank for Reconstruction and Development/the World Bank. The findings, interpretations, and conclusions expressed in this event do not necessarily reflect the views of the World Bank, the Executive Directors of the World Bank, or the governments they represent. The World Bank does not guarantee the accuracy of the materials included in the reproducibility package.

    Access and rights

    License
    Name URI
    Modified BSD3 https://opensource.org/license/bsd-3-clause/

    Contacts

    Contacts
    Name Role Affiliation Email
    Brian Stacy Data Scientist World Bank bstacy@worldbank.org
    Reproducibility WBG World Bank reproducibility@worldbank.org

    Information on metadata

    Producers
    Name Abbreviation Affiliation Role
    Yukiko Suzuki YS World Bank Junior Professional Officer
    Date of Production

    2023-12

    Document version

    1

    Citation

    Citation
    loading, please wait...
    Citation format
    Export citation: RIS | BibTeX | Plain text
    Back to Catalog
    The World Bank Working for a World Free of Poverty
    • IBRD IDA IFC MIGA ICSID

    © The World Bank Group, All Rights Reserved.