{"type":"script","doc_desc":{"producers":[{"name":"Yukiko Suzuki","abbr":"YS","affiliation":"World Bank","role":"Junior Professional Officer"}],"prod_date":"2023-12","version":"1"},"project_desc":{"authoring_entity":[{"name":"Brian Stacy","role":"Data Scientist","affiliation":"World Bank","email":"bstacy@worldbank.org"},{"email":"xiaoyuplus@gmail.com","affiliation":"World Bank","role":"Consultant","name":"Xiaoyu Wang "},{"name":"Daniel Gerszon Mahler","role":"Senior Economist","affiliation":"World Bank","email":"dmahler@worldbank.org"},{"name":"Umar Serajuddin","role":"Manager","affiliation":"World Bank","email":"userajuddin@worldbank.org"},{"name":"Lucas Kitzm\u00fcller","role":"Economist","affiliation":"European Bank for Reconstruction and Development (EBRD)  ","email":"mail@lucaskitzmueller.eu"}],"output":[{"type":"WP","title":"Missing Evidence: Tracking Academic Data Use around the Word","authors":"Brian Stacy, Lucas Kitzm\u00fcller, Xiaoyu Wang, Daniel Gerszon Mahler, and Umar Serajuddin","description":"PRWP 10673","doi":"https:\/\/doi.org\/10.1596\/1813-9450-10673","uri":"https:\/\/documents.worldbank.org\/en\/publication\/documents-reports\/documentdetail\/099503001172493286\/idu1ad0286b61aec014d8c19d3e1e651d3fe6286"}],"datasets":[{"note":"File name: correlates_df.csv\n\nThe file contains indicator data from the World Bank World Development Indicators (WDI). A set of indicators were pulled from the World Bank API, which was current as of November 2023. However, because the WDI is updated periodically, the data may have changed since the data were downloaded. Data was pulled using the web stats package in R in wdi_pull.R.\nPlease refer to the README for the collections of indicators pulled from the WDI. ","name":"World Bank World Development Indicators (WDI)","access_type":"Published with the package.","uri":"https:\/\/datatopics.worldbank.org\/world-development-indicators\/"},{"name":"The Semantic Scholar Open Research Corpus","note":"Filename: results_completed_updated_20231003.fst\t\n\nLo, Kyle, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld. 2020. \u201cS2ORC: The Semantic Scholar Open Research Corpus.\u201d In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4969\u201383. Online: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.447.\n\nThis file contains article metadata for the 1 million articles classified by the machine learning algorithm. The file is too large to host on github, so it is hosted on S3 from AWS. The file is in fst format, which is a compressed format that is faster to read than csv.","access_type":"Publicly hosted on S3 from AWS at \"s3:\/\/data-use-academia\/03_output\/results_completed_updated_20231003.fst\".\nIt can also be downloaded from https:\/\/datacatalog.worldbank.org\/int\/search\/dataset\/0065200\/data-use-in-academia-dataset by clicking on \"data_use_dataset_fst\"","uri":"https:\/\/datacatalog.worldbank.org\/int\/search\/dataset\/0065200\/data-use-in-academia-dataset"},{"uri":"https:\/\/github.com\/worldbank\/SPI","name":"Statistical Performance Indicators","note":"Filename: SPI_index.csv\n\nStatistical Performance Indicators (Dang et al. 2023, Cameron et al. 2021)","access_type":"Published with the package."},{"name":"Amazon Mturk","note":"Filename: test_doc_wcounts.csv\n\ntest_doc_wcounts.csv was produced by randomly selecting from the set of articles that were classified by human raters using Amazon Mturk.  This subset of the articles that were classified by Mturk workers were used for testing the predictions from our machine learning model to classify data use of articles.\n","access_type":"Published with the package."},{"access_type":"Published with the package.","name":"Amazon Mturk ","note":"Filename: train_doc_wcounts.csv\n\ntrain_doc_wcounts.csv was a randomly selected set of articles that were classified by Amazon Mturk workers used for training our machine learning model.  These articles were used to build our machine-learning model.  The test and training set do not overlap."},{"name":"Das, Do, Shaines, and Srikant (2013) and Porteous (2022)","note":"Filename: das_porteous_compare.csv \n\ndas_porteous_compare.csv was a dataset our team prepared by reading in the articles produced per country from two papers: Das, Do, Shaines, and Srikant (2013) and Porteous (2022).  Columns B-D of the spreadsheet were populated from Table A.3 of Das et al. (2013).  Columns E-H were populated from Table 1 from Porteous (2022).\n\nDas et al. (2013) can be accessed at https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0304387813001090#s0035\nPorteous (2022) can be accessed at https:\/\/onlinelibrary.wiley.com\/doi\/epdf\/10.1111\/obes.12510\n","access_type":"Published with the paper"}],"software":[{"name":"R","version":"4.3.1"}],"scripts":[{"file_name":"RR_WLD_2023_39-v0_prg_v01.zip","zip_package":"RR_WLD_2023_39-v0_prg_v01.zip","title":"Reproducibility package for Missing Evidence: Tracking Academic Data Use around the World","date":"2023-12","software":"R","dependencies":"Run renv::restore()  to install necessary dependencies.","source_code_repo":"https:\/\/github.com\/stacybri\/academic_data_use","instructions":"See README in the reproducibility package","notes":"Computational reproducibility verified by the Development Impact (DIME) Analytics team, World Bank."}],"title_statement":{"idno":"RR_WLD_2023_39-v01","title":"Reproducibility Package for Missing Evidence: Tracking Academic Data Use around the World","identifiers":[{"type":"DOI","identifier":"https:\/\/doi.org\/10.1596\/1813-9450-10673"}]},"production_date":"2023-12","geographic_units":[{"code":"WLD","name":"World"}],"topics":[{"id":"C45","vocabulary":"JEL Classifications","name":" Neural Networks and Related Topics ","url":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=econlit"},{"id":"C52","vocabulary":"JEL Classifications","name":" Model Evaluation, Validation, and Selection ","url":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=econlit"},{"id":"O30","vocabulary":"JEL Classifications","name":" Innovation; Research and Development; Technological Change; Intellectual Property Rights: General ","url":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=jel","uri":"https:\/\/www.aeaweb.org\/econlit\/jelCodes.php?view=econlit"}],"acknowledgment_statement":"We acknowledge financial support from a World Bank Research Support Grant. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors.\nThey do not necessarily represent the views of the World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Views presented are those of the authors and not necessarily of the EB.","language":[{"name":"English","code":"EN"}],"data_statement":"All data are publicly available.","repository_uri":[{"uri":"https:\/\/github.com\/stacybri\/academic_data_use","type":"GitHub","name":"academic_data_use"}],"technology_environment":"Paper exhibits were attempted to be reproduced in a computer with the following specifications:\n\u2022 OS: Windows 10 Enterprise, version 21H2\n\u2022 Processor: Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz 2.80 GHz\n\u2022 Memory available: 5.9 GB\n\u2022 Software version: R 4.3.1","technology_requirements":"20~30 min runtime","reproduction_instructions":"README and 02_programs\/Data_Use_Academia_tables_figures.Rmd is provided.","disclaimer":"The materials in the reproducibility packages are distributed as they were prepared by the staff of the International Bank for Reconstruction and Development\/the World Bank. The findings, interpretations, and conclusions expressed in this event do not necessarily reflect the views of the World Bank, the Executive Directors of the World Bank, or the governments they represent. The World Bank does not guarantee the accuracy of the materials included in the reproducibility package.","license":[{"name":"Modified BSD3","uri":"https:\/\/opensource.org\/license\/bsd-3-clause\/"}],"contacts":[{"name":"Brian Stacy","role":"Data Scientist","affiliation":"World Bank","email":"bstacy@worldbank.org"},{"name":"Reproducibility WBG","affiliation":"World Bank","email":"reproducibility@worldbank.org"}]},"tags":[{"tag":"DOI"}],"schematype":"script"}