Scripts, data, and replication materials for "A study of gender and regional differences in scientific mobility and immobility among researchers identified as potentially talented"
This repository includes replication materials including data, Python and R scripts to replicate the analysis and figures of the article with following metadata. The repository also includes additional results that were not included in the manuscript and its Appendix due to space limitations but are provided here, e.g., using alternative gender assignments or more granular classification for geographical regions. Script authors/maintainers: Aliakbar Akbaritabar Contact: akbaritabar@demogr.mpg.de Article title: A study of gender and regional differences in scientific mobility and immobility among researchers identified as potentially talented Manuscript authors: Aliakbar Akbaritabar, Robin Haunschild and Lutz Bornmann Journal: Journal of Informetrics Article DOI: https://doi.org/10.1016/j.joi.2025.101744 Replication package on Zenodo: https://doi.org/10.5281/zenodo.17495895 Abstract: Identifying talented academics worldwide using publication data has been proven to be successful with other performance measures based on citations and funding data in previous studies. In this study, we investigate the scientific mobility and immobility among academics as an additional performance measure. We reconstruct the mobility trajectory of potentially talented researchers throughout their scientific careers to study whether they have a different propensity to be mobile or non-mobile than other researchers in the group for comparison. Since the researchers’ gender may play an important role in scientific careers, we delve into gender differences. Our results indicate that potentially talented researchers have a higher propensity to be mobile than other researchers in the group for comparison – more so among male than female talented researchers. Women are overrepresented among non-mobile researchers in the other researchers group. We conclude – based on our findings – that the proposed method for identifying potentially talented individuals seems to select researchers who are more successful in their academic careers than the researchers in the group for comparison. The results agree with the findings of the previous studies based on citation and funding data. In the interpretation of our study results, one should consider yet that higher mobility is a privilege (that may be independent of talent). Specific groups, such as those with fewer caring responsibilities and visa restrictions, could have better access to this privilege. Further research is necessary thus on the trade-off between higher mobility's potential advantages and disadvantages as a strategy to build a successful academic career and unequal access to mobility. Keywords: talented academics, bibliometric indicators, scientific mobility and immobility, gendered mobility How to replicate the analysis The scripts, data, and the reproducible pipeline that creates final data for our manuscript, and runs the analysis and visualizations using R and Python are provided. The pipeline includes the preparation of data, statistical analysis, visualizations (see figures under results folder). The pipeline uses scripts that are developed in Python. The reproducible workflow (described below) uses SnakeMake workflow management to ensure full replicability. There are also R scripts for results in the manuscript (see them in workflow\scripts folder) and figures in the manuscript and Appendix (see them under results folder). Below, we describe the requirements to install before analysis can be replicated. Python requirements For the reproducible pipeline to recreate the paper's replication data and analysis, SnakeMake version 8 or above should be installed. Creating a conda environment with Python 3 (3.11.9 was used here) and the following libraries (for instance by running conda env create -f requirements.yml and after copying the following yml code into a file: "requirements.yml") enables reproducing the pipeline by opening CLI, activating the conda environment i.e., conda activate talents, and running a dry-run with snakemake -np all or a full reproduction with snakemake --cores 4 all. An HTML report shows a directed acyclic graph (DAG) of the steps' dependency in the pipeline (rules in SnakeMake lingua) which is accessible in https://akbaritabar.github.io/Replication-package-for-gender-and-regional-differences-in-scientific-mobility-and-immobility . Some rules require Scopus data at the individual level. Since the data are licensed, they cannot be shared publicly. Hence, these output files are commented out from the main Snakefile and rule all. However, the aggregated data to prepare the analysis presented in the manuscript and statistical models to recreate all figures are included in this repository. The repository completely complies with the license terms of the data provider. name: talents channels: - conda-forge - bioconda # to prevent using default anaconda channels - nodefaults dependencies: - python 3.* - pandas - pyarrow - plotnine - argparse - numpy - mizani # (is installed with plotnine) - duckdb - tabulate R packages To replicate the statistical analysis, following packages should be installed. The scripts and the data used in them are in the folder workflow\scripts. install.packages(c("tidyverse", "RColorBrewer", "argparser", "nnet", "broom", "countrycode", "nanoparquet", "stargazer", "texreg", "GGally", "ggeffects", "lmtest", "DescTools")) Publication figures Figures 1-5 are in the main manuscript. All figures, including these figures and the Appendix figures, are available in the results folder.