1
|
Gao J, Wang D. Quantifying the use and potential benefits of artificial intelligence in scientific research. Nat Hum Behav 2024; 8:2281-2292. [PMID: 39394445 DOI: 10.1038/s41562-024-02020-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 09/12/2024] [Indexed: 10/13/2024]
Abstract
The rapid advancement of artificial intelligence (AI) is poised to reshape almost every line of work. Despite enormous efforts devoted to understanding AI's economic impacts, we lack a systematic understanding of the benefits to scientific research associated with the use of AI. Here we develop a measurement framework to estimate the direct use of AI and associated benefits in science. We find that the use and benefits of AI appear widespread throughout the sciences, growing especially rapidly since 2015. However, there is a substantial gap between AI education and its application in research, highlighting a misalignment between AI expertise supply and demand. Our analysis also reveals demographic disparities, with disciplines with higher proportions of women or Black scientists reaping fewer benefits from AI, potentially exacerbating existing inequalities in science. These findings have implications for the equity and sustainability of the research enterprise, especially as the integration of AI with science continues to deepen.
Collapse
Affiliation(s)
- Jian Gao
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
- Ryan Institute on Complexity, Northwestern University, Evanston, IL, USA
- Faculty of Social Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Dashun Wang
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA.
- Kellogg School of Management, Northwestern University, Evanston, IL, USA.
- Ryan Institute on Complexity, Northwestern University, Evanston, IL, USA.
- McCormick School of Engineering, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
2
|
Kojaku S, Radicchi F, Ahn YY, Fortunato S. Network community detection via neural embeddings. Nat Commun 2024; 15:9446. [PMID: 39487114 PMCID: PMC11530665 DOI: 10.1038/s41467-024-52355-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 09/03/2024] [Indexed: 11/04/2024] Open
Abstract
Recent advances in machine learning research have produced powerful neural graph embedding methods, which learn useful, low-dimensional vector representations of network data. These neural methods for graph embedding excel in graph machine learning tasks and are now widely adopted. However, how and why these methods work-particularly how network structure gets encoded in the embedding-remain largely unexplained. Here, we show that node2vec-shallow, linear neural network-encodes communities into separable clusters better than random partitioning down to the information-theoretic detectability limit for the stochastic block models. We show that this is due to the equivalence between the embedding learned by node2vec and the spectral embedding via the eigenvectors of the symmetric normalized Laplacian matrix. Numerical simulations demonstrate that node2vec is capable of learning communities on sparse graphs generated by the stochastic blockmodel, as well as on sparse degree-heterogeneous networks. Our results highlight the features of graph neural networks that enable them to separate communities in the embedding space.
Collapse
Affiliation(s)
- Sadamori Kojaku
- School of Systems Science and Industrial Engineering, Binghamton University, Binghamton, NY, USA
- Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Filippo Radicchi
- Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Yong-Yeol Ahn
- Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Santo Fortunato
- Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
3
|
Lyutov A, Uygun Y, Hütt MT. Machine learning misclassification networks reveal a citation advantage of interdisciplinary publications only in high-impact journals. Sci Rep 2024; 14:21906. [PMID: 39300204 DOI: 10.1038/s41598-024-72364-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 09/06/2024] [Indexed: 09/22/2024] Open
Abstract
Given a large enough volume of data and precise, meaningful categories, training a statistical model to solve a classification problem is straightforward and has become a standard application of machine learning (ML). If the categories are not precise, but rather fuzzy, as in the case of scientific disciplines, the systematic failures of ML classification can be informative about properties of the underlying categories. Here we classify a large volume of academic publications using only the abstract as information. From the publications that are classified differently by journal categories and ML categories (i.e., misclassified publications, when using the journal assignment as ground truth) we construct a network among disciplines. Analysis of these misclassifications provides insight in two topics at the core of the science of science: (1) Mapping out the interplay of disciplines. We show that this misclassification network is informative about the interplay of academic disciplines and it is similar to, but distinct from, a citation-based map of science, where nodes are scientific disciplines and an edge indicates a strong co-citation count between publications in these disciplines. (2) Analyzing the success of interdisciplinarity. By evaluating the citation patterns of publications, we show that misclassification can be linked to interdisciplinarity and, furthermore, that misclassified articles have different citation frequencies than correctly classified articles: In the highest 10 percent of journals in each discipline, these misclassified articles are on average cited more frequently, while in the rest of the journals they are cited less frequently.
Collapse
Affiliation(s)
- Alexey Lyutov
- School of Business, Social and Decision Science, Constructor University, 28759, Bremen, Germany
| | - Yilmaz Uygun
- School of Business, Social and Decision Science, Constructor University, 28759, Bremen, Germany
| | | |
Collapse
|
4
|
Peng H, Qiu HS, Fosse HB, Uzzi B. Promotional language and the adoption of innovative ideas in science. Proc Natl Acad Sci U S A 2024; 121:e2320066121. [PMID: 38861605 PMCID: PMC11194578 DOI: 10.1073/pnas.2320066121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 05/01/2024] [Indexed: 06/13/2024] Open
Abstract
How are the merits of innovative ideas communicated in science? Here, we conduct semantic analyses of grant application success with a focus on scientific promotional language, which may help to convey an innovative idea's originality and significance. Our analysis attempts to surmount the limitations of prior grant studies by examining the full text of tens of thousands of both funded and unfunded grants from three leading public and private funding agencies: the NIH, the NSF, and the Novo Nordisk Foundation, one of the world's largest private science funding foundations. We find a robust association between promotional language and the support and adoption of innovative ideas by funders and other scientists. First, a grant proposal's percentage of promotional language is associated with up to a doubling of the grant's probability of being funded. Second, a grant's promotional language reflects its intrinsic innovativeness. Third, the percentage of promotional language is predictive of the expected citation and productivity impact of publications that are supported by funded grants. Finally, a computer-assisted experiment that manipulates the promotional language in our data demonstrates how promotional language can communicate the merit of ideas through cognitive activation. With the incidence of promotional language in science steeply rising, and the pivotal role of grants in converting promising and aspirational ideas into solutions, our analysis provides empirical evidence that promotional language is associated with effectively communicating the merits of innovative scientific ideas.
Collapse
Affiliation(s)
- Hao Peng
- Department of Management & Organizations, Kellogg School of Management, Northwestern University, Evanston, IL60208
- Northwestern Institute on Complex Systems, Evanston, IL60208
| | - Huilian Sophie Qiu
- Department of Management & Organizations, Kellogg School of Management, Northwestern University, Evanston, IL60208
- Northwestern Institute on Complex Systems, Evanston, IL60208
| | | | - Brian Uzzi
- Department of Management & Organizations, Kellogg School of Management, Northwestern University, Evanston, IL60208
- Northwestern Institute on Complex Systems, Evanston, IL60208
| |
Collapse
|
5
|
Fan Y, Blok A, Lehmann S. Understanding scholar-trajectories across scientific periodicals. Sci Rep 2024; 14:5309. [PMID: 38438413 PMCID: PMC10912201 DOI: 10.1038/s41598-024-54693-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 02/15/2024] [Indexed: 03/06/2024] Open
Abstract
Despite the rapid growth in the number of scientific publications, our understanding of author publication trajectories remains limited. Here we propose an embedding-based framework for tracking author trajectories in a geometric space that leverages the information encoded in the publication sequences, namely the list of the consecutive publication venues for each scholar. Using the publication histories of approximately 30,000 social media researchers, we obtain a knowledge space that broadly captures essential information about periodicals as well as complex (inter-)disciplinary structures of science. Based on this space, we study academic success through the prism of movement across scientific periodicals. We use a measure from human mobility, the radius of gyration, to characterize individual scholars' trajectories. Results show that author mobility across periodicals negatively correlates with citations, suggesting that successful scholars tend to publish in a relatively proximal range of periodicals. Overall, our framework discovers intricate structures in large-scale sequential data and provides new ways to explore mobility and trajectory patterns.
Collapse
Affiliation(s)
- Yangliu Fan
- Copenhagen Center for Social Data Science, University of Copenhagen, Copenhagen, Denmark.
| | - Anders Blok
- Copenhagen Center for Social Data Science, University of Copenhagen, Copenhagen, Denmark
- Department of Sociology, University of Copenhagen, Copenhagen, Denmark
| | - Sune Lehmann
- Copenhagen Center for Social Data Science, University of Copenhagen, Copenhagen, Denmark
- DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
6
|
Dubova M, Goldstone RL. Carving joints into nature: reengineering scientific concepts in light of concept-laden evidence. Trends Cogn Sci 2023; 27:656-670. [PMID: 37173157 DOI: 10.1016/j.tics.2023.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 04/11/2023] [Accepted: 04/12/2023] [Indexed: 05/15/2023]
Abstract
A new wave of proposals suggests that scientists must reassess scientific concepts in light of accumulated evidence. However, reengineering scientific concepts in light of data is challenging because scientific concepts affect the evidence itself in multiple ways. Among other possible influences, concepts (i) prime scientists to overemphasize within-concept similarities and between-concept differences; (ii) lead scientists to measure conceptually relevant dimensions more accurately; (iii) serve as units of scientific experimentation, communication, and theory-building; and (iv) affect the phenomena themselves. When looking for improved ways to carve nature at its joints, scholars must take the concept-laden nature of evidence into account to avoid entering a vicious circle of concept-evidence mutual substantiation.
Collapse
Affiliation(s)
- Marina Dubova
- Cognitive Science Program, Indiana University, 1101 E. 10th Street, Bloomington, IN 47405, USA.
| | - Robert L Goldstone
- Cognitive Science Program, Indiana University, 1101 E. 10th Street, Bloomington, IN 47405, USA; Department of Psychological and Brain Sciences, Indiana University, 1101 E. 10th Street, Bloomington, IN 47405, USA
| |
Collapse
|
7
|
Lin Z, Yin Y, Liu L, Wang D. SciSciNet: A large-scale open data lake for the science of science research. Sci Data 2023; 10:315. [PMID: 37264014 DOI: 10.1038/s41597-023-02198-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 05/02/2023] [Indexed: 06/03/2023] Open
Abstract
The science of science has attracted growing research interests, partly due to the increasing availability of large-scale datasets capturing the innerworkings of science. These datasets, and the numerous linkages among them, enable researchers to ask a range of fascinating questions about how science works and where innovation occurs. Yet as datasets grow, it becomes increasingly difficult to track available sources and linkages across datasets. Here we present SciSciNet, a large-scale open data lake for the science of science research, covering over 134M scientific publications and millions of external linkages to funding and public uses. We offer detailed documentation of pre-processing steps and analytical choices in constructing the data lake. We further supplement the data lake by computing frequently used measures in the literature, illustrating how researchers may contribute collectively to enriching the data lake. Overall, this data lake serves as an initial but useful resource for the field, by lowering the barrier to entry, reducing duplication of efforts in data processing and measurements, improving the robustness and replicability of empirical claims, and broadening the diversity and representation of ideas in the field.
Collapse
Affiliation(s)
- Zihang Lin
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
- School of Computer Science, Fudan University, Shanghai, China
| | - Yian Yin
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
- McCormick School of Engineering, Northwestern University, Evanston, IL, USA
| | - Lu Liu
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
| | - Dashun Wang
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA.
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA.
- Kellogg School of Management, Northwestern University, Evanston, IL, USA.
- McCormick School of Engineering, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
8
|
Liu L, Jones BF, Uzzi B, Wang D. Data, measurement and empirical methods in the science of science. Nat Hum Behav 2023:10.1038/s41562-023-01562-4. [PMID: 37264084 DOI: 10.1038/s41562-023-01562-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 02/17/2023] [Indexed: 06/03/2023]
Abstract
The advent of large-scale datasets that trace the workings of science has encouraged researchers from many different disciplinary backgrounds to turn scientific methods into science itself, cultivating a rapidly expanding 'science of science'. This Review considers this growing, multidisciplinary literature through the lens of data, measurement and empirical methods. We discuss the purposes, strengths and limitations of major empirical approaches, seeking to increase understanding of the field's diverse methodologies and expand researchers' toolkits. Overall, new empirical developments provide enormous capacity to test traditional beliefs and conceptual frameworks about science, discover factors associated with scientific productivity, predict scientific outcomes and design policies that facilitate scientific progress.
Collapse
Affiliation(s)
- Lu Liu
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, USA
| | - Benjamin F Jones
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
- National Bureau of Economic Research, Cambridge, MA, USA
- Brookings Institution, Washington, DC, USA
| | - Brian Uzzi
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
| | - Dashun Wang
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA.
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA.
- Kellogg School of Management, Northwestern University, Evanston, IL, USA.
- McCormick School of Engineering, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
9
|
Liu F, Rahwan T, AlShebli B. Non-White scientists appear on fewer editorial boards, spend more time under review, and receive fewer citations. Proc Natl Acad Sci U S A 2023; 120:e2215324120. [PMID: 36940343 PMCID: PMC10068789 DOI: 10.1073/pnas.2215324120] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 01/28/2023] [Indexed: 03/22/2023] Open
Abstract
Disparities continue to pose major challenges in various aspects of science. One such aspect is editorial board composition, which has been shown to exhibit racial and geographical disparities. However, the literature on this subject lacks longitudinal studies quantifying the degree to which the racial composition of editors reflects that of scientists. Other aspects that may exhibit racial disparities include the time spent between the submission and acceptance of a manuscript and the number of citations a paper receives relative to textually similar papers, but these have not been studied to date. To fill this gap, we compile a dataset of 1,000,000 papers published between 2001 and 2020 by six publishers, while identifying the handling editor of each paper. Using this dataset, we show that most countries in Asia, Africa, and South America (where the majority of the population is ethnically non-White) have fewer editors than would be expected based on their share of authorship. Focusing on US-based scientists reveals Black as the most underrepresented race. In terms of acceptance delay, we find, again, that papers from Asia, Africa, and South America spend more time compared to other papers published in the same journal and the same year. Regression analysis of US-based papers reveals that Black authors suffer from the greatest delay. Finally, by analyzing citation rates of US-based papers, we find that Black and Hispanic scientists receive significantly fewer citations compared to White ones doing similar research. Taken together, these findings highlight significant challenges facing non-White scientists.
Collapse
Affiliation(s)
- Fengyuan Liu
- Computer Science, Science Division, New York University Abu Dhabi, Abu Dhabi129188, UAE
- Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| | - Talal Rahwan
- Computer Science, Science Division, New York University Abu Dhabi, Abu Dhabi129188, UAE
| | - Bedoor AlShebli
- Social Science Division, New York University Abu Dhabi, Abu Dhabi129188, UAE
| |
Collapse
|
10
|
Liu F, Holme P, Chiesa M, AlShebli B, Rahwan T. Gender inequality and self-publication are common among academic editors. Nat Hum Behav 2023; 7:353-364. [PMID: 36646836 PMCID: PMC10038799 DOI: 10.1038/s41562-022-01498-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 11/10/2022] [Indexed: 01/18/2023]
Abstract
Scientific editors shape the content of academic journals and set standards for their fields. Yet, the degree to which the gender makeup of editors reflects that of scientists, and the rate at which editors publish in their own journals, are not entirely understood. Here, we use algorithmic tools to infer the gender of 81,000 editors serving more than 1,000 journals and 15 disciplines over five decades. Only 26% of authors in our dataset are women, and we find even fewer women among editors (14%) and editors-in-chief (8%). Career length explains the gender gap among editors, but not editors-in-chief. Moreover, by analysing the publication records of 20,000 editors, we find that 12% publish at least one-fifth, and 6% publish at least one-third, of their papers in the journal they edit. Editors-in-chief tend to self-publish at a higher rate. Finally, compared with women, men have a higher increase in the rate at which they publish in a journal soon after becoming its editor.
Collapse
Affiliation(s)
- Fengyuan Liu
- Computer Science, Science Division, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Petter Holme
- Department of Computer Science, Aalto University, Espoo, Finland
- Center for Computational Social Science, Kobe University, Kobe, Japan
| | - Matteo Chiesa
- Laboratory for Energy and Nano Science, Khalifa University of Science and Technology, Abu Dhabi, UAE
- Department of Physics and Technology, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Bedoor AlShebli
- Social Science Division, New York University Abu Dhabi, Abu Dhabi, UAE.
| | - Talal Rahwan
- Computer Science, Science Division, New York University Abu Dhabi, Abu Dhabi, UAE.
| |
Collapse
|
11
|
Deep representation learning of scientific paper reveals its potential scholarly impact. J Informetr 2023. [DOI: 10.1016/j.joi.2023.101376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
12
|
AlShebli B, Cheng E, Waniek M, Jagannathan R, Hernández-Lagos P, Rahwan T. Beijing's central role in global artificial intelligence research. Sci Rep 2022; 12:21461. [PMID: 36509790 PMCID: PMC9744801 DOI: 10.1038/s41598-022-25714-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 12/05/2022] [Indexed: 12/14/2022] Open
Abstract
Nations worldwide are mobilizing to harness the power of Artificial Intelligence (AI) given its massive potential to shape global competitiveness over the coming decades. Using a dataset of 2.2 million AI papers, we study inter-city citations, collaborations, and talent migrations to uncover dependencies between Eastern and Western cities worldwide. Beijing emerges as a clear outlier, as it has been the most impactful city since 2007, the most productive since 2002, and the one housing the largest number of AI scientists since 1995. Our analysis also reveals that Western cities cite each other far more frequently than expected by chance, East-East collaborations are far more common than East-West or West-West collaborations, and migration of AI scientists mostly takes place from one Eastern city to another. We then propose a measure that quantifies each city's role in bridging East and West. Beijing's role surpasses that of all other cities combined, making it the central gateway through which knowledge and talent flow from one side to the other. We also track the center of mass of AI research by weighing each city's geographic location by its impact, productivity, and AI workforce. The center of mass has moved thousands of kilometers eastward over the past three decades, with Beijing's pull increasing each year. These findings highlight the eastward shift in the tides of global AI research, and the growing role of the Chinese capital as a hub connecting researchers across the globe.
Collapse
Affiliation(s)
- Bedoor AlShebli
- grid.440573.10000 0004 1755 5934Social Science Division, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Enshu Cheng
- grid.440573.10000 0004 1755 5934Social Science Division, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Marcin Waniek
- grid.440573.10000 0004 1755 5934Computer Science, Science Division, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Ramesh Jagannathan
- grid.440573.10000 0004 1755 5934Engineering Division, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Pablo Hernández-Lagos
- grid.268433.80000 0004 1936 7638Sy Syms School of Business, Yeshiva University, New York, USA
| | - Talal Rahwan
- grid.440573.10000 0004 1755 5934Computer Science, Science Division, New York University Abu Dhabi, Abu Dhabi, UAE
| |
Collapse
|
13
|
Ke Q, Liang L, Ding Y, David SV, Acuna DE. A dataset of mentorship in bioscience with semantic and demographic estimations. Sci Data 2022; 9:467. [PMID: 35918351 PMCID: PMC9345966 DOI: 10.1038/s41597-022-01578-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 07/11/2022] [Indexed: 12/04/2022] Open
Abstract
Mentorship in science is crucial for topic choice, career decisions, and the success of mentees and mentors. Typically, researchers who study mentorship use article co-authorship and doctoral dissertation datasets. However, available datasets of this type focus on narrow selections of fields and miss out on early career and non-publication-related interactions. Here, we describe Mentorship, a crowdsourced dataset of 743176 mentorship relationships among 738989 scientists primarily in biosciences that avoids these shortcomings. Our dataset enriches the Academic Family Tree project by adding publication data from the Microsoft Academic Graph and "semantic" representations of research using deep learning content analysis. Because gender and race have become critical dimensions when analyzing mentorship and disparities in science, we also provide estimations of these factors. We perform extensive validations of the profile-publication matching, semantic content, and demographic inferences, which mostly cover neuroscience and biomedical sciences. We anticipate this dataset will spur the study of mentorship in science and deepen our understanding of its role in scientists' career outcomes.
Collapse
Affiliation(s)
- Qing Ke
- School of Data Science, City University of Hong Kong, Kowloon, Hong Kong.
| | - Lizhen Liang
- School of Information Studies, Syracuse University, Syracuse, New York, 13244, USA
| | - Ying Ding
- School of Information, University of Texas at Austin, Austin, Texas, 78712, USA
| | - Stephen V David
- Oregon Hearing Research Center, Oregon Health and Science University, Portland, Oregon, 97239, USA
| | - Daniel E Acuna
- School of Information Studies, Syracuse University, Syracuse, New York, 13244, USA.
| |
Collapse
|
14
|
Metrics and mechanisms: Measuring the unmeasurable in the science of science. J Informetr 2022. [DOI: 10.1016/j.joi.2022.101290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
15
|
Nelson AP, Gray RJ, Ruffle JK, Watkins HC, Herron D, Sorros N, Mikhailov D, Cardoso MJ, Ourselin S, McNally N, Williams B, Rees GE, Nachev P. Deep forecasting of translational impact in medical research. PATTERNS 2022; 3:100483. [PMID: 35607619 PMCID: PMC9122964 DOI: 10.1016/j.patter.2022.100483] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/10/2022] [Accepted: 03/04/2022] [Indexed: 11/26/2022]
|
16
|
Yun J. Generalization of bibliographic coupling and co-citation using the node split network. J Informetr 2022. [DOI: 10.1016/j.joi.2022.101291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
17
|
Liu H, Zhang L, Wang W, Huang Y, Li S, Ren Z, Zhou Z. Prediction of Online Psychological Help-Seeking Behavior During the COVID-19 Pandemic: An Interpretable Machine Learning Method. Front Public Health 2022; 10:814366. [PMID: 35309216 PMCID: PMC8929708 DOI: 10.3389/fpubh.2022.814366] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 01/17/2022] [Indexed: 12/02/2022] Open
Abstract
Online mental health service (OMHS) has been named as the best psychological assistance measure during the COVID-19 pandemic. An interpretable, accurate, and early prediction for the demand of OMHS is crucial to local governments and organizations which need to allocate and make the decision in mental health resources. The present study aimed to investigate the influence of the COVID-19 pandemic on the online psychological help-seeking (OPHS) behavior in the OMHS, then propose a machine learning model to predict and interpret the OPHS number in advance. The data was crawled from two Chinese OMHS platforms. Linguistic inquiry and word count (LIWC), neural embedding-based topic modeling, and time series analysis were utilized to build time series feature sets with lagging one, three, seven, and 14 days. Correlation analysis was used to examine the impact of COVID-19 on OPHS behaviors across different OMHS platforms. Machine learning algorithms and Shapley additive explanation (SHAP) were used to build the prediction. The result showed that the massive growth of OPHS behavior during the COVID-19 pandemic was a common phenomenon. The predictive model based on random forest (RF) and feature sets containing temporal features of the OPHS number, mental health topics, LIWC, and COVID-19 cases achieved the best performance. Temporal features of the OPHS number showed the biggest positive and negative predictive power. The topic features had incremental effects on performance of the prediction across different lag days and were more suitable for OPHS prediction compared to the LIWC features. The interpretable model showed that the increase in the OPHS behaviors was impacted by the cumulative confirmed cases and cumulative deaths, while it was not sensitive in the new confirmed cases or new deaths. The present study was the first to predict the demand for OMHS using machine learning during the COVID-19 pandemic. This study suggests an interpretable machine learning method that can facilitate quick, early, and interpretable prediction of the OPHS behavior and to support the operational decision-making; it also demonstrated the power of utilizing the OMHS platforms as an always-on data source to obtain a high-resolution timeline and real-time prediction of the psychological response of the online public.
Collapse
Affiliation(s)
- Hui Liu
- Key Laboratory of Adolescent Cyberpsychology and Behavior, Ministry of Education, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
| | - Lin Zhang
- Key Laboratory of Adolescent Cyberpsychology and Behavior, Ministry of Education, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
| | - Weijun Wang
- Key Laboratory of Adolescent Cyberpsychology and Behavior, Ministry of Education, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
| | - Yinghui Huang
- Key Laboratory of Adolescent Cyberpsychology and Behavior, Ministry of Education, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
| | - Shen Li
- Key Laboratory of Adolescent Cyberpsychology and Behavior, Ministry of Education, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
| | - Zhihong Ren
- Key Laboratory of Adolescent Cyberpsychology and Behavior, Ministry of Education, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
| | - Zongkui Zhou
- Key Laboratory of Adolescent Cyberpsychology and Behavior, Ministry of Education, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
| |
Collapse
|
18
|
Zhan L, Jia T. CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling. ENTROPY (BASEL, SWITZERLAND) 2022; 24:276. [PMID: 35205570 PMCID: PMC8870891 DOI: 10.3390/e24020276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 02/07/2022] [Accepted: 02/11/2022] [Indexed: 11/16/2022]
Abstract
Heterogeneous information network (HIN) embedding is an important tool for tasks such as node classification, community detection, and recommendation. It aims to find the representations of nodes that preserve the proximity between entities of different nature. A family of approaches that are widely adopted applies random walk to generate a sequence of heterogeneous contexts, from which, the embedding is learned. However, due to the multipartite graph structure of HIN, hub nodes tend to be over-represented to their context in the sampled sequence, giving rise to imbalanced samples of the network. Here, we propose a new embedding method: CoarSAS2hvec. The self-avoiding short sequence sampling with the HIN coarsening procedure (CoarSAS) is utilized to better collect the rich information in HIN. An optimized loss function is used to improve the performance of the HIN structure embedding. CoarSAS2hvec outperforms nine other methods in node classification and community detection on four real-world data sets. Using entropy as a measure of the amount of information, we confirm that CoarSAS catches richer information of the network compared with that through other methods. Hence, the traditional loss function applied to samples by CoarSAS can also yield improved results. Our work addresses a limitation of the random-walk-based HIN embedding that has not been emphasized before, which can shed light on a range of problems in HIN analyses.
Collapse
Affiliation(s)
| | - Tao Jia
- College of Computer and Information Science, Southwest University, Chongqing 400715, China;
| |
Collapse
|
19
|
|
20
|
Liu L, Dehmamy N, Chown J, Giles CL, Wang D. Understanding the onset of hot streaks across artistic, cultural, and scientific careers. Nat Commun 2021; 12:5392. [PMID: 34518529 PMCID: PMC8438033 DOI: 10.1038/s41467-021-25477-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 08/04/2021] [Indexed: 11/09/2022] Open
Abstract
Across a range of creative domains, individual careers are characterized by hot streaks, which are bursts of high-impact works clustered together in close succession. Yet it remains unclear if there are any regularities underlying the beginning of hot streaks. Here, we analyze career histories of artists, film directors, and scientists, and develop deep learning and network science methods to build high-dimensional representations of their creative outputs. We find that across all three domains, individuals tend to explore diverse styles or topics before their hot streak, but become notably more focused after the hot streak begins. Crucially, hot streaks appear to be associated with neither exploration nor exploitation behavior in isolation, but a particular sequence of exploration followed by exploitation, where the transition from exploration to exploitation closely traces the onset of a hot streak. Overall, these results may have implications for identifying and nurturing talents across a wide range of creative domains.
Collapse
Affiliation(s)
- Lu Liu
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, USA
| | - Nima Dehmamy
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
| | - Jillian Chown
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
| | - C Lee Giles
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, USA
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
| | - Dashun Wang
- Center for Science of Science and Innovation, Northwestern University, Evanston, IL, USA.
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA.
- Kellogg School of Management, Northwestern University, Evanston, IL, USA.
- McCormick School of Engineering, Northwestern University, Evanston, IL, USA.
| |
Collapse
|