1
|
Lan M, Cheng M, Hoang L, Ter Riet G, Kilicoglu H. Automatic categorization of self-acknowledged limitations in randomized controlled trial publications. J Biomed Inform 2024; 152:104628. [PMID: 38548008 DOI: 10.1016/j.jbi.2024.104628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 03/09/2024] [Accepted: 03/24/2024] [Indexed: 04/05/2024]
Abstract
OBJECTIVE Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications. METHODS We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale. RESULTS Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F1 score (0.821 vs. 0.8) with statistical significance (p<.001). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F1 score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance (p<.001). CONCLUSION The model could support automated screening tools which can be used by journals to draw the authors' attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.
Collapse
Affiliation(s)
- Mengfei Lan
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA
| | - Mandy Cheng
- Department of Biological Sciences, Binghamton University, 4400 Vestal Parkway East, New York City, 13902, NY, USA
| | - Linh Hoang
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA
| | - Gerben Ter Riet
- Faculty of Health, Amsterdam University of Applied Sciences, Tafelbergweg 51, Amsterdam, 1105 BD, The Netherlands
| | - Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA.
| |
Collapse
|
2
|
Borg DN, Impellizzeri FM, Borg SJ, Hutchins KP, Stewart IB, Jones T, Baguley BJ, Orssatto LBR, Bach AJE, Osborne JO, McMaster BS, Buhmann RL, Bon JJ, Barnett AG. Meta-analysis prediction intervals are under reported in sport and exercise medicine. Scand J Med Sci Sports 2024; 34:e14603. [PMID: 38501202 DOI: 10.1111/sms.14603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 02/22/2024] [Accepted: 03/04/2024] [Indexed: 03/20/2024]
Abstract
AIM Prediction intervals are a useful measure of uncertainty for meta-analyses that capture the likely effect size of a new (similar) study based on the included studies. In comparison, confidence intervals reflect the uncertainty around the point estimate but provide an incomplete summary of the underlying heterogeneity in the meta-analysis. This study aimed to estimate (i) the proportion of meta-analysis studies that report a prediction interval in sports medicine; and (ii) the proportion of studies with a discrepancy between the reported confidence interval and a calculated prediction interval. METHODS We screened, at random, 1500 meta-analysis studies published between 2012 and 2022 in highly ranked sports medicine and medical journals. Articles that used a random effect meta-analysis model were included in the study. We randomly selected one meta-analysis from each article to extract data from, which included the number of estimates, the pooled effect, and the confidence and prediction interval. RESULTS Of the 1500 articles screened, 866 (514 from sports medicine) used a random effect model. The probability of a prediction interval being reported in sports medicine was 1.7% (95% CI = 0.9%, 3.3%). In medicine the probability was 3.9% (95% CI = 2.4%, 6.6%). A prediction interval was able to be calculated for 220 sports medicine studies. For 60% of these studies, there was a discrepancy in study findings between the reported confidence interval and the calculated prediction interval. Prediction intervals were 3.4 times wider than confidence intervals. CONCLUSION Very few meta-analyses report prediction intervals and hence are prone to missing the impact of between-study heterogeneity on the overall conclusions. The widespread misinterpretation of random effect meta-analyses could mean that potentially harmful treatments, or those lacking a sufficient evidence base, are being used in practice. Authors, reviewers, and editors should be aware of the importance of prediction intervals.
Collapse
Affiliation(s)
- David N Borg
- Australian Centre for Health Services Innovation (AusHSI), School of Public Health and Social Work, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Exercise and Nutrition Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Franco M Impellizzeri
- School of Sport, Exercise and Rehabilitation, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Samantha J Borg
- Australian Centre for Health Services Innovation (AusHSI), School of Public Health and Social Work, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Kate P Hutchins
- School of Exercise and Nutrition Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Ian B Stewart
- School of Exercise and Nutrition Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tamara Jones
- Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia
| | - Brenton J Baguley
- Institute for Physical Activity and Nutrition (IPAN), School of Exercise and Nutrition Sciences, Deakin University, Burwood, Victoria, Australia
| | - Lucas B R Orssatto
- School of Exercise and Nutrition Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
- Institute for Physical Activity and Nutrition (IPAN), School of Exercise and Nutrition Sciences, Deakin University, Burwood, Victoria, Australia
| | - Aaron J E Bach
- School of Health Sciences and Social Work, Griffith University, Gold Coast, Queensland, Australia
- Cities Research Institute, Griffith University, Gold Coast, Queensland, Australia
| | - John O Osborne
- School of Sport Sciences, UiT The Arctic University of Norway, Tromsø, Norway
| | - Benjamin S McMaster
- School of Exercise and Nutrition Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Robert L Buhmann
- School of Health, University of Sunshine Coast, Sippy Downs, Queensland, Australia
| | - Joshua J Bon
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Adrian G Barnett
- Australian Centre for Health Services Innovation (AusHSI), School of Public Health and Social Work, Queensland University of Technology, Brisbane, Queensland, Australia
| |
Collapse
|
3
|
Yoshizawa G, Shinomiya N, Kawamoto S, Kawahara N, Kiga D, Hanaki KI, Minari J. Limiting open science? Three approaches to bottom-up governance of dual-use research of concern. Pathog Glob Health 2023:1-10. [PMID: 37791645 DOI: 10.1080/20477724.2023.2265626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023] Open
Abstract
Governing dual-use research of concern (DURC) in the life sciences has become difficult owing to the diversification of scientific domains, digitalization of potential threats, and the proliferation of actors. This paper proposes three approaches to realize bottom-up governance of DURC from laboratory operation to institutional decision-making levels. First, a technological approach can predict and monitor the dual-use nature of the research target pathogens and their information. Second, an interactive approach is proposed in which diverse stakeholders proactively discuss and examine dual-use issues through research practice. Third, a personnel approach can identify the right persons involved in DURC. These approaches suggest that, going beyond self-governance by researchers, collaborative and networked governance involving diverse actors should become essential. This mode of governance can also be seen in light of the management of research use. Therefore, program design by funding agencies and publication screening by journal publishers continuously contribute to governance at the meso-level. Bottom-up governance may be realized by using an appropriately integrated design of these three approaches at the micro-level, such as dual-use prediction and monitoring, stakeholder dialogue, and background checks. Given that the term 'open science' has been promoted to the research community as part of top-down governance, paying due attention on site to research subjects, research practices, and persons involved in research will provide an opportunity to develop a more socially conscious open science.
Collapse
Affiliation(s)
- Go Yoshizawa
- Innovation System Research Center, Kwansei Gakuin University, Hyogo, Japan
| | | | - Shishin Kawamoto
- Graduate School of Science, Hokkaido University, Hokkaido, Japan
| | - Naoto Kawahara
- Center for Clinical and Translational Research, Kyushu University Hospital, Fukuoka, Japan
| | - Daisuke Kiga
- Center for Advanced Biomedical Sciences, School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Ken-Ichi Hanaki
- Management Department of Biosafety, Laboratory Animal, and Pathogen Bank, National Institute of Infectious Diseases, Tokyo, Japan
| | - Jusaku Minari
- Uehiro Research Division for iPS Cell Ethics, Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
| |
Collapse
|
4
|
Thibault RT, Amaral OB, Argolo F, Bandrowski AE, Davidson AR, Drude NI. Open Science 2.0: Towards a truly collaborative research ecosystem. PLoS Biol 2023; 21:e3002362. [PMID: 37856538 PMCID: PMC10617723 DOI: 10.1371/journal.pbio.3002362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 10/31/2023] [Indexed: 10/21/2023] Open
Abstract
Conversations about open science have reached the mainstream, yet many open science practices such as data sharing remain uncommon. Our efforts towards openness therefore need to increase in scale and aim for a more ambitious target. We need an ecosystem not only where research outputs are openly shared but also in which transparency permeates the research process from the start and lends itself to more rigorous and collaborative research. To support this vision, this Essay provides an overview of a selection of open science initiatives from the past 2 decades, focusing on methods transparency, scholarly communication, team science, and research culture, and speculates about what the future of open science could look like. It then draws on these examples to provide recommendations for how funders, institutions, journals, regulators, and other stakeholders can create an environment that is ripe for improvement.
Collapse
Affiliation(s)
- Robert T. Thibault
- 1 Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, Unites States of America
| | - Olavo B. Amaral
- Institute of Medical Biochemistry Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Anita E. Bandrowski
- FAIR Data Informatics Lab, Department of Neuroscience, UCSD, San Diego, California, United States of America
- SciCrunch Inc., San Diego, California, United States of America
| | - Alexandra R, Davidson
- Institute for Evidence-Based Health Care, Bond University, Robina, Australia
- Faculty of Health Science and Medicine, Bond University, Robina, Australia
| | - Natascha I. Drude
- Berlin Institute of Health (BIH) at Charité, BIH QUEST Center for Responsible Research, Berlin, Germany
| |
Collapse
|
5
|
Kamel SA, El-Sobky TA. Reporting quality of abstracts and inconsistencies with full text articles in pediatric orthopedic publications. Res Integr Peer Rev 2023; 8:11. [PMID: 37608346 PMCID: PMC10463470 DOI: 10.1186/s41073-023-00135-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 06/19/2023] [Indexed: 08/24/2023] Open
Abstract
BACKGROUND Abstracts should provide a brief yet comprehensive reporting of all components of a manuscript. Inaccurate reporting may mislead readers and impact citation practices. It was our goal to investigate the reporting quality of abstracts of interventional observational studies in three major pediatric orthopedic journals and to analyze any reporting inconsistencies between those abstracts and their corresponding full-text articles. METHODS We selected a sample of 55 abstracts and their full-text articles published between 2018 and 2022. Included articles were primary therapeutic research investigating the results of treatments or interventions. Abstracts were scrutinized for reporting quality and inconsistencies with their full-text versions with a 22-itemized checklist. The reporting quality of titles was assessed by a 3-items categorical scale. RESULTS In 48 (87%) of articles there were abstract reporting inaccuracies related to patient demographics. The study's follow-up and complications were not reported in 21 (38%) of abstracts each. Most common inconsistencies between the abstracts and full-text articles were related to reporting of inclusion or exclusion criteria in 39 (71%) and study correlations in 27 (49%) of articles. Reporting quality of the titles was insufficient in 33 (60%) of articles. CONCLUSIONS In our study we found low reporting quality of abstracts and noticeable inconsistencies with full-text articles, especially regarding inclusion or exclusion criteria and study correlations. While the current sample is likely not representative of overall pediatric orthopedic literature, we recommend that authors, reviewers, and editors ensure abstracts are reported accurately, ideally following the appropriate reporting guidelines, and that they double check that there are no inconsistencies between abstracts and full text articles. To capture essential study information, journals should also consider increasing abstract word limits.
Collapse
Affiliation(s)
- Sherif Ahmed Kamel
- Department of Orthopedic Surgery, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- University Hospitals of Leicester NHS Trust, Leicester, UK
| | - Tamer A. El-Sobky
- Division of Pediatric Orthopedics, Department of Orthopedic Surgery, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| |
Collapse
|
6
|
Alqahtani T, Badreldin HA, Alrashed M, Alshaya AI, Alghamdi SS, Bin Saleh K, Alowais SA, Alshaya OA, Rahman I, Al Yami MS, Albekairy AM. The emergent role of artificial intelligence, natural learning processing, and large language models in higher education and research. Res Social Adm Pharm 2023:S1551-7411(23)00280-2. [PMID: 37321925 DOI: 10.1016/j.sapharm.2023.05.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/29/2023] [Accepted: 05/30/2023] [Indexed: 06/17/2023]
Abstract
Artificial Intelligence (AI) has revolutionized various domains, including education and research. Natural language processing (NLP) techniques and large language models (LLMs) such as GPT-4 and BARD have significantly advanced our comprehension and application of AI in these fields. This paper provides an in-depth introduction to AI, NLP, and LLMs, discussing their potential impact on education and research. By exploring the advantages, challenges, and innovative applications of these technologies, this review gives educators, researchers, students, and readers a comprehensive view of how AI could shape educational and research practices in the future, ultimately leading to improved outcomes. Key applications discussed in the field of research include text generation, data analysis and interpretation, literature review, formatting and editing, and peer review. AI applications in academics and education include educational support and constructive feedback, assessment, grading, tailored curricula, personalized career guidance, and mental health support. Addressing the challenges associated with these technologies, such as ethical concerns and algorithmic biases, is essential for maximizing their potential to improve education and research outcomes. Ultimately, the paper aims to contribute to the ongoing discussion about the role of AI in education and research and highlight its potential to lead to better outcomes for students, educators, and researchers.
Collapse
Affiliation(s)
- Tariq Alqahtani
- Department of Pharmaceutical Sciences, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, Saudi Arabia; King Abdullah International Medical Research Center, Riyadh, Saudi Arabia.
| | - Hisham A Badreldin
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Department of Pharmacy Practice, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Pharmaceutical Care Department, King Abdulaziz Medical City, National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Mohammed Alrashed
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Department of Pharmacy Practice, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Pharmaceutical Care Department, King Abdulaziz Medical City, National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Abdulrahman I Alshaya
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Department of Pharmacy Practice, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Pharmaceutical Care Department, King Abdulaziz Medical City, National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Sahar S Alghamdi
- Department of Pharmaceutical Sciences, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, Saudi Arabia; King Abdullah International Medical Research Center, Riyadh, Saudi Arabia
| | - Khalid Bin Saleh
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Department of Pharmacy Practice, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Pharmaceutical Care Department, King Abdulaziz Medical City, National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Shuroug A Alowais
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Department of Pharmacy Practice, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Pharmaceutical Care Department, King Abdulaziz Medical City, National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Omar A Alshaya
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Department of Pharmacy Practice, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Pharmaceutical Care Department, King Abdulaziz Medical City, National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Ishrat Rahman
- Department of Basic Dental Sciences, College of Dentistry, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
| | - Majed S Al Yami
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Department of Pharmacy Practice, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Pharmaceutical Care Department, King Abdulaziz Medical City, National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Abdulkareem M Albekairy
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Department of Pharmacy Practice, College of Pharmacy, King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; Pharmaceutical Care Department, King Abdulaziz Medical City, National Guard Health Affairs, Riyadh, Saudi Arabia
| |
Collapse
|
7
|
Barnett A. Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials. F1000Res 2023; 11:783. [PMID: 37360941 PMCID: PMC10285343 DOI: 10.12688/f1000research.123002.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/24/2023] [Indexed: 06/28/2023] Open
Abstract
Background: Papers describing the results of a randomised trial should include a baseline table that compares the characteristics of randomised groups. Researchers who fraudulently generate trials often unwittingly create baseline tables that are implausibly similar (under-dispersed) or have large differences between groups (over-dispersed). I aimed to create an automated algorithm to screen for under- and over-dispersion in the baseline tables of randomised trials. Methods: Using a cross-sectional study I examined 2,245 randomised controlled trials published in health and medical journals on PubMed Central. I estimated the probability that a trial's baseline summary statistics were under- or over-dispersed using a Bayesian model that examined the distribution of t-statistics for the between-group differences, and compared this with an expected distribution without dispersion. I used a simulation study to test the ability of the model to find under- or over-dispersion and compared its performance with an existing test of dispersion based on a uniform test of p-values. My model combined categorical and continuous summary statistics, whereas the uniform test used only continuous statistics. Results: The algorithm had a relatively good accuracy for extracting the data from baseline tables, matching well on the size of the tables and sample size. Using t-statistics in the Bayesian model out-performed the uniform test of p-values, which had many false positives for skewed, categorical and rounded data that were not under- or over-dispersed. For trials published on PubMed Central, some tables appeared under- or over-dispersed because they had an atypical presentation or had reporting errors. Some trials flagged as under-dispersed had groups with strikingly similar summary statistics. Conclusions: Automated screening for fraud of all submitted trials is challenging due to the widely varying presentation of baseline tables. The Bayesian model could be useful in targeted checks of suspected trials or authors.
Collapse
Affiliation(s)
- Adrian Barnett
- Australian Centre for Health Services Innovation & Centre for Healthcare Transformation, Queensland University of Technology, Kelvin Grove, Queensland, 4059, Australia
| |
Collapse
|
8
|
Barnett A. Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials. F1000Res 2023; 11:783. [PMID: 37360941 PMCID: PMC10285343 DOI: 10.12688/f1000research.123002.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/24/2023] [Indexed: 10/12/2023] Open
Abstract
Background: Papers describing the results of a randomised trial should include a baseline table that compares the characteristics of randomised groups. Researchers who fraudulently generate trials often unwittingly create baseline tables that are implausibly similar (under-dispersed) or have large differences between groups (over-dispersed). I aimed to create an automated algorithm to screen for under- and over-dispersion in the baseline tables of randomised trials. Methods: Using a cross-sectional study I examined 2,245 randomised controlled trials published in health and medical journals on PubMed Central. I estimated the probability that a trial's baseline summary statistics were under- or over-dispersed using a Bayesian model that examined the distribution of t-statistics for the between-group differences, and compared this with an expected distribution without dispersion. I used a simulation study to test the ability of the model to find under- or over-dispersion and compared its performance with an existing test of dispersion based on a uniform test of p-values. My model combined categorical and continuous summary statistics, whereas the uniform test used only continuous statistics. Results: The algorithm had a relatively good accuracy for extracting the data from baseline tables, matching well on the size of the tables and sample size. Using t-statistics in the Bayesian model out-performed the uniform test of p-values, which had many false positives for skewed, categorical and rounded data that were not under- or over-dispersed. For trials published on PubMed Central, some tables appeared under- or over-dispersed because they had an atypical presentation or had reporting errors. Some trials flagged as under-dispersed had groups with strikingly similar summary statistics. Conclusions: Automated screening for fraud of all submitted trials is challenging due to the widely varying presentation of baseline tables. The Bayesian model could be useful in targeted checks of suspected trials or authors.
Collapse
Affiliation(s)
- Adrian Barnett
- Australian Centre for Health Services Innovation & Centre for Healthcare Transformation, Queensland University of Technology, Kelvin Grove, Queensland, 4059, Australia
| |
Collapse
|
9
|
Hosseini M, Horbach SPJM. Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review. Res Integr Peer Rev 2023; 8:4. [PMID: 37198671 DOI: 10.1186/s41073-023-00133-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/19/2023] [Indexed: 05/19/2023] Open
Abstract
BACKGROUND The emergence of systems based on large language models (LLMs) such as OpenAI's ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provided prompts, using them in various writing tasks including writing peer review reports could result in improved productivity. Given the significance of peer reviews in the existing scholarly publication landscape, exploring challenges and opportunities of using LLMs in peer review seems urgent. After the generation of the first scholarly outputs with LLMs, we anticipate that peer review reports too would be generated with the help of these systems. However, there are currently no guidelines on how these systems should be used in review tasks. METHODS To investigate the potential impact of using LLMs on the peer review process, we used five core themes within discussions about peer review suggested by Tennant and Ross-Hellauer. These include 1) reviewers' role, 2) editors' role, 3) functions and quality of peer reviews, 4) reproducibility, and 5) the social and epistemic functions of peer reviews. We provide a small-scale exploration of ChatGPT's performance regarding identified issues. RESULTS LLMs have the potential to substantially alter the role of both peer reviewers and editors. Through supporting both actors in efficiently writing constructive reports or decision letters, LLMs can facilitate higher quality review and address issues of review shortage. However, the fundamental opacity of LLMs' training data, inner workings, data handling, and development processes raise concerns about potential biases, confidentiality and the reproducibility of review reports. Additionally, as editorial work has a prominent function in defining and shaping epistemic communities, as well as negotiating normative frameworks within such communities, partly outsourcing this work to LLMs might have unforeseen consequences for social and epistemic relations within academia. Regarding performance, we identified major enhancements in a short period and expect LLMs to continue developing. CONCLUSIONS We believe that LLMs are likely to have a profound impact on academia and scholarly communication. While potentially beneficial to the scholarly communication system, many uncertainties remain and their use is not without risks. In particular, concerns about the amplification of existing biases and inequalities in access to appropriate infrastructure warrant further attention. For the moment, we recommend that if LLMs are used to write scholarly reviews and decision letters, reviewers and editors should disclose their use and accept full responsibility for data security and confidentiality, and their reports' accuracy, tone, reasoning and originality.
Collapse
Affiliation(s)
- Mohammad Hosseini
- Feinberg School of Medicine, Northwestern University, 420 E. Superior Street, Chicago, IL, 60611, USA.
| | - Serge P J M Horbach
- Danish Centre for Studies in Research and Research Policy, Aarhus University, Bartholins Alle 7, 8000, Aarhus C, Aarhus, Denmark
| |
Collapse
|
10
|
Khan KS. International multi-stakeholder consensus statement on clinical trial integrity. BJOG 2023. [PMID: 37161843 DOI: 10.1111/1471-0528.17451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 02/06/2023] [Accepted: 03/03/2023] [Indexed: 05/11/2023]
Abstract
OBJECTIVE To prepare a set of statements for randomised clinical trials (RCT) integrity through an international multi-stakeholder consensus. METHODS The consensus was developed via: multi-country multidisciplinary stakeholder group composition and engagement; evidence synthesis of 55 systematic reviews concerning RCT integrity; anonymised two-round modified Delphi survey with consensus threshold based on the average percentage of majority opinions; and, a final consensus development meeting. Prospective registrations: (https://osf.io/bhncy, https://osf.io/3ursn). RESULTS There were 30 stakeholders representing 15 countries from five continents including triallists, ethicists, methodologists, statisticians, consumer representatives, industry representatives, systematic reviewers, funding body panel members, regulatory experts, authors, journal editors, peer-reviewers and advisors for resolving integrity concerns. Delphi survey response rate was 86.7% (26/30 stakeholders). There were 111 statements (73 stakeholder-provided, 46 systematic review-generated, 8 supported by both) in the initial long list, with eight additional statements provided during the consensus rounds. Through consensus the final set consolidated 81 statements (49 stakeholder-provided, 41 systematic review-generated, 9 supported by both). The entire RCT life cycle was covered by the set of statements including general aspects (n = 6), design and approval (n = 11), conduct and monitoring (n = 19), reporting of protocols and findings (n = 20), post-publication concerns (n = 12), and future research and development (n = 13). CONCLUSION Implementation of this multi-stakeholder consensus statement is expected to enhance RCT integrity.
Collapse
|
11
|
Lauria M. Reviewing Peer Review: A Flawed System: With Immense Potential. PUBLISHING RESEARCH QUARTERLY 2023. [DOI: 10.1007/s12109-023-09943-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
|
12
|
Abstract
Scholars need to be able to trust each other, because otherwise they cannot collaborate and use each other's findings. Similarly trust is essential for research to be applied for individuals, society or the natural environment. The trustworthiness is threatened when researchers engage in questionable research practices or worse. By adopting open science practices, research becomes transparent and accountable. Only then it is possible to verify whether trust in research findings is justified. The magnitude of the issue is substantial with a prevalence of four percent for both fabrication and falsification, and more than 50% for questionable research practices. This implies that researchers regularly engage in behaviors that harm the validity and trustworthiness of their work. What is good for the quality and reliability of research is not always good for a scholarly career. Navigating this dilemma depends on how virtuous the researcher at issue is, but also on the local research climate and the perverse incentives in the way the research system functions. Research institutes, funding agencies and scholarly journals can do a lot to foster research integrity, first and foremost by improving the quality of peer review and reforming researcher assessment.
Collapse
Affiliation(s)
- Lex Bouter
- Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam, The Netherlands.,Department of Philosophy, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
13
|
|
14
|
The Postdigital-Biodigital Revolution. POSTDIGITAL SCIENCE AND EDUCATION 2022. [PMCID: PMC9483348 DOI: 10.1007/s42438-022-00338-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|