1
|
Lan M, Cheng M, Hoang L, Ter Riet G, Kilicoglu H. Automatic categorization of self-acknowledged limitations in randomized controlled trial publications. J Biomed Inform 2024; 152:104628. [PMID: 38548008 DOI: 10.1016/j.jbi.2024.104628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 03/09/2024] [Accepted: 03/24/2024] [Indexed: 04/05/2024]
Abstract
OBJECTIVE Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications. METHODS We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale. RESULTS Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F1 score (0.821 vs. 0.8) with statistical significance (p<.001). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F1 score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance (p<.001). CONCLUSION The model could support automated screening tools which can be used by journals to draw the authors' attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.
Collapse
Affiliation(s)
- Mengfei Lan
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA
| | - Mandy Cheng
- Department of Biological Sciences, Binghamton University, 4400 Vestal Parkway East, New York City, 13902, NY, USA
| | - Linh Hoang
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA
| | - Gerben Ter Riet
- Faculty of Health, Amsterdam University of Applied Sciences, Tafelbergweg 51, Amsterdam, 1105 BD, The Netherlands
| | - Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA.
| |
Collapse
|
2
|
Ott DE. Limitations in Medical Research: Recognition, Influence, and Warning. JSLS 2024; 28:e2023.00049. [PMID: 38405216 PMCID: PMC10882193 DOI: 10.4293/jsls.2023.00049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024] Open
Abstract
Background As the number of limitations increases in a medical research article, their consequences multiply and the validity of findings decreases. How often do limitations occur in a medical article? What are the implications of limitation interaction? How often are the conclusions hedged in their explanation? Objective To identify the number, type, and frequency of limitations and words used to describe conclusion(s) in medical research articles. Methods Search, analysis, and evaluation of open access research articles from 2021 and 2022 from the Journal of the Society of Laparoscopic and Robotic Surgery and 2022 Surgical Endoscopy for type(s) of limitation(s) admitted to by author(s) and the number of times they occurred. Limitations not admitted to were found, obvious, and not claimed. An automated text analysis was performed for hedging words in conclusion statements. A limitation index score is proposed to gauge the validity of statements and conclusions as the number of limitations increases. Results A total of 298 articles were reviewed and analyzed, finding 1,764 limitations. Four articles had no limitations. The average was between 3.7% and 6.9% per article. Hedging, weasel words and words of estimative probability description was found in 95.6% of the conclusions. Conclusions Limitations and their number matter. The greater the number of limitations and ramifications of their effects, the more outcomes and conclusions are affected. Wording ambiguity using hedging or weasel words shows that limitations affect the uncertainty of claims. The limitation index scoring method shows the diminished validity of finding(s) and conclusion(s).
Collapse
|
3
|
Knez N, Kroflin K, Fraga GR. Publications on the diagnostic accuracy of dermatopathology tests: A cross-sectional quality analysis. J Cutan Pathol 2023; 50:1020-1026. [PMID: 37565501 DOI: 10.1111/cup.14504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 07/15/2023] [Accepted: 07/24/2023] [Indexed: 08/12/2023]
Abstract
BACKGROUND Ancillary diagnostic tests are frequent in dermatopathology practice. Publications on their accuracy influence their utilization. The transparency and completeness of these publications are unknown. METHODS We performed a cross-sectional study on diagnostic accuracy studies in dermatopathology published between 2020 and 2022 for compliance with Standards for Reporting of Diagnostic Accuracy Studies (STARD) and the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2). RESULTS 14.67 ± 3.02 STARD items were reported in 62 publications (range, 9.5-23.5 out of the recommended total of 30). More items were reported in high-impact factor journals (16.01 vs. 13.32, p = 0.0002) and journals that endorsed STARD in their author instructions (17.22 vs. 14.11, p = 0.0039). Less than 10% of publications reported quantifiable hypotheses, sample size calculations, flow diagrams, or study registrations. The risk of bias by our analysis of QUADAS-2 criteria was high or uncertain for index test interpretation (36/62, 58%) and patient selection (44/62, 71%). CONCLUSIONS Publications on dermatopathology tests are exploratory studies without prespecified hypotheses or study designs. They do not meet the criteria for transparent reporting. We suggest that medical journal leadership should consider updating their instructions with more explicit guidance on recommended manuscript elements.
Collapse
Affiliation(s)
- Nora Knez
- School of Medicine, University of Zagreb, Zagreb, Croatia
| | - Karla Kroflin
- School of Medicine, University of Zagreb, Zagreb, Croatia
| | - Garth R Fraga
- School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, USA
| |
Collapse
|
4
|
Schulz R, Barnett A, Bernard R, Brown NJL, Byrne JA, Eckmann P, Gazda MA, Kilicoglu H, Prager EM, Salholz-Hillel M, Ter Riet G, Vines T, Vorland CJ, Zhuang H, Bandrowski A, Weissgerber TL. Is the future of peer review automated? BMC Res Notes 2022; 15:203. [PMID: 35690782 PMCID: PMC9188010 DOI: 10.1186/s13104-022-06080-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 05/18/2022] [Indexed: 12/19/2022] Open
Abstract
The rising rate of preprints and publications, combined with persistent inadequate reporting practices and problems with study design and execution, have strained the traditional peer review system. Automated screening tools could potentially enhance peer review by helping authors, journal editors, and reviewers to identify beneficial practices and common problems in preprints or submitted manuscripts. Tools can screen many papers quickly, and may be particularly helpful in assessing compliance with journal policies and with straightforward items in reporting guidelines. However, existing tools cannot understand or interpret the paper in the context of the scientific literature. Tools cannot yet determine whether the methods used are suitable to answer the research question, or whether the data support the authors' conclusions. Editors and peer reviewers are essential for assessing journal fit and the overall quality of a paper, including the experimental design, the soundness of the study's conclusions, potential impact and innovation. Automated screening tools cannot replace peer review, but may aid authors, reviewers, and editors in improving scientific papers. Strategies for responsible use of automated tools in peer review may include setting performance criteria for tools, transparently reporting tool performance and use, and training users to interpret reports.
Collapse
Affiliation(s)
- Robert Schulz
- BIH QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Adrian Barnett
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health & Social Work, Queensland University of Technology, Brisbane, QLD, Australia
| | - René Bernard
- NeuroCure Cluster of Excellence, Charité Universitätsmedizin Berlin, Berlin, Germany
| | | | - Jennifer A Byrne
- Faculty of Medicine and Health, New South Wales Health Pathology, The University of Sydney, New South Wales, Australia
| | - Peter Eckmann
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, USA
| | - Małgorzata A Gazda
- UMR 3525, Institut Pasteur, Université de Paris, CNRS, INSERM UA12, Comparative Functional Genomics group, Paris, France
| | - Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Eric M Prager
- Translational Research and Development, Cohen Veterans Bioscience, New York, NY, USA
| | - Maia Salholz-Hillel
- BIH QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Gerben Ter Riet
- Faculty of Health, Center of Expertise Urban Vitality, Amsterdam University of Applied Science, Amsterdam, The Netherlands
| | - Timothy Vines
- DataSeer Research Data Services Ltd, Vancouver, BC, Canada
| | - Colby J Vorland
- Indiana University School of Public Health-Bloomington, Bloomington, IN, USA
| | - Han Zhuang
- School of Information Studies, Syracuse University, Syracuse, NY, USA
| | - Anita Bandrowski
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, USA
| | - Tracey L Weissgerber
- BIH QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
5
|
The science of science: Clinical Science launches a new translational meta-research collection. Clin Sci (Lond) 2021; 135:2031-2034. [PMID: 34427290 DOI: 10.1042/cs20210777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 08/10/2021] [Accepted: 08/16/2021] [Indexed: 11/17/2022]
Abstract
Clinical Science is proud to launch a new translational meta-research collection. Meta-research, or the science of science, applies the scientific method to study science itself. Meta-research is a powerful tool for identifying common problems in scientific papers, assessing their impact, and testing solutions to improve the transparency, rigor, trustworthiness, and usefulness of biomedical research. The collection welcomes science of science studies that link basic science to disease mechanisms, as well as meta-research articles highlighting opportunities to improve transparency, rigor, and reproducibility among the types of papers published in Clinical Science. Submissions might include science of science studies that explore factors linked to successful translation, or meta-research on experimental methods or study designs that are often used in translational research. We hope that this collection will encourage scientists to think critically about current practices and take advantage of opportunities to make their own research more transparent, rigorous, and reproducible.
Collapse
|
6
|
Hogan KO, Fraga GR. Compliance With Standards for STARD 2015 Reporting Recommendations in Pathology. Am J Clin Pathol 2020; 154:828-836. [PMID: 32789451 DOI: 10.1093/ajcp/aqaa103] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVES Lack of experimental reproducibility has led to growing interest in guidelines to enhance completeness and transparency in research reporting. This retrospective survey sought to determine compliance with Standards for Reporting of Diagnostic Accuracy Studies (STARD) 2015 statement in the recent pathology scientific literature. METHODS Two raters independently scored 171 pathology diagnostic accuracy studies for compliance with 34 STARD items and subcomponents. Overall adherence was calculated as a proportion after excluding nonapplicable items. RESULTS After excluding nonapplicable items, there was 50% overall adherence to STARD reporting recommendations. In total, 15.44 ± 3.59 items were reported per article (range, 4-28 out of maximum possible of 34). There was substantial heterogeneity in individual item reporting, with greater than 75% reporting in eight of 34 items and less than 25% reporting in 11 of 34 items. Less than 10% of articles reported hypotheses, subgroup analyses for confounding, sample size calculations, subject flow diagrams, study registrations, and links to full study protocols. Significantly more items were reported in articles from journals that endorsed STARD (16.14 vs 14.84, P = .0175). CONCLUSIONS These findings demonstrate incomplete reporting of essential items in pathology diagnostic accuracy studies. More vigorous enforcement of reporting checklists might improve adherence to minimum reporting standards.
Collapse
Affiliation(s)
- Keenan O Hogan
- Department of Pathology and Laboratory Medicine, University of Kansas School of Medicine, Kansas City
| | - Garth R Fraga
- Department of Pathology and Laboratory Medicine, University of Kansas School of Medicine, Kansas City
| |
Collapse
|
7
|
Alvarez G, Núñez-Cortés R, Solà I, Sitjà-Rabert M, Fort-Vanmeerhaeghe A, Fernández C, Bonfill X, Urrútia G. Sample size, study length, and inadequate controls were the most common self-acknowledged limitations in manual therapy trials: A methodological review. J Clin Epidemiol 2020; 130:96-106. [PMID: 33144246 DOI: 10.1016/j.jclinepi.2020.10.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Revised: 10/23/2020] [Accepted: 10/28/2020] [Indexed: 02/06/2023]
Abstract
OBJECTIVES The aim of this study was to quantify and analyze the presence and type of self-acknowledged limitations (SALs) in a sample of manual therapy (MT) randomized controlled trials. STUDY DESIGN AND SETTING We randomly selected 120 MT trials. We extracted data related to SALs from the original reports and classified them into 12 categories. After data extraction, specific limitations within each category were identified. A descriptive analysis was performed using frequencies and percentages for qualitative variables. RESULTS The number of SALs per trial article ranged from 0 to 8, and more than two-thirds of trials acknowledged at least two different limitations. Despite its small proportion, 9% of trials did not report SALs. The most common limitation declared, in almost half of our sample, related to sample size (47.5%) followed by limitations related to study length and follow-up (33.3%) and inadequate controls (32.5%). CONCLUSION Our results indicate that at least two different limitations are consistently acknowledged in MT trial reports, the most common being those related to sample size, study length, follow-up, and inadequate controls. Analysis of the reasons behind the SALs gives some insights about the main difficulties in conducting research in this field and may help develop strategies to improve future research.
Collapse
Affiliation(s)
- Gerard Alvarez
- Iberoamerican Cochrane Centre - Sant Pau Biomedical Research Institute, IIB Sant Pau, Barcelona, Spain; Foundation Centre for Osteopathic Medicine Collaboration. Spain National Centre, Barcelona, Spain.
| | - Rodrigo Núñez-Cortés
- Department of Physical Therapy, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Ivan Solà
- Iberoamerican Cochrane Centre - Sant Pau Biomedical Research Institute, IIB Sant Pau, Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), Spain
| | - Mercè Sitjà-Rabert
- Blanquerna School of Health Science (FCS), Ramon Llull University, Barcelona, Spain; Global Research on Wellbeing (GRoW) Research Group, Ramon Llull University, Barcelona, Spain
| | - Azahara Fort-Vanmeerhaeghe
- Blanquerna School of Health Science (FCS), Ramon Llull University, Barcelona, Spain; Blanquerna Faculty of Psychology, Education Sciences and Sport (FPCEE), Ramon Llull University, Barcelona, Spain
| | - Carles Fernández
- Blanquerna School of Health Science (FCS), Ramon Llull University, Barcelona, Spain; Global Research on Wellbeing (GRoW) Research Group, Ramon Llull University, Barcelona, Spain
| | - Xavier Bonfill
- Iberoamerican Cochrane Centre - Sant Pau Biomedical Research Institute, IIB Sant Pau, Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), Spain
| | - Gerard Urrútia
- Iberoamerican Cochrane Centre - Sant Pau Biomedical Research Institute, IIB Sant Pau, Barcelona, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), Spain
| |
Collapse
|
8
|
Josefsson J, Hiron M, Arlt D, Auffret AG, Berg Å, Chevalier M, Glimskär A, Hartman G, Kačergytė I, Klein J, Knape J, Laugen AT, Low M, Paquet M, Pasanen‐Mortensen M, Rosin ZM, Rubene D, Żmihorski M, Pärt T. Improving scientific rigour in conservation evaluations and a plea deal for transparency on potential biases. Conserv Lett 2020. [DOI: 10.1111/conl.12726] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Affiliation(s)
- Jonas Josefsson
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Matthew Hiron
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Debora Arlt
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Alistair G. Auffret
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Åke Berg
- Swedish Biodiversity Centre Swedish University of Agricultural Sciences Uppsala Sweden
| | - Mathieu Chevalier
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
| | - Anders Glimskär
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Göran Hartman
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Ineta Kačergytė
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Julian Klein
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Jonas Knape
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Ane T. Laugen
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
- Department of Natural Sciences, Centre for Coastal Research University of Agder Kristiansand Norway
| | - Matthew Low
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Matthieu Paquet
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Marianne Pasanen‐Mortensen
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
- Department of Zoology Stockholm University Stockholm Sweden
| | - Zuzanna M. Rosin
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
- Department of Cell Biology Institute of Experimental Biology, Faculty of Biology, Adam Mickiewicz University Umultowska Poznań Poland
| | - Diana Rubene
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
- Department of Crop Production Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| | - Michał Żmihorski
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
- Mammal Research Institute, Polish Academy of Sciences Białowieża Poland
| | - Tomas Pärt
- Department of Ecology Swedish University of Agricultural Sciences Uppsala Sweden
| |
Collapse
|
9
|
Keserlioglu K, Kilicoglu H, Ter Riet G. Impact of peer review on discussion of study limitations and strength of claims in randomized trial reports: a before and after study. Res Integr Peer Rev 2019; 4:19. [PMID: 31534784 PMCID: PMC6745784 DOI: 10.1186/s41073-019-0078-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Accepted: 08/14/2019] [Indexed: 11/22/2022] Open
Abstract
Background In their research reports, scientists are expected to discuss limitations that their studies have. Previous research showed that often, such discussion is absent. Also, many journals emphasize the importance of avoiding overstatement of claims. We wanted to see to what extent editorial handling and peer review affects self-acknowledgment of limitations and hedging of claims. Methods Using software that automatically detects limitation-acknowledging sentences and calculates the level of hedging in sentences, we compared the submitted manuscripts and their ultimate publications of all randomized trials published in 2015 in 27 BioMed Central (BMC) journals and BMJ Open. We used mixed linear and logistic regression models, accounting for clustering of manuscript-publication pairs within journals, to quantify before-after changes in the mean numbers of limitation-acknowledging sentences, in the probability that a manuscript with zero self-acknowledged limitations ended up as a publication with at least one and in hedging scores. Results Four hundred forty-six manuscript-publication pairs were analyzed. The median number of manuscripts per journal was 10.5 (interquartile range 6–18). The average number of distinct limitation sentences increased by 1.39 (95% CI 1.09–1.76), from 2.48 in manuscripts to 3.87 in publications. Two hundred two manuscripts (45.3%) did not mention any limitations. Sixty-three (31%, 95% CI 25–38) of these mentioned at least one after peer review. Changes in mean hedging scores were negligible. Conclusions Our findings support the idea that editorial handling and peer review lead to more self-acknowledgment of study limitations, but not to changes in linguistic nuance.
Collapse
Affiliation(s)
- Kerem Keserlioglu
- Department of General Practice, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ Amsterdam, The Netherlands
| | - Halil Kilicoglu
- 2Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD USA
| | - Gerben Ter Riet
- 3Department of Cardiology, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ Amsterdam, The Netherlands.,4ACHIEVE Centre for Applied Research, Amsterdam University of Applied Sciences, Tafelbergweg 51, 1105 BD Amsterdam, The Netherlands
| |
Collapse
|
10
|
Avidan MS, Ioannidis JPA, Mashour GA. Independent discussion sections for improving inferential reproducibility in published research. Br J Anaesth 2019; 122:413-420. [PMID: 30857597 DOI: 10.1016/j.bja.2018.12.010] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 12/11/2018] [Accepted: 12/13/2018] [Indexed: 01/15/2023] Open
Abstract
There is a reproducibility crisis in science. There are many potential contributors to replication failure in research across the translational continuum. In this perspective piece, we focus on the narrow topic of inferential reproducibility. Although replication of methods and results is necessary to demonstrate reproducibility, it is not sufficient. Also fundamental is consistent interpretation in the Discussion section. Current deficiencies in the Discussion sections of manuscripts might limit the inferential reproducibility of scientific research. Lack of contextualisation using systematic reviews, overinterpretation and misinterpretation of results, and insufficient acknowledgement of limitations are common problems in Discussion sections; these deficiencies can harm the translational process. Proposed solutions include eliminating or not reading Discussions, writing accompanying editorials, and post-publication review and comments; however, none of these solutions works very well. A second Discussion written by an independent author with appropriate expertise in research methodology is a new testable solution that could help probe inferential reproducibility, and address some deficiencies in primary Discussion sections.
Collapse
Affiliation(s)
- Michael S Avidan
- Department of Anesthesiology, Washington University School of Medicine, St Louis, MO, USA.
| | - John P A Ioannidis
- Departments of Health Research and Policy, Medicine, Biomedical Data Science, and Statistics, Meta-Research Innovation Center, Stanford University, Palo Alto, CA, USA
| | - George A Mashour
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
11
|
Kilicoglu H. Biomedical text mining for research rigor and integrity: tasks, challenges, directions. Brief Bioinform 2018; 19:1400-1414. [PMID: 28633401 PMCID: PMC6291799 DOI: 10.1093/bib/bbx057] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 04/10/2017] [Indexed: 01/01/2023] Open
Abstract
An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise.
Collapse
Affiliation(s)
- Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, US National Library of Medicine
| |
Collapse
|
12
|
Kilicoglu H, Rosemblat G, Malički M, ter Riet G. Automatic recognition of self-acknowledged limitations in clinical research literature. J Am Med Inform Assoc 2018; 25:855-861. [PMID: 29718377 PMCID: PMC6016608 DOI: 10.1093/jamia/ocy038] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Revised: 02/21/2018] [Accepted: 03/28/2018] [Indexed: 11/14/2022] Open
Abstract
Objective To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Methods To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results Annotators had good agreement in labeling limitation sentences (Krippendorff's α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]). Conclusions The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.
Collapse
Affiliation(s)
- Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD, USA
| | - Graciela Rosemblat
- Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, Bethesda, MD, USA
| | - Mario Malički
- Department of General Practice, Academic Medical Center, Amsterdam, The Netherlands
- Department of Research in Biomedicine and Health, University of Split School of Medicine, Split, Croatia
| | - Gerben ter Riet
- Department of General Practice, Academic Medical Center, Amsterdam, The Netherlands
| |
Collapse
|
13
|
Cohen JF, Korevaar DA, Altman DG, Bruns DE, Gatsonis CA, Hooft L, Irwig L, Levine D, Reitsma JB, de Vet HCW, Bossuyt PMM. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open 2016; 6:e012799. [PMID: 28137831 PMCID: PMC5128957 DOI: 10.1136/bmjopen-2016-012799] [Citation(s) in RCA: 1224] [Impact Index Per Article: 153.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 08/03/2016] [Accepted: 08/25/2016] [Indexed: 12/11/2022] Open
Abstract
Diagnostic accuracy studies are, like other clinical studies, at risk of bias due to shortcomings in design and conduct, and the results of a diagnostic accuracy study may not apply to other patient groups and settings. Readers of study reports need to be informed about study design and conduct, in sufficient detail to judge the trustworthiness and applicability of the study findings. The STARD statement (Standards for Reporting of Diagnostic Accuracy Studies) was developed to improve the completeness and transparency of reports of diagnostic accuracy studies. STARD contains a list of essential items that can be used as a checklist, by authors, reviewers and other readers, to ensure that a report of a diagnostic accuracy study contains the necessary information. STARD was recently updated. All updated STARD materials, including the checklist, are available at http://www.equator-network.org/reporting-guidelines/stard Here, we present the STARD 2015 explanation and elaboration document. Through commented examples of appropriate reporting, we clarify the rationale for each of the 30 items on the STARD 2015 checklist, and describe what is expected from authors in developing sufficiently informative study reports.
Collapse
Affiliation(s)
- Jérémie F Cohen
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
- Department of Pediatrics, INSERM UMR 1153, Necker Hospital, AP-HP, Paris Descartes University, Paris, France
| | - Daniël A Korevaar
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| | - Douglas G Altman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, UK
| | - David E Bruns
- Department of Pathology, University of Virginia School of Medicine, Charlottesville, Virginia, USA
| | - Constantine A Gatsonis
- Department of Biostatistics, Brown University School of Public Health, Providence, Rhode Island, USA
| | - Lotty Hooft
- Cochrane Netherlands, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, University of Utrecht, Utrecht, The Netherlands
| | - Les Irwig
- Screening and Diagnostic Test Evaluation Program, School of Public Health, University of Sydney, Sydney, New South Wales, Australia
| | - Deborah Levine
- Department of Radiology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
- Radiology Editorial Office, Boston, Massachusetts, USA
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, University of Utrecht, Utrecht, The Netherlands
| | - Henrica C W de Vet
- Department of Epidemiology and Biostatistics, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
| | - Patrick M M Bossuyt
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|