1
|
Reviewing research reporting in randomised controlled trials: Confidence and P-values. Indian J Anaesth 2024; 68:492-495. [PMID: 38764952 PMCID: PMC11100656 DOI: 10.4103/ija.ija_189_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 03/06/2024] [Accepted: 03/10/2024] [Indexed: 05/21/2024] Open
|
2
|
Enhancing biostatistics education for medical students in Poland: factors influencing perception and educational recommendations. BMC MEDICAL EDUCATION 2024; 24:428. [PMID: 38649993 PMCID: PMC11034022 DOI: 10.1186/s12909-024-05389-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 04/03/2024] [Indexed: 04/25/2024]
Abstract
BACKGROUND A number of recommendations for the teaching of biostatistics have been published to date, however, student opinion on them has not yet been studied. For this reason, the aim of the manuscript was to find out the opinions of medical students at universities in Poland on two forms of teaching biostatistics, namely traditional and practical, as well as to indicate, on the basis of the results obtained, the related educational recommendations. METHODS The study involved a group of 527 students studying at seven medical faculties in Poland, who were asked to imagine two different courses. The traditional form of teaching biostatistics was based on the standard teaching scheme of running a test from memory in a statistical package, while the practical one involved reading an article in which a particular test was applied and then applying it based on the instruction provided. Other aspects related to the teaching of the subject were assessed. RESULTS According to the students of each course, the practical form of teaching biostatistics reduces the stress level associated with teaching and the student exam (p < 0.001), as well as contributing to an increased level of elevated knowledge (p < 0.001), while the degree of satisfaction after passing the exam is higher (p < 0.001). A greater proportion of students (p < 0.001) believe that credit for the course could be given by doing a statistical review of an article or conducting a survey, followed by the tests learned in class. More than 95% also said that the delivery of the courses should be based on the field of study they were taking, during which time they would also like to have the opportunity to take part in optional activities and hear lectures from experts. CONCLUSION It is recommended that more emphasis be placed on practical teaching the subject of biostatistics.
Collapse
|
3
|
Disinformation and trust in vaccines in the era of artificial intelligence: the necessity of implementing statistical recommendations. Future Microbiol 2024; 19:293-295. [PMID: 38411115 DOI: 10.2217/fmb-2024-0004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 01/09/2024] [Indexed: 02/28/2024] Open
|
4
|
Statistical advice provided by ChatGPT regarding an accepted article in Allergy. Allergy 2024; 79:748-751. [PMID: 37985460 DOI: 10.1111/all.15956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 11/22/2023]
|
5
|
Poor statistical reporting, inadequate data presentation and spin persist despite Journal awareness and updated Information for Authors. F1000Res 2023; 12:1483. [PMID: 38434651 PMCID: PMC10905014 DOI: 10.12688/f1000research.142841.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/13/2023] [Indexed: 03/05/2024] Open
Abstract
Sound reporting of research results is fundamental to good science. Unfortunately, poor reporting is common and does not improve with editorial educational strategies. We investigated whether publicly highlighting poor reporting at a journal can lead to improved reporting practices. We also investigated whether reporting practices that are required or strongly encouraged in journal Information for Authors are enforced by journal editors and staff. A 2016 audit highlighted poor reporting practices in the Journal of Neurophysiology. In August 2016 and 2018, the American Physiological Society updated the Information for Authors, which included the introduction of several required or strongly encouraged reporting practices. We audited Journal of Neurophysiology papers published in 2019 and 2020 (downloaded through the library of the University of New South Wales) on reporting items selected from the 2016 audit, the newly introduced reporting practices, and items from previous audits. Summary statistics (means, counts) were used to summarize audit results. In total, 580 papers were audited. Compared to results from the 2016 audit, several reporting practices remained unchanged or worsened. For example, 60% of papers erroneously reported standard errors of the mean, 23% of papers included undefined measures of variability, 40% of papers failed to define a statistical threshold for their tests, and when present, 64% of papers with p-values between 0.05 and 0.1 misinterpreted them as statistical trends. As for the newly introduced reporting practices, required practices were consistently adhered to by 34 to 37% of papers, while strongly encouraged practices were consistently adhered to by 9 to 26% of papers. Adherence to the other audited reporting practices was comparable to our previous audits. Publicly highlighting poor reporting practices did little to improve research reporting. Similarly, requiring or strongly encouraging reporting practices was only partly effective. Although the present audit focused on a single journal, this is likely not an isolated case. Stronger, more strategic measures are required to improve poor research reporting.
Collapse
|
6
|
ChatGPT's Skills in Statistical Analysis Using the Example of Allergology: Do We Have Reason for Concern? Healthcare (Basel) 2023; 11:2554. [PMID: 37761751 PMCID: PMC10530997 DOI: 10.3390/healthcare11182554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/13/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023] Open
Abstract
BACKGROUND Content generated by artificial intelligence is sometimes not truthful. To date, there have been a number of medical studies related to the validity of ChatGPT's responses; however, there is a lack of studies addressing various aspects of statistical analysis. The aim of this study was to assess the validity of the answers provided by ChatGPT in relation to statistical analysis, as well as to identify recommendations to be implemented in the future in connection with the results obtained. METHODS The study was divided into four parts and was based on the exemplary medical field of allergology. The first part consisted of asking ChatGPT 30 different questions related to statistical analysis. The next five questions included a request for ChatGPT to perform the relevant statistical analyses, and another five requested ChatGPT to indicate which statistical test should be applied to articles accepted for publication in Allergy. The final part of the survey involved asking ChatGPT the same statistical question three times. RESULTS Out of the 40 general questions asked that related to broad statistical analysis, ChatGPT did not fully answer half of them. Assumptions necessary for the application of specific statistical tests were not included. ChatGPT also gave completely divergent answers to one question about which test should be used. CONCLUSION The answers provided by ChatGPT to various statistical questions may give rise to the use of inappropriate statistical tests and, consequently, the subsequent misinterpretation of the research results obtained. Questions asked in this regard need to be framed more precisely.
Collapse
|
7
|
Responsible research practices could be more strongly endorsed by Australian university codes of research conduct. Res Integr Peer Rev 2023; 8:5. [PMID: 37277861 DOI: 10.1186/s41073-023-00129-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 02/15/2023] [Indexed: 06/07/2023] Open
Abstract
BACKGROUND This study aimed to investigate how strongly Australian university codes of research conduct endorse responsible research practices. METHODS Codes of research conduct from 25 Australian universities active in health and medical research were obtained from public websites, and audited against 19 questions to assess how strongly they (1) defined research integrity, research quality, and research misconduct, (2) required research to be approved by an appropriate ethics committee, (3) endorsed 9 responsible research practices, and (4) discouraged 5 questionable research practices. RESULTS Overall, a median of 10 (IQR 9 to 12) of 19 practices covered in the questions were mentioned, weakly endorsed, or strongly endorsed. Five to 8 of 9 responsible research practices were mentioned, weakly, or strongly endorsed, and 3 questionable research practices were discouraged. Results are stratified by Group of Eight (n = 8) and other (n = 17) universities. Specifically, (1) 6 (75%) Group of Eight and 11 (65%) other codes of research conduct defined research integrity, 4 (50%) and 8 (47%) defined research quality, and 7 (88%) and 16 (94%) defined research misconduct. (2) All codes required ethics approval for human and animal research. (3) All codes required conflicts of interest to be declared, but there was variability in how strongly other research practices were endorsed. The most commonly endorsed practices were ensuring researcher training in research integrity [8 (100%) and 16 (94%)] and making study data publicly available [6 (75%) and 12 (71%)]. The least commonly endorsed practices were making analysis code publicly available [0 (0%) and 0 (0%)] and registering analysis protocols [0 (0%) and 1 (6%)]. (4) Most codes discouraged fabricating data [5 (63%) and 15 (88%)], selectively deleting or modifying data [5 (63%) and 15 (88%)], and selective reporting of results [3 (38%) and 15 (88%)]. No codes discouraged p-hacking or hypothesising after results are known. CONCLUSIONS Responsible research practices could be more strongly endorsed by Australian university codes of research conduct. Our findings may not be generalisable to smaller universities, or those not active in health and medical research.
Collapse
|
8
|
A change to Experimental Physiology's statistics policy. Exp Physiol 2023; 108:795-796. [PMID: 37079429 PMCID: PMC10988475 DOI: 10.1113/ep091248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 04/11/2023] [Indexed: 04/21/2023]
|
9
|
Consistency between trials presented at conferences, their subsequent publications and press releases. BMJ Evid Based Med 2023; 28:95-102. [PMID: 36357160 PMCID: PMC10086295 DOI: 10.1136/bmjebm-2022-111989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/16/2022] [Indexed: 11/12/2022]
Abstract
OBJECTIVE This study examined the extent to which trials presented at major international medical conferences in 2016 consistently reported their study design, end points and results across conference abstracts, published article abstracts and press releases. DESIGN Cross-sectional analysis of clinical trials presented at 12 major medical conferences in the USA in 2016. Conferences were identified from a list of the largest clinical research meetings aggregated by the Healthcare Convention and Exhibitors Association and were included if their abstracts were publicly available. From these conferences, all late-breaker clinical trials were included, as well as a random selection of all other clinical trials, such that the total sample included up to 25 trial abstracts per conference. MAIN OUTCOME MEASURES First, it was determined if trials were registered and reported results in an International Committee of Medical Journal Editors-approved clinical trial registry. Second, it was determined if trial results were published in a peer-reviewed journal. Finally, information on trial media coverage and press releases was collected using LexisNexis. For all published trials, the consistency of reporting of the following characteristics was examined, through comparison of the trials' conference and publication abstracts: primary efficacy endpoint definition, safety endpoint identification, sample size, follow-up period, primary end point effect size and characterisation of trial results. For all published abstracts with press releases, the characterisation of trial results across conference abstracts, press releases and publications was compared. Authors determined consistency of reporting when identical information was presented across abstracts and press releases. Primary analyses were descriptive; secondary analyses included χ2 tests and multiple logistic regression. RESULTS Among 240 clinical trials presented at 12 major medical conferences, 208 (86.7%) were registered, 95 (39.6%) reported summary results in a registry and 177 (73.8%) were published; 82 (34.2%) were covered by the media and 68 (28.3%) had press releases. Among the 177 published trials, 171 (96.6%) reported the definition of primary efficacy endpoints consistently across conference and publication abstracts, whereas 96/128 (75.0%) consistently identified safety endpoints. There were 107/172 (62.2%) trials with consistent sample sizes across conference and publication abstracts, 101/137 (73.7%) that reported their follow-up periods consistently, 92/175 (52.6%) that described their effect sizes consistently and 157/175 (89.7%) that characterised their results consistently. Among the trials that were published and had press releases, 32/32 (100%) characterised their results consistently across conference abstracts, press releases and publication abstracts. No trial characteristics were associated with reporting primary efficacy end points consistently. CONCLUSIONS For clinical trials presented at major medical conferences, primary efficacy endpoint definitions were consistently reported and results were consistently characterised across conference abstracts, registry entries and publication abstracts; consistency rates were lower for sample sizes, follow-up periods, and effect size estimates. REGISTRATION This study was registered at the Open Science Framework (https://doi.org/10.17605/OSF.IO/VGXZY).
Collapse
|
10
|
Biostatistics in allergy - recommendations for authors. Allergy 2022; 77:3493-3495. [PMID: 35916056 DOI: 10.1111/all.15463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 07/26/2022] [Accepted: 07/31/2022] [Indexed: 01/28/2023]
|
11
|
Rigor and reproducibility for data analysis and design in the study of eating disorders. Int J Eat Disord 2022; 55:1267-1278. [PMID: 35852964 DOI: 10.1002/eat.23774] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 06/24/2022] [Accepted: 06/26/2022] [Indexed: 11/10/2022]
Abstract
OBJECTIVE Incorporating open science practices has become a priority for submission criteria in the International Journal of Eating Disorders (IJED). In this systematic review, we used the rigor and reproducibility framework developed by Hildebrandt and Prenoveau (2020) to examine the implementation of statistically sound open science principles in IJED, determining whether the cost and effort of incorporating these practices ultimately make research more likely to be cited. METHOD For this systematic review, six trained coders examined 1145 articles published from January 2011 to May 2021, including the 5 years prior to the 2016 introduction of the Open Science Foundation article preregistration. We coded for the presence or absence of 10 specific open science elements and calculated citation metrics for each article. RESULTS There was evidence of a significant positive relationship between time and total rigor and reproducibility (Total RR) criteria included in IJED articles following the implementation of preregistration in 2016. For every increase in year from 2011 to 2016, there was a .14 decrease in Total RR criteria. From 2016 to 2021, there was a .42 increase per volume in Total RR criteria. There was no statistically significant relationship between Total RR criteria and citation impact. DISCUSSION Although findings indicate that statistical rigor and reproducibility in this field has increased, the lack of direct relationship between open science methods and article visibility for scientists suggests that there is a limited incentive for researchers to participate in reporting guidelines. PUBLIC SIGNIFICANCE Statistical controversies within science threaten the rigor and reproducibility of published research. Open science practices, including the preregistration of study hypotheses, links to statistical code, and explicit data-sharing arguably generate reliable and valid inferences. This review illustrates the rigor and reproducibility of articles published in IJED between 2011 and 2021 and identifies whether open sciences practices have become increasingly prevalent in eating disorder research.
Collapse
|
12
|
Quality Output Checklist and Content Assessment (QuOCCA): a new tool for assessing research quality and reproducibility. BMJ Open 2022; 12:e060976. [PMID: 36167369 PMCID: PMC9516158 DOI: 10.1136/bmjopen-2022-060976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Research must be well designed, properly conducted and clearly and transparently reported. Our independent medical research institute wanted a simple, generic tool to assess the quality of the research conducted by its researchers, with the goal of identifying areas that could be improved through targeted educational activities. Unfortunately, none was available, thus we devised our own. Here, we report development of the Quality Output Checklist and Content Assessment (QuOCCA), and its application to publications from our institute's scientists. Following consensus meetings and external review by statistical and methodological experts, 11 items were selected for the final version of the QuOCCA: research transparency (items 1-3), research design and analysis (items 4-6) and research reporting practices (items 7-11). Five pairs of raters assessed all 231 articles published in 2017 and 221 in 2018 by researchers at our institute. Overall, the results were similar between years and revealed limited engagement with several recommended practices highlighted in the QuOCCA. These results will be useful to guide educational initiatives and their effectiveness. The QuOCCA is brief and focuses on broadly applicable and relevant concepts to open, high-quality, reproducible and well-reported science. Thus, the QuOCCA could be used by other biomedical institutions and individual researchers to evaluate research publications, assess changes in research practice over time and guide the discussion about high-quality, open science. Given its generic nature, the QuOCCA may also be useful in other research disciplines.
Collapse
|
13
|
Recommendations to medical journals on ways to encourage statistical experts to review submissions. Curr Med Res Opin 2022; 38:1553-1554. [PMID: 35770863 DOI: 10.1080/03007995.2022.2096335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
14
|
Structured reporting to improve transparency of analyses in prognostic marker studies. BMC Med 2022; 20:184. [PMID: 35546237 PMCID: PMC9095054 DOI: 10.1186/s12916-022-02304-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 02/17/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Factors contributing to the lack of understanding of research studies include poor reporting practices, such as selective reporting of statistically significant findings or insufficient methodological details. Systematic reviews have shown that prognostic factor studies continue to be poorly reported, even for important aspects, such as the effective sample size. The REMARK reporting guidelines support researchers in reporting key aspects of tumor marker prognostic studies. The REMARK profile was proposed to augment these guidelines to aid in structured reporting with an emphasis on including all aspects of analyses conducted. METHODS A systematic search of prognostic factor studies was conducted, and fifteen studies published in 2015 were selected, three from each of five oncology journals. A paper was eligible for selection if it included survival outcomes and multivariable models were used in the statistical analyses. For each study, we summarized the key information in a REMARK profile consisting of details about the patient population with available variables and follow-up data, and a list of all analyses conducted. RESULTS Structured profiles allow an easy assessment if reporting of a study only has weaknesses or if it is poor because many relevant details are missing. Studies had incomplete reporting of exclusion of patients, missing information about the number of events, or lacked details about statistical analyses, e.g., subgroup analyses in small populations without any information about the number of events. Profiles exhibit severe weaknesses in the reporting of more than 50% of the studies. The quality of analyses was not assessed, but some profiles exhibit several deficits at a glance. CONCLUSIONS A substantial part of prognostic factor studies is poorly reported and analyzed, with severe consequences for related systematic reviews and meta-analyses. We consider inadequate reporting of single studies as one of the most important reasons that the clinical relevance of most markers is still unclear after years of research and dozens of publications. We conclude that structured reporting is an important step to improve the quality of prognostic marker research and discuss its role in the context of selective reporting, meta-analysis, study registration, predefined statistical analysis plans, and improvement of marker research.
Collapse
|
15
|
Explanation and Use of Uncertainty Quantified by Bayesian Neural Network Classifiers for Breast Histopathology Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:815-825. [PMID: 34699354 DOI: 10.1109/tmi.2021.3123300] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Despite the promise of Convolutional neural network (CNN) based classification models for histopathological images, it is infeasible to quantify its uncertainties. Moreover, CNNs may suffer from overfitting when the data is biased. We show that Bayesian-CNN can overcome these limitations by regularizing automatically and by quantifying the uncertainty. We have developed a novel technique to utilize the uncertainties provided by the Bayesian-CNN that significantly improves the performance on a large fraction of the test data (about 6% improvement in accuracy on 77% of test data). Further, we provide a novel explanation for the uncertainty by projecting the data into a low dimensional space through a nonlinear dimensionality reduction technique. This dimensionality reduction enables interpretation of the test data through visualization and reveals the structure of the data in a low dimensional feature space. We show that the Bayesian-CNN can perform much better than the state-of-the-art transfer learning CNN (TL-CNN) by reducing the false negative and false positive by 11% and 7.7% respectively for the present data set. It achieves this performance with only 1.86 million parameters as compared to 134.33 million for TL-CNN. Besides, we modify the Bayesian-CNN by introducing a stochastic adaptive activation function. The modified Bayesian-CNN performs slightly better than Bayesian-CNN on all performance metrics and significantly reduces the number of false negatives and false positives (3% reduction for both). We also show that these results are statistically significant by performing McNemar's statistical significance test. This work shows the advantages of Bayesian-CNN against the state-of-the-art, explains and utilizes the uncertainties for histopathological images. It should find applications in various medical image classifications.
Collapse
|
16
|
An observational analysis of the trope “A p-value of < 0.05 was considered statistically significant” and other cut-and-paste statistical methods. PLoS One 2022; 17:e0264360. [PMID: 35263374 PMCID: PMC8906599 DOI: 10.1371/journal.pone.0264360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 02/08/2022] [Indexed: 11/19/2022] Open
Abstract
Appropriate descriptions of statistical methods are essential for evaluating research quality and reproducibility. Despite continued efforts to improve reporting in publications, inadequate descriptions of statistical methods persist. At times, reading statistical methods sections can conjure feelings of dèjá vu, with content resembling cut-and-pasted or “boilerplate text” from already published work. Instances of boilerplate text suggest a mechanistic approach to statistical analysis, where the same default methods are being used and described using standardized text. To investigate the extent of this practice, we analyzed text extracted from published statistical methods sections from PLOS ONE and the Australian and New Zealand Clinical Trials Registry (ANZCTR). Topic modeling was applied to analyze data from 111,731 papers published in PLOS ONE and 9,523 studies registered with the ANZCTR. PLOS ONE topics emphasized definitions of statistical significance, software and descriptive statistics. One in three PLOS ONE papers contained at least 1 sentence that was a direct copy from another paper. 12,675 papers (11%) closely matched to the sentence “a p-value < 0.05 was considered statistically significant”. Common topics across ANZCTR studies differentiated between study designs and analysis methods, with matching text found in approximately 3% of sections. Our findings quantify a serious problem affecting the reporting of statistical methods and shed light on perceptions about the communication of statistics as part of the scientific process. Results further emphasize the importance of rigorous statistical review to ensure that adequate descriptions of methods are prioritized over relatively minor details such as p-values and software when reporting research outcomes.
Collapse
|
17
|
Statistical recomendations for the authors of manuscripts submitted to the Journal of Cancer Research and Clinical Oncology. J Cancer Res Clin Oncol 2022; 148:1011-1013. [PMID: 35238999 PMCID: PMC8892391 DOI: 10.1007/s00432-022-03956-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 02/09/2022] [Indexed: 10/24/2022]
Abstract
In recent years, a negative picture of statistical analyses carried out in medicine has been observed around the world. Unfortunately, as it turns out, this also applies to COVID-19. The most important guidelines for the members of the readers and authors of articles submitted to the Journal of Cancer Research and Clinical Oncology, i.e., on numerous factors related to the statistical analysis, are presented.
Collapse
|
18
|
Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision Transformers. Front Genet 2022; 13:864724. [PMID: 35281798 PMCID: PMC8907925 DOI: 10.3389/fgene.2022.864724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 02/10/2022] [Indexed: 11/25/2022] Open
Abstract
Research on detecting Tuberculosis (TB) findings on chest radiographs (or Chest X-rays: CXR) using convolutional neural networks (CNNs) has demonstrated superior performance due to the emergence of publicly available, large-scale datasets with expert annotations and availability of scalable computational resources. However, these studies use only the frontal CXR projections, i.e., the posterior-anterior (PA), and the anterior-posterior (AP) views for analysis and decision-making. Lateral CXRs which are heretofore not studied help detect clinically suspected pulmonary TB, particularly in children. Further, Vision Transformers (ViTs) with built-in self-attention mechanisms have recently emerged as a viable alternative to the traditional CNNs. Although ViTs demonstrated notable performance in several medical image analysis tasks, potential limitations exist in terms of performance and computational efficiency, between the CNN and ViT models, necessitating a comprehensive analysis to select appropriate models for the problem under study. This study aims to detect TB-consistent findings in lateral CXRs by constructing an ensemble of the CNN and ViT models. Several models are trained on lateral CXR data extracted from two large public collections to transfer modality-specific knowledge and fine-tune them for detecting findings consistent with TB. We observed that the weighted averaging ensemble of the predictions of CNN and ViT models using the optimal weights computed with the Sequential Least-Squares Quadratic Programming method delivered significantly superior performance (MCC: 0.8136, 95% confidence intervals (CI): 0.7394, 0.8878, p < 0.05) compared to the individual models and other ensembles. We also interpreted the decisions of CNN and ViT models using class-selective relevance maps and attention maps, respectively, and combined them to highlight the discriminative image regions contributing to the final output. We observed that (i) the model accuracy is not related to disease region of interest (ROI) localization and (ii) the bitwise-AND of the heatmaps of the top-2-performing models delivered significantly superior ROI localization performance in terms of mean average precision [mAP@(0.1 0.6) = 0.1820, 95% CI: 0.0771,0.2869, p < 0.05], compared to other individual models and ensembles. The code is available at https://github.com/sivaramakrishnan-rajaraman/Ensemble-of-CNN-and-ViT-for-TB-detection-in-lateral-CXR.
Collapse
|
19
|
Between two stools: preclinical research, reproducibility, and statistical design of experiments. BMC Res Notes 2022; 15:73. [PMID: 35189946 PMCID: PMC8862533 DOI: 10.1186/s13104-022-05965-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 02/08/2022] [Indexed: 11/11/2022] Open
Abstract
Translation of animal-based preclinical research is hampered by poor validity and reproducibility issues. Unfortunately, preclinical research has ‘fallen between the stools’ of competing study design traditions. Preclinical studies are often characterised by small sample sizes, large variability, and ‘problem’ data. Although Fisher-type designs with randomisation and blocking are appropriate and have been vigorously promoted, structured statistically-based designs are almost unknown. Traditional analysis methods are commonly misapplied, and basic terminology and principles of inference testing misinterpreted. Problems are compounded by the lack of adequate statistical training for researchers, and failure of statistical educators to account for the unique demands of preclinical research. The solution is a return to the basics: statistical education tailored to non-statistician investigators, with clear communication of statistical concepts, and curricula that address design and data issues specific to preclinical research. Statistics curricula should focus on statistics as process: data sampling and study design before analysis and inference. Properly-designed and analysed experiments are a matter of ethics as much as procedure. Shifting the focus of statistical education from rote hypothesis testing to sound methodology will reduce the numbers of animals wasted in noninformative experiments and increase overall scientific quality and value of published research.
Collapse
|
20
|
Abstract
INTRODUCTION In recent years, unfortunately, low quality of statistical analyses in medicine has been observed. As it turns out, this also applies to COVID-19 subject matter. METHODS The study included 2600 medical articles published between the beginning of 2020 and June 2021, in which the authors described the obtained results, i.e. related to COVID-19. RESULTS Of the analysed articles, 39% were correct in terms of the statistical analysis performed. CONCLUSIONS There should be more emphasis on conducting statistical reviews in authors' contributions on various aspects of COVID-19.
Collapse
|
21
|
Strengthening the incentives for responsible research practices in Australian health and medical research funding. Res Integr Peer Rev 2021; 6:11. [PMID: 34340719 PMCID: PMC8328133 DOI: 10.1186/s41073-021-00113-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 05/16/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Australian health and medical research funders support substantial research efforts, and incentives within grant funding schemes influence researcher behaviour. We aimed to determine to what extent Australian health and medical funders incentivise responsible research practices. METHODS We conducted an audit of instructions from research grant and fellowship schemes. Eight national research grants and fellowships were purposively sampled to select schemes that awarded the largest amount of funds. The funding scheme instructions were assessed against 9 criteria to determine to what extent they incentivised these responsible research and reporting practices: (1) publicly register study protocols before starting data collection, (2) register analysis protocols before starting data analysis, (3) make study data openly available, (4) make analysis code openly available, (5) make research materials openly available, (6) discourage use of publication metrics, (7) conduct quality research (e.g. adhere to reporting guidelines), (8) collaborate with a statistician, and (9) adhere to other responsible research practices. Each criterion was answered using one of the following responses: "Instructed", "Encouraged", or "No mention". RESULTS Across the 8 schemes from 5 funders, applicants were instructed or encouraged to address a median of 4 (range 0 to 5) of the 9 criteria. Three criteria received no mention in any scheme (register analysis protocols, make analysis code open, collaborate with a statistician). Importantly, most incentives did not seem strong as applicants were only instructed to register study protocols, discourage use of publication metrics and conduct quality research. Other criteria were encouraged but were not required. CONCLUSIONS Funders could strengthen the incentives for responsible research practices by requiring grant and fellowship applicants to implement these practices in their proposals. Administering institutions could be required to implement these practices to be eligible for funding. Strongly rewarding researchers for implementing robust research practices could lead to sustained improvements in the quality of health and medical research.
Collapse
|
22
|
A Review of Statistical Reporting in Dietetics Research (2010-2019): How is a Canadian Journal Doing? CAN J DIET PRACT RES 2021; 82:59-67. [PMID: 33876983 DOI: 10.3148/cjdpr-2021-005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Despite the widespread use of statistical techniques in quantitative research, methodological flaws and inadequate statistical reporting persist. The objective of this study is to evaluate the quality of statistical reporting and procedures in all original, quantitative articles published in the Canadian Journal of Dietetic Practice and Research (CJDPR) from 2010 to 2019 using a checklist created by our research team. In total, 107 articles were independently evaluated by 2 raters. The hypothesis or objective(s) was clearly stated in 97.2% of the studies. Over half (51.4%) of the articles reported the study design and 57.9% adequately described the statistical techniques used. Only 21.2% of the studies that required a prestudy sample size calculation reported one. Of the 281 statistical tests conducted, 88.3% of them were correct. P values >0.05-0.10 were reported as "statistically significant" and/or a "trend" in 11.4% of studies. While this evaluation reveals both strengths and areas for improvement in the quality of statistical reporting in CJDPR, we encourage dietitians to pursue additional statistical training and/or seek the assistance of a statistician. Future research should consider validating this new checklist and using it to evaluate the statistical quality of studies published in other nutrition journals and disciplines.
Collapse
|
23
|
The Quality of Statistical Reporting and Data Presentation in Predatory Dental Journals Was Lower Than in Non-Predatory Journals. ENTROPY (BASEL, SWITZERLAND) 2021; 23:468. [PMID: 33923391 PMCID: PMC8071575 DOI: 10.3390/e23040468] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 04/05/2021] [Accepted: 04/14/2021] [Indexed: 01/18/2023]
Abstract
Proper peer review and quality of published articles are often regarded as signs of reliable scientific journals. The aim of this study was to compare whether the quality of statistical reporting and data presentation differs among articles published in 'predatory dental journals' and in other dental journals. We evaluated 50 articles published in 'predatory open access (OA) journals' and 100 clinical trials published in legitimate dental journals between 2019 and 2020. The quality of statistical reporting and data presentation of each paper was assessed on a scale from 0 (poor) to 10 (high). The mean (SD) quality score of the statistical reporting and data presentation was 2.5 (1.4) for the predatory OA journals, 4.8 (1.8) for the legitimate OA journals, and 5.6 (1.8) for the more visible dental journals. The mean values differed significantly (p < 0.001). The quality of statistical reporting of clinical studies published in predatory journals was found to be lower than in open access and highly cited journals. This difference in quality is a wake-up call to consume study results critically. Poor statistical reporting indicates wider general lower quality in publications where the authors and journals are less likely to be critiqued by peer review.
Collapse
|
24
|
Improved Semantic Segmentation of Tuberculosis-Consistent Findings in Chest X-rays Using Augmented Training of Modality-Specific U-Net Models with Weak Localizations. Diagnostics (Basel) 2021; 11:diagnostics11040616. [PMID: 33808240 PMCID: PMC8065621 DOI: 10.3390/diagnostics11040616] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 03/25/2021] [Accepted: 03/28/2021] [Indexed: 11/16/2022] Open
Abstract
Deep learning (DL) has drawn tremendous attention for object localization and recognition in both natural and medical images. U-Net segmentation models have demonstrated superior performance compared to conventional hand-crafted feature-based methods. Medical image modality-specific DL models are better at transferring domain knowledge to a relevant target task than those pretrained on stock photography images. This character helps improve model adaptation, generalization, and class-specific region of interest (ROI) localization. In this study, we train chest X-ray (CXR) modality-specific U-Nets and other state-of-the-art U-Net models for semantic segmentation of tuberculosis (TB)-consistent findings. Automated segmentation of such manifestations could help radiologists reduce errors and supplement decision-making while improving patient care and productivity. Our approach uses the publicly available TBX11K CXR dataset with weak TB annotations, typically provided as bounding boxes, to train a set of U-Net models. Next, we improve the results by augmenting the training data with weak localization, postprocessed into an ROI mask, from a DL classifier trained to classify CXRs as showing normal lungs or suspected TB manifestations. Test data are individually derived from the TBX11K CXR training distribution and other cross-institutional collections, including the Shenzhen TB and Montgomery TB CXR datasets. We observe that our augmented training strategy helped the CXR modality-specific U-Net models achieve superior performance with test data derived from the TBX11K CXR training distribution and cross-institutional collections (p < 0.05). We believe that this is the first study to i) use CXR modality-specific U-Nets for semantic segmentation of TB-consistent ROIs and ii) evaluate the segmentation performance while augmenting the training data with weak TB-consistent localizations.
Collapse
|
25
|
Statistical inference in abstracts of three influential clinical pharmacology journals analyzed using a text-mining algorithm. Br J Clin Pharmacol 2021; 87:4173-4182. [PMID: 33769597 DOI: 10.1111/bcp.14836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 03/08/2021] [Accepted: 03/19/2021] [Indexed: 11/30/2022] Open
Abstract
AIM To describe the trend in the prevalence of statistical inference in three influential clinical pharmacology journals METHODS: We applied a computer-based algorithm to abstracts of three clinical pharmacology journals published in 1976 to 2016 to identify statistical inference and its subtypes. Furthermore, we manually reviewed a random sample of 300 articles to access algorithm's performance in finding statistical inference in abstracts and as a screening tool for presence and absence of statistical inference in full text. RESULT The algorithm identified 59% (13,375/22,516 [mid p 95% CI, 59%-60%]) article abstracts with statistical inference. The percentage of abstracts with statistical inference was similar in 1976 and 2016, 48% (179/377 [mid p 95%CI, 42%-52%]) versus 49% (386/791 [mid p 95%CI, 45%-52%]). Statistical reporting pattern varied among journals. Among abstracts containing any statistical inference in the publications from 1976 to 2016 null-hypothesis significance testing was the most prevalent reported statistical inference. The algorithm had high sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for finding statistical inferences in abstract. While PPV for predicting the statistical inference in full text (including abstract, text, tables and figures) was high, NPV was low. CONCLUSION Despite journal's editorials and statistical associations' guidelines, most authors focused on testing rather than estimation. In future, a better statistical reporting might be ensured by improving the statistical knowledge of authors and an addition of statistical guides to journals' instruction to authors to the extent that editors would like their statistical inference preferences to be incorporated into submitted manuscripts.
Collapse
|
26
|
|
27
|
Creating clear and informative image-based figures for scientific publications. PLoS Biol 2021; 19:e3001161. [PMID: 33788834 PMCID: PMC8041175 DOI: 10.1371/journal.pbio.3001161] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 04/12/2021] [Accepted: 02/26/2021] [Indexed: 11/18/2022] Open
Abstract
Scientists routinely use images to display data. Readers often examine figures first; therefore, it is important that figures are accessible to a broad audience. Many resources discuss fraudulent image manipulation and technical specifications for image acquisition; however, data on the legibility and interpretability of images are scarce. We systematically examined these factors in non-blot images published in the top 15 journals in 3 fields; plant sciences, cell biology, and physiology (n = 580 papers). Common problems included missing scale bars, misplaced or poorly marked insets, images or labels that were not accessible to colorblind readers, and insufficient explanations of colors, labels, annotations, or the species and tissue or object depicted in the image. Papers that met all good practice criteria examined for all image-based figures were uncommon (physiology 16%, cell biology 12%, plant sciences 2%). We present detailed descriptions and visual examples to help scientists avoid common pitfalls when publishing images. Our recommendations address image magnification, scale information, insets, annotation, and color and may encourage discussion about quality standards for bioimage publishing.
Collapse
|
28
|
Publications, replication and statistics in physiology plus two neglected curves. J Physiol 2021; 599:1719-1721. [DOI: 10.1113/jp281360] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
|
29
|
Common methodological issues and suggested solutions in bone research. Osteoporos Sarcopenia 2021; 6:161-167. [PMID: 33426303 PMCID: PMC7783208 DOI: 10.1016/j.afos.2020.11.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 11/12/2020] [Accepted: 11/19/2020] [Indexed: 11/30/2022] Open
Abstract
Bone research is a dynamic area of scientific investigation that usually encompasses multidisciplines. Virtually all basic cellular research, clinical research and epidemiologic research rely on statistical concepts and methodology for inference. This paper discusses common issues and suggested solutions concerning the application of statistical thinking in bone research, particularly in clinical and epidemiological investigations. The issues are sample size estimation, biases and confounders, analysis of longitudinal data, categorization of continuous data, selection of significant variables, over-fitting, P-values, false positive finding, confidence interval, and Bayesian inference. It is hoped that by adopting the suggested measures the scientific quality of bone research can improve.
Collapse
|
30
|
Analyzing inter-reader variability affecting deep ensemble learning for COVID-19 detection in chest radiographs. PLoS One 2020; 15:e0242301. [PMID: 33180877 PMCID: PMC7660555 DOI: 10.1371/journal.pone.0242301] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 11/01/2020] [Indexed: 01/17/2023] Open
Abstract
Data-driven deep learning (DL) methods using convolutional neural networks (CNNs) demonstrate promising performance in natural image computer vision tasks. However, their use in medical computer vision tasks faces several limitations, viz., (i) adapting to visual characteristics that are unlike natural images; (ii) modeling random noise during training due to stochastic optimization and backpropagation-based learning strategy; (iii) challenges in explaining DL black-box behavior to support clinical decision-making; and (iv) inter-reader variability in the ground truth (GT) annotations affecting learning and evaluation. This study proposes a systematic approach to address these limitations through application to the pandemic-caused need for Coronavirus disease 2019 (COVID-19) detection using chest X-rays (CXRs). Specifically, our contribution highlights significant benefits obtained through (i) pretraining specific to CXRs in transferring and fine-tuning the learned knowledge toward improving COVID-19 detection performance; (ii) using ensembles of the fine-tuned models to further improve performance over individual constituent models; (iii) performing statistical analyses at various learning stages for validating results; (iv) interpreting learned individual and ensemble model behavior through class-selective relevance mapping (CRM)-based region of interest (ROI) localization; and, (v) analyzing inter-reader variability and ensemble localization performance using Simultaneous Truth and Performance Level Estimation (STAPLE) methods. We find that ensemble approaches markedly improved classification and localization performance, and that inter-reader variability and performance level assessment helps guide algorithm design and parameter optimization. To the best of our knowledge, this is the first study to construct ensembles, perform ensemble-based disease ROI localization, and analyze inter-reader variability and algorithm performance for COVID-19 detection in CXRs.
Collapse
|
31
|
Abstract spin in physiotherapy interventions using virtual reality or robotics: protocol for two Meta-research reviews. PHYSICAL THERAPY REVIEWS 2020. [DOI: 10.1080/10833196.2020.1832708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
32
|
|
33
|
Statistical Reporting in Nursing Research: Addressing a Common Error in Reporting of p Values (p = .000). J Nurs Scholarsh 2020; 52:688-695. [PMID: 32890425 DOI: 10.1111/jnu.12595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2020] [Indexed: 11/30/2022]
Abstract
PURPOSE The confidence in a study will be reduced due to the incorrect representation of statistical results. However, it is unknown to what extent p values are incorrectly represented in published nursing journals. The study aims to evaluate the articles in 30 nursing journals in terms of the error in reporting of p values (p = .000). DESIGN AND METHODS This was a bibliometric analysis. All papers published in 10 leading nursing journals (between 2015 and 2019), the 10 bottom nursing journals (2019), and 10 selected key nursing journals (2019) indexed in the Science Citation Index Journal Citation Reports were reviewed to detect errors in reporting of p values (p = .000). RESULTS A total of 3,788 papers were reviewed. Notably, it was found that 93.3% (28/30) of the nursing journals contained incorrect representation of p values (p = .000). The reporting rate of these journals ranges from 0% to 57.1%, with an overall rate of 12.8% (486/3,788). In addition, the rate of incorrect representation of p values (p = .000) showed no statistically significant difference between different publication years (Χ2 = 4.976, p = .290). However, the rate of reporting was different between study types, journals, and regions (p = .007, p = .020, and p < .001, respectively). CONCLUSIONS The incorrect representation of p values is common in nursing journals. CLINICAL RELEVANCE We recommend that both publishers and researchers be responsible for preventing statistical errors in manuscripts. Furthermore, various kinds of statistical training methods should be adopted to ensure that nurses and journal reviewers have enough statistical literacy.
Collapse
|
34
|
Part
II
: Statistics in practice: Nature of data, descriptive statistics, and presentation of data. JOURNAL OF THE AMERICAN COLLEGE OF CLINICAL PHARMACY 2020. [DOI: 10.1002/jac5.1312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
35
|
Reporting Standards for a Bland-Altman Agreement Analysis: A Review of Methodological Reviews. Diagnostics (Basel) 2020; 10:E334. [PMID: 32456091 PMCID: PMC7278016 DOI: 10.3390/diagnostics10050334] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 05/05/2020] [Accepted: 05/20/2020] [Indexed: 12/28/2022] Open
Abstract
The Bland-Altman Limits of Agreement is a popular and widespread means of analyzing the agreement of two methods, instruments, or raters in quantitative outcomes. An agreement analysis could be reported as a stand-alone research article but it is more often conducted as a minor quality assurance project in a subgroup of patients, as a part of a larger diagnostic accuracy study, clinical trial, or epidemiological survey. Consequently, such an analysis is often limited to brief descriptions in the main report. Therefore, in several medical fields, it has been recommended to report specific items related to the Bland-Altman analysis. The present study aimed to identify the most comprehensive and appropriate list of items for such an analysis. Seven proposals were identified from a MEDLINE/PubMed search, three of which were derived by reviewing anesthesia journals. Broad consensus was seen for the a priori establishment of acceptability benchmarks, estimation of repeatability of measurements, description of the data structure, visual assessment of the normality and homogeneity assumption, and plotting and numerically reporting both bias and the Bland-Altman Limits of Agreement, including respective 95% confidence intervals. Abu-Arafeh et al. provided the most comprehensive and prudent list, identifying 13 key items for reporting (Br. J. Anaesth. 2016, 117, 569-575). An exemplification with interrater data from a local study accentuated the straightforwardness of transparent reporting of the Bland-Altman analysis. The 13 key items should be applied by researchers, journal editors, and reviewers in the future, to increase the quality of reporting Bland-Altman agreement analyses.
Collapse
|
36
|
Reply from Jacob Graves McPherson, Albert Chen, Michael D. Ellis, Jun Yao, C. J. Heckman and Julius P. A. Dewald. J Physiol 2020; 597:4413-4414. [PMID: 31414488 DOI: 10.1113/jp278464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
|
37
|
Rigor and reproducibility for data analysis and design in the behavioral sciences. Behav Res Ther 2020; 126:103552. [PMID: 32014693 DOI: 10.1016/j.brat.2020.103552] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 01/02/2020] [Accepted: 01/13/2020] [Indexed: 01/19/2023]
Abstract
The rigor and reproducibility of science methods depends heavily on the appropriate use of statistical methods to answer research questions and make meaningful and accurate inferences based on data. The increasing analytic complexity and valuation of novel statistical and methodological approaches to data place greater emphasis on statistical review. We will outline the controversies within statistical sciences that threaten rigor and reproducibility of research published in the behavioral sciences and discuss ongoing approaches to generate reliable and valid inferences from data. We outline nine major areas to consider for generally evaluating the rigor and reproducibility of published articles and apply this framework to the 116 Behaviour Research and Therapy (BRAT) articles published in 2018. The results of our analysis highlight a pattern of missing rigor and reproducibility elements, especially pre-registration of study hypotheses, links to statistical code/output, and explicit archiving or sharing data used in analyses. We recommend reviewers consider these elements in their peer review and that journals consider publishing results of these rigor and reproducibility ratings with manuscripts to incentivize authors to publish these elements with their manuscript.
Collapse
|
38
|
Applications of Raman spectroscopy in the development of cell therapies: state of the art and future perspectives. Analyst 2020; 145:2070-2105. [DOI: 10.1039/c9an01811e] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
This comprehensive review article discusses current and future perspectives of Raman spectroscopy-based analyses of cell therapy processes and products.
Collapse
|
39
|
A world beyond P: policies, strategies, tactics and advice. Exp Physiol 2019; 105:13-16. [PMID: 31675153 DOI: 10.1113/ep088040] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 10/29/2019] [Indexed: 11/08/2022]
Abstract
A short review of the changing approach to statistics' contribution to the conduct of physiological experiments, with suggestions for further changes and better practice.
Collapse
|
40
|
New Guidelines for Data Reporting and Statistical Analysis: Helping Authors With Transparency and Rigor in Research. J Bone Miner Res 2019; 34:1981-1984. [PMID: 31648410 DOI: 10.1002/jbmr.3885] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2019] [Revised: 09/27/2019] [Accepted: 10/01/2019] [Indexed: 11/05/2022]
|
41
|
|
42
|
Let’s all play with the same rules. Eur J Appl Physiol 2019; 119:2121-2122. [DOI: 10.1007/s00421-019-04194-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Accepted: 07/13/2019] [Indexed: 10/26/2022]
|
43
|
Statistics and spin: it's time to improve. J Physiol 2019; 597:4411-4412. [PMID: 31414487 DOI: 10.1113/jp278335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
|
44
|
Correction to 'Open science and modified funding lotteries can impede the natural selection of bad science'. ROYAL SOCIETY OPEN SCIENCE 2019; 6:191249. [PMID: 31543978 PMCID: PMC6731693 DOI: 10.1098/rsos.191249] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
[This corrects the article DOI: 10.1098/rsos.190194.].
Collapse
|
45
|
Poor statistical reporting and spin in neuromuscular fatigue research. Eur J Appl Physiol 2019; 119:2119-2120. [PMID: 31350638 DOI: 10.1007/s00421-019-04193-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 07/13/2019] [Indexed: 10/26/2022]
|
46
|
Open science and modified funding lotteries can impede the natural selection of bad science. ROYAL SOCIETY OPEN SCIENCE 2019; 6:190194. [PMID: 31417725 PMCID: PMC6689639 DOI: 10.1098/rsos.190194] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 06/04/2019] [Indexed: 06/10/2023]
Abstract
Assessing scientists using exploitable metrics can lead to the degradation of research methods even without any strategic behaviour on the part of individuals, via 'the natural selection of bad science.' Institutional incentives to maximize metrics like publication quantity and impact drive this dynamic. Removing these incentives is necessary, but institutional change is slow. However, recent developments suggest possible solutions with more rapid onsets. These include what we call open science improvements, which can reduce publication bias and improve the efficacy of peer review. In addition, there have been increasing calls for funders to move away from prestige- or innovation-based approaches in favour of lotteries. We investigated whether such changes are likely to improve the reproducibility of science even in the presence of persistent incentives for publication quantity through computational modelling. We found that modified lotteries, which allocate funding randomly among proposals that pass a threshold for methodological rigour, effectively reduce the rate of false discoveries, particularly when paired with open science improvements that increase the publication of negative results and improve the quality of peer review. In the absence of funding that targets rigour, open science improvements can still reduce false discoveries in the published literature but are less likely to improve the overall culture of research practices that underlie those publications.
Collapse
|
47
|
One hertz versus ten hertz repetitive TMS treatment of PTSD: A randomized clinical trial. Psychiatry Res 2019; 273:153-162. [PMID: 30641346 DOI: 10.1016/j.psychres.2019.01.004] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 12/19/2018] [Accepted: 01/01/2019] [Indexed: 02/06/2023]
Abstract
The purpose of this trial was to test whether right prefrontal cortex 1 Hz versus 10 Hz rTMS provides a significantly greater improvement in PTSD symptoms and/or function. Veterans 18 to 50 years of age suffering from PTSD were randomized to right prefrontal 1 Hz rTMS [2400 pulses/session] versus right prefrontal 10 Hz rTMS [2400 pulses/session]. The treatments were performed 5 days a week for 6 weeks with a 3-week taper using the NeuroStar system. There were one month and three months post treatment follow-up evaluations. Forty-four participants were enrolled with 17 being randomized to 1 Hz rTMS and 18 to 10 Hz rTMS. Both groups had significant improvement in PTSD and depression scores from baseline to the end of acute treatment. The 10 Hz group but not the 1 Hz group demonstrated significant improvement in function. Although both groups demonstrated significant improvement in PTSD and depression symptoms, a significant advantage for either the 1 Hz or 10 Hz frequency group on any of the scales acquired was not demonstrated. Further work is required with larger samples sizes to test whether low or high frequency is superior or if individual differences would indicate the more effective frequency.
Collapse
|
48
|
Why we need to report more than 'Data were Analyzed by t-tests or ANOVA'. eLife 2018; 7:e36163. [PMID: 30574870 PMCID: PMC6326723 DOI: 10.7554/elife.36163] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 12/16/2018] [Indexed: 12/18/2022] Open
Abstract
Transparent reporting is essential for the critical evaluation of studies. However, the reporting of statistical methods for studies in the biomedical sciences is often limited. This systematic review examines the quality of reporting for two statistical tests, t-tests and ANOVA, for papers published in a selection of physiology journals in June 2017. Of the 328 original research articles examined, 277 (84.5%) included an ANOVA or t-test or both. However, papers in our sample were routinely missing essential information about both types of tests: 213 papers (95% of the papers that used ANOVA) did not contain the information needed to determine what type of ANOVA was performed, and 26.7% of papers did not specify what post-hoc test was performed. Most papers also omitted the information needed to verify ANOVA results. Essential information about t-tests was also missing in many papers. We conclude by discussing measures that could be taken to improve the quality of reporting.
Collapse
|
49
|
Confidence intervals that cross zero must be interpreted correctly. Scand J Med Sci Sports 2018; 29:476-477. [PMID: 30506736 DOI: 10.1111/sms.13352] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|