1
|
Lucas SL, Carroll AH, Backstrom ZK, Dylan Pasko KB, Mesfin A. Utilization of the Fragility Index to Assess Randomized Controlled Trials Comparing Cervical Total Disc Arthroplasty to Anterior Cervical Discectomy and Fusion. Global Spine J 2025:21925682251341812. [PMID: 40347150 PMCID: PMC12065715 DOI: 10.1177/21925682251341812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 04/11/2025] [Accepted: 04/28/2025] [Indexed: 05/12/2025] Open
Abstract
Study designSystematic Review.ObjectivesCervical total disc arthroplasty (CTDA) remains an alternative to anterior cervical discectomy and fusion (ACDF) in select patients with cervical radiculopathy or myelopathy secondary to degenerative disc disease. Studies comparing CTDA to ACDF often have conflicting conclusions and varying quality. The purpose of this study was to utilize the fragility index (FI) to assess the robustness of randomized controlled trials (RCT) comparing CTDA to ACDF.MethodsA systematic review was performed by searching PubMed, Ovid MEDLINE, Web of Science, and Embase for RCTs with 2 parallel study arms and 1:1 allocation of subjects investigating CTDA vs ACDF with at least 1 statistically significant, dichotomous outcome. The FI was calculated by individually shifting 1 patient from the event group to the non-event group with re-calculation of Fisher's Exact test until the reported P value was no longer statistically significant (P > 0.05).ResultsThe search identified 934 abstracts with 19 RCTs meeting inclusion criteria. The mean patient sample size was 276.4 (median 209, range 30-541). The number of patients lost to follow-up ranged from 0-229 (mean 69.7, median 45). The mean FI was 4.6 (range 0-30, median 2) with 3 (13.6%) of the studies having an associated FI of 0. Loss to follow up exceeded the fragility index in all but 2 studies.ConclusionRCTs comparing ACDF to CTDA are often fragile with only 1-2 patients experiencing an alternative outcome or lost to follow-up to change the studied outcome.
Collapse
Affiliation(s)
- Sarah L. Lucas
- Georgetown University School of Medicine, Washington, DC, USA
| | - Austin H. Carroll
- Department of Orthopaedic Surgery, MedStar Georgetown University Hospital, Washington, DC, USA
| | | | | | - Addisu Mesfin
- Department of Orthopaedic Surgery, MedStar Georgetown University Hospital, Washington, DC, USA
- Department of Orthopaedic Surgery, MedStar Washington Hospital Center, Washington, DC, USA
| |
Collapse
|
2
|
Oeding JF, Krych AJ, Camp CL, Varady NH. The Number of Patients Lost to Follow-Up May Exceed the Fragility Index of a Randomized Controlled Trial Without Reversing Statistical Significance: A Systematic Review and Statistical Model. Arthroscopy 2025; 41:442-451.e1. [PMID: 38777001 DOI: 10.1016/j.arthro.2024.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 04/21/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024]
Abstract
PURPOSE To (1) analyze trends in the publishing of statistical fragility index (FI)-based systematic reviews in the orthopaedic literature, including the prevalence of misleading or inaccurate statements related to the statistical fragility of randomized controlled trials (RCTs) and patients lost to follow-up (LTF), and (2) determine whether RCTs with relatively "low" FIs are truly as sensitive to patients LTF as previously portrayed in the literature. METHODS All FI-based studies published in the orthopaedic literature were identified using the Cochrane Database of Systematic Reviews, Web of Science Core Collection, PubMed, and MEDLINE databases. All articles involving application of the FI or reverse FI to study the statistical fragility of studies in orthopaedics were eligible for inclusion in the study. Study characteristics, median FIs and sample sizes, and misleading or inaccurate statements related to the FI and patients LTF were recorded. Misleading or inaccurate statements-defined as those basing conclusions of trial fragility on the false assumption that adding patients LTF back to a trial has the same statistical effect as existing patients in a trial experiencing the opposite outcome-were determined by 2 authors. A theoretical RCT with a sample size of 100, P = .006, and FI of 4 was used to evaluate the difference in effect on statistical significance between flipping outcome events of patients already included in the trial (FI) and adding patients LTF back to the trial to show the true sensitivity of RCTs to patients LTF. RESULTS Of the 39 FI-based studies, 37 (95%) directly compared the FI with the number of patients LTF. Of these 37 studies, 22 (59%) included a statement regarding the FI and patients LTF that was determined to be inaccurate or misleading. In the theoretical RCT, a reversal of significance was not observed until 7 patients LTF (nearly twice the FI) were added to the trial in the distribution of maximal significance reversal. CONCLUSIONS The claim that any RCT in which the number of patients LTF exceeds the FI could potentially have its significance reversed simply by maintaining study follow-ups is commonly inaccurate and prevalent in orthopaedic studies applying the FI. Patients LTF and the FI are not equivalent. The minimum number of patients LTF required to flip the significance of a typical RCT was shown to be greater than the FI, suggesting that RCTs with relatively low FIs may not be as sensitive to patients LTF as previously portrayed in the literature; however, only a holistic approach that considers the context in which the trial was conducted, potential biases, and study results can determine the merits of any particular RCT. CLINICAL RELEVANCE Surgeons may benefit from re-examining their interpretation of prior FI reviews that have made claims of substantial RCT fragility based on comparisons between the FI and patients LTF; it is possible the results are more robust than previously believed.
Collapse
Affiliation(s)
- Jacob F Oeding
- School of Medicine, Mayo Clinic Alix School of Medicine, Rochester, Minnesota, U.S.A.; Department of Orthopaedics, Institute of Clinical Sciences, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.
| | - Aaron J Krych
- Department of Orthopaedic Surgery, Mayo Clinic, Rochester, Minnesota, U.S.A
| | - Christopher L Camp
- Department of Orthopaedic Surgery, Mayo Clinic, Rochester, Minnesota, U.S.A
| | - Nathan H Varady
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, U.S.A
| |
Collapse
|
3
|
Khan NS, Dhanda AK, Takashima M, Liu R, Yoshiyasu Y, Wu W, Jin W, McCoul ED, Ramanathan M, Ahmed OG. What is the robustness of randomized controlled trials supporting rhinosinusitis guidelines? Am J Otolaryngol 2025; 46:104575. [PMID: 39740532 DOI: 10.1016/j.amjoto.2024.104575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Accepted: 12/17/2024] [Indexed: 01/02/2025]
Abstract
PURPOSE To determine the robustness of randomized controlled trials (RCTs) supporting the current rhinosinusitis guideline; International Consensus Statement on Allergy and Rhinology: rhinosinusitis (ICAR-RS). MATERIALS & METHODS RCTs referenced by ICAR-RS with primary dichotomous outcomes were analyzed. The Fragility Index (FI) was calculated for trials with statistically significant findings. Trial characteristics, the FI, and FI minus number lost to follow-up (LTF) were assessed for associations. RESULTS A total of 317 RCTs were identified, with 38 trials possessing a primary dichotomous outcome. Thirty-one percent evaluated surgical interventions and 24 % were industry-sponsored. The mean sample size was 116 with 9 patients, on average, LTF. Sixty-three percent were eligible for FI calculation and had a median FI of 2.5 (IQR 1, 4.25). Sixty-seven percent of trials had an FI ≤ 3, indicating low robustness. No difference in FI was observed between trials with and without industry support (p = 0.577). The FI was less than or equal to the number of patients LTF in 33 % of trials (n = 8). Higher FI was strongly correlated with higher sample size, total number of events, p-value, and grade of recommendation (p < 0.001). After adjusting for covariates, higher sample size and total number of events were associated with higher FI. CONCLUSION The RCTs used to support the ICAR-RS have an overall low robustness and future rhinosinusitis trials should report FI measures to provide improved context of their results.
Collapse
Affiliation(s)
- Najm S Khan
- Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA; Department of Otolaryngology - Head and Neck Surgery, Houston Methodist, Houston, TX, USA.
| | - Aatin K Dhanda
- Department of Otolaryngology - Head and Neck Surgery, Houston Methodist, Houston, TX, USA
| | - Masayoshi Takashima
- Department of Otolaryngology - Head and Neck Surgery, Houston Methodist, Houston, TX, USA
| | - Richard Liu
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, NY, New York, USA
| | - Yuki Yoshiyasu
- Department of Otolaryngology-Head and Neck Surgery, University of Texas Medical Branch, Galveston, TX, USA
| | - Wenbo Wu
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, NY, New York, USA
| | - Whitney Jin
- Baylor College of Medicine, Houston, TX, USA
| | - Edward D McCoul
- Department of Otolaryngology - Head and Neck Surgery, Tulane University School of Medicine, New Orleans, LA, USA; Department of Otorhinolaryngology and Communication Sciences, Ochsner Clinic Foundation, New Orleans, LA, USA
| | - Murugappan Ramanathan
- Department of Otolaryngology - Head and Neck Surgery, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Omar G Ahmed
- Department of Otolaryngology - Head and Neck Surgery, Houston Methodist, Houston, TX, USA
| |
Collapse
|
4
|
Yu A, Mohamed KS, Kurapatti M, Song J, Huang JJ, Singh P, Alasadi Y, Grewal A, Yendluri A, Namiri N, Corvi J, Kim JS, Cho SK. The statistical fragility of vertebroplasty outcomes: A systematic review of randomized controlled trials. JOURNAL OF CRANIOVERTEBRAL JUNCTION AND SPINE 2025; 16:26-33. [PMID: 40292175 PMCID: PMC12029381 DOI: 10.4103/jcvjs.jcvjs_13_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Accepted: 02/08/2025] [Indexed: 04/30/2025] Open
Abstract
Randomized clinical trials (RCTs) on vertebroplasty are crucial for guiding the treatment of vertebral compression fractures, but their overlooked statistical fragility can undermine clinical reliability. Minor outcome changes may overturn significant findings, risking unreliable evidence, and impacting patient care. This study assessed the fragility of significant outcomes in vertebroplasty RCTs, hypothesizing high sensitivity to such changes. PubMed, Embase, and MEDLINE were searched for RCTs on vertebroplasty reporting dichotomous outcomes. The fragility index (FI) and reverse FI quantified the number of outcome reversals needed to change statistical significance for significant and nonsignificant results, respectively. The fragility quotient (FQ) was calculated as the FI divided by the study sample size. Subgroup analysis was conducted by outcome category. A total of 276 outcomes from RCTs were analyzed. The median FI was 5 (interquartile range [IQR]: 4-5), with a FQ of 0.053 (IQR: 0.019-0.088). Statistically significant outcomes (n = 36) had a median FI of 3 (IQR: 2-4) and FQ of 0.034 (IQR: 0.018-0.051), whereas nonsignificant outcomes (n = 240) showed a median FI of 5 (IQR: 4-5) and FQ of 0.062 (IQR: 0.021-0.088). Fracture-related outcomes were the most robust (FI: 5, FQ: 0.088), whereas cement leakage was the most fragile (FI: 3, FQ: 0.041). Pain outcomes had an FI of 5 (FQ: 0.062), and complications and vertebroplasty versus kyphoplasty outcomes were more robust (FI: 5, FQ: 0.013). Patients lost to follow-up exceeded the FI in 79% of outcomes. The statistical findings in vertebroplasty RCTs are fragile and warrant cautious interpretation. A small number of outcome reversals or consistent postoperative follow-up can shift the significance of the results. Standardized reporting of P values alongside FI and FQ metrics is recommended to help clinicians evaluate the robustness of study findings.
Collapse
Affiliation(s)
- Alexander Yu
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kareem S. Mohamed
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mark Kurapatti
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Junho Song
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jonathan J. Huang
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Prabhjot Singh
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yazan Alasadi
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Abhijeet Grewal
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Avanish Yendluri
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nikan Namiri
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John Corvi
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jun S. Kim
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Samuel K. Cho
- Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
5
|
Byrne R, Ahn B, Zhao L, Quinn M, Naphade O, Owens BD. The Statistical Fragility of Lateral Extra-articular Tenodesis Research: A Systematic Review. Orthop J Sports Med 2024; 12:23259671241266329. [PMID: 39221044 PMCID: PMC11363240 DOI: 10.1177/23259671241266329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 10/05/2023] [Indexed: 09/04/2024] Open
Abstract
Background A P value of <.05 is often used to denote statistical significance; however, in many scenarios, this threshold is vulnerable to a small number of outcome reversals. This study joins a body of studies within the orthopaedic literature that evaluate the statistical fragility of existing research via metrics such as fragility index (FI) and fragility quotient (FQ). Purpose/Hypothesis The purpose of this study was to investigate the statistical fragility of randomized controlled trials (RCTs) and comparative studies on the topic, given the resurgent interest in lateral extra-articular tenodesis (LET) to augment primary or revision anterior cruciate ligament reconstruction (ACLR). It was hypothesized that the outcomes reported in these studies would be statistically fragile. Study Design Systematic review; Level of evidence, 4. Methods Comparative studies and RCTs regarding LET as an adjunct procedure to ACLR published between 2000 and 2022 were analyzed. Descriptive characteristics, dichotomous outcomes, and continuous outcomes were extracted. The FI and continuous FI (CFI) were calculated by the number of event reversals to change significance; the FQ and continuous FQ (CFQ) were calculated to normalize the fragility metrics per sample size. Results Of 455 studies screened, 29 studies were included (9 RCTs, 20 comparative); 79.3% of included studies were published after 2020. A total of 48 dichotomous and 265 continuous outcomes were analyzed. The median FI was 9.0 (IQR, 7.0-13.3), with FQ of 0.1 (IQR, 0.04-0.17); the median CFI was 7.8 (IQR, 4.2-19.6), with CFQ of 0.12 (IQR, 0.08-0.19). The FQ and CFQ for studies on LET with revision ACLR were larger (0.117 and 0.113, respectively) than those focused on primary ACLR (0.042 and 0.095, respectively). Conclusion Studies focused on LET with primary ACLR were more fragile than those on LET with revision, which suggests that further research on the indications for LET with primary ACLR is necessary. Future orthopaedic comparative research should include fragility metrics alongside traditional P values.
Collapse
Affiliation(s)
- Rory Byrne
- Department of Orthopaedic Surgery, Alpert Medical School of Brown University, Providence, Rhode Island, USA
| | - Benjamin Ahn
- Department of Orthopaedic Surgery, Alpert Medical School of Brown University, Providence, Rhode Island, USA
| | - Leon Zhao
- Department of Orthopaedic Surgery, Alpert Medical School of Brown University, Providence, Rhode Island, USA
| | - Matthew Quinn
- Department of Orthopaedic Surgery, Alpert Medical School of Brown University, Providence, Rhode Island, USA
| | - Om Naphade
- Department of Orthopaedic Surgery, Alpert Medical School of Brown University, Providence, Rhode Island, USA
| | - Brett D. Owens
- Department of Orthopaedic Surgery, Alpert Medical School of Brown University, Providence, Rhode Island, USA
| |
Collapse
|
6
|
Proal JD, Moon AS, Kwon B. The fragility index and reverse fragility index of FDA investigational device exemption trials in spinal fusion surgery: a systematic review. EUROPEAN SPINE JOURNAL : OFFICIAL PUBLICATION OF THE EUROPEAN SPINE SOCIETY, THE EUROPEAN SPINAL DEFORMITY SOCIETY, AND THE EUROPEAN SECTION OF THE CERVICAL SPINE RESEARCH SOCIETY 2024; 33:2594-2603. [PMID: 38802596 DOI: 10.1007/s00586-024-08317-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/20/2024] [Accepted: 05/16/2024] [Indexed: 05/29/2024]
Abstract
PURPOSE FDA investigational device exemption (IDE) studies are considered a gold standard of assessing safety and efficacy of novel devices through RCTs. The fragility index (FI) has emerged as a means to assess robustness of statistically significant study results and inversely, the reverse fragility index (RFI) for non-significant differences. Previous authors have defined results as fragile if loss to follow up is greater than the FI or RFI. The aim of this study was to assess the FI, RFI, and robustness of data supplied by IDE studies in spinal surgery. METHODS This was a systematic review of the literature. Inclusion criteria included randomized controlled trials with dichotomous outcome measures conducted under IDE guidelines between 2000 and 2023. FI and RFI were calculated through successively changing events to non-events until the outcome changed to non-significance or significance, respectively. The fragility quotient (FQ) and reverse fragility quotient (RFQ) were calculated by dividing the FI and RFI, respectively, by the sample size. RESULTS Thirty-two studies met inclusion criteria with a total of 40 unique outcome measures; 240 outcomes were analyzed. Twenty-six studies reported 96 statistically significant results. The median FI was 6 (IQR: 3-9.25), and patients lost to follow up was greater than the FI in 99.0% (95/96) of results. The average FQ was 0.027. Thirty studies reported 144 statistically insignificant results and a median RFI of 6 (IQR: 4-8). The average RFQ extrapolated was 0.021, and loss to follow up was greater than the RFI in 98.6% (142/144) of results. CONCLUSIONS IDE studies in spine surgery are surprisingly fragile given their reputations, large sample sizes, and intent to establish safety in investigational devices. This study found a median FI and RFI of 6. The number of patients lost to follow-up was greater than FIand RFI in 98.8% (237/240) of reported outcomes. FQ and RFQ tell us that changes of two to three patients per hundred can flip the significance of reported outcomes. This is an important reminder of the limitations of RCTs. Analysis of fragility in future studies may help clarify the strength of the relationship between reported data and their conclusions.
Collapse
Affiliation(s)
- Joshua D Proal
- Tufts University School of Medicine, 145 Harrison Ave, Boston, MA, 02111, USA.
| | - Andrew S Moon
- Department of Orthopedic Surgery, Tufts Medical Center, Tufts University School of Medicine, 800 Washington St, Tufts MC Box #306, Boston, MA, 02111, USA
| | - Brian Kwon
- New England Baptist Hospital, Department of Orthopaedic Surgery, 125 Parker Hill Ave, Boston, MA, 02120, USA
| |
Collapse
|
7
|
Zabat MA, Giakas AM, Hohmann AL, Lonner JH. Interpreting the Current Literature on Outcomes of Robotic-Assisted Versus Conventional Total Knee Arthroplasty Using Fragility Analysis: A Systematic Review and Cross-Sectional Study of Randomized Controlled Trials. J Arthroplasty 2024; 39:1882-1887. [PMID: 38309638 DOI: 10.1016/j.arth.2024.01.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 01/18/2024] [Accepted: 01/24/2024] [Indexed: 02/05/2024] Open
Abstract
BACKGROUND Fragility analysis is a method of further characterizing outcomes in terms of the stability of statistical findings. This study assesses the statistical fragility of recent randomized controlled trials (RCTs) evaluating robotic-assisted versus conventional total knee arthroplasty (RA-TKA versus C-TKA). METHODS We queried PubMed for RCTs comparing alignment, function, and outcomes between RA-TKA and C-TKA. Fragility index (FI) and reverse fragility index (RFI) (collectively, "FI") were calculated for dichotomous outcomes as the number of outcome reversals needed to change statistical significance. Fragility quotient (FQ) was calculated by dividing the FI by the sample size for that outcome event. Median FI and FQ were calculated for all outcomes collectively as well as for each individual outcome. Subanalyses were performed to assess FI and FQ based on outcome event type and statistical significance, as well as study loss to follow-up and year of publication. RESULTS The overall median FI was 3.0 (interquartile range, [IQR] 1.0 to 6.3) and the median reverse fragility index was 3.0 (IQR 2.0 to 4.0). The overall median FQ was 0.027 (IQR 0.012 to 0.050). Loss to follow-up was greater than FI for 23 of the 38 outcomes assessed. CONCLUSIONS A small number of alternative outcomes is often enough to reverse the statistical significance of findings in RCTs evaluating dichotomous outcomes in RA-TKA versus C-TKA. We recommend reporting FI and FQ alongside P values to improve the interpretability of RCT results.
Collapse
Affiliation(s)
- Michelle A Zabat
- Department of Orthopaedic Surgery, NYU Langone Orthopaedic Hospital, New York, New York
| | - Alec M Giakas
- Rothman Orthopaedic Institute at Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Alexandra L Hohmann
- Rothman Orthopaedic Institute at Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Jess H Lonner
- Rothman Orthopaedic Institute at Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
8
|
Dennstädt F, Zink J, Putora PM, Hastings J, Cihoric N. Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain. Syst Rev 2024; 13:158. [PMID: 38879534 PMCID: PMC11180407 DOI: 10.1186/s13643-024-02575-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 05/30/2024] [Indexed: 06/19/2024] Open
Abstract
BACKGROUND Systematically screening published literature to determine the relevant publications to synthesize in a review is a time-consuming and difficult task. Large language models (LLMs) are an emerging technology with promising capabilities for the automation of language-related tasks that may be useful for such a purpose. METHODS LLMs were used as part of an automated system to evaluate the relevance of publications to a certain topic based on defined criteria and based on the title and abstract of each publication. A Python script was created to generate structured prompts consisting of text strings for instruction, title, abstract, and relevant criteria to be provided to an LLM. The relevance of a publication was evaluated by the LLM on a Likert scale (low relevance to high relevance). By specifying a threshold, different classifiers for inclusion/exclusion of publications could then be defined. The approach was used with four different openly available LLMs on ten published data sets of biomedical literature reviews and on a newly human-created data set for a hypothetical new systematic literature review. RESULTS The performance of the classifiers varied depending on the LLM being used and on the data set analyzed. Regarding sensitivity/specificity, the classifiers yielded 94.48%/31.78% for the FlanT5 model, 97.58%/19.12% for the OpenHermes-NeuralChat model, 81.93%/75.19% for the Mixtral model and 97.58%/38.34% for the Platypus 2 model on the ten published data sets. The same classifiers yielded 100% sensitivity at a specificity of 12.58%, 4.54%, 62.47%, and 24.74% on the newly created data set. Changing the standard settings of the approach (minor adaption of instruction prompt and/or changing the range of the Likert scale from 1-5 to 1-10) had a considerable impact on the performance. CONCLUSIONS LLMs can be used to evaluate the relevance of scientific publications to a certain review topic and classifiers based on such an approach show some promising results. To date, little is known about how well such systems would perform if used prospectively when conducting systematic literature reviews and what further implications this might have. However, it is likely that in the future researchers will increasingly use LLMs for evaluating and classifying scientific publications.
Collapse
Affiliation(s)
- Fabio Dennstädt
- Department of Radiation Oncology, Cantonal Hospital of St. Gallen, St. Gallen, Switzerland.
- Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
| | - Johannes Zink
- Institute for Computer Science, University of Würzburg, Würzburg, Germany
| | - Paul Martin Putora
- Department of Radiation Oncology, Cantonal Hospital of St. Gallen, St. Gallen, Switzerland
- Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland
| | - Janna Hastings
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- School of Medicine, University of St. Gallen, St. Gallen, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nikola Cihoric
- Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland
| |
Collapse
|
9
|
Skorochod R, Gronovich Y. Fragility Index and Fragility Quotient in Statistically Significant Randomized Controlled Trials in Plastic Breast Surgery. PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN 2024; 12:e5916. [PMID: 38903137 PMCID: PMC11188868 DOI: 10.1097/gox.0000000000005916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 05/01/2024] [Indexed: 06/22/2024]
Abstract
Background The fragility index (FI) was conceived as an adjunct to the P value, signifying the strength of statistically significant results. The index states the minimal number of patients whose outcome must be changed from "event" to "nonevent" for the results to be statistically nonsignificant. The FI was applied in various medical specialties to assess the robustness of results presented in studies. We aim to assess the robustness of statistically significant results in studies on plastic surgery of the breast and determine factors correlated with studies deemed fragile. Methods A systematic literature review of PubMed databases using designated keywords was performed. Background characteristics were extracted from the studies, alongside the significance of outcomes. FI and fragility quotient were calculated for each analyzed outcome and correlated with various baseline characteristics. Results FI and fragility quotient were both significantly correlated only with the P value of the analyzed outcomes. However, grouping studies based on the P value into three categories did not demonstrate a difference in FI. Comparisons of fragile and robust studies did not demonstrate a statistically significant change in terms of baseline variables, except for the mean P value of the outcome. Conclusion Statistically significant results of randomized controlled trials in plastic surgery of the breast suffer from extensive fragility, and researchers should critically implement their conclusions in their practice.
Collapse
Affiliation(s)
- Ron Skorochod
- From the Department of Plastic and Reconstructive Surgery, Shaare Zedek Medical Center; Hebrew University Faculty of Medicine, Jerusalem, Israel
| | - Yoav Gronovich
- From the Department of Plastic and Reconstructive Surgery, Shaare Zedek Medical Center; Hebrew University Faculty of Medicine, Jerusalem, Israel
| |
Collapse
|
10
|
Meade M, Buchan L, Stark M, Woods B. Evidence-Based Medicine and Observational Studies. Clin Spine Surg 2024; 37:242-244. [PMID: 37941105 DOI: 10.1097/bsd.0000000000001550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 10/03/2023] [Indexed: 11/10/2023]
Abstract
Evidence-based medicine drives medical decision-making in the modern era, which has historically favored randomized control trials. Despite their notoriety, randomized control trials have multiple disadvantages when applied to spinal surgery. Observational studies are popular in spinal surgery literature and are seen in various forms, such as retrospective studies and prospective cohort studies. For researchers, learners, and practicing spine surgeons, this paper describes options for study design when applied to spinal surgery.
Collapse
Affiliation(s)
- Matthew Meade
- Division of Orthopaedic Surgery, Jefferson Health, Stratford, NJ
| | - Levi Buchan
- Division of Orthopaedic Surgery, Jefferson Health, Stratford, NJ
| | - Michael Stark
- Division of Orthopaedic Surgery, Jefferson Health, Stratford, NJ
| | - Barrett Woods
- The Rothman Institute at Thomas Jefferson University, Philadelphia, PA
| |
Collapse
|
11
|
Brown AN, Yendluri A, Lawrence KW, Cordero JK, Moucha CS, Hayden BL, Parisien RL. The Statistical Fragility of Tranexamic Acid Use in the Orthopaedic Surgery Literature: A Systematic Review of Randomized Controlled Trials. J Am Acad Orthop Surg 2024; 32:508-515. [PMID: 38574390 DOI: 10.5435/jaaos-d-23-00503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/15/2024] [Indexed: 04/06/2024] Open
Abstract
INTRODUCTION Randomized controlled trials (RCTs) represent the highest level of evidence in orthopaedic surgery literature, although the robustness of statistical findings in these trials may be unreliable. We used the fragility index (FI), reverse fragility index (rFI), and fragility quotient (FQ) to evaluate the statistical stability of outcomes reported in RCTs that assess the use of tranexamic acid (TXA) across orthopaedic subspecialties. METHODS PubMed, EMBASE, and MEDLINE were queried for RCTs (2010-present) reporting dichotomous outcomes with study groups stratified by TXA administration. The FI and rFI were defined as the number of outcome event reversals needed to alter the significance level of significant and nonsignificant outcomes, respectively. FQ was determined by dividing the FI or rFI by sample size. Subgroup analyses were conducted based on orthopaedic subspecialty. RESULTS Six hundred five RCTs were screened with 108 studies included for analysis comprising 192 total outcomes. The median FI of the 192 outcomes was 4 (IQR 2 to 5) with an associated FQ of 0.03 (IQR 0.019 to 0.050). 45 outcomes were reported as statistically significant with a median FI of 1 (IQR 1 to 5) and associated FQ of 0.02 (IQR 0.011 to 0.034). 147 outcomes were reported as nonsignificant with a median rFI of 4 (IQR 3 to 5) and associated FQ of 0.04 (IQR 0.023 to 0.051). The adult reconstruction, trauma, and spine subspecialties had a median FI of 4. Sports had a median FI of 3. Shoulder and elbow and foot and ankle had median FIs of 6. DISCUSSION Statistical outcomes reported in RCTs on the use of TXA in orthopaedic surgery are fragile. Reversal of a few outcomes is sufficient to alter statistical significance. We recommend reporting FI, rFI, and FQ metrics to aid in interpreting the outcomes reported in comparative trials.
Collapse
Affiliation(s)
- Ashley N Brown
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Brown, Yendluri, Cordero, Moucha, Hayden, Parisien), and the Boston University School of Medicine, Boston, MA (Lawrence)
| | | | | | | | | | | | | |
Collapse
|
12
|
Suresh NV, Go BC, Fritz CG, Harris J, Ahluwalia V, Xu K, Lu J, Rajasekaran K. The fragility index: how robust are the outcomes of head and neck cancer randomised, controlled trials? J Laryngol Otol 2024; 138:451-456. [PMID: 37795709 PMCID: PMC10950446 DOI: 10.1017/s0022215123001755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 08/12/2023] [Accepted: 08/29/2023] [Indexed: 10/06/2023]
Abstract
BACKGROUND The fragility index represents the minimum number of patients required to convert an outcome from statistically significant to insignificant. This report assesses the fragility index of head and neck cancer randomised, controlled trials. METHODS Studies were extracted from PubMed/Medline, Scopus, Embase and Cochrane databases. RESULTS Overall, 123 randomised, controlled trials were included. The sample size and fragility index medians (interquartile ranges) were 103 (56-213) and 2 (0-5), respectively. The fragility index exceeded the number of patients lost to follow up in 42.3 per cent (n = 52) of studies. A higher fragility index correlated with higher sample size (r = 0.514, p < 0.001), number of events (r = 0.449, p < 0.001) and statistical significance via p-value (r = -0.367, p < 0.001). CONCLUSION Head and neck cancer randomised, controlled trials demonstrated low fragility index values, in which statistically significant results could be nullified by altering the outcomes of just two patients, on average. Future head and neck oncology randomised, controlled trials should report the fragility index in order to provide insight into statistical robustness.
Collapse
Affiliation(s)
- Neeraj V Suresh
- Department of Otorhinolaryngology – Head and Neck Surgery, University of Pennsylvania, Philadelphia, PA, USA
- Department of Otolaryngology – Head and Neck Surgery, Yale University, New Haven, CT, USA
| | - Beatrice C Go
- Department of Otorhinolaryngology – Head and Neck Surgery, University of Pennsylvania, Philadelphia, PA, USA
| | - Christian G Fritz
- Department of Otorhinolaryngology – Head and Neck Surgery, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacob Harris
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Vinayak Ahluwalia
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Katherine Xu
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Joseph Lu
- Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, PA, USA
| | - Karthik Rajasekaran
- Department of Otorhinolaryngology – Head and Neck Surgery, University of Pennsylvania, Philadelphia, PA, USA
- Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
13
|
Zhang J, Wei H, Chang X, Liang J, Lou Z, Tang X. Statistical fragility of randomized clinical trials pertaining to femoral neck fractures. Injury 2023; 54:111161. [PMID: 39491900 DOI: 10.1016/j.injury.2023.111161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 10/09/2023] [Accepted: 10/22/2023] [Indexed: 11/05/2024]
Abstract
OBJECTIVE P values were frequently misused and misinterpreted, the fragility index (FI) has been utilized to evaluate the robustness of randomized controlled trials (RCTs) as a complement to p-values. This study aimed to assess the statistical robustness of RCTs for femoral neck fractures through the utilization of the FI. METHODS We systematically reviewed PubMed, Cochrane Library, and Embase database to identify RCTs pertaining to femoral neck fractures published in the top 25 highest-impact orthopaedic journals and 4 high-impact general medical journals from January 1, 2000, to December 31, 2022. The FI was calculated for the dichotomous, categorical study outcomes in the identified RCTs using the Fisher exact test, with previously published methods. Spearman correlation analyses were used to evaluate potential associated factors associated with the FI. RESULTS We identified 10 eligible RCTs with a median total sample size of 101 (IQR, 79.5 to 174.75) and a number of patients lost to follow-up of 19.5 (IQR, 4.5 to 28). The median FI was 3.5 (IQR, 1 to 14.25), implying that reversal of outcome in only 4 patients was sufficient to alter trial significance. The FI was less than the number of patients lost to follow-up in seven (70%) RCTs. P values were negatively associated with the FI, while the number of patients lost to follow-up and patients enroled were not statistically significantly associated with the FI. CONCLUSIONS The RCTs pertaining to femoral neck fractures were not as statistically robust as previously thought and should be interpreted with caution. We recommend that the orthopaedic RCT report FI as a supplement for the P values to help readers draw reliable conclusions based on the fragility of the outcomes.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, 222 Zhong Shan Road, Xi Gang District, Dalian, Liaoning 116011, China
| | - Haotian Wei
- Department of Urology, Second Affiliated Hospital of Tianjin Medical University, Tianjin 300211, China
| | - Xiaohu Chang
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, 222 Zhong Shan Road, Xi Gang District, Dalian, Liaoning 116011, China
| | - Jiahui Liang
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, 222 Zhong Shan Road, Xi Gang District, Dalian, Liaoning 116011, China
| | - Zhiyuan Lou
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, 222 Zhong Shan Road, Xi Gang District, Dalian, Liaoning 116011, China
| | - Xin Tang
- Department of Orthopedics, First Affiliated Hospital of Dalian Medical University, 222 Zhong Shan Road, Xi Gang District, Dalian, Liaoning 116011, China.
| |
Collapse
|
14
|
Stern BZ, Poeran J. Statistics in Brief: The Fragility Index. Clin Orthop Relat Res 2023; 481:1288-1291. [PMID: 36862056 PMCID: PMC10263243 DOI: 10.1097/corr.0000000000002622] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/10/2023] [Indexed: 03/03/2023]
Affiliation(s)
- Brocha Z. Stern
- Leni and Peter W. May Department of Orthopaedics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Healthcare Delivery Science, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jashvant Poeran
- Leni and Peter W. May Department of Orthopaedics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Healthcare Delivery Science, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
15
|
Geisler FH, Moghaddamjou A, Wilson JRF, Fehlings MG. Methylprednisolone in acute traumatic spinal cord injury: case-matched outcomes from the NASCIS2 and Sygen historical spinal cord injury studies with contemporary statistical analysis. J Neurosurg Spine 2023; 38:595-606. [PMID: 36640098 DOI: 10.3171/2022.12.spine22713] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 12/12/2022] [Indexed: 01/15/2023]
Abstract
OBJECTIVE Methylprednisolone (MP) to treat acute traumatic spinal cord injury (ATSCI) remains controversial since the release of the second National Acute Spinal Cord Injury Study (NASCIS2) in 1990. As two historical studies, NASCIS2 and Sygen in ATSCI, used identical MP dosages, it was possible to construct a new case-level pooled ATSCI data set satisfying contemporary criteria and able to clarify the effect of MP. METHODS The new pooled data set was first modernized by excluding patients with injury levels caudal to T10, lower-extremity American Spinal Injury Association (ASIA) motor scores (LEMSs) ≥ 46, Glasgow Coma Scale scores ≤ 11, and age < 15 or > 75 years, and then standardized to the ASIA grading and scoring format. A new updated NASCIS2 data set from this pooled data set contained 31.6% fewer patients than the 1990 NASCIS2 data set. RESULTS In the new pooled data set, recovery of LEMSs from baseline to 26 weeks, the primary outcome variable, was separated statistically into five different injury severity cohorts (p < 0.0001). The severity cohorts contained groups with severe floor (62.9%) and ceiling (10.7%) effects, which do not contribute to drug effects. The new NASCIS2 data set duplicated the p value for MP versus placebo in the sub-subgroup analysis of MP initiated ≤ 8 hours (the subgroup) and recovery of motor function on only the right side of the body (a further subgroup within the ≤ 8-hour subgroup), presented as the positive MP effect in the original NASCIS2 reporting. However, current statistical interpretation considers results seen only in post hoc sub-subgroups, without multi-test corrections, to be random effects without clinical significance. The combined case-level pooled data set from the NASCIS2 and Sygen studies increased the MP group from 106 to 431 patients, creating a new MP combined group. This new data set served as a surrogate for a contemporary MP study and found that administration of MP did not enhance ASIA motor score improvement in the lower extremities at 26 weeks. Secondary analysis of descending ASIA motor and sensory cervical neurological levels in cervical ATSCI patients at 26 weeks also found no MP drug effect. CONCLUSIONS Analysis of both the new updated NASCIS2 data set and the new case-matched pooled data set from two historical ATSCI studies revealed that administration of MP after spinal cord injury did not demonstrate any enhancement in neurological recovery at 26 weeks. The results of this analysis warrant review by clinical guideline groups.
Collapse
Affiliation(s)
- Fred H Geisler
- 1Department of Medical Imaging, College of Medicine at the University of Saskatchewan, Saskatoon, Saskatchewan
| | - Ali Moghaddamjou
- 2Division of Neurosurgery, Department of Surgery, University of Toronto and Spinal Program, Toronto Western Hospital, University Health Network, Toronto, Ontario, Canada; and
| | - Jamie R F Wilson
- 3Department of Neurosurgery, College of Medicine, University of Nebraska Medical Center, Omaha, Nebraska
| | - Michael G Fehlings
- 2Division of Neurosurgery, Department of Surgery, University of Toronto and Spinal Program, Toronto Western Hospital, University Health Network, Toronto, Ontario, Canada; and
| |
Collapse
|
16
|
Lee Y, Samarasinghe Y, Chen LH, Jong A, Hapugall A, Javidan A, McKechnie T, Doumouras A, Hong D. Fragility of statistically significant findings from randomized trials in comparing laparoscopic versus robotic abdominopelvic surgeries. Surg Endosc 2023:10.1007/s00464-023-10063-4. [PMID: 37095233 DOI: 10.1007/s00464-023-10063-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 04/01/2023] [Indexed: 04/26/2023]
Abstract
BACKGROUND Utility of robotic over laparoscopic approach has been an area of debate across all surgical specialties over the past decade. The fragility index (FI) is a metric that evaluates the frailty of randomized controlled trials (RCTs) findings by altering the status of patients from an event to non-event until significance is lost. This study aims to evaluate the robustness of RCTs comparing laparoscopic and robotic abdominopelvic surgeries through the FI. METHODS A search was conducted in MEDLINE and EMBASE for RCTs with dichotomous outcomes comparing laparoscopic and robot-assisted surgery in general surgery, gynecology, and urology. The FI and reverse fragility Index (RFI) metrics were used to assess the strength of findings reported by RCTs, and bivariate correlation was conducted to analyze relationships between FI and trial characteristics. RESULTS A total of 21 RCTs were included, with a median sample size of 89 participants (Interquartile range [IQR] 62-126). The median FI was 2 (IQR 0-15) and median RFI 5.5 (IQR 4-8.5). The median FI was 3 (IQR 1-15) for general surgery (n = 7), 2 (0.5-3.5) for gynecology (n = 4), and 0 (IQR 0-8.5) for urology RCTs (n = 4). Correlation was found between increasing FI and decreasing p-value, but not sample size, number of outcome events, journal impact factor, loss to follow-up, or risk of bias. CONCLUSION RCTs comparing laparoscopic and robotic abdominal surgery did not prove to be very robust. While possible advantages of robotic surgery may be emphasized, it remains novel and requires further concrete RCT data.
Collapse
Affiliation(s)
- Yung Lee
- Division of General Surgery, McMaster University, Hamilton, ON, Canada
- Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | | | - Lucy H Chen
- Division of General Surgery, McMaster University, Hamilton, ON, Canada
| | - Audrey Jong
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Akithma Hapugall
- Division of General Surgery, McMaster University, Hamilton, ON, Canada
| | - Arshia Javidan
- Division of Vascular Surgery, University of Toronto, Toronto, ON, Canada
| | - Tyler McKechnie
- Division of General Surgery, McMaster University, Hamilton, ON, Canada
- Department of Health Research Methods and Evidence, McMaster University, Hamilton, ON, Canada
| | | | - Dennis Hong
- Division of General Surgery, McMaster University, Hamilton, ON, Canada.
- Division of General Surgery, St. Joseph's Healthcare, 50 Charlton Avenue East, Hamilton, ON, L8N 4A6, Canada.
| |
Collapse
|
17
|
Muthu S. The efficiency of machine learning-assisted platform for article screening in systematic reviews in orthopaedics. INTERNATIONAL ORTHOPAEDICS 2023; 47:551-556. [PMID: 36562816 DOI: 10.1007/s00264-022-05672-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]
Abstract
PURPOSE With the development of machine learning and artificial intelligence, various platforms were developed to aid in the time-consuming process of article screening in systematic reviews. We aim to analyze the efficiency of a machine learning-assisted platform as an end-user to aid in the screening of the articles for selection into systematic review in orthopaedic surgery. METHODS We included three previously published systematic reviews in the field of orthopaedics of increasing levels of difficulty in the structure of the research question to assess the efficiency of a platform with active-learning technology for article screening. We compared the efficiency of the platform compared to the traditional screening and also across the various scenarios tested. We performed five iterations for each review analyzed. The outcome parameters analyzed were the work saved at 95% recall (WSS-95), work saved at 100% recall (WSS-100), and relevant records found after screening the first 30% of the total records (RRF-30). RESULTS The machine learning-assisted screening significantly improved the rate of identifying the relevant records compared to the traditional screening method (p<0.001). The WSS-95 for the easy, intermediate, and advanced screening scenarios were 78%, 59%, and 38%, respectively. The WSS-100 for the easy, intermediate, and advanced screening scenarios were 75%, 48%, and 7%, respectively. The RRF-30 for the easy, intermediate, and advanced screening scenarios were 97%, 86%, and 64%, respectively. We noted a significant reduction (p<0.001) in the efficiency with the increasing level of difficulty of the screening scenarios. CONCLUSION The machine learning platform is significantly better than the traditional method as an assistive technology to aid in article screening. However, the efficiency of the platform significantly decreases as the complexity of the research question increases.
Collapse
Affiliation(s)
- Sathish Muthu
- Orthopaedic Research Group, Coimbatore, Tamil Nadu, India.
- Department of Biotechnology, School of Engineering and Technology, Sharda University, New Delhi, India.
- Department of Orthopaedics, Government Medical College, Dindigul, Tamil Nadu, India.
| |
Collapse
|
18
|
Milto AJ, Negri CE, Baker J, Thuppal S. The Statistical Fragility of Foot and Ankle Surgery Randomized Controlled Trials. J Foot Ankle Surg 2022; 62:191-196. [PMID: 36182644 DOI: 10.1053/j.jfas.2022.08.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 08/16/2022] [Accepted: 08/27/2022] [Indexed: 02/03/2023]
Abstract
Fragility index (FI) is a metric used to interpret the results of randomized controlled trials (RCTs), and describes the number of subjects that would need to be switched from event to non-event for a result to no longer be significant. Studies that analyze FI of RCTs in various orthopedic subspecialties have shown the RCTs to be largely underpowered and highly fragile. However, FI has not been assessed in foot and ankle RCTs. The MEDLINE and Embase online databases were searched from 1/1/2011 through 11/19/2021 for RCTs involving foot and ankle conditions. FI, fragility quotient (FQ), and difference between the FI and number of subjects lost to follow-up was calculated. Spearman correlation was performed to determine the relationship between sample size and FI. Overall, 1262 studies were identified of which 18 were included in the final analysis. The median sample size was 65 (interquartile range [IQR] 57-95.5), the median FI was 2 (IQR 1-2.5), and the median FQ was 0.026 (IQR 0.012-0.033). Ten of 15 (67%) studies with non-zero FI values had FI values less than the number of subjects lost to follow-up. There was linear association between FI and sample size (R2 = 0.495, p-value: .031). This study demonstrates that RCTs in the field of foot and ankle surgery are highly fragile, similar to other orthopedic subspecialties.
Collapse
Affiliation(s)
- Anthony J Milto
- Division of Orthopedics and Rehabilitation, Department of Surgery, Southern Illinois University School of Medicine, Springfield, IL; Center for Clinical Research, Southern Illinois University School of Medicine, Springfield, IL
| | - Cecily E Negri
- Division of Orthopedics and Rehabilitation, Department of Surgery, Southern Illinois University School of Medicine, Springfield, IL
| | - Jeffrey Baker
- Division of Orthopedics and Rehabilitation, Department of Surgery, Southern Illinois University School of Medicine, Springfield, IL
| | - Sowmyanarayanan Thuppal
- Division of Orthopedics and Rehabilitation, Department of Surgery, Southern Illinois University School of Medicine, Springfield, IL; Center for Clinical Research, Southern Illinois University School of Medicine, Springfield, IL.
| |
Collapse
|
19
|
Fackler NP, Karasavvidis T, Ehlers CB, Callan KT, Lai WC, Parisien RL, Wang D. The Statistical Fragility of Operative vs Nonoperative Management for Achilles Tendon Rupture: A Systematic Review of Comparative Studies. Foot Ankle Int 2022; 43:1331-1339. [PMID: 36004430 PMCID: PMC9527367 DOI: 10.1177/10711007221108078] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND The statistical significance of randomized controlled trials (RCTs) and comparative studies is often conveyed utilizing the P value. However, P values are an imperfect measure and may be vulnerable to a small number of outcome reversals to alter statistical significance. The interpretation of the statistical strength of these studies may be aided by the inclusion of a Fragility Index (FI) and Fragility Quotient (FQ). This study examines the statistical stability of studies comparing operative vs nonoperative management for Achilles tendon rupture. METHODS A systematic search was performed of 10 orthopaedic journals between 2000 and 2021 for comparative studies focusing on management of Achilles tendon rupture reporting dichotomous outcome measures. FI for each outcome was determined by the number of event reversals necessary to alter significance (P < .05). FQ was calculated by dividing the FI by the respective sample size. Additional subgroup analyses were performed. RESULTS Of 8020 studies screened, 1062 met initial search criteria with 17 comparative studies ultimately included for analysis, 10 of which were RCTs. A total of 40 outcomes were examined. Overall, the median FI was 2.5 (interquartile range [IQR] 2-4), the mean FI was 2.90 (±1.58), the median FQ was 0.032 (IQR 0.012-0.069), and the mean FQ was 0.049 (±0.062). The FI was less than the number of patients lost to follow-up for 78% of outcomes. CONCLUSION Studies examining the efficacy of operative vs nonoperative management of Achilles tendon rupture may not be as statistically stable as previously thought. The average number of outcome reversals needed to alter the significance of a given study was 2.90. Future analyses may benefit from the inclusion of a fragility index and a fragility quotient in their statistical analyses.
Collapse
Affiliation(s)
- Nathan P. Fackler
- University of California, Irvine, CA,
USA,Georgetown University School of
Medicine, Washington, DC, USA
| | | | | | | | | | | | - Dean Wang
- University of California, Irvine, CA,
USA,Dean Wang, MD, University of California,
Irvine, 101 The City Drive South, Pavilion III, Building 29A, Orange, CA 92686,
USA.
| |
Collapse
|
20
|
Fackler NP, Ehlers CB, Callan KT, Amirhekmat A, Smith EJ, Parisien RL, Wang D. Statistical Fragility of Single-Row Versus Double-Row Anchoring for Rotator Cuff Repair: A Systematic Review of Comparative Studies. Orthop J Sports Med 2022; 10:23259671221093391. [PMID: 35571970 PMCID: PMC9096204 DOI: 10.1177/23259671221093391] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 02/17/2022] [Indexed: 01/08/2023] Open
Abstract
Background: Comparative studies and randomized controlled trials (RCTs) often use the P (probability) value to convey the statistical significance of their findings. P values are an imperfect measure, however, and are vulnerable to a small number of outcome reversals to alter statistical significance. The inclusion of a fragility index (FI) and fragility quotient (FQ) may aid in the interpretation of a study’s statistical strength. Purpose/Hypothesis: The purpose of this study was to examine the statistical stability of studies comparing single-row to double-row rotator cuff repair. It was hypothesized that the findings of these studies would be vulnerable to a small number of outcome event reversals, often fewer than the number of patients lost to follow-up. Study Design: Systematic review; Level of evidence, 3. Methods: We analyzed comparative studies and RCTs on primary single-row versus double-row rotator cuff repair that were published between 2000 and 2021 in 10 leading orthopaedic journals. Statistical significance was defined as a P < .05. The FI for each outcome was determined by the number of event reversals necessary to alter significance. The FQ was calculated by dividing the FI by the respective sample size. Results: Of 4896 studies screened, 22 comparative studies, 10 of which were RCTs, were ultimately included for analysis. A total of 74 outcomes were examined. Overall, the median FI was 2 (interquartile range [IQR], 1-3), and the median FQ was 0.035 (IQR, 0.020-0.057). The mean FI was 2.55 ± 1.29, and the mean FQ was 0.043 ± 0.027. In 64% of outcomes, the FI was less than the number of patients lost to follow-up.) Additionally, 81% of significant outcomes needed just a single outcome reversal to lose their significance. Conclusion: Over half of the studies currently used to guide clinical practice have a number of patients lost to follow-up greater than their FI. The results of these studies should be interpreted within the context of these limitations. Future analyses may benefit from the inclusion of the FI and the FQ in their statistical analyses.
Collapse
Affiliation(s)
- Nathan P. Fackler
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
- Georgetown University School of Medicine, Washington, DC, USA
| | - Cooper B. Ehlers
- Department of Orthopaedic Surgery, University of California, San Diego, San Diego, California, USA
| | - Kylie T. Callan
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
| | - Arya Amirhekmat
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
| | - Eric J. Smith
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
| | | | - Dean Wang
- Department of Orthopaedic Surgery, University of California, Irvine, Irvine, California, USA
| |
Collapse
|
21
|
Itaya T, Isobe Y, Suzuki S, Koike K, Nishigaki M, Yamamoto Y. The Fragility of Statistically Significant Results in Randomized Clinical Trials for COVID-19. JAMA Netw Open 2022; 5:e222973. [PMID: 35302631 PMCID: PMC8933746 DOI: 10.1001/jamanetworkopen.2022.2973] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
IMPORTANCE Interpreting results from randomized clinical trials (RCTs) for COVID-19, which have been published rapidly and in vast numbers, is challenging during a pandemic. OBJECTIVE To evaluate the robustness of statistically significant findings from RCTs for COVID-19 using the fragility index. DESIGN, SETTING, AND PARTICIPANTS This cross-sectional study included COVID-19 trial articles that randomly assigned patients 1:1 into 2 parallel groups and reported at least 1 binary outcome as significant in the abstract. A systematic search was conducted using PubMed to identify RCTs on COVID-19 published until August 7, 2021. EXPOSURES Trial characteristics, such as type of intervention (treatment drug, vaccine, or others), number of outcome events, and sample size. MAIN OUTCOMES AND MEASURES Fragility index. RESULTS Of the 47 RCTs for COVID-19 included, 36 (77%) were studies of the effects of treatment drugs, 5 (11%) were studies of vaccines, and 6 (13%) were of other interventions. A total of 138 235 participants were included in these trials. The median (IQR) fragility index of the included trials was 4 (1-11). The medians (IQRs) of the fragility indexes of RCTs of treatment drugs, vaccines, and other interventions were 2.5 (1-6), 119 (61-139), and 4.5 (1-18), respectively. The fragility index among more than half of the studies was less than 1% of each sample size, although the fragility index as a proportion of events needing to change would be much higher. CONCLUSIONS AND RELEVANCE This cross-sectional study found a relatively small number of events (a median of 4) would be required to change the results of COVID-19 RCTs from statistically significant to not significant. These findings suggest that health care professionals and policy makers should not rely heavily on individual results of RCTs for COVID-19.
Collapse
Affiliation(s)
- Takahiro Itaya
- Department of Healthcare Epidemiology, Graduate School of Medicine and Public Health, Kyoto University, Kyoto, Japan
| | - Yotsuha Isobe
- Department of Human Health Sciences, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Sayoko Suzuki
- Department of Human Health Sciences, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Kanako Koike
- Department of Medical Genetics, International University of Health and Welfare Graduate School, Tokyo, Japan
| | - Masakazu Nishigaki
- Department of Medical Genetics, International University of Health and Welfare Graduate School, Tokyo, Japan
| | - Yosuke Yamamoto
- Department of Healthcare Epidemiology, Graduate School of Medicine and Public Health, Kyoto University, Kyoto, Japan
| |
Collapse
|
22
|
Marasco D, Russo J, Izzo A, Vallefuoco S, Coppola F, Patel S, Smeraglia F, Balato G, Mariconda M, Bernasconi A. Static versus dynamic fixation of distal tibiofibular syndesmosis: a systematic review of overlapping meta-analyses. Knee Surg Sports Traumatol Arthrosc 2021; 29:3534-3542. [PMID: 34455448 DOI: 10.1007/s00167-021-06721-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 08/24/2021] [Indexed: 11/26/2022]
Abstract
PURPOSE Multiple Level I meta-analyses were conducted comparing traditional static vs. more recently introduced dynamic strategies of fixation for injuries of the distal tibiofibular syndesmosis (TFS). The aim of this review was to assess their robustness and methodological quality, providing support in the choice of a treatment strategy in case of TFS injury using the highest level of evidence. METHODS In this systematic review, conducted in accordance with the PRISMA guidelines, meta-analyses/systematic reviews comparing static and dynamic fixation methods after acute TFS injury were identified. The robustness of studies was evaluated using the fragility index (FI) for meta-analysis and the fragility quotient (FQ). The risk of bias was evaluated using the Assessment of Multiple Systematic Reviews (AMSTAR) instrument. Finally, the Jadad was applied to select the study which provided the highest quality of evidence to develop recommendations for the fixation strategy of these lesions. RESULTS Out of 1.302 records, four Level I meta-analyses were included in this study. Analyzing the statistically significant dichotomous outcomes, the median FI was 3.5 (IQR, 2 to 5.5; range, 1 to 9), while the median FQ was 1.9% (IQR, 1 to 3.5; range 0.35 to 4.4). In total, 37% had an FI of 2 or less and 75% of outcomes had a FI of 4 or less. According to the AMSTAR score and Jadad algorithm, the largest meta-analysis was selected as the highest evidence provided so far. CONCLUSION The meta-analyses with statistically significant dichotomous outcomes comparing dynamic and static fixation for treating injuries of the distal tibiofibular syndesmosis are fragile, with a change in less than four patients or less than 2% of the study population sufficient to reverse a significant outcome to nonsignificant. LEVEL OF EVIDENCE Level I.
Collapse
Affiliation(s)
- Domenico Marasco
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Jacopo Russo
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Antonio Izzo
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Salvatore Vallefuoco
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Francesco Coppola
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Shelain Patel
- Foot and Ankle Unit, Royal National Orthopaedic Hospital, Stanmore, UK
| | - Francesco Smeraglia
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Giovanni Balato
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Massimo Mariconda
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Alessio Bernasconi
- Department of Public Health, Trauma and Orthopaedics, University Federico II of Naples, Via Pansini 5, 80131, Naples, Italy.
| |
Collapse
|