Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Muthu S, Ramakrishnan E. Fragility Analysis of Statistically Significant Outcomes of Randomized Control Trials in Spine Surgery: A Systematic Review. Spine (Phila Pa 1976) 2021;46:198-208. [PMID: 32756285 DOI: 10.1097/brs.0000000000003645] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

For:	Muthu S, Ramakrishnan E. Fragility Analysis of Statistically Significant Outcomes of Randomized Control Trials in Spine Surgery: A Systematic Review. Spine (Phila Pa 1976) 2021;46:198-208. [PMID: 32756285 DOI: 10.1097/brs.0000000000003645] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Number

Cited by Other Article(s)

Lucas SL, Carroll AH, Backstrom ZK, Dylan Pasko KB, Mesfin A. Utilization of the Fragility Index to Assess Randomized Controlled Trials Comparing Cervical Total Disc Arthroplasty to Anterior Cervical Discectomy and Fusion. Global Spine J 2025:21925682251341812. [PMID: 40347150 PMCID: PMC12065715 DOI: 10.1177/21925682251341812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 04/11/2025] [Accepted: 04/28/2025] [Indexed: 05/12/2025] Open

Oeding JF, Krych AJ, Camp CL, Varady NH. The Number of Patients Lost to Follow-Up May Exceed the Fragility Index of a Randomized Controlled Trial Without Reversing Statistical Significance: A Systematic Review and Statistical Model. Arthroscopy 2025;41:442-451.e1. [PMID: 38777001 DOI: 10.1016/j.arthro.2024.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 04/21/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024]

Abstract

PURPOSE

To (1) analyze trends in the publishing of statistical fragility index (FI)-based systematic reviews in the orthopaedic literature, including the prevalence of misleading or inaccurate statements related to the statistical fragility of randomized controlled trials (RCTs) and patients lost to follow-up (LTF), and (2) determine whether RCTs with relatively "low" FIs are truly as sensitive to patients LTF as previously portrayed in the literature.

METHODS

All FI-based studies published in the orthopaedic literature were identified using the Cochrane Database of Systematic Reviews, Web of Science Core Collection, PubMed, and MEDLINE databases. All articles involving application of the FI or reverse FI to study the statistical fragility of studies in orthopaedics were eligible for inclusion in the study. Study characteristics, median FIs and sample sizes, and misleading or inaccurate statements related to the FI and patients LTF were recorded. Misleading or inaccurate statements-defined as those basing conclusions of trial fragility on the false assumption that adding patients LTF back to a trial has the same statistical effect as existing patients in a trial experiencing the opposite outcome-were determined by 2 authors. A theoretical RCT with a sample size of 100, P = .006, and FI of 4 was used to evaluate the difference in effect on statistical significance between flipping outcome events of patients already included in the trial (FI) and adding patients LTF back to the trial to show the true sensitivity of RCTs to patients LTF.

RESULTS

Of the 39 FI-based studies, 37 (95%) directly compared the FI with the number of patients LTF. Of these 37 studies, 22 (59%) included a statement regarding the FI and patients LTF that was determined to be inaccurate or misleading. In the theoretical RCT, a reversal of significance was not observed until 7 patients LTF (nearly twice the FI) were added to the trial in the distribution of maximal significance reversal.

CONCLUSIONS

The claim that any RCT in which the number of patients LTF exceeds the FI could potentially have its significance reversed simply by maintaining study follow-ups is commonly inaccurate and prevalent in orthopaedic studies applying the FI. Patients LTF and the FI are not equivalent. The minimum number of patients LTF required to flip the significance of a typical RCT was shown to be greater than the FI, suggesting that RCTs with relatively low FIs may not be as sensitive to patients LTF as previously portrayed in the literature; however, only a holistic approach that considers the context in which the trial was conducted, potential biases, and study results can determine the merits of any particular RCT.

CLINICAL RELEVANCE

Surgeons may benefit from re-examining their interpretation of prior FI reviews that have made claims of substantial RCT fragility based on comparisons between the FI and patients LTF; it is possible the results are more robust than previously believed.

Collapse

Khan NS, Dhanda AK, Takashima M, Liu R, Yoshiyasu Y, Wu W, Jin W, McCoul ED, Ramanathan M, Ahmed OG. What is the robustness of randomized controlled trials supporting rhinosinusitis guidelines? Am J Otolaryngol 2025;46:104575. [PMID: 39740532 DOI: 10.1016/j.amjoto.2024.104575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Accepted: 12/17/2024] [Indexed: 01/02/2025]

Yu A, Mohamed KS, Kurapatti M, Song J, Huang JJ, Singh P, Alasadi Y, Grewal A, Yendluri A, Namiri N, Corvi J, Kim JS, Cho SK. The statistical fragility of vertebroplasty outcomes: A systematic review of randomized controlled trials. JOURNAL OF CRANIOVERTEBRAL JUNCTION AND SPINE 2025;16:26-33. [PMID: 40292175 PMCID: PMC12029381 DOI: 10.4103/jcvjs.jcvjs_13_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Accepted: 02/08/2025] [Indexed: 04/30/2025] Open

Abstract

Randomized clinical trials (RCTs) on vertebroplasty are crucial for guiding the treatment of vertebral compression fractures, but their overlooked statistical fragility can undermine clinical reliability. Minor outcome changes may overturn significant findings, risking unreliable evidence, and impacting patient care. This study assessed the fragility of significant outcomes in vertebroplasty RCTs, hypothesizing high sensitivity to such changes. PubMed, Embase, and MEDLINE were searched for RCTs on vertebroplasty reporting dichotomous outcomes. The fragility index (FI) and reverse FI quantified the number of outcome reversals needed to change statistical significance for significant and nonsignificant results, respectively. The fragility quotient (FQ) was calculated as the FI divided by the study sample size. Subgroup analysis was conducted by outcome category. A total of 276 outcomes from RCTs were analyzed. The median FI was 5 (interquartile range [IQR]: 4-5), with a FQ of 0.053 (IQR: 0.019-0.088). Statistically significant outcomes (n = 36) had a median FI of 3 (IQR: 2-4) and FQ of 0.034 (IQR: 0.018-0.051), whereas nonsignificant outcomes (n = 240) showed a median FI of 5 (IQR: 4-5) and FQ of 0.062 (IQR: 0.021-0.088). Fracture-related outcomes were the most robust (FI: 5, FQ: 0.088), whereas cement leakage was the most fragile (FI: 3, FQ: 0.041). Pain outcomes had an FI of 5 (FQ: 0.062), and complications and vertebroplasty versus kyphoplasty outcomes were more robust (FI: 5, FQ: 0.013). Patients lost to follow-up exceeded the FI in 79% of outcomes. The statistical findings in vertebroplasty RCTs are fragile and warrant cautious interpretation. A small number of outcome reversals or consistent postoperative follow-up can shift the significance of the results. Standardized reporting of P values alongside FI and FQ metrics is recommended to help clinicians evaluate the robustness of study findings.

Collapse

Byrne R, Ahn B, Zhao L, Quinn M, Naphade O, Owens BD. The Statistical Fragility of Lateral Extra-articular Tenodesis Research: A Systematic Review. Orthop J Sports Med 2024;12:23259671241266329. [PMID: 39221044 PMCID: PMC11363240 DOI: 10.1177/23259671241266329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 10/05/2023] [Indexed: 09/04/2024] Open

Abstract

Background

A P value of <.05 is often used to denote statistical significance; however, in many scenarios, this threshold is vulnerable to a small number of outcome reversals. This study joins a body of studies within the orthopaedic literature that evaluate the statistical fragility of existing research via metrics such as fragility index (FI) and fragility quotient (FQ).

Purpose/Hypothesis

The purpose of this study was to investigate the statistical fragility of randomized controlled trials (RCTs) and comparative studies on the topic, given the resurgent interest in lateral extra-articular tenodesis (LET) to augment primary or revision anterior cruciate ligament reconstruction (ACLR). It was hypothesized that the outcomes reported in these studies would be statistically fragile.

Study Design

Systematic review; Level of evidence, 4.

Methods

Comparative studies and RCTs regarding LET as an adjunct procedure to ACLR published between 2000 and 2022 were analyzed. Descriptive characteristics, dichotomous outcomes, and continuous outcomes were extracted. The FI and continuous FI (CFI) were calculated by the number of event reversals to change significance; the FQ and continuous FQ (CFQ) were calculated to normalize the fragility metrics per sample size.

Results

Of 455 studies screened, 29 studies were included (9 RCTs, 20 comparative); 79.3% of included studies were published after 2020. A total of 48 dichotomous and 265 continuous outcomes were analyzed. The median FI was 9.0 (IQR, 7.0-13.3), with FQ of 0.1 (IQR, 0.04-0.17); the median CFI was 7.8 (IQR, 4.2-19.6), with CFQ of 0.12 (IQR, 0.08-0.19). The FQ and CFQ for studies on LET with revision ACLR were larger (0.117 and 0.113, respectively) than those focused on primary ACLR (0.042 and 0.095, respectively).

Conclusion

Studies focused on LET with primary ACLR were more fragile than those on LET with revision, which suggests that further research on the indications for LET with primary ACLR is necessary. Future orthopaedic comparative research should include fragility metrics alongside traditional P values.

Collapse

Proal JD, Moon AS, Kwon B. The fragility index and reverse fragility index of FDA investigational device exemption trials in spinal fusion surgery: a systematic review. EUROPEAN SPINE JOURNAL : OFFICIAL PUBLICATION OF THE EUROPEAN SPINE SOCIETY, THE EUROPEAN SPINAL DEFORMITY SOCIETY, AND THE EUROPEAN SECTION OF THE CERVICAL SPINE RESEARCH SOCIETY 2024;33:2594-2603. [PMID: 38802596 DOI: 10.1007/s00586-024-08317-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/20/2024] [Accepted: 05/16/2024] [Indexed: 05/29/2024]

Abstract

PURPOSE

FDA investigational device exemption (IDE) studies are considered a gold standard of assessing safety and efficacy of novel devices through RCTs. The fragility index (FI) has emerged as a means to assess robustness of statistically significant study results and inversely, the reverse fragility index (RFI) for non-significant differences. Previous authors have defined results as fragile if loss to follow up is greater than the FI or RFI. The aim of this study was to assess the FI, RFI, and robustness of data supplied by IDE studies in spinal surgery.

METHODS

This was a systematic review of the literature. Inclusion criteria included randomized controlled trials with dichotomous outcome measures conducted under IDE guidelines between 2000 and 2023. FI and RFI were calculated through successively changing events to non-events until the outcome changed to non-significance or significance, respectively. The fragility quotient (FQ) and reverse fragility quotient (RFQ) were calculated by dividing the FI and RFI, respectively, by the sample size.

RESULTS

Thirty-two studies met inclusion criteria with a total of 40 unique outcome measures; 240 outcomes were analyzed. Twenty-six studies reported 96 statistically significant results. The median FI was 6 (IQR: 3-9.25), and patients lost to follow up was greater than the FI in 99.0% (95/96) of results. The average FQ was 0.027. Thirty studies reported 144 statistically insignificant results and a median RFI of 6 (IQR: 4-8). The average RFQ extrapolated was 0.021, and loss to follow up was greater than the RFI in 98.6% (142/144) of results.

CONCLUSIONS

IDE studies in spine surgery are surprisingly fragile given their reputations, large sample sizes, and intent to establish safety in investigational devices. This study found a median FI and RFI of 6. The number of patients lost to follow-up was greater than FIand RFI in 98.8% (237/240) of reported outcomes. FQ and RFQ tell us that changes of two to three patients per hundred can flip the significance of reported outcomes. This is an important reminder of the limitations of RCTs. Analysis of fragility in future studies may help clarify the strength of the relationship between reported data and their conclusions.

Collapse

Zabat MA, Giakas AM, Hohmann AL, Lonner JH. Interpreting the Current Literature on Outcomes of Robotic-Assisted Versus Conventional Total Knee Arthroplasty Using Fragility Analysis: A Systematic Review and Cross-Sectional Study of Randomized Controlled Trials. J Arthroplasty 2024;39:1882-1887. [PMID: 38309638 DOI: 10.1016/j.arth.2024.01.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 01/18/2024] [Accepted: 01/24/2024] [Indexed: 02/05/2024] Open

Dennstädt F, Zink J, Putora PM, Hastings J, Cihoric N. Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain. Syst Rev 2024;13:158. [PMID: 38879534 PMCID: PMC11180407 DOI: 10.1186/s13643-024-02575-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 05/30/2024] [Indexed: 06/19/2024] Open

Abstract

BACKGROUND

Systematically screening published literature to determine the relevant publications to synthesize in a review is a time-consuming and difficult task. Large language models (LLMs) are an emerging technology with promising capabilities for the automation of language-related tasks that may be useful for such a purpose.

METHODS

LLMs were used as part of an automated system to evaluate the relevance of publications to a certain topic based on defined criteria and based on the title and abstract of each publication. A Python script was created to generate structured prompts consisting of text strings for instruction, title, abstract, and relevant criteria to be provided to an LLM. The relevance of a publication was evaluated by the LLM on a Likert scale (low relevance to high relevance). By specifying a threshold, different classifiers for inclusion/exclusion of publications could then be defined. The approach was used with four different openly available LLMs on ten published data sets of biomedical literature reviews and on a newly human-created data set for a hypothetical new systematic literature review.

RESULTS

The performance of the classifiers varied depending on the LLM being used and on the data set analyzed. Regarding sensitivity/specificity, the classifiers yielded 94.48%/31.78% for the FlanT5 model, 97.58%/19.12% for the OpenHermes-NeuralChat model, 81.93%/75.19% for the Mixtral model and 97.58%/38.34% for the Platypus 2 model on the ten published data sets. The same classifiers yielded 100% sensitivity at a specificity of 12.58%, 4.54%, 62.47%, and 24.74% on the newly created data set. Changing the standard settings of the approach (minor adaption of instruction prompt and/or changing the range of the Likert scale from 1-5 to 1-10) had a considerable impact on the performance.

CONCLUSIONS

LLMs can be used to evaluate the relevance of scientific publications to a certain review topic and classifiers based on such an approach show some promising results. To date, little is known about how well such systems would perform if used prospectively when conducting systematic literature reviews and what further implications this might have. However, it is likely that in the future researchers will increasingly use LLMs for evaluating and classifying scientific publications.

Collapse

Skorochod R, Gronovich Y. Fragility Index and Fragility Quotient in Statistically Significant Randomized Controlled Trials in Plastic Breast Surgery. PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN 2024;12:e5916. [PMID: 38903137 PMCID: PMC11188868 DOI: 10.1097/gox.0000000000005916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 05/01/2024] [Indexed: 06/22/2024]

Meade M, Buchan L, Stark M, Woods B. Evidence-Based Medicine and Observational Studies. Clin Spine Surg 2024;37:242-244. [PMID: 37941105 DOI: 10.1097/bsd.0000000000001550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 10/03/2023] [Indexed: 11/10/2023]

Brown AN, Yendluri A, Lawrence KW, Cordero JK, Moucha CS, Hayden BL, Parisien RL. The Statistical Fragility of Tranexamic Acid Use in the Orthopaedic Surgery Literature: A Systematic Review of Randomized Controlled Trials. J Am Acad Orthop Surg 2024;32:508-515. [PMID: 38574390 DOI: 10.5435/jaaos-d-23-00503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/15/2024] [Indexed: 04/06/2024] Open

Abstract

INTRODUCTION

Randomized controlled trials (RCTs) represent the highest level of evidence in orthopaedic surgery literature, although the robustness of statistical findings in these trials may be unreliable. We used the fragility index (FI), reverse fragility index (rFI), and fragility quotient (FQ) to evaluate the statistical stability of outcomes reported in RCTs that assess the use of tranexamic acid (TXA) across orthopaedic subspecialties.

METHODS

PubMed, EMBASE, and MEDLINE were queried for RCTs (2010-present) reporting dichotomous outcomes with study groups stratified by TXA administration. The FI and rFI were defined as the number of outcome event reversals needed to alter the significance level of significant and nonsignificant outcomes, respectively. FQ was determined by dividing the FI or rFI by sample size. Subgroup analyses were conducted based on orthopaedic subspecialty.

RESULTS

Six hundred five RCTs were screened with 108 studies included for analysis comprising 192 total outcomes. The median FI of the 192 outcomes was 4 (IQR 2 to 5) with an associated FQ of 0.03 (IQR 0.019 to 0.050). 45 outcomes were reported as statistically significant with a median FI of 1 (IQR 1 to 5) and associated FQ of 0.02 (IQR 0.011 to 0.034). 147 outcomes were reported as nonsignificant with a median rFI of 4 (IQR 3 to 5) and associated FQ of 0.04 (IQR 0.023 to 0.051). The adult reconstruction, trauma, and spine subspecialties had a median FI of 4. Sports had a median FI of 3. Shoulder and elbow and foot and ankle had median FIs of 6.

DISCUSSION

Statistical outcomes reported in RCTs on the use of TXA in orthopaedic surgery are fragile. Reversal of a few outcomes is sufficient to alter statistical significance. We recommend reporting FI, rFI, and FQ metrics to aid in interpreting the outcomes reported in comparative trials.

Collapse

Suresh NV, Go BC, Fritz CG, Harris J, Ahluwalia V, Xu K, Lu J, Rajasekaran K. The fragility index: how robust are the outcomes of head and neck cancer randomised, controlled trials? J Laryngol Otol 2024;138:451-456. [PMID: 37795709 PMCID: PMC10950446 DOI: 10.1017/s0022215123001755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 08/12/2023] [Accepted: 08/29/2023] [Indexed: 10/06/2023]

Zhang J, Wei H, Chang X, Liang J, Lou Z, Tang X. Statistical fragility of randomized clinical trials pertaining to femoral neck fractures. Injury 2023;54:111161. [PMID: 39491900 DOI: 10.1016/j.injury.2023.111161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 10/09/2023] [Accepted: 10/22/2023] [Indexed: 11/05/2024]

Stern BZ, Poeran J. Statistics in Brief: The Fragility Index. Clin Orthop Relat Res 2023;481:1288-1291. [PMID: 36862056 PMCID: PMC10263243 DOI: 10.1097/corr.0000000000002622] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/10/2023] [Indexed: 03/03/2023]

Geisler FH, Moghaddamjou A, Wilson JRF, Fehlings MG. Methylprednisolone in acute traumatic spinal cord injury: case-matched outcomes from the NASCIS2 and Sygen historical spinal cord injury studies with contemporary statistical analysis. J Neurosurg Spine 2023;38:595-606. [PMID: 36640098 DOI: 10.3171/2022.12.spine22713] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 12/12/2022] [Indexed: 01/15/2023]

Abstract

OBJECTIVE

Methylprednisolone (MP) to treat acute traumatic spinal cord injury (ATSCI) remains controversial since the release of the second National Acute Spinal Cord Injury Study (NASCIS2) in 1990. As two historical studies, NASCIS2 and Sygen in ATSCI, used identical MP dosages, it was possible to construct a new case-level pooled ATSCI data set satisfying contemporary criteria and able to clarify the effect of MP.

METHODS

The new pooled data set was first modernized by excluding patients with injury levels caudal to T10, lower-extremity American Spinal Injury Association (ASIA) motor scores (LEMSs) ≥ 46, Glasgow Coma Scale scores ≤ 11, and age < 15 or > 75 years, and then standardized to the ASIA grading and scoring format. A new updated NASCIS2 data set from this pooled data set contained 31.6% fewer patients than the 1990 NASCIS2 data set.

RESULTS

In the new pooled data set, recovery of LEMSs from baseline to 26 weeks, the primary outcome variable, was separated statistically into five different injury severity cohorts (p < 0.0001). The severity cohorts contained groups with severe floor (62.9%) and ceiling (10.7%) effects, which do not contribute to drug effects. The new NASCIS2 data set duplicated the p value for MP versus placebo in the sub-subgroup analysis of MP initiated ≤ 8 hours (the subgroup) and recovery of motor function on only the right side of the body (a further subgroup within the ≤ 8-hour subgroup), presented as the positive MP effect in the original NASCIS2 reporting. However, current statistical interpretation considers results seen only in post hoc sub-subgroups, without multi-test corrections, to be random effects without clinical significance. The combined case-level pooled data set from the NASCIS2 and Sygen studies increased the MP group from 106 to 431 patients, creating a new MP combined group. This new data set served as a surrogate for a contemporary MP study and found that administration of MP did not enhance ASIA motor score improvement in the lower extremities at 26 weeks. Secondary analysis of descending ASIA motor and sensory cervical neurological levels in cervical ATSCI patients at 26 weeks also found no MP drug effect.

CONCLUSIONS

Analysis of both the new updated NASCIS2 data set and the new case-matched pooled data set from two historical ATSCI studies revealed that administration of MP after spinal cord injury did not demonstrate any enhancement in neurological recovery at 26 weeks. The results of this analysis warrant review by clinical guideline groups.

Collapse

Lee Y, Samarasinghe Y, Chen LH, Jong A, Hapugall A, Javidan A, McKechnie T, Doumouras A, Hong D. Fragility of statistically significant findings from randomized trials in comparing laparoscopic versus robotic abdominopelvic surgeries. Surg Endosc 2023:10.1007/s00464-023-10063-4. [PMID: 37095233 DOI: 10.1007/s00464-023-10063-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 04/01/2023] [Indexed: 04/26/2023]

Muthu S. The efficiency of machine learning-assisted platform for article screening in systematic reviews in orthopaedics. INTERNATIONAL ORTHOPAEDICS 2023;47:551-556. [PMID: 36562816 DOI: 10.1007/s00264-022-05672-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]

Abstract

PURPOSE

With the development of machine learning and artificial intelligence, various platforms were developed to aid in the time-consuming process of article screening in systematic reviews. We aim to analyze the efficiency of a machine learning-assisted platform as an end-user to aid in the screening of the articles for selection into systematic review in orthopaedic surgery.

METHODS

We included three previously published systematic reviews in the field of orthopaedics of increasing levels of difficulty in the structure of the research question to assess the efficiency of a platform with active-learning technology for article screening. We compared the efficiency of the platform compared to the traditional screening and also across the various scenarios tested. We performed five iterations for each review analyzed. The outcome parameters analyzed were the work saved at 95% recall (WSS-95), work saved at 100% recall (WSS-100), and relevant records found after screening the first 30% of the total records (RRF-30).

RESULTS

The machine learning-assisted screening significantly improved the rate of identifying the relevant records compared to the traditional screening method (p<0.001). The WSS-95 for the easy, intermediate, and advanced screening scenarios were 78%, 59%, and 38%, respectively. The WSS-100 for the easy, intermediate, and advanced screening scenarios were 75%, 48%, and 7%, respectively. The RRF-30 for the easy, intermediate, and advanced screening scenarios were 97%, 86%, and 64%, respectively. We noted a significant reduction (p<0.001) in the efficiency with the increasing level of difficulty of the screening scenarios.

CONCLUSION

The machine learning platform is significantly better than the traditional method as an assistive technology to aid in article screening. However, the efficiency of the platform significantly decreases as the complexity of the research question increases.

Collapse

Milto AJ, Negri CE, Baker J, Thuppal S. The Statistical Fragility of Foot and Ankle Surgery Randomized Controlled Trials. J Foot Ankle Surg 2022;62:191-196. [PMID: 36182644 DOI: 10.1053/j.jfas.2022.08.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 08/16/2022] [Accepted: 08/27/2022] [Indexed: 02/03/2023]

Fackler NP, Karasavvidis T, Ehlers CB, Callan KT, Lai WC, Parisien RL, Wang D. The Statistical Fragility of Operative vs Nonoperative Management for Achilles Tendon Rupture: A Systematic Review of Comparative Studies. Foot Ankle Int 2022;43:1331-1339. [PMID: 36004430 PMCID: PMC9527367 DOI: 10.1177/10711007221108078] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Abstract

BACKGROUND

The statistical significance of randomized controlled trials (RCTs) and comparative studies is often conveyed utilizing the P value. However, P values are an imperfect measure and may be vulnerable to a small number of outcome reversals to alter statistical significance. The interpretation of the statistical strength of these studies may be aided by the inclusion of a Fragility Index (FI) and Fragility Quotient (FQ). This study examines the statistical stability of studies comparing operative vs nonoperative management for Achilles tendon rupture.

METHODS

A systematic search was performed of 10 orthopaedic journals between 2000 and 2021 for comparative studies focusing on management of Achilles tendon rupture reporting dichotomous outcome measures. FI for each outcome was determined by the number of event reversals necessary to alter significance (P < .05). FQ was calculated by dividing the FI by the respective sample size. Additional subgroup analyses were performed.

RESULTS

Of 8020 studies screened, 1062 met initial search criteria with 17 comparative studies ultimately included for analysis, 10 of which were RCTs. A total of 40 outcomes were examined. Overall, the median FI was 2.5 (interquartile range [IQR] 2-4), the mean FI was 2.90 (±1.58), the median FQ was 0.032 (IQR 0.012-0.069), and the mean FQ was 0.049 (±0.062). The FI was less than the number of patients lost to follow-up for 78% of outcomes.

CONCLUSION

Studies examining the efficacy of operative vs nonoperative management of Achilles tendon rupture may not be as statistically stable as previously thought. The average number of outcome reversals needed to alter the significance of a given study was 2.90. Future analyses may benefit from the inclusion of a fragility index and a fragility quotient in their statistical analyses.

Collapse

Fackler NP, Ehlers CB, Callan KT, Amirhekmat A, Smith EJ, Parisien RL, Wang D. Statistical Fragility of Single-Row Versus Double-Row Anchoring for Rotator Cuff Repair: A Systematic Review of Comparative Studies. Orthop J Sports Med 2022;10:23259671221093391. [PMID: 35571970 PMCID: PMC9096204 DOI: 10.1177/23259671221093391] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 02/17/2022] [Indexed: 01/08/2023] Open

Abstract

Background:

Comparative studies and randomized controlled trials (RCTs) often use the P (probability) value to convey the statistical significance of their findings. P values are an imperfect measure, however, and are vulnerable to a small number of outcome reversals to alter statistical significance. The inclusion of a fragility index (FI) and fragility quotient (FQ) may aid in the interpretation of a study’s statistical strength.

Purpose/Hypothesis:

The purpose of this study was to examine the statistical stability of studies comparing single-row to double-row rotator cuff repair. It was hypothesized that the findings of these studies would be vulnerable to a small number of outcome event reversals, often fewer than the number of patients lost to follow-up.

Study Design:

Systematic review; Level of evidence, 3.

Methods:

We analyzed comparative studies and RCTs on primary single-row versus double-row rotator cuff repair that were published between 2000 and 2021 in 10 leading orthopaedic journals. Statistical significance was defined as a P < .05. The FI for each outcome was determined by the number of event reversals necessary to alter significance. The FQ was calculated by dividing the FI by the respective sample size.

Results:

Of 4896 studies screened, 22 comparative studies, 10 of which were RCTs, were ultimately included for analysis. A total of 74 outcomes were examined. Overall, the median FI was 2 (interquartile range [IQR], 1-3), and the median FQ was 0.035 (IQR, 0.020-0.057). The mean FI was 2.55 ± 1.29, and the mean FQ was 0.043 ± 0.027. In 64% of outcomes, the FI was less than the number of patients lost to follow-up.) Additionally, 81% of significant outcomes needed just a single outcome reversal to lose their significance.

Conclusion:

Over half of the studies currently used to guide clinical practice have a number of patients lost to follow-up greater than their FI. The results of these studies should be interpreted within the context of these limitations. Future analyses may benefit from the inclusion of the FI and the FQ in their statistical analyses.

Collapse

Itaya T, Isobe Y, Suzuki S, Koike K, Nishigaki M, Yamamoto Y. The Fragility of Statistically Significant Results in Randomized Clinical Trials for COVID-19. JAMA Netw Open 2022;5:e222973. [PMID: 35302631 PMCID: PMC8933746 DOI: 10.1001/jamanetworkopen.2022.2973] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Abstract

IMPORTANCE

Interpreting results from randomized clinical trials (RCTs) for COVID-19, which have been published rapidly and in vast numbers, is challenging during a pandemic.

OBJECTIVE

To evaluate the robustness of statistically significant findings from RCTs for COVID-19 using the fragility index.

DESIGN, SETTING, AND PARTICIPANTS

This cross-sectional study included COVID-19 trial articles that randomly assigned patients 1:1 into 2 parallel groups and reported at least 1 binary outcome as significant in the abstract. A systematic search was conducted using PubMed to identify RCTs on COVID-19 published until August 7, 2021.

EXPOSURES

Trial characteristics, such as type of intervention (treatment drug, vaccine, or others), number of outcome events, and sample size.

MAIN OUTCOMES AND MEASURES

Fragility index.

RESULTS

Of the 47 RCTs for COVID-19 included, 36 (77%) were studies of the effects of treatment drugs, 5 (11%) were studies of vaccines, and 6 (13%) were of other interventions. A total of 138 235 participants were included in these trials. The median (IQR) fragility index of the included trials was 4 (1-11). The medians (IQRs) of the fragility indexes of RCTs of treatment drugs, vaccines, and other interventions were 2.5 (1-6), 119 (61-139), and 4.5 (1-18), respectively. The fragility index among more than half of the studies was less than 1% of each sample size, although the fragility index as a proportion of events needing to change would be much higher.

CONCLUSIONS AND RELEVANCE

This cross-sectional study found a relatively small number of events (a median of 4) would be required to change the results of COVID-19 RCTs from statistically significant to not significant. These findings suggest that health care professionals and policy makers should not rely heavily on individual results of RCTs for COVID-19.

Collapse

Marasco D, Russo J, Izzo A, Vallefuoco S, Coppola F, Patel S, Smeraglia F, Balato G, Mariconda M, Bernasconi A. Static versus dynamic fixation of distal tibiofibular syndesmosis: a systematic review of overlapping meta-analyses. Knee Surg Sports Traumatol Arthrosc 2021;29:3534-3542. [PMID: 34455448 DOI: 10.1007/s00167-021-06721-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 08/24/2021] [Indexed: 11/26/2022]

Abstract

PURPOSE

Multiple Level I meta-analyses were conducted comparing traditional static vs. more recently introduced dynamic strategies of fixation for injuries of the distal tibiofibular syndesmosis (TFS). The aim of this review was to assess their robustness and methodological quality, providing support in the choice of a treatment strategy in case of TFS injury using the highest level of evidence.

METHODS

In this systematic review, conducted in accordance with the PRISMA guidelines, meta-analyses/systematic reviews comparing static and dynamic fixation methods after acute TFS injury were identified. The robustness of studies was evaluated using the fragility index (FI) for meta-analysis and the fragility quotient (FQ). The risk of bias was evaluated using the Assessment of Multiple Systematic Reviews (AMSTAR) instrument. Finally, the Jadad was applied to select the study which provided the highest quality of evidence to develop recommendations for the fixation strategy of these lesions.

RESULTS

Out of 1.302 records, four Level I meta-analyses were included in this study. Analyzing the statistically significant dichotomous outcomes, the median FI was 3.5 (IQR, 2 to 5.5; range, 1 to 9), while the median FQ was 1.9% (IQR, 1 to 3.5; range 0.35 to 4.4). In total, 37% had an FI of 2 or less and 75% of outcomes had a FI of 4 or less. According to the AMSTAR score and Jadad algorithm, the largest meta-analysis was selected as the highest evidence provided so far.

CONCLUSION

The meta-analyses with statistically significant dichotomous outcomes comparing dynamic and static fixation for treating injuries of the distal tibiofibular syndesmosis are fragile, with a change in less than four patients or less than 2% of the study population sufficient to reverse a significant outcome to nonsignificant.

LEVEL OF EVIDENCE

Level I.

Collapse