Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cao X, Xing L, Majd E, He H, Gu J, Zhang X. A Systematic Evaluation of Supervised Machine Learning Algorithms for Cell Phenotype Classification Using Single-Cell RNA Sequencing Data. Front Genet 2022;13:836798. [PMID: 35281805 PMCID: PMC8905542 DOI: 10.3389/fgene.2022.836798] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 01/18/2022] [Indexed: 11/13/2022] Open

For:	Cao X, Xing L, Majd E, He H, Gu J, Zhang X. A Systematic Evaluation of Supervised Machine Learning Algorithms for Cell Phenotype Classification Using Single-Cell RNA Sequencing Data. Front Genet 2022;13:836798. [PMID: 35281805 PMCID: PMC8905542 DOI: 10.3389/fgene.2022.836798] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 01/18/2022] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Kidd M, Drozdov IA, Chirindel A, Nicolas G, Imagawa D, Gulati A, Tsuchikawa T, Prasad V, Halim AB, Strosberg J. NETest® 2.0-A decade of innovation in neuroendocrine tumor diagnostics. J Neuroendocrinol 2025;37:e70002. [PMID: 39945192 PMCID: PMC11975799 DOI: 10.1111/jne.70002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 01/27/2025] [Accepted: 01/31/2025] [Indexed: 04/09/2025]

Abstract

Gastroenteropancreatic neuroendocrine neoplasms (GEP-NENs) are challenging to diagnose and manage. Because there is a critical need for a reliable biomarker, we previously developed the NETest, a liquid biopsy test that quantifies the expression of 51 neuroendocrine tumor (NET)-specific genes in blood using real-time PCR (NETest 1.0). In this study, we have leveraged our well-established laboratory approach (blood collection, RNA isolation, qPCR) with contemporary supervised machine learning methods and expanded training and testing sets to improve the discrimination and calibration of the NETest algorithm (NETest 2.0). qPCR measurements of RNA-stabilized blood-derived gene expression of 51 NET markers were used to train two supervised classifiers. The first classifier trained on 78 Controls and 162 NETs, distinguishing NETs from controls; the second, trained on 134 stable disease samples, 61 progressive disease samples, differentiated stable from progressive NET disease. In all cases, 80% of data was retained for model training, while remaining 20% were used for performance evaluation. The predictive performance of the AI system was assessed using sensitivity, specificity, and Area under Received Operating Characteristic curves (AUROC). The algorithm with the highest performance was retained for validation in two independent validation sets. Validation Cohort #I consisted of 277 patients and 186 healthy controls from the United States, Latin America, Europe, Africa and Asia, while Validation Cohort #II comprised 291 European patients from the Swiss NET Registry. A specificity cohort of 147 gastrointestinal, pancreatic and lung malignancies (non-NETs) was also evaluated. NETest 2.0 Algorithm #1 (Random Forest/gene expression normalized to ATG4B) achieved an AUROC of 0.91 for distinguishing NETs from controls (Validation Cohort #I), with a sensitivity of 95% and specificity of 81%. In Validation Cohort #II, 92% of NETs with image-positive disease were detected. The AUROC for differentiating NETs from other malignancies was 0.95; the sensitivity was 92% and specificity 90%. NETest 2.0 Algorithm #2 (Random Forest/gene expression normalized to ALG9) demonstrated an AUROC of 0.81 in Validation Cohort #I and 0.82 in Validation Cohort #II for differentiating stable from progressive disease, with specificities of 81% and 82%, respectively. Model performance was not affected by gender, ethnicity or age. Substantial improvements in performance for both algorithms were identified in head-to-head comparisons with NETest 1.0 (diagnostic: p = 1.73 × 10-9; prognostic: p = 1.02 × 10-10). NETest 2.0 exhibited improved diagnostic and prognostic capabilities over NETest 1.0. The assay also demonstrated improved sensitivity for differentiating NETs from other gastrointestinal, pancreatic and lung malignancies. The validation of this tool in geographically diverse cohorts highlights their potential for widespread clinical use.

Collapse

Stathopoulou KM, Georgakopoulos S, Tasoulis S, Plagianakos VP. Investigating the overlap of machine learning algorithms in the final results of RNA-seq analysis on gene expression estimation. Health Inf Sci Syst 2024;12:14. [PMID: 38435719 PMCID: PMC10904690 DOI: 10.1007/s13755-023-00265-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Accepted: 12/05/2023] [Indexed: 03/05/2024] Open

Cai K, Fu W, Liu H, Yang X, Wang Z, Zhao X. Leveraging Bioinformatics and Machine Learning for Identifying Prognostic Biomarkers and Predicting Clinical Outcomes in Lung Adenocarcinoma. Genes (Basel) 2024;15:1497. [PMID: 39766765 PMCID: PMC11675206 DOI: 10.3390/genes15121497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 11/06/2024] [Accepted: 11/21/2024] [Indexed: 01/11/2025] Open

Modlin IM, Kidd M, Drozdov IA, Boegemann M, Bodei L, Kunikowska J, Malczewska A, Bernemann C, Koduru SV, Rahbar K. Development of a multigenomic liquid biopsy (PROSTest) for prostate cancer in whole blood. Prostate 2024;84:850-865. [PMID: 38571290 DOI: 10.1002/pros.24704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/04/2024] [Accepted: 03/25/2024] [Indexed: 04/05/2024]

Abstract

INTRODUCTION

We describe the development of a molecular assay from publicly available tumor tissue mRNA databases using machine learning and present preliminary evidence of functionality as a diagnostic and monitoring tool for prostate cancer (PCa) in whole blood.

MATERIALS AND METHODS

We assessed 1055 PCas (public microarray data sets) to identify putative mRNA biomarkers. Specificity was confirmed against 32 different solid and hematological cancers from The Cancer Genome Atlas (n = 10,990). This defined a 27-gene panel which was validated by qPCR in 50 histologically confirmed PCa surgical specimens and matched blood. An ensemble classifier (Random Forest, Support Vector Machines, XGBoost) was trained in age-matched PCas (n = 294), and in 72 controls and 64 BPH. Classifier performance was validated in two independent sets (n = 263 PCas; n = 99 controls). We assessed the panel as a postoperative disease monitor in a radical prostatectomy cohort (RPC: n = 47).

RESULTS

A PCa-specific 27-gene panel was identified. Matched blood and tumor gene expression levels were concordant (r = 0.72, p < 0.0001). The ensemble classifier ("PROSTest") was scaled 0%-100% and the industry-standard operating point of ≥50% used to define a PCa. Using this, the PROSTest exhibited an 85% sensitivity and 95% specificity for PCa versus controls. In two independent sets, the metrics were 92%-95% sensitivity and 100% specificity. In the RPCs (n = 47), PROSTest scores decreased from 72% ± 7% to 33% ± 16% (p < 0.0001, Mann-Whitney test). PROSTest was 26% ± 8% in 37 with normal postoperative PSA levels (<0.1 ng/mL). In 10 with elevated postoperative PSA, PROSTest was 60% ± 4%.

CONCLUSION

A 27-gene whole blood signature for PCa is concordant with tissue mRNA levels. Measuring blood expression provides a minimally invasive genomic tool that may facilitate prostate cancer management.

Collapse

Lee YH, Chang J, Lee JE, Jung YS, Lee D, Lee HS. Essential elements of physical fitness analysis in male adolescent athletes using machine learning. PLoS One 2024;19:e0298870. [PMID: 38564629 PMCID: PMC10986970 DOI: 10.1371/journal.pone.0298870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 02/01/2024] [Indexed: 04/04/2024] Open

Fuller GW, Hasan M, Hodkinson P, McAlpine D, Goodacre S, Bath PA, Sbaffi L, Omer Y, Wallis L, Marincowitz C. Training and testing of a gradient boosted machine learning model to predict adverse outcome in patients presenting to emergency departments with suspected covid-19 infection in a middle-income setting. PLOS DIGITAL HEALTH 2023;2:e0000309. [PMID: 37729117 PMCID: PMC10511129 DOI: 10.1371/journal.pdig.0000309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 06/27/2023] [Indexed: 09/22/2023]

Abstract

COVID-19 infection rates remain high in South Africa. Clinical prediction models may be helpful for rapid triage, and supporting clinical decision making, for patients with suspected COVID-19 infection. The Western Cape, South Africa, has integrated electronic health care data facilitating large-scale linked routine datasets. The aim of this study was to develop a machine learning model to predict adverse outcome in patients presenting with suspected COVID-19 suitable for use in a middle-income setting. A retrospective cohort study was conducted using linked, routine data, from patients presenting with suspected COVID-19 infection to public-sector emergency departments (EDs) in the Western Cape, South Africa between 27th August 2020 and 31st October 2021. The primary outcome was death or critical care admission at 30 days. An XGBoost machine learning model was trained and internally tested using split-sample validation. External validation was performed in 3 test cohorts: Western Cape patients presenting during the Omicron COVID-19 wave, a UK cohort during the ancestral COVID-19 wave, and a Sudanese cohort during ancestral and Eta waves. A total of 282,051 cases were included in a complete case training dataset. The prevalence of 30-day adverse outcome was 4.0%. The most important features for predicting adverse outcome were the requirement for supplemental oxygen, peripheral oxygen saturations, level of consciousness and age. Internal validation using split-sample test data revealed excellent discrimination (C-statistic 0.91, 95% CI 0.90 to 0.91) and calibration (CITL of 1.05). The model achieved C-statistics of 0.84 (95% CI 0.84 to 0.85), 0.72 (95% CI 0.71 to 0.73), and 0.62, (95% CI 0.59 to 0.65) in the Omicron, UK, and Sudanese test cohorts. Results were materially unchanged in sensitivity analyses examining missing data. An XGBoost machine learning model achieved good discrimination and calibration in prediction of adverse outcome in patients presenting with suspected COVID19 to Western Cape EDs. Performance was reduced in temporal and geographical external validation.

Collapse

Dimitsaki S, Gavriilidis GI, Dimitriadis VK, Natsiavas P. Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence. Artif Intell Med 2023;137:102490. [PMID: 36868685 PMCID: PMC9846931 DOI: 10.1016/j.artmed.2023.102490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 01/10/2023] [Accepted: 01/11/2023] [Indexed: 01/19/2023]

Abstract

The SARS-CoV-2 pandemic highlighted the need for software tools that could facilitate patient triage regarding potential disease severity or even death. In this article, an ensemble of Machine Learning (ML) algorithms is evaluated in terms of predicting the severity of their condition using plasma proteomics and clinical data as input. An overview of AI-based technical developments to support COVID-19 patient management is presented outlining the landscape of relevant technical developments. Based on this review, the use of an ensemble of ML algorithms that analyze clinical and biological data (i.e., plasma proteomics) of COVID-19 patients is designed and deployed to evaluate the potential use of AI for early COVID-19 patient triage. The proposed pipeline is evaluated using three publicly available datasets for training and testing. Three ML "tasks" are defined, and several algorithms are tested through a hyperparameter tuning method to identify the highest-performance models. As overfitting is one of the typical pitfalls for such approaches (mainly due to the size of the training/validation datasets), a variety of evaluation metrics are used to mitigate this risk. In the evaluation procedure, recall scores ranged from 0.6 to 0.74 and F1-score from 0.62 to 0.75. The best performance is observed via Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) algorithms. Additionally, input data (proteomics and clinical data) were ranked based on corresponding Shapley additive explanation (SHAP) values and evaluated for their prognosticated capacity and immuno-biological credence. This "interpretable" approach revealed that our ML models could discern critical COVID-19 cases predominantly based on patient's age and plasma proteins on B cell dysfunction, hyper-activation of inflammatory pathways like Toll-like receptors, and hypo-activation of developmental and immune pathways like SCF/c-Kit signaling. Finally, the herein computational workflow is corroborated in an independent dataset and MLP superiority along with the implication of the abovementioned predictive biological pathways are corroborated. Regarding limitations of the presented ML pipeline, the datasets used in this study contain less than 1000 observations and a significant number of input features hence constituting a high-dimensional low-sample (HDLS) dataset which could be sensitive to overfitting. An advantage of the proposed pipeline is that it combines biological data (plasma proteomics) with clinical-phenotypic data. Thus, in principle, the presented approach could enable patient triage in a timely fashion if used on already trained models. However, larger datasets and further systematic validation are needed to confirm the potential clinical value of this approach. The code is available on Github: https://github.com/inab-certh/Predicting-COVID-19-severity-through-interpretable-AI-analysis-of-plasma-proteomics.

Collapse

Sen Puliparambil B, Tomal JH, Yan Y. A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data. BIOLOGY 2022;11:biology11101495. [PMID: 36290397 PMCID: PMC9598401 DOI: 10.3390/biology11101495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/21/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022]

Le H, Peng B, Uy J, Carrillo D, Zhang Y, Aevermann BD, Scheuermann RH. Machine learning for cell type classification from single nucleus RNA sequencing data. PLoS One 2022;17:e0275070. [PMID: 36149937 PMCID: PMC9506651 DOI: 10.1371/journal.pone.0275070] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 09/09/2022] [Indexed: 11/18/2022] Open