1
|
Zeng Y, Zhang Y, Xiao Z, Sui H. A multi-classification deep neural network for cancer type identification from high-dimension, small-sample and imbalanced gene microarray data. Sci Rep 2025; 15:5239. [PMID: 39939378 PMCID: PMC11822135 DOI: 10.1038/s41598-025-89475-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Accepted: 02/05/2025] [Indexed: 02/14/2025] Open
Abstract
Gene microarray technology provides an efficient way to diagnose cancer. However, microarray gene expression data face the challenges of high-dimension, small-sample, and multi-class imbalance. The coupling of these challenges leads to inaccurate results when using traditional feature selection and classification algorithms. Due to fast learning speed and good classification performance, deep neural network such as generative adversarial network has been proven one of the best classification algorithms, especially in bioinformatics domain. However, it is limited to binary application and inefficient in processing high-dimensional sparse features. This paper proposes a multi-classification generative adversarial network model combined with features bundling (MGAN-FB) to handle the coupling of high-dimension, small-sample, and multi-class imbalance for gene microarray data classification at both feature and algorithmic levels. At feature level, a deep encoder structure combining feature bundling (FB) mechanism and squeeze and excite (SE) mechanism, is designed for the generator. So, the sparsity, correlation and consequence of high-dimension features are all taken into consideration for adaptive features extraction. It achieves effective dimensionality reduction without transitional information loss. At algorithmic level, a softmax module coupled with multi-classifier are introduced into the discriminator, with a new objective function is distinctively designed for the proposed MGAN-FB model, considering encode loss, reconstruction loss, discrimination loss and multi-classification loss. We extend generative adversaria framework from the binary classification to the multi-classification field. Experiments are performed on eight open-source gene microarray datasets from classification performance, running time and non-parametric tests, which demonstrate that the proposed method has obvious advantages over other 7 compared methods.
Collapse
Affiliation(s)
- Yifu Zeng
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
- Department of Information Technology, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Yixiang Zhang
- Department of Infectious Diseases, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Zikai Xiao
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
| | - He Sui
- College of Aeronautical Engineering, Civil Aviation University of China, Tianjin, 300300, China.
- Information Security Evaluation Center, Civil Aviation University of China, Tianjin, 300300, China.
| |
Collapse
|
2
|
Rathore AS, Kumar N, Choudhury S, Mehta NK, Raghava GPS. Prediction of hemolytic peptides and their hemolytic concentration. Commun Biol 2025; 8:176. [PMID: 39905233 PMCID: PMC11794569 DOI: 10.1038/s42003-025-07615-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 01/28/2025] [Indexed: 02/06/2025] Open
Abstract
Peptide-based drugs often fail in clinical trials due to their toxicity or hemolytic activity against red blood cells (RBCs). Existing methods predict hemolytic peptides but not the concentration (HC50) required to lyse 50% of RBCs. This study develops classification and regression models to identify and quantify hemolytic activity. These models train on 1926 peptides with experimentally determined HC50 against mammalian RBCs. Analysis indicates that hydrophobic and positively charged residues were associated with higher hemolytic activity. Among classification models, including machine learning (ML), quantum ML, and protein language models, a hybrid model combining random forest (RF) and a motif-based approach achieves the highest area under the receiver operating characteristic curve (AUROC) of 0.921. Regression models achieve a Pearson correlation coefficient (R) of 0.739 and a coefficient of determination (R²) of 0.543. These models outperform existing methods and are implemented in HemoPI2, a web-based platform and standalone software for designing peptides with desired HC50 values ( http://webs.iiitd.edu.in/raghava/hemopi2/ ).
Collapse
Affiliation(s)
- Anand Singh Rathore
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Nishant Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Naman Kumar Mehta
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| |
Collapse
|
3
|
Kim WS, Heo DW, Maeng J, Shen J, Tsogt U, Odkhuu S, Zhang X, Cheraghi S, Kim SW, Ham BJ, Rami FZ, Sui J, Kang CY, Suk HI, Chung YC. Deep Learning-based Brain Age Prediction in Patients With Schizophrenia Spectrum Disorders. Schizophr Bull 2024; 50:804-814. [PMID: 38085061 PMCID: PMC11283195 DOI: 10.1093/schbul/sbad167] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/29/2024]
Abstract
BACKGROUND AND HYPOTHESIS The brain-predicted age difference (brain-PAD) may serve as a biomarker for neurodegeneration. We investigated the brain-PAD in patients with schizophrenia (SCZ), first-episode schizophrenia spectrum disorders (FE-SSDs), and treatment-resistant schizophrenia (TRS) using structural magnetic resonance imaging (sMRI). STUDY DESIGN We employed a convolutional network-based regression (SFCNR), and compared its performance with models based on three machine learning (ML) algorithms. We pretrained the SFCNR with sMRI data of 7590 healthy controls (HCs) selected from the UK Biobank. The parameters of the pretrained model were transferred to the next training phase with a new set of HCs (n = 541). The brain-PAD was analyzed in independent HCs (n = 209) and patients (n = 233). Correlations between the brain-PAD and clinical measures were investigated. STUDY RESULTS The SFCNR model outperformed three commonly used ML models. Advanced brain aging was observed in patients with SCZ, FE-SSDs, and TRS compared to HCs. A significant difference in brain-PAD was observed between FE-SSDs and TRS with ridge regression but not with the SFCNR model. Chlorpromazine equivalent dose and cognitive function were correlated with the brain-PAD in SCZ and FE-SSDs. CONCLUSIONS Our findings indicate that there is advanced brain aging in patients with SCZ and higher brain-PAD in SCZ can be used as a surrogate marker for cognitive dysfunction. These findings warrant further investigations on the causes of advanced brain age in SCZ. In addition, possible psychosocial and pharmacological interventions targeting brain health should be considered in early-stage SCZ patients with advanced brain age.
Collapse
Affiliation(s)
- Woo-Sung Kim
- Department of Psychiatry, Jeonbuk National University, Medical School, Jeonju, Korea
- Research Institute of Clinical Medicine of Jeonbuk National University-Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, Korea
| | - Da-Woon Heo
- Department of Artificial Intelligence, Korea University, Seoul, Korea
| | - Junyeong Maeng
- Department of Artificial Intelligence, Korea University, Seoul, Korea
| | - Jie Shen
- Department of Psychiatry, Jeonbuk National University, Medical School, Jeonju, Korea
- Research Institute of Clinical Medicine of Jeonbuk National University-Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, Korea
- Department of Psychiatry, Yanbian University, Medical School, Yanji, China
| | - Uyanga Tsogt
- Department of Psychiatry, Jeonbuk National University, Medical School, Jeonju, Korea
- Research Institute of Clinical Medicine of Jeonbuk National University-Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, Korea
| | - Soyolsaikhan Odkhuu
- Department of Psychiatry, Jeonbuk National University, Medical School, Jeonju, Korea
- Research Institute of Clinical Medicine of Jeonbuk National University-Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, Korea
| | - Xuefeng Zhang
- Research Institute of Clinical Medicine of Jeonbuk National University-Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, Korea
| | - Sahar Cheraghi
- Department of Psychiatry, Jeonbuk National University, Medical School, Jeonju, Korea
- Research Institute of Clinical Medicine of Jeonbuk National University-Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, Korea
| | - Sung-Wan Kim
- Department of Psychiatry, Chonnam National University Medical School, Gwangju, Korea
| | - Byung-Joo Ham
- Department of Psychiatry, Korea University Anam Hospital, Korea University College of Medicine, Seoul, Korea
| | - Fatima Zahra Rami
- Department of Psychiatry, Jeonbuk National University, Medical School, Jeonju, Korea
- Research Institute of Clinical Medicine of Jeonbuk National University-Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, Korea
| | - Jing Sui
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
| | - Chae Yeong Kang
- Research Institute of Clinical Medicine of Jeonbuk National University-Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, Korea
| | - Heung-Il Suk
- Department of Artificial Intelligence, Korea University, Seoul, Korea
- Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea
| | - Young-Chul Chung
- Department of Psychiatry, Jeonbuk National University, Medical School, Jeonju, Korea
- Department of Psychiatry, Jeonbuk National University Hospital, Jeonju, Korea
- Research Institute of Clinical Medicine of Jeonbuk National University-Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, Korea
| |
Collapse
|
4
|
Hong S, Zhang Y, Wang D, Wang H, Zhang H, Jiang J, Chen L. Disulfidptosis-related lncRNAs signature predicting prognosis and immunotherapy effect in lung adenocarcinoma. Aging (Albany NY) 2024; 16:9972-9989. [PMID: 38862217 PMCID: PMC11210254 DOI: 10.18632/aging.205911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 04/22/2024] [Indexed: 06/13/2024]
Abstract
PURPOSE Lung adenocarcinoma (LUAD) is a prevalent malignant tumor worldwide, with high incidence and mortality rates. However, there is still a lack of specific and sensitive biomarkers for its early diagnosis and targeted treatment. Disulfidptosis is a newly identified mode of cell death that is characteristic of disulfide stress. Therefore, exploring the correlation between disulfidptosis-related long non-coding RNAs (DRGs-lncRNAs) and patient prognosis can provide new molecular targets for LUAD patients. METHODS The study analysed the transcriptome data and clinical data of LUAD patients in The Cancer Genome Atlas (TCGA) database, gene co-expression, and univariate Cox regression methods were used to screen for DRGs-lncRNAs related to prognosis. The risk score model of lncRNA was established by univariate and multivariate Cox regression models. TIMER, CIBERSORT, CIBERSORT-ABS, and other methods were used to analyze immune infiltration and further evaluate immune function analysis, immune checkpoints, and drug sensitivity. Real-time polymerase chain reaction (RT-PCR) was performed to detect the expression of DRGs-lncRNAs in LUAD cell lines. RESULTS A total of 108 lncRNAs significantly associated with disulfidptosis were identified. A prognostic model was constructed by screening 10 lncRNAs with independent prognostic significance through single-factor Cox regression analysis, LASSO regression analysis, and multiple-factor Cox regression analysis. Survival analysis of patients through the prognostic model showed that there were obvious survival differences between the high- and low-risk groups. The risk score of the prognostic model can be used as an independent prognostic factor independent of other clinical traits, and the risk score increases with stage. Further analysis showed that the prognostic model was also different from tumor immune cell infiltration, immune function, and immune checkpoint genes in the high- and low-risk groups. Chemotherapy drug susceptibility analysis showed that high-risk patients were more sensitive to Paclitaxel, 5-Fluorouracil, Gefitinib, Docetaxel, Cytarabine, and Cisplatin. Additionally, RT-PCR analysis demonstrated differential expression of DRGs-lncRNAs between LUAD cell lines and the human bronchial epithelial cell line. CONCLUSIONS The prognostic model of DRGs-lncRNAs constructed in this study has certain accuracy and reliability in predicting the survival prognosis of LUAD patients, and provides clues for the interaction between disulfidptosis and LUAD immunotherapy.
Collapse
Affiliation(s)
- Suifeng Hong
- Department of Respiratory and Critical Care Medicine, The Affiliated People’s Hospital of Ningbo University, Ningbo 315400, China
| | - Yu Zhang
- Department of Oncology Radiation, Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200433, China
| | - Dongfeng Wang
- Dongying People’s Hospital (Dongying Hospital of Shandong Provincial Hospital Group), Dongying, Shandong 257091, China
| | - Huaying Wang
- Department of Respiratory and Critical Care Medicine, The Affiliated People’s Hospital of Ningbo University, Ningbo 315400, China
| | - Huihui Zhang
- Department of Respiratory and Critical Care Medicine, The Affiliated People’s Hospital of Ningbo University, Ningbo 315400, China
| | - Jing Jiang
- Department of Respiratory and Critical Care Medicine, The Affiliated People’s Hospital of Ningbo University, Ningbo 315400, China
| | - Liping Chen
- Department of Respiratory and Critical Care Medicine, The Affiliated People’s Hospital of Ningbo University, Ningbo 315400, China
| |
Collapse
|
5
|
Li J, Varghese RS, Ressom HW. RNA-Seq Data Analysis. Methods Mol Biol 2024; 2822:263-290. [PMID: 38907924 DOI: 10.1007/978-1-0716-3918-4_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]
Abstract
RNA-Seq data analysis stands as a vital part of genomics research, turning vast and complex datasets into meaningful biological insights. It is a field marked by rapid evolution and ongoing innovation, necessitating a thorough understanding for anyone seeking to unlock the potential of RNA-Seq data. In this chapter, we describe the intricate landscape of RNA-seq data analysis, elucidating a comprehensive pipeline that navigates through the entirety of this complex process. Beginning with quality control, the chapter underscores the paramount importance of ensuring the integrity of RNA-seq data, as it lays the groundwork for subsequent analyses. Preprocessing is then addressed, where the raw sequence data undergoes necessary modifications and enhancements, setting the stage for the alignment phase. This phase involves mapping the processed sequences to a reference genome, a step pivotal for decoding the origins and functions of these sequences.Venturing into the heart of RNA-seq analysis, the chapter then explores differential expression analysis-the process of identifying genes that exhibit varying expression levels across different conditions or sample groups. Recognizing the biological context of these differentially expressed genes is pivotal; hence, the chapter transitions into functional analysis. Here, methods and tools like Gene Ontology and pathway analyses help contextualize the roles and interactions of the identified genes within broader biological frameworks. However, the chapter does not stop at conventional analysis methods. Embracing the evolving paradigms of data science, it delves into machine learning applications for RNA-seq data, introducing advanced techniques in dimension reduction and both unsupervised and supervised learning. These approaches allow for patterns and relationships to be discerned in the data that might be imperceptible through traditional methods.
Collapse
Affiliation(s)
- James Li
- Genomics & Epigenomics Shared Resource, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
| | - Rency S Varghese
- Genomics & Epigenomics Shared Resource, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA
| | - Habtom W Ressom
- Genomics & Epigenomics Shared Resource, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
6
|
Jiao R, Xue B, Zhang M. Benefiting From Single-Objective Feature Selection to Multiobjective Feature Selection: A Multiform Approach. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7773-7786. [PMID: 36346857 DOI: 10.1109/tcyb.2022.3218345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Evolutionary multiobjective feature selection (FS) has gained increasing attention in recent years. However, it still faces some challenges, for example, the frequently appeared duplicated solutions in either the search space or the objective space lead to the diversity loss of the population, and the huge search space results in the low search efficiency of the algorithm. Minimizing the number of selected features and maximizing the classification performance are two major objectives in FS. Usually, the fitness function of a single-objective FS problem linearly aggregates these two objectives through a weighted sum method. Given a predefined direction (weight) vector, the single-objective FS task can explore the specified direction or area extensively. Different direction vectors result in different search directions in the objective space. Motivated by this, this article proposes a multiform framework, which solves a multiobjective FS task combined with its auxiliary single-objective FS tasks in a multitask environment. By setting different direction vectors, promising feature subsets from single-objective FS tasks can be utilized, to boost the evolutionary search of the multiobjective FS task. By comparing with five classical and state-of-the-art multiobjective evolutionary algorithms, as well as four well-performing FS algorithms, the effectiveness and efficiency of the proposed method are verified via extensive experiments on 18 classification datasets. Furthermore, the effectiveness of the proposed method is also investigated in a noisy environment.
Collapse
|
7
|
van der Sar IG, van Jaarsveld N, Spiekerman IA, Toxopeus FJ, Langens QL, Wijsenbeek MS, Dauwels J, Moor CC. Evaluation of different classification methods using electronic nose data to diagnose sarcoidosis. J Breath Res 2023; 17:047104. [PMID: 37595574 DOI: 10.1088/1752-7163/acf1bf] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 08/18/2023] [Indexed: 08/20/2023]
Abstract
Electronic nose (eNose) technology is an emerging diagnostic application, using artificial intelligence to classify human breath patterns. These patterns can be used to diagnose medical conditions. Sarcoidosis is an often difficult to diagnose disease, as no standard procedure or conclusive test exists. An accurate diagnostic model based on eNose data could therefore be helpful in clinical decision-making. The aim of this paper is to evaluate the performance of various dimensionality reduction methods and classifiers in order to design an accurate diagnostic model for sarcoidosis. Various methods of dimensionality reduction and multiple hyperparameter optimised classifiers were tested and cross-validated on a dataset of patients with pulmonary sarcoidosis (n= 224) and other interstitial lung disease (n= 317). Best performing methods were selected to create a model to diagnose patients with sarcoidosis. Nested cross-validation was applied to calculate the overall diagnostic performance. A classification model with feature selection and random forest (RF) classifier showed the highest accuracy. The overall diagnostic performance resulted in an accuracy of 87.1% and area-under-the-curve of 91.2%. After comparing different dimensionality reduction methods and classifiers, a highly accurate model to diagnose a patient with sarcoidosis using eNose data was created. The RF classifier and feature selection showed the best performance. The presented systematic approach could also be applied to other eNose datasets to compare methods and select the optimal diagnostic model.
Collapse
Affiliation(s)
- Iris G van der Sar
- Department of Respiratory Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Nynke van Jaarsveld
- Educational Program Technical Medicine, Leiden University Medical Center, Delft University of Technology & Erasmus University Medical Center, Leiden, Delft & Rotterdam, The Netherlands
| | - Imme A Spiekerman
- Educational Program Technical Medicine, Leiden University Medical Center, Delft University of Technology & Erasmus University Medical Center, Leiden, Delft & Rotterdam, The Netherlands
| | - Floor J Toxopeus
- Educational Program Technical Medicine, Leiden University Medical Center, Delft University of Technology & Erasmus University Medical Center, Leiden, Delft & Rotterdam, The Netherlands
| | - Quint L Langens
- Educational Program Technical Medicine, Leiden University Medical Center, Delft University of Technology & Erasmus University Medical Center, Leiden, Delft & Rotterdam, The Netherlands
| | - Marlies S Wijsenbeek
- Department of Respiratory Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Justin Dauwels
- Department of Microelectronics, Delft University of Technology, Delft, The Netherlands
| | - Catharina C Moor
- Department of Respiratory Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
8
|
Hamraz M, Ali A, Mashwani WK, Aldahmani S, Khan Z. Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio. PLoS One 2023; 18:e0284619. [PMID: 37098036 PMCID: PMC10128961 DOI: 10.1371/journal.pone.0284619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 04/04/2023] [Indexed: 04/26/2023] Open
Abstract
Feature selection in high dimensional gene expression datasets not only reduces the dimension of the data, but also the execution time and computational cost of the underlying classifier. The current study introduces a novel feature selection method called weighted signal to noise ratio (WSNR) by exploiting the weights of features based on support vectors and signal to noise ratio, with an objective to identify the most informative genes in high dimensional classification problems. The combination of two state-of-the-art procedures enables the extration of the most informative genes. The corresponding weights of these procedures are then multiplied and arranged in decreasing order. Larger weight of a feature indicates its discriminatory power in classifying the tissue samples to their true classes. The current method is validated on eight gene expression datasets. Moreover, results of the proposed method (WSNR) are also compared with four well known feature selection methods. We found that the (WSNR) outperform the other competing methods on 6 out of 8 datasets. Box-plots and Bar-plots of the results of the proposed method and all the other methods are also constructed. The proposed method is further assessed on simulated data. Simulation analysis reveal that (WSNR) outperforms all the other methods included in the study.
Collapse
Affiliation(s)
- Muhammad Hamraz
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Amjad Ali
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Wali Khan Mashwani
- Institute of Numerical Sciences, Kohat University of Science and Technology, Kohat, Pakistan
| | - Saeed Aldahmani
- Department of Analytics in the Digital Era, United Arab Emirates University, Al Ain, UAE
| | - Zardad Khan
- Department of Analytics in the Digital Era, United Arab Emirates University, Al Ain, UAE
| |
Collapse
|
9
|
Xu Z, Vekaria V, Wang F, Cukor J, Su C, Adekkanattu P, Brandt P, Jiang G, Kiefer RC, Luo Y, Rasmussen LV, Xu J, Xiao Y, Alexopoulos G, Pathak J. Using Machine Learning to Predict Antidepressant Treatment Outcome From Electronic Health Records. PSYCHIATRIC RESEARCH AND CLINICAL PRACTICE 2023; 5:118-125. [PMID: 38077277 PMCID: PMC10698704 DOI: 10.1176/appi.prcp.20220015] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 02/26/2023] [Accepted: 02/28/2023] [Indexed: 03/28/2023] Open
Abstract
Objective To evaluate if a machine learning approach can accurately predict antidepressant treatment outcome using electronic health records (EHRs) from patients with depression. Method This study examined 808 patients with depression at a New York City-based outpatient mental health clinic between June 13, 2016 and June 22, 2020. Antidepressant treatment outcome was defined based on trend in depression symptom severity over time and was categorized as either "Recovering" or "Worsening" (i.e., non-Recovering), measured by the slope of individual-level Patient Health Questionnaire-9 (PHQ-9) score trajectory spanning 6 months following treatment initiation. A patient was designated as "Recovering" if the slope is less than 0 and as "Worsening" if the slope was no less than 0. Multiple machine learning (ML) models including L2 norm regularized Logistic Regression, Naive Bayes, Random Forest, and Gradient Boosting Decision Tree (GBDT) were used to predict treatment outcome based on additional data from EHRs, including demographics and diagnoses. Shapley Additive Explanations were applied to identify the most important predictors. Results The GBDT achieved the best results of predicting "Recovering" (AUC: 0.7654 ± 0.0227; precision: 0.6002 ± 0.0215; recall: 0.5131 ± 0.0336). When excluding patients with low PHQ-9 scores (<10) at baseline, the results of predicting "Recovering" (AUC: 0.7254 ± 0.0218; precision: 0.5392 ± 0.0437; recall: 0.4431 ± 0.0513) were obtained. Prior diagnosis of anxiety, psychotherapy, recurrent depression, and baseline depression symptom severity were strong predictors. Conclusions The results demonstrate the potential utility of using ML in longitudinal EHRs to predict antidepressant treatment outcome. Our predictive tool holds the promise to accelerate personalized medical management in patients with psychiatric illnesses.
Collapse
Affiliation(s)
| | | | - Fei Wang
- Weill Cornell MedicineNew YorkNew YorkUSA
| | | | - Chang Su
- Temple UniversityPhiladelphiaPennsylvaniaUSA
| | | | | | | | | | - Yuan Luo
- Northwestern UniversityChicagoIllinoisUSA
| | | | - Jie Xu
- University of FloridaGainesvilleFloridaUSA
| | - Yunyu Xiao
- Weill Cornell MedicineNew YorkNew YorkUSA
| | | | | |
Collapse
|
10
|
Karger E, Kureljusic M. Artificial Intelligence for Cancer Detection-A Bibliometric Analysis and Avenues for Future Research. Curr Oncol 2023; 30:1626-1647. [PMID: 36826086 PMCID: PMC9954989 DOI: 10.3390/curroncol30020125] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 01/18/2023] [Accepted: 01/27/2023] [Indexed: 01/31/2023] Open
Abstract
After cardiovascular diseases, cancer is responsible for the most deaths worldwide. Detecting a cancer disease early improves the chances for healing significantly. One group of technologies that is increasingly applied for detecting cancer is artificial intelligence. Artificial intelligence has great potential to support clinicians and medical practitioners as it allows for the early detection of carcinomas. During recent years, research on artificial intelligence for cancer detection grew a lot. Within this article, we conducted a bibliometric study of the existing research dealing with the application of artificial intelligence in cancer detection. We analyzed 6450 articles on that topic that were published between 1986 and 2022. By doing so, we were able to give an overview of this research field, including its key topics, relevant outlets, institutions, and articles. Based on our findings, we developed a future research agenda that can help to advance research on artificial intelligence for cancer detection. In summary, our study is intended to serve as a platform and foundation for researchers that are interested in the potential of artificial intelligence for detecting cancer.
Collapse
Affiliation(s)
- Erik Karger
- Information Systems and Strategic IT Management, University of Duisburg-Essen, 45141 Essen, Germany
| | - Marko Kureljusic
- International Accounting, University of Duisburg-Essen, 45141 Essen, Germany
| |
Collapse
|
11
|
Okamura H, Yamano H, Tsuda T, Morihiro J, Hirayama K, Nagano H. Development of a clinical microarray system for genetic analysis screening. Pract Lab Med 2022; 33:e00306. [PMID: 36593945 PMCID: PMC9803787 DOI: 10.1016/j.plabm.2022.e00306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/14/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022] Open
Abstract
Objectives Research on the relationship between diseases and genes and the advancement of genetic analysis technologies have made genetic testing in medical care possible. There are various methods for genetic testing, including PCR-based methods and next-generation sequencing; however, screening tests in clinical laboratories are becoming more diverse; therefore, novel measurement systems and equipment are required to meet the needs of each situation. In this study, we aimed to develop a novel microarray-based genetic analysis system that uses a Peltier element to overcome the issues of conventional microarrays, such as the long measurement time and high cost. Methods We constructed a microarray system to detect the UDP-glucuronosyltransferase gene polymorphisms UGT1A1*6 and UGT1A1*28 in patients eligible for irinotecan hydrochloride treatment for use in clinical laboratories. To evaluate the performance of the system, the hybridization temperature and reaction time were determined, and the results were compared with those obtained using a conventional hybridization oven. Results The hybridization temperature reached its target in 1/27th of the time required by the conventional system. We assessed 111 human clinical samples and found that our results agreed with those obtained using existing methods. The total time for the newly developed device was reduced by 85 min compared to that for existing methods, as the automated DNA microarray eliminates the time that existing methods spend on manual operation. Conclusions The surface treatment technology used in our system enables high-density and strong DNA fixation, allowing the construction of a measurement system suitable for clinical applications.
Collapse
Affiliation(s)
- Hiroshi Okamura
- Toyo Kohan Co., Ltd., Shinagawa, Tokyo, Japan,Corresponding author. Toyo Kohan Co., Ltd., Japan.
| | | | | | | | | | - Hiroaki Nagano
- Department of Gastroenterological, Breast and Endocrine Surgery, Yamaguchi University Graduate School of Medicine, Ube, Yamaguchi, Japan
| |
Collapse
|
12
|
Tsamardinos I. Don't lose samples to estimation. PATTERNS (NEW YORK, N.Y.) 2022; 3:100612. [PMID: 36569551 PMCID: PMC9782254 DOI: 10.1016/j.patter.2022.100612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In a typical predictive modeling task, we are asked to produce a final predictive model to employ operationally for predictions, as well as an estimate of its out-of-sample predictive performance. Typically, analysts hold out a portion of the available data, called a Test set, to estimate the model predictive performance on unseen (out-of-sample) records, thus "losing these samples to estimation." However, this practice is unacceptable when the total sample size is low. To avoid losing data to estimation, we need a shift in our perspective: we do not estimate the performance of a specific model instance; we estimate the performance of the pipeline that produces the model. This pipeline is applied on all available samples to produce the final model; no samples are lost to estimation. An estimate of its performance is provided by training the same pipeline on subsets of the samples. When multiple pipelines are tried, additional considerations that correct for the "winner's curse" need to be in place.
Collapse
Affiliation(s)
- Ioannis Tsamardinos
- Computer Science Department, University of Crete, Heraklion, Greece,JADBio – Gnosis DA S.A, Heraklion, Greece,Institute of Applied and Computational Mathematics, Foundation for Research and Technology, Hellas, Heraklion, Greece,Corresponding author
| |
Collapse
|
13
|
Ibal JC, Park YJ, Park MK, Lee J, Kim MC, Shin JH. Review of the Current State of Freely Accessible Web Tools for the Analysis of 16S rRNA Sequencing of the Gut Microbiome. Int J Mol Sci 2022; 23:10865. [PMID: 36142775 PMCID: PMC9501225 DOI: 10.3390/ijms231810865] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/13/2022] [Accepted: 09/15/2022] [Indexed: 11/16/2022] Open
Abstract
Owing to the emergence and improvement of high-throughput technology and the associated reduction in costs, next-generation sequencing (NGS) technology has made large-scale sampling and sequencing possible. With the large volume of data produced, the processing and downstream analysis of data are important for ensuring meaningful results and interpretation. Problems in data analysis may be encountered if researchers have little experience in using programming languages, especially if they are clinicians and beginners in the field. A strategy for solving this problem involves ensuring easy access to commercial software and tools. Here, we observed the current status of free web-based tools for microbiome analysis that can help users analyze and handle microbiome data effortlessly. We limited our search to freely available web-based tools and identified MicrobiomeAnalyst, Mian, gcMeta, VAMPS, and Microbiome Toolbox. We also highlighted the various analyses that each web tool offers, how users can analyze their data using each web tool, and noted some of their limitations. From the abovementioned list, gcMeta, VAMPS, and Microbiome Toolbox had several issues that made the analysis more difficult. Over time, as more data are generated and accessed, more users will analyze microbiome data. Thus, the availability of free and easily accessible web tools can enable the easy use and analysis of microbiome data, especially for those users with less experience in using command-line interfaces.
Collapse
Affiliation(s)
- Jerald Conrad Ibal
- NGS Core Facility, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
| | - Yeong-Jun Park
- NGS Core Facility, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
| | - Min-Kyu Park
- NGS Core Facility, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
- Department of Applied Biosciences, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
| | - Jooeun Lee
- NGS Core Facility, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
| | - Min-Chul Kim
- NGS Core Facility, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
- Department of Applied Biosciences, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
| | - Jae-Ho Shin
- NGS Core Facility, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
- Department of Applied Biosciences, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
- Department of Integrative Biotechnology, Kyungpook National University, Daehak-ro 80, Daegu 41566, Korea
| |
Collapse
|
14
|
Beheshti Z. BMPA-TVSinV: A Binary Marine Predators Algorithm using time-varying sine and V-shaped transfer functions for wrapper-based feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
15
|
Goettsch KA, Zhang L, Singh AB, Dhawan P, Bastola DK. Reliable epithelial-mesenchymal transition biomarkers for colorectal cancer detection. Biomark Med 2022; 16:889-901. [PMID: 35892269 PMCID: PMC9442548 DOI: 10.2217/bmm-2022-0071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Aims: To combat increases in colorectal cancer (CRC) incidence and mortality, biomarkers among differentially expressed genes (DEGs) have been identified to objectively detect cancer. However, DEGs are numerous, and additional parameters may identify more reliable biomarkers. Here, CRC DEGs were filtered into a prioritized list of biomarkers. Materials & methods: Two independent datasets (COAD-READ [n = 698] and GSE50760 [n = 36]) were input alternatively to the recently published data-driven reference method. Results were filtered based on epithelial-mesenchymal transition enrichment (χ-square statistic: 919.05; p = 2.2e-16) to produce 37 potential CRC biomarkers. Results: All 37 genes reliably classified CRC samples and ETV4, CLDN1 and CA2 together were top-ranked by DDR (accuracy: 89%; F1 score: 0.89). Conclusion: Biological and statistical information were combined to produce a better set of CRC detection biomarkers.
Collapse
Affiliation(s)
- Kaitlin A Goettsch
- School of Interdisciplinary Informatics, College of Information Science & Technology, University of Nebraska at Omaha, 1110 S. 67th Street, Omaha, NE 68182, USA
| | - Ling Zhang
- School of Interdisciplinary Informatics, College of Information Science & Technology, University of Nebraska at Omaha, 1110 S. 67th Street, Omaha, NE 68182, USA
| | - Amar B Singh
- Department of Biochemistry & Molecular Biology, University of Nebraska Medical Center, 42nd & Emile Streets, Omaha, NE 68198, USA.,Veterans Affairs Nebraska - Western Iowa Health Care System, Research Service, Omaha, NE 68105, USA
| | - Punita Dhawan
- Department of Biochemistry & Molecular Biology, University of Nebraska Medical Center, 42nd & Emile Streets, Omaha, NE 68198, USA
| | - Dhundy K Bastola
- School of Interdisciplinary Informatics, College of Information Science & Technology, University of Nebraska at Omaha, 1110 S. 67th Street, Omaha, NE 68182, USA
| |
Collapse
|
16
|
Lombardi A, Diacono D, Amoroso N, Biecek P, Monaco A, Bellantuono L, Pantaleo E, Logroscino G, De Blasi R, Tangaro S, Bellotti R. A robust framework to investigate the reliability and stability of explainable artificial intelligence markers of Mild Cognitive Impairment and Alzheimer's Disease. Brain Inform 2022; 9:17. [PMID: 35882684 PMCID: PMC9325942 DOI: 10.1186/s40708-022-00165-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 07/03/2022] [Indexed: 11/11/2022] Open
Abstract
In clinical practice, several standardized neuropsychological tests have been designed to assess and monitor the neurocognitive status of patients with neurodegenerative diseases such as Alzheimer's disease. Important research efforts have been devoted so far to the development of multivariate machine learning models that combine the different test indexes to predict the diagnosis and prognosis of cognitive decline with remarkable results. However, less attention has been devoted to the explainability of these models. In this work, we present a robust framework to (i) perform a threefold classification between healthy control subjects, individuals with cognitive impairment, and subjects with dementia using different cognitive indexes and (ii) analyze the variability of the explainability SHAP values associated with the decisions taken by the predictive models. We demonstrate that the SHAP values can accurately characterize how each index affects a patient's cognitive status. Furthermore, we show that a longitudinal analysis of SHAP values can provide effective information on Alzheimer's disease progression.
Collapse
Affiliation(s)
- Angela Lombardi
- Dipartimento di Fisica, Università degli Studi di Bari Aldo Moro, Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Domenico Diacono
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Przemysław Biecek
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Loredana Bellantuono
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
- Dipartimento di Scienze mediche di base, Neuroscienze e Organi di senso, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Ester Pantaleo
- Dipartimento di Fisica, Università degli Studi di Bari Aldo Moro, Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Giancarlo Logroscino
- Dipartimento di Scienze mediche di base, Neuroscienze e Organi di senso, Università degli Studi di Bari Aldo Moro, Bari, Italy
- Pia Fondazione “Card. G. Panico”, Tricase, Italy
| | | | - Sabina Tangaro
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Roberto Bellotti
- Dipartimento di Fisica, Università degli Studi di Bari Aldo Moro, Bari, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| |
Collapse
|
17
|
Kulwa F, Li C, Zhang J, Shirahama K, Kosov S, Zhao X, Jiang T, Grzegorzek M. A new pairwise deep learning feature for environmental microorganism image analysis. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:51909-51926. [PMID: 35257344 DOI: 10.1007/s11356-022-18849-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 01/20/2022] [Indexed: 06/14/2023]
Abstract
Environmental microorganism (EM) offers a highly efficient, harmless, and low-cost solution to environmental pollution. They are used in sanitation, monitoring, and decomposition of environmental pollutants. However, this depends on the proper identification of suitable microorganisms. In order to fasten, lower the cost, and increase consistency and accuracy of identification, we propose the novel pairwise deep learning features (PDLFs) to analyze microorganisms. The PDLFs technique combines the capability of handcrafted and deep learning features. In this technique, we leverage the Shi and Tomasi interest points by extracting deep learning features from patches which are centered at interest points' locations. Then, to increase the number of potential features that have intermediate spatial characteristics between nearby interest points, we use Delaunay triangulation theorem and straight line geometric theorem to pair the nearby deep learning features. The potential of pairwise features is justified on the classification of EMs using SVMs, Linear discriminant analysis, Logistic regression, XGBoost and Random Forest classifier. The pairwise features obtain outstanding results of 99.17%, 91.34%, 91.32%, 91.48%, and 99.56%, which are the increase of about 5.95%, 62.40%, 62.37%, 61.84%, and 3.23% in accuracy, F1-score, recall, precision, and specificity respectively, compared to non-paired deep learning features.
Collapse
Affiliation(s)
- Frank Kulwa
- Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, People's Republic of China
| | - Chen Li
- Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, People's Republic of China.
| | - Jinghua Zhang
- Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, People's Republic of China
| | - Kimiaki Shirahama
- Department of Informatics, Kindai University, Osaka, Higashiosaka, Japan
| | - Sergey Kosov
- Faculty of Data Engineering, Jacobs University Bremen, Bremen, Germany
| | - Xin Zhao
- Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, People's Republic of China
| | - Tao Jiang
- Control Engineering College, Chengdu University of Information Technology, Chengdu, China
| | - Marcin Grzegorzek
- Institute for Medical Informatics, University of Lübeck, 23538, Lübeck, Germany
| |
Collapse
|
18
|
Machine learning-enabled cancer diagnostics with widefield polarimetric second-harmonic generation microscopy. Sci Rep 2022; 12:10290. [PMID: 35717344 PMCID: PMC9206659 DOI: 10.1038/s41598-022-13623-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 05/03/2022] [Indexed: 11/08/2022] Open
Abstract
The extracellular matrix (ECM) collagen undergoes major remodeling during tumorigenesis. However, alterations to the ECM are not widely considered in cancer diagnostics, due to mostly uniform appearance of collagen fibers in white light images of hematoxylin and eosin-stained (H&E) tissue sections. Polarimetric second-harmonic generation (P-SHG) microscopy enables label-free visualization and ultrastructural investigation of non-centrosymmetric molecules, which, when combined with texture analysis, provides multiparameter characterization of tissue collagen. This paper demonstrates whole slide imaging of breast tissue microarrays using high-throughput widefield P-SHG microscopy. The resulting P-SHG parameters are used in classification to differentiate tumor from normal tissue, resulting in 94.2% for both accuracy and F1-score, and 6.3% false discovery rate. Subsequently, the trained classifier is employed to predict tumor tissue with 91.3% accuracy, 90.7% F1-score, and 13.8% false omission rate. As such, we show that widefield P-SHG microscopy reveals collagen ultrastructure over large tissue regions and can be utilized as a sensitive biomarker for cancer diagnostics and prognostics studies.
Collapse
|
19
|
Just Add Data: automated predictive modeling for knowledge discovery and feature selection. NPJ Precis Oncol 2022; 6:38. [PMID: 35710826 PMCID: PMC9203777 DOI: 10.1038/s41698-022-00274-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 04/13/2022] [Indexed: 01/20/2023] Open
Abstract
Fully automated machine learning (AutoML) for predictive modeling is becoming a reality, giving rise to a whole new field. We present the basic ideas and principles of Just Add Data Bio (JADBio), an AutoML platform applicable to the low-sample, high-dimensional omics data that arise in translational medicine and bioinformatics applications. In addition to predictive and diagnostic models ready for clinical use, JADBio focuses on knowledge discovery by performing feature selection and identifying the corresponding biosignatures, i.e., minimal-size subsets of biomarkers that are jointly predictive of the outcome or phenotype of interest. It also returns a palette of useful information for interpretation, clinical use of the models, and decision making. JADBio is qualitatively and quantitatively compared against Hyper-Parameter Optimization Machine Learning libraries. Results show that in typical omics dataset analysis, JADBio manages to identify signatures comprising of just a handful of features while maintaining competitive predictive performance and accurate out-of-sample performance estimation.
Collapse
|
20
|
Machine learning and bioinformatics approaches for classification and clinical detection of bevacizumab responsive glioblastoma subtypes based on miRNA expression. Sci Rep 2022; 12:8685. [PMID: 35606527 PMCID: PMC9126877 DOI: 10.1038/s41598-022-12566-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 05/03/2022] [Indexed: 11/29/2022] Open
Abstract
For the precise treatment of patients with glioblastoma multiforme (GBM), we classified and detected bevacizumab (BVZ)-responsive subtypes of GBM and found their differential expression (DE) of miRNAs and mRNAs, clinical characteristics, and related functional pathways. Based on miR-21 and miR-10b expression z-scores, approximately 30% of GBM patients were classified as having the GBM BVZ-responsive subtype. For this subtype, GBM patients had a significantly shorter survival time than other GBM patients (p = 0.014), and vascular endothelial growth factor A (VEGF) methylation was significantly lower than that in other GBM patients (p = 0.005). It also revealed 14 DE miRNAs and 7 DE mRNAs and revealed functional characteristics between GBM BVZ subgroups. After comparing several machine learning algorithms, the construction and cross-validation of the SVM classifier were performed. For clinical use, miR-197 was optimized and added to the miRNA panel for better classification. Afterwards, we validated the classifier with several GBM datasets and discovered some key related issues. According to this study, GBM BVZ subtypes can be classified and detected by a combination of SVM classifiers and miRNA panels in existing tissue GBM datasets. With certain modifications, the classifier may be used for the classification and detection of GBM BVZ subtypes for future clinical use.
Collapse
|
21
|
Al Rashid SZ. Collaborative Computing-Based K-Nearest Neighbour Algorithm and Mutual Information to Classify Gene Expressions for Type 2 Diabetes. INTERNATIONAL JOURNAL OF E-COLLABORATION 2022. [DOI: 10.4018/ijec.304044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The classification process is used in gene expression data on venous endothelial cells of umbilical cords in humans to reveal the concepts of regulation of insulin using dynamic gene expression data for two classes, namely, control and exposed to insulin. The mutual information statistical feature selection method is used on all available datasets to select these significant genes. The data reduction results are divided into training and testing, and further supplemented to the KNN classifier for diabetes classification. The results show that the mutual information in KNN reaches the highest ranked 10,000 genes and the test classification accuracy is 100%. Pathway analysis and gene ontology enrichment are used to evaluate the targeted genes. The results clearly exhibit the importance of finding the most informative genes in the database by using the statistical gene selection technique to achieve a reduction in time and cost and increase the efficiency of the classifier. This method exhibits these significant results that can be applied to other data and diseases.
Collapse
|
22
|
The ability to classify patients based on gene-expression data varies by algorithm and performance metric. PLoS Comput Biol 2022; 18:e1009926. [PMID: 35275931 PMCID: PMC8942277 DOI: 10.1371/journal.pcbi.1009926] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 03/23/2022] [Accepted: 02/15/2022] [Indexed: 01/02/2023] Open
Abstract
By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist-and most support diverse hyperparameters-so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.
Collapse
|
23
|
Rahman SM, Lan J, Kaeli D, Dy J, Alshawabkeh A, Gu AZ. Machine learning-based biomarkers identification from toxicogenomics - Bridging to regulatory relevant phenotypic endpoints. JOURNAL OF HAZARDOUS MATERIALS 2022; 423:127141. [PMID: 34560480 PMCID: PMC9628282 DOI: 10.1016/j.jhazmat.2021.127141] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 08/31/2021] [Accepted: 09/02/2021] [Indexed: 05/30/2023]
Abstract
One of the major challenges in realization and implementations of the Tox21 vision is the urgent need to establish quantitative link between in-vitro assay molecular endpoint and in-vivo regulatory-relevant phenotypic toxicity endpoint. Current toxicomics approach still mostly rely on large number of redundant markers without pre-selection or ranking, therefore, selection of relevant biomarkers with minimal redundancy would reduce the number of markers to be monitored and reduce the cost, time, and complexity of the toxicity screening and risk monitoring. Here, we demonstrated that, using time series toxicomics in-vitro assay along with machine learning-based feature selection (maximum relevance and minimum redundancy (MRMR)) and classification method (support vector machine (SVM)), an "optimal" number of biomarkers with minimum redundancy can be identified for prediction of phenotypic toxicity endpoints with good accuracy. We included two case studies for in-vivo carcinogenicity and Ames genotoxicity prediction, using 20 selected chemicals including model genotoxic chemicals and negative controls, respectively. The results suggested that, employing the adverse outcome pathway (AOP) concept, molecular endpoints based on a relatively small number of properly selected biomarker-ensemble involved in the conserved DNA-damage and repair pathways among eukaryotes, were able to predict both Ames genotoxicity endpoints and in-vivo carcinogenicity in rats. A prediction accuracy of 76% with AUC = 0.81 was achieved while predicting in-vivo carcinogenicity with the top-ranked five biomarkers. For Ames genotoxicity prediction, the top-ranked five biomarkers were able to achieve prediction accuracy of 70% with AUC = 0.75. However, the specific biomarkers identified as the top-ranked five biomarkers are different for the two different phenotypic genotoxicity assays. The top-ranked biomarkers for the in-vivo carcinogenicity prediction mainly focused on double strand break repair and DNA recombination, whereas the selected top-ranked biomarkers for Ames genotoxicity prediction are associated with base- and nucleotide-excision repair The method developed in this study will help to fill in the knowledge gap in phenotypic anchoring and predictive toxicology, and contribute to the progress in the implementation of tox 21 vision for environmental and health applications.
Collapse
Affiliation(s)
- Sheikh Mokhlesur Rahman
- Department of Civil and Environmental Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA; Department of Civil Engineering, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh
| | - Jiaqi Lan
- Department of Civil and Environmental Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA; Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - David Kaeli
- Department of Electrical and Computer Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA
| | - Jennifer Dy
- Department of Electrical and Computer Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA
| | - Akram Alshawabkeh
- Department of Civil and Environmental Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA
| | - April Z Gu
- Department of Civil and Environmental Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA; School of Civil and Environmental Engineering, Cornell University, 263 Hollister Hall, Ithaca, NY 14853, USA.
| |
Collapse
|
24
|
Roohafza H, Noohi F, Hosseini SG, Alemzadeh-Ansari M, Bagherieh S, Marateb H, Mansourian M, Mousavi AF, Seyedhosseini M, Farshidi H, Ahmadi N, Yazdani A, Sadeghi M. A Cardiovascular Risk Assessment model according to behavioral, pSychosocial and traditional factors in patients with ST-segment elevation Myocardial Infarction (CRAS-MI): review of literature and methodology of a multi-center cohort study. Curr Probl Cardiol 2022; 48:101158. [DOI: 10.1016/j.cpcardiol.2022.101158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 02/16/2022] [Indexed: 11/17/2022]
|
25
|
|
26
|
Ma Y, Guo J, Li D, Cai X. Identification of potential key genes and functional role of CENPF in osteosarcoma using bioinformatics and experimental analysis. Exp Ther Med 2021; 23:80. [PMID: 34934449 PMCID: PMC8652394 DOI: 10.3892/etm.2021.11003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 09/21/2021] [Indexed: 11/25/2022] Open
Abstract
Osteosarcoma, which arises from bone tissue, is considered to be one of the most common types of cancer in children and teenagers. As the etiology of osteosarcoma has not been fully elucidated, the overall prognosis for patients is generally poor. In recent years, the development of bioinformatical technology has allowed researchers to identify numerous molecular biological characteristics associated with the prognosis of osteosarcoma using online databases. In the present study, Gene Expression Omnibus (GEO) database was used and three microarray datasets were obtained. The GEO2R web tool was utilized and differentially expressed genes (DEGs) in osteosarcoma tissue were identified. Venn analysis was performed to determine the intersection of the DEG profiles. DEGs were analyzed by Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis. Protein-protein interactions (PPIs) between these DEGs were analyzed using the Search Tool for the Retrieval of Interacting Genes database, and the PPI network was then visualized using Cytoscape software. The top ten genes were identified based on measurement of degree, density of maximum neighborhood component, maximal clique centrality and mononuclear cell counts in the PPI network, and five overlapping genes [origin recognition complex subunit 6 (ORC6), IGF-binding protein 5 (IGFBP5), minichromosome maintenance 10 replication initiation factor (MCM10), MET proto-oncogene, receptor tyrosine kinase (MET) and centromere protein F (CENPF)] were identified. Additionally, three module networks were analyzed by Molecular Complex Detection (MCODE), and six key genes [ORC6, MCM10, DEP domain containing 1 (DEPDC1), CENPF, TIMELESS interacting protein (TIPIN) and shugoshin 1 (SGOL1)] were screened. Combined with the results from Cytoscape and MCODE, eight hub genes (ORC6, MCM10, DEPDC1, CENPF, TIPIN, SGOL1, MET and IGFBP5) were obtained. Furthermore, Kaplan-Meier plotter survival analysis was used to evaluate the prognostic value of these eight hub genes in patients with osteosarcoma. Oncomine and GEPIA databases were applied to further confirm the expression levels of hub genes in tissue. Finally, the functional roles of the core gene CENPF were investigated using Cell Counting Kit-8, wound healing and Transwell assays, which indicated that CENPF knockdown inhibited the proliferation, migration and invasion of osteosarcoma cells. These results provided potential prognostic markers, as well as a basis for further investigation of the mechanism underlying osteosarcoma.
Collapse
Affiliation(s)
- Yihui Ma
- Department of Stomatology, General Hospital of Central Theater Command of the People's Liberation Army, Wuhan, Hubei 430070, P.R. China
| | - Jiaping Guo
- Department of Stomatology, General Hospital of Central Theater Command of the People's Liberation Army, Wuhan, Hubei 430070, P.R. China
| | - Da Li
- Department of Stomatology, General Hospital of Central Theater Command of the People's Liberation Army, Wuhan, Hubei 430070, P.R. China
| | - Xianhua Cai
- Department of Orthopedics, General Hospital of Central Theater Command of the People's Liberation Army, Wuhan, Hubei 430070, P.R. China
| |
Collapse
|
27
|
Xu K, Lian F, Quan Y, Liu J, Yin L, Li X, Tian S, Pei H, Xia Q. Septicemic Melioidosis Detection Using Support Vector Machine with Five Immune Cell Types. DISEASE MARKERS 2021; 2021:8668978. [PMID: 34912476 PMCID: PMC8668356 DOI: 10.1155/2021/8668978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 11/17/2021] [Indexed: 11/29/2022]
Abstract
Melioidosis, caused by Burkholderia pseudomallei (B. pseudomallei), predominantly occurs in the tropical regions. Of various types of melioidosis, septicemic melioidosis is the most lethal one with a mortality rate of 40%. Early detection of the disease is paramount for the better chances of cure. In this study, we developed a novel approach for septicemic melioidosis detection, using a machine learning technique-support vector machine (SVM). Several SVM models were built, and 19 features characterized by the corresponding immune cell types were generated by Cell type Identification Estimating Relative Subsets Of RNA Transcripts (CIBERSORT). Using these features, we trained a binomial SVM model on the training set and evaluated it on the independent testing set. Our findings indicated that this model performed well with means of sensitivity and specificity up to 0.962 and 0.979, respectively. Meanwhile, the receiver operating characteristic (ROC) curve analysis gave area under curves (AUCs) ranging from 0.952 to 1.000. Furthermore, we found that a concise SVM model, built upon a combination of CD8+ T cells, resting CD4+ memory T cells, monocytes, M2 macrophages, and activated mast cells, worked perfectly on the detection of septicemic melioidosis. Our data showed that its mean of sensitivity was up to 0.976 while that of specificity up to 0.993. In addition, the ROC curve analysis gave AUC close to 1.000. Taken together, this SVM model is a robust classification tool and may serve as a complementary diagnostic technique to septicemic melioidosis.
Collapse
Affiliation(s)
- Ke Xu
- Key Laboratory of Tropical Translational Medicine of Ministry of Education and School of Tropical Medicine and Laboratory Medicine, Hainan Medical University, Haikou, Hainan, China
| | - Fang Lian
- Department of Clinical Laboratory, The Second Affiliated Hospital, Hainan Medical University, Haikou, China
| | - Yunfan Quan
- Key Laboratory of Tropical Translational Medicine of Ministry of Education and School of Tropical Medicine and Laboratory Medicine, Hainan Medical University, Haikou, Hainan, China
| | - Jun Liu
- School of Basic Medicine and Life Sciences, Hainan Medical University, Haikou, Hainan, China
| | - Li Yin
- Key Laboratory of Tropical Translational Medicine of Ministry of Education and School of Tropical Medicine and Laboratory Medicine, Hainan Medical University, Haikou, Hainan, China
| | - Xuexia Li
- Key Laboratory of Tropical Translational Medicine of Ministry of Education and School of Tropical Medicine and Laboratory Medicine, Hainan Medical University, Haikou, Hainan, China
| | - Shen Tian
- Key Laboratory of Tropical Translational Medicine of Ministry of Education and School of Tropical Medicine and Laboratory Medicine, Hainan Medical University, Haikou, Hainan, China
| | - Hua Pei
- Department of Clinical Laboratory, The Second Affiliated Hospital, Hainan Medical University, Haikou, China
| | - Qianfeng Xia
- Key Laboratory of Tropical Translational Medicine of Ministry of Education and School of Tropical Medicine and Laboratory Medicine, Hainan Medical University, Haikou, Hainan, China
- Department of Clinical Laboratory, The Second Affiliated Hospital, Hainan Medical University, Haikou, China
| |
Collapse
|
28
|
B-MFO: A Binary Moth-Flame Optimization for Feature Selection from Medical Datasets. COMPUTERS 2021. [DOI: 10.3390/computers10110136] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Advancements in medical technology have created numerous large datasets including many features. Usually, all captured features are not necessary, and there are redundant and irrelevant features, which reduce the performance of algorithms. To tackle this challenge, many metaheuristic algorithms are used to select effective features. However, most of them are not effective and scalable enough to select effective features from large medical datasets as well as small ones. Therefore, in this paper, a binary moth-flame optimization (B-MFO) is proposed to select effective features from small and large medical datasets. Three categories of B-MFO were developed using S-shaped, V-shaped, and U-shaped transfer functions to convert the canonical MFO from continuous to binary. These categories of B-MFO were evaluated on seven medical datasets and the results were compared with four well-known binary metaheuristic optimization algorithms: BPSO, bGWO, BDA, and BSSA. In addition, the convergence behavior of the B-MFO and comparative algorithms were assessed, and the results were statistically analyzed using the Friedman test. The experimental results demonstrate a superior performance of B-MFO in solving the feature selection problem for different medical datasets compared to other comparative algorithms.
Collapse
|
29
|
Shahjaman M, Rahman MR, Islam T, Auwul MR, Moni MA, Mollah MNH. rMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data. Comput Biol Med 2021; 138:104911. [PMID: 34634637 DOI: 10.1016/j.compbiomed.2021.104911] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 09/25/2021] [Accepted: 09/25/2021] [Indexed: 12/14/2022]
Abstract
Transcriptomics and metabolomics data often contain missing values or outliers due to limitations of the data acquisition techniques. Most of the statistical methods require complete datasets for downstream analysis. A number of methods have been developed for missing value imputation using the classical mean and variance based on maximum likelihood estimators, which are not robust against outliers. Consequently, the performance of these methods deteriorates in the presence of outliers. Hence precise imputation of missing values and outliers handling are both concurrently important. Therefore, in this paper, we developed a robust iterative approach using robust estimators based on the minimum beta divergence method, which simultaneously impute missing values and outliers. We investigate the performance of the proposed method in a comparison with six frequently used missing value imputation methods such as Zero, KNN, robust SVD, EM, random forest (RF) and weighted least square approach (WLSA) through feature selection using both simulated and real datasets. Ten performance indices were used to explore the optimal method such as Frobenius norm (FOBN), accuracy (ACC), sensitivity (SN), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), detection rate (DR), misclassification error rate (MER), the area under the ROC curve (AUC) and computational runtime. Evaluation based on both simulated and real data suggests the superiority of the proposed method over the other traditional methods in terms of various rates of outliers and missing values. The suggested approach also keeps almost equal performance in absence of outliers with the other methods. The proposed method is accurate, simple, and consumes lower computational time compared to the other methods. Therefore, our recommendation is to apply the proposed procedure for large-scale transcriptomics and metabolomics data analysis. The computational tool has been implemented in an R package, which is publicly available from https://CRAN.R-project.org/package=rMisbeta.
Collapse
Affiliation(s)
- Md Shahjaman
- Department of Statistics, Begum Rokeya University, Rangpur, 5400, Bangladesh.
| | - Md Rezanur Rahman
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Tania Islam
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Md Rabiul Auwul
- School of Economics and Statistics, Guangzhou University, Guangzhou 510006, China
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, Australia
| | - Md Nurul Haque Mollah
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| |
Collapse
|
30
|
Interval modelling in optimization of k‐NN classifiers for large number of attributes in data sets on an example of DNA microarrays. INT J INTELL SYST 2021. [DOI: 10.1002/int.22679] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
31
|
A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data. Processes (Basel) 2021. [DOI: 10.3390/pr9081466] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Data-driven model with predictive ability are important to be used in medical and healthcare. However, the most challenging task in predictive modeling is to construct a prediction model, which can be addressed using machine learning (ML) methods. The methods are used to learn and trained the model using a gene expression dataset without being programmed explicitly. Due to the vast amount of gene expression data, this task becomes complex and time consuming. This paper provides a recent review on recent progress in ML and deep learning (DL) for cancer classification, which has received increasing attention in bioinformatics and computational biology. The development of cancer classification methods based on ML and DL is mostly focused on this review. Although many methods have been applied to the cancer classification problem, recent progress shows that most of the successful techniques are those based on supervised and DL methods. In addition, the sources of the healthcare dataset are also described. The development of many machine learning methods for insight analysis in cancer classification has brought a lot of improvement in healthcare. Currently, it seems that there is highly demanded further development of efficient classification methods to address the expansion of healthcare applications.
Collapse
|
32
|
Gumaei A, Sammouda R, Al-Rakhami M, AlSalman H, El-Zaart A. Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression. Health Informatics J 2021; 27:1460458221989402. [PMID: 33570011 DOI: 10.1177/1460458221989402] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Cancer diagnosis using machine learning algorithms is one of the main topics of research in computer-based medical science. Prostate cancer is considered one of the reasons that are leading to deaths worldwide. Data analysis of gene expression from microarray using machine learning and soft computing algorithms is a useful tool for detecting prostate cancer in medical diagnosis. Even though traditional machine learning methods have been successfully applied for detecting prostate cancer, the large number of attributes with a small sample size of microarray data is still a challenge that limits their ability for effective medical diagnosis. Selecting a subset of relevant features from all features and choosing an appropriate machine learning method can exploit the information of microarray data to improve the accuracy rate of detection. In this paper, we propose to use a correlation feature selection (CFS) method with random committee (RC) ensemble learning to detect prostate cancer from microarray data of gene expression. A set of experiments are conducted on a public benchmark dataset using 10-fold cross-validation technique to evaluate the proposed approach. The experimental results revealed that the proposed approach attains 95.098% accuracy, which is higher than related work methods on the same dataset.
Collapse
Affiliation(s)
- Abdu Gumaei
- Research Chair of Pervasive and Mobile Computing, King Saud University, Saudi Arabia.,Taiz University, Yemen
| | | | - Mabrook Al-Rakhami
- Research Chair of Pervasive and Mobile Computing, King Saud University, Saudi Arabia
| | | | | |
Collapse
|
33
|
Harnessing artificial intelligence for the next generation of 3D printed medicines. Adv Drug Deliv Rev 2021; 175:113805. [PMID: 34019957 DOI: 10.1016/j.addr.2021.05.015] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 05/02/2021] [Accepted: 05/13/2021] [Indexed: 02/06/2023]
Abstract
Artificial intelligence (AI) is redefining how we exist in the world. In almost every sector of society, AI is performing tasks with super-human speed and intellect; from the prediction of stock market trends to driverless vehicles, diagnosis of disease, and robotic surgery. Despite this growing success, the pharmaceutical field is yet to truly harness AI. Development and manufacture of medicines remains largely in a 'one size fits all' paradigm, in which mass-produced, identical formulations are expected to meet individual patient needs. Recently, 3D printing (3DP) has illuminated a path for on-demand production of fully customisable medicines. Due to its flexibility, pharmaceutical 3DP presents innumerable options during formulation development that generally require expert navigation. Leveraging AI within pharmaceutical 3DP removes the need for human expertise, as optimal process parameters can be accurately predicted by machine learning. AI can also be incorporated into a pharmaceutical 3DP 'Internet of Things', moving the personalised production of medicines into an intelligent, streamlined, and autonomous pipeline. Supportive infrastructure, such as The Cloud and blockchain, will also play a vital role. Crucially, these technologies will expedite the use of pharmaceutical 3DP in clinical settings and drive the global movement towards personalised medicine and Industry 4.0.
Collapse
|
34
|
|
35
|
Gershanov S, Madiwale S, Feinberg-Gorenshtein G, Vainer I, Nehushtan T, Michowiz S, Goldenberg-Cohen N, Birger Y, Toledano H, Salmon-Divon M. Classifying Medulloblastoma Subgroups Based on Small, Clinically Achievable Gene Sets. Front Oncol 2021; 11:637482. [PMID: 34178626 PMCID: PMC8223061 DOI: 10.3389/fonc.2021.637482] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 05/10/2021] [Indexed: 12/21/2022] Open
Abstract
As treatment protocols for medulloblastoma (MB) are becoming subgroup-specific, means for reliably distinguishing between its subgroups are a timely need. Currently available methods include immunohistochemical stains, which are subjective and often inconclusive, and molecular techniques—e.g., NanoString, microarrays, or DNA methylation assays—which are time-consuming, expensive and not widely available. Quantitative PCR (qPCR) provides a good alternative for these methods, but the current NanoString panel which includes 22 genes is impractical for qPCR. Here, we applied machine-learning–based classifiers to extract reliable, concise gene sets for distinguishing between the four MB subgroups, and we compared the accuracy of these gene sets to that of the known NanoString 22-gene set. We validated our results using an independent microarray-based dataset of 92 samples of all four subgroups. In addition, we performed a qPCR validation on a cohort of 18 patients diagnosed with SHH, Group 3 and Group 4 MB. We found that the 22-gene set can be reduced to only six genes (IMPG2, NPR3, KHDRBS2, RBM24, WIF1, and EMX2) without compromising accuracy. The identified gene set is sufficiently small to make a qPCR-based MB subgroup classification easily accessible to clinicians, even in developing, poorly equipped countries.
Collapse
Affiliation(s)
- Sivan Gershanov
- Department of Molecular Biology, Ariel University, Ariel, Israel
| | - Shreyas Madiwale
- Hemato-Oncology Laboratory, Division of Pediatric Hematology Oncology, Schneider Children's Medical Center of Israel, Petach Tikva, Israel.,Sackler Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | - Galina Feinberg-Gorenshtein
- Hemato-Oncology Laboratory, Division of Pediatric Hematology Oncology, Schneider Children's Medical Center of Israel, Petach Tikva, Israel
| | - Igor Vainer
- Department of Molecular Biology, Ariel University, Ariel, Israel
| | - Tamar Nehushtan
- Department of Molecular Biology, Ariel University, Ariel, Israel
| | - Shalom Michowiz
- Sackler Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel.,Department of Pediatric Neurosurgery, Schneider Children's Medical Center of Israel, Petach-Tikva, Israel
| | - Nitza Goldenberg-Cohen
- Department of Ophthalmology, Bnai Zion Medical Center, Haifa, Israel.,The Krieger Eye Research Laboratory, Felsenstein Medical Research Center, Rabin Medical Center, Petach-Tikva, Israel.,The Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel
| | - Yehudit Birger
- Hemato-Oncology Laboratory, Division of Pediatric Hematology Oncology, Schneider Children's Medical Center of Israel, Petach Tikva, Israel
| | - Helen Toledano
- Sackler Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel.,Department of Pediatric Oncology, Schneider Children's Medical Center of Israel, Petach-Tikva, Israel
| | - Mali Salmon-Divon
- Department of Molecular Biology, Ariel University, Ariel, Israel.,Adelson School of Medicine, Ariel University, Ariel, Israel
| |
Collapse
|
36
|
Tavolara TE, Niazi MKK, Gower AC, Ginese M, Beamer G, Gurcan MN. Deep learning predicts gene expression as an intermediate data modality to identify susceptibility patterns in Mycobacterium tuberculosis infected Diversity Outbred mice. EBioMedicine 2021; 67:103388. [PMID: 34000621 PMCID: PMC8138606 DOI: 10.1016/j.ebiom.2021.103388] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 04/22/2021] [Accepted: 04/23/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Machine learning sustains successful application to many diagnostic and prognostic problems in computational histopathology. Yet, few efforts have been made to model gene expression from histopathology. This study proposes a methodology which predicts selected gene expression values (microarray) from haematoxylin and eosin whole-slide images as an intermediate data modality to identify fulminant-like pulmonary tuberculosis ('supersusceptible') in an experimentally infected cohort of Diversity Outbred mice (n=77). METHODS Gradient-boosted trees were utilized as a novel feature selector to identify gene transcripts predictive of fulminant-like pulmonary tuberculosis. A novel attention-based multiple instance learning model for regression was used to predict selected genes' expression from whole-slide images. Gene expression predictions were shown to be sufficiently replicated to identify supersusceptible mice using gradient-boosted trees trained on ground truth gene expression data. FINDINGS The model was accurate, showing high positive correlations with ground truth gene expression on both cross-validation (n = 77, 0.63 ≤ ρ ≤ 0.84) and external testing sets (n = 33, 0.65 ≤ ρ ≤ 0.84). The sensitivity and specificity for gene expression predictions to identify supersusceptible mice (n=77) were 0.88 and 0.95, respectively, and for an external set of mice (n=33) 0.88 and 0.93, respectively. IMPLICATIONS Our methodology maps histopathology to gene expression with sufficient accuracy to predict a clinical outcome. The proposed methodology exemplifies a computational template for gene expression panels, in which relatively inexpensive and widely available tissue histopathology may be mapped to specific genes' expression to serve as a diagnostic or prognostic tool. FUNDING National Institutes of Health and American Lung Association.
Collapse
Affiliation(s)
- Thomas E Tavolara
- Center for Biomedical Informatics, Wake Forest School of Medicine, 486 Patterson Avenue, Winston-Salem, NC 27101, United States
| | - M K K Niazi
- Center for Biomedical Informatics, Wake Forest School of Medicine, 486 Patterson Avenue, Winston-Salem, NC 27101, United States.
| | - Adam C Gower
- Department of Medicine, Boston University School of Medicine, 72 E. Concord St Evans Building, Boston, MA 02118, United States
| | - Melanie Ginese
- Department of Infectious Disease and Global Health, Tufts University Cummings School of Veterinary Medicine, 200 Westboro Rd., North Grafton, MA 01536, United States
| | - Gillian Beamer
- Department of Infectious Disease and Global Health, Tufts University Cummings School of Veterinary Medicine, 200 Westboro Rd., North Grafton, MA 01536, United States
| | - Metin N Gurcan
- Center for Biomedical Informatics, Wake Forest School of Medicine, 486 Patterson Avenue, Winston-Salem, NC 27101, United States
| |
Collapse
|
37
|
Li WX, Dai SX, An SQ, Sun T, Liu J, Wang J, Liu LG, Xun Y, Yang H, Fan LX, Zhang XL, Liao WQ, You H, Tamagnone L, Liu F, Huang JF, Liu D. Transcriptome integration analysis and specific diagnosis model construction for Hodgkin's lymphoma, diffuse large B-cell lymphoma, and mantle cell lymphoma. Aging (Albany NY) 2021; 13:11833-11859. [PMID: 33885377 PMCID: PMC8109084 DOI: 10.18632/aging.202882] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 03/02/2021] [Indexed: 01/20/2023]
Abstract
Transcriptome differences between Hodgkin's lymphoma (HL), diffuse large B-cell lymphoma (DLBCL), and mantle cell lymphoma (MCL), which are all derived from B cell, remained unclear. This study aimed to construct lymphoma-specific diagnostic models by screening lymphoma marker genes. Transcriptome data of HL, DLBCL, and MCL were obtained from public databases. Lymphoma marker genes were screened by comparing cases and controls as well as the intergroup differences among lymphomas. A total of 9 HL marker genes, 7 DLBCL marker genes, and 4 MCL marker genes were screened in this study. Most HL marker genes were upregulated, whereas DLBCL and MCL marker genes were downregulated compared to controls. The optimal HL-specific diagnostic model contains one marker gene (MYH2) with an AUC of 0.901. The optimal DLBCL-specific diagnostic model contains 7 marker genes (LIPF, CCDC144B, PRO2964, PHF1, SFTPA2, NTS, and HP) with an AUC of 0.951. The optimal MCL-specific diagnostic model contains 3 marker genes (IGLV3-19, IGKV4-1, and PRB3) with an AUC of 0.843. The present study reveals the transcriptome data-based differences between HL, DLBCL, and MCL, when combined with other clinical markers, may help the clinical diagnosis and prognosis.
Collapse
Affiliation(s)
- Wen-Xing Li
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China.,Guangdong Provincial Key Laboratory of Single Cell Technology and Application, Southern Medical University, Guangzhou, Guangdong, China
| | - Shao-Xing Dai
- Yunnan Key Laboratory of Primate Biomedical Research, Institute of Primate Translational Medicine, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - San-Qi An
- Biosafety Level-3 Laboratory, Life Sciences Institute & Guangxi Key Laboratory of AIDS Prevention and Treatment & Guangxi Collaborative Innovation Center for Biomedicine, Guangxi Medical University, Nanning, Guangxi, China
| | - Tingting Sun
- National School of Development, Peking University, Beijing 100871, China
| | - Justin Liu
- Department of Statistics, University of California, Riverside, CA 92521, USA
| | - Jun Wang
- Foshan Stomatology Hospital, School of Medicine, Foshan University, Foshan, Guangdong, China
| | | | - Yang Xun
- Foshan Stomatology Hospital, School of Medicine, Foshan University, Foshan, Guangdong, China
| | - Hua Yang
- Foshan Stomatology Hospital, School of Medicine, Foshan University, Foshan, Guangdong, China
| | - Li-Xia Fan
- Foshan Stomatology Hospital, School of Medicine, Foshan University, Foshan, Guangdong, China
| | - Xiao-Li Zhang
- Foshan Stomatology Hospital, School of Medicine, Foshan University, Foshan, Guangdong, China
| | - Wan-Qin Liao
- Foshan Stomatology Hospital, School of Medicine, Foshan University, Foshan, Guangdong, China
| | - Hua You
- Affiliated Cancer Hospital & Institute of Guangzhou Medical University, Guangzhou, Guangdong, China
| | - Luca Tamagnone
- Istituto di Istologia ed Embriologia, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Fang Liu
- Foshan Stomatology Hospital, School of Medicine, Foshan University, Foshan, Guangdong, China
| | - Jing-Fei Huang
- Key Laboratory of Animal Models and Human Disease Mechanisms, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Dahai Liu
- Foshan Stomatology Hospital, School of Medicine, Foshan University, Foshan, Guangdong, China
| |
Collapse
|
38
|
Chiodi E, Marn AM, Geib MT, Ünlü MS. The Role of Surface Chemistry in the Efficacy of Protein and DNA Microarrays for Label-Free Detection: An Overview. Polymers (Basel) 2021; 13:1026. [PMID: 33810267 PMCID: PMC8036480 DOI: 10.3390/polym13071026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 03/23/2021] [Accepted: 03/23/2021] [Indexed: 01/04/2023] Open
Abstract
The importance of microarrays in diagnostics and medicine has drastically increased in the last few years. Nevertheless, the efficiency of a microarray-based assay intrinsically depends on the density and functionality of the biorecognition elements immobilized onto each sensor spot. Recently, researchers have put effort into developing new functionalization strategies and technologies which provide efficient immobilization and stability of any sort of molecule. Here, we present an overview of the most widely used methods of surface functionalization of microarray substrates, as well as the most recent advances in the field, and compare their performance in terms of optimal immobilization of the bioreceptor molecules. We focus on label-free microarrays and, in particular, we aim to describe the impact of surface chemistry on two types of microarray-based sensors: microarrays for single particle imaging and for label-free measurements of binding kinetics. Both protein and DNA microarrays are taken into consideration, and the effect of different polymeric coatings on the molecules' functionalities is critically analyzed.
Collapse
Affiliation(s)
- Elisa Chiodi
- Department of Electrical Engineering, Boston University, Boston, MA 02215, USA; (A.M.M.); (M.T.G.); (M.S.Ü.)
| | - Allison M. Marn
- Department of Electrical Engineering, Boston University, Boston, MA 02215, USA; (A.M.M.); (M.T.G.); (M.S.Ü.)
| | - Matthew T. Geib
- Department of Electrical Engineering, Boston University, Boston, MA 02215, USA; (A.M.M.); (M.T.G.); (M.S.Ü.)
| | - M. Selim Ünlü
- Department of Electrical Engineering, Boston University, Boston, MA 02215, USA; (A.M.M.); (M.T.G.); (M.S.Ü.)
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| |
Collapse
|
39
|
Ye H, Li T, Wang H, Wu J, Yi C, Shi J, Wang P, Song C, Dai L, Jiang G, Huang Y, Yu Y, Li J. TSPAN1, TMPRSS4, SDR16C5, and CTSE as Novel Panel for Pancreatic Cancer: A Bioinformatics Analysis and Experiments Validation. Front Immunol 2021; 12:649551. [PMID: 33815409 PMCID: PMC8015801 DOI: 10.3389/fimmu.2021.649551] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 02/23/2021] [Indexed: 12/14/2022] Open
Abstract
Pancreatic cancer is a lethal malignancy with a poor prognosis. This study aims to identify pancreatic cancer-related genes and develop a robust diagnostic model to detect this disease. Weighted gene co-expression network analysis (WGCNA) was used to determine potential hub genes for pancreatic cancer. Their mRNA and protein expression levels were validated through reverse transcription PCR (RT-PCR) and immunohistochemical (IHC). Diagnostic models were developed by eight machine learning algorithms and ten-fold cross-validation. Four hub genes (TSPAN1, TMPRSS4, SDR16C5, and CTSE) were identified based on bioinformatics. RT-PCR showed that the four hub genes were expressed at medium to high levels, IHC revealed that their protein expression levels were higher in pancreatic cancer tissues. For the panel of these four genes, eight models performed with 0.87-0.92 area under the curve value (AUC), 0.91-0.94 sensitivity, and 0.84-0.86 specificity in the validation cohort. In the external validation set, these models also showed good performance (0.86-0.98 AUC, 0.84-1.00 sensitivity, and 0.86-1.00 specificity). In conclusion, this study has identified four hub genes that might be closely related to pancreatic cancer: TSPAN1, TMPRSS4, SDR16C5, and CTSE. Four-gene panels might provide a theoretical basis for the diagnosis of pancreatic cancer.
Collapse
Affiliation(s)
- Hua Ye
- College of Public Health, Zhengzhou University, Zhengzhou, China
| | - Tiandong Li
- College of Public Health, Zhengzhou University, Zhengzhou, China
- Laboratory of Molecular Biology, Henan Luoyang Orthopedic Hospital (Henan Provincial Orthopedic Hospital), Zhengzhou, China
- Henan Key Laboratory of Tumor Epidemiology, Zhengzhou, China
| | - Hua Wang
- College of Public Health, Zhengzhou University, Zhengzhou, China
- Henan Key Laboratory of Tumor Epidemiology, Zhengzhou, China
| | - Jinyu Wu
- College of Public Health, Zhengzhou University, Zhengzhou, China
- Henan Key Laboratory of Tumor Epidemiology, Zhengzhou, China
| | - Chuncheng Yi
- College of Public Health, Zhengzhou University, Zhengzhou, China
- Henan Key Laboratory of Tumor Epidemiology, Zhengzhou, China
| | - Jianxiang Shi
- Henan Key Laboratory of Tumor Epidemiology, Zhengzhou, China
- Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, China
| | - Peng Wang
- College of Public Health, Zhengzhou University, Zhengzhou, China
- Henan Key Laboratory of Tumor Epidemiology, Zhengzhou, China
| | - Chunhua Song
- College of Public Health, Zhengzhou University, Zhengzhou, China
- Henan Key Laboratory of Tumor Epidemiology, Zhengzhou, China
| | - Liping Dai
- Henan Key Laboratory of Tumor Epidemiology, Zhengzhou, China
- Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, China
| | - Guozhong Jiang
- Deparment of Pathology, First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yuxin Huang
- Program in Public Health, University of California, Irvine, Irvine, CA, United States
| | - Yongwei Yu
- Department of Pathology, Second Military Medical University, Shanghai, China
| | - Jitian Li
- Laboratory of Molecular Biology, Henan Luoyang Orthopedic Hospital (Henan Provincial Orthopedic Hospital), Zhengzhou, China
- Henan Key Laboratory of Tumor Epidemiology, Zhengzhou, China
| |
Collapse
|
40
|
Hamraz M, Gul N, Raza M, Khan DM, Khalil U, Zubair S, Khan Z. Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. PeerJ Comput Sci 2021; 7:e562. [PMID: 34141889 PMCID: PMC8176540 DOI: 10.7717/peerj-cs.562] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/04/2021] [Indexed: 05/10/2023]
Abstract
In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.
Collapse
Affiliation(s)
- Muhammad Hamraz
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Naz Gul
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mushtaq Raza
- Department of Computer Sciences, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Umair Khalil
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Seema Zubair
- Department of Mathematics, Statistics and Computer Science, University of Agriculture Peshawar, Peshawar, Pakistan
| | - Zardad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| |
Collapse
|
41
|
Iliopoulos A, Beis G, Apostolou P, Papasotiriou I. Complex Networks, Gene Expression and Cancer Complexity: A Brief Review of Methodology and Applications. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017093504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
In this brief survey, various aspects of cancer complexity and how this complexity can
be confronted using modern complex networks’ theory and gene expression datasets, are described.
In particular, the causes and the basic features of cancer complexity, as well as the challenges
it brought are underlined, while the importance of gene expression data in cancer research
and in reverse engineering of gene co-expression networks is highlighted. In addition, an introduction
to the corresponding theoretical and mathematical framework of graph theory and complex
networks is provided. The basics of network reconstruction along with the limitations of gene
network inference, the enrichment and survival analysis, evolution, robustness-resilience and cascades
in complex networks, are described. Finally, an indicative and suggestive example of a cancer
gene co-expression network inference and analysis is given.
Collapse
Affiliation(s)
- A.C. Iliopoulos
- Research and Development Department, Research Genetic Cancer Centre S.A., Florina, Greece
| | - G. Beis
- Research and Development Department, Research Genetic Cancer Centre S.A., Florina, Greece
| | - P. Apostolou
- Research and Development Department, Research Genetic Cancer Centre S.A., Florina, Greece
| | - I. Papasotiriou
- Research Genetic Cancer Centre International GmbH, Zug, Switzerland
| |
Collapse
|
42
|
kelidari M, Hamidzadeh J. Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator. Soft comput 2020. [DOI: 10.1007/s00500-020-05349-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
43
|
Classification of gene expression patterns using a novel type-2 fuzzy multigranulation-based SVM model for the recognition of cancer mediating biomarkers. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05241-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
44
|
Brain Tumor Segmentation Using Deep Learning and Fuzzy K-Means Clustering for Magnetic Resonance Images. Neural Process Lett 2020. [DOI: 10.1007/s11063-020-10326-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
45
|
Wang T, Tian Y, Qiu RG. Long Short-Term Memory Recurrent Neural Networks for Multiple Diseases Risk Prediction by Leveraging Longitudinal Medical Records. IEEE J Biomed Health Inform 2020; 24:2337-2346. [PMID: 31880573 DOI: 10.1109/jbhi.2019.2962366] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Individuals suffer from chronic diseases without being identified in time, which brings lots of burden of disease to the society. This paper presents a multiple disease risk prediction method to systematically assess future disease risks for patients based on their longitudinal medical records. In this study, medical diagnoses based on International Classification of Diseases (ICD) are aggregated into different levels for prediction to meet the needs of different stakeholders. The proposed approach gets validated using two independent hospital medical datasets, which includes 7105 patients with 18, 893 patients and 4170 patients with 13, 124 visits, respectively. The initial analysis reveals a high variation in patients' characteristics. The study demonstrates that recurrent neural network with long-short time memory units performs well in different levels of diagnosis aggregation. Especially, the results show that the developed model can be well applied to predicting future disease risks for patients, with the exact-match score of 98.90% and 95.12% using 3-digit ICD code aggregation, while 96.60% and 96.83% using 4-digit ICD code aggregation for these two datasets, respectively. Moreover, the approach can be developed as a reference tool for hospital information systems, enhancing patients' healthcare management over time.
Collapse
|
46
|
Yang Z, LaRiviere MJ, Ko J, Till JE, Christensen T, Yee SS, Black TA, Tien K, Lin A, Shen H, Bhagwat N, Herman D, Adallah A, O'Hara MH, Vollmer CM, Katona BW, Stanger BZ, Issadore D, Carpenter EL. A Multianalyte Panel Consisting of Extracellular Vesicle miRNAs and mRNAs, cfDNA, and CA19-9 Shows Utility for Diagnosis and Staging of Pancreatic Ductal Adenocarcinoma. Clin Cancer Res 2020; 26:3248-3258. [PMID: 32299821 PMCID: PMC7334066 DOI: 10.1158/1078-0432.ccr-19-3313] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 02/14/2020] [Accepted: 03/30/2020] [Indexed: 12/27/2022]
Abstract
PURPOSE To determine whether a multianalyte liquid biopsy can improve the detection and staging of pancreatic ductal adenocarcinoma (PDAC). EXPERIMENTAL DESIGN We analyzed plasma from 204 subjects (71 healthy, 44 non-PDAC pancreatic disease, and 89 PDAC) for the following biomarkers: tumor-associated extracellular vesicle miRNA and mRNA isolated on a nanomagnetic platform that we developed and measured by next-generation sequencing or qPCR, circulating cell-free DNA (ccfDNA) concentration measured by qPCR, ccfDNA KRAS G12D/V/R mutations detected by droplet digital PCR, and CA19-9 measured by electrochemiluminescence immunoassay. We applied machine learning to training sets and subsequently evaluated model performance in independent, user-blinded test sets. RESULTS To identify patients with PDAC versus those without, we generated a classification model using a training set of 47 subjects (20 PDAC and 27 noncancer). When applied to a blinded test set (N = 136), the model achieved an AUC of 0.95 and accuracy of 92%, superior to the best individual biomarker, CA19-9 (89%). We next used a cohort of 20 patients with PDAC to train our model for disease staging and applied it to a blinded test set of 25 patients clinically staged by imaging as metastasis-free, including 9 subsequently determined to have had occult metastasis. Our workflow achieved significantly higher accuracy for disease staging (84%) than imaging alone (accuracy = 64%; P < 0.05). CONCLUSIONS Algorithmically combining blood-based biomarkers may improve PDAC diagnostic accuracy and preoperative identification of nonmetastatic patients best suited for surgery, although larger validation studies are necessary.
Collapse
Affiliation(s)
- Zijian Yang
- Department of Mechanical Engineering and Applied Mechanics, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Michael J LaRiviere
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jina Ko
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jacob E Till
- Division of Hematology-Oncology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Theresa Christensen
- Division of Hematology-Oncology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Stephanie S Yee
- Division of Hematology-Oncology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Taylor A Black
- Division of Hematology-Oncology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Kyle Tien
- Division of Hematology-Oncology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Andrew Lin
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Hanfei Shen
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Neha Bhagwat
- Division of Gastroenterology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Daniel Herman
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Andrew Adallah
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Mark H O'Hara
- Division of Hematology-Oncology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Charles M Vollmer
- Division of General Surgery, Department of Surgery, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Bryson W Katona
- Division of Gastroenterology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Ben Z Stanger
- Division of Gastroenterology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - David Issadore
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania
- Department of Electrical and Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Erica L Carpenter
- Division of Hematology-Oncology, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.
| |
Collapse
|
47
|
Yang X, Tian L, Chen Y, Yang L, Xu S, Wu W. Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1262-1275. [PMID: 30575544 DOI: 10.1109/tcbb.2018.2886334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Sparse representation based classification (SRC) methods have achieved remarkable results. SRC, however, still suffer from requiring enough training samples, insufficient use of test samples, and instability of representation. In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples. An IPR is first proposed and its feasibility and stability are analyzed. A classification criterion named category contribution rate is constructed to match the IPR and complete classification. Moreover, a statistical measure is introduced to quantify the stability of representation-based classification methods. Based on the IPRC technique, a robust tumor recognition framework is presented by interpreting microarray gene expression data, where a two-stage hybrid gene selection method is introduced to select informative genes. Finally, the functional analysis of candidate's pathogenicity-related genes is given. Extensive experiments on six public tumor microarray gene expression datasets demonstrate the proposed technique is competitive with state-of-the-art methods.
Collapse
|
48
|
Abdulrauf Sharifai G, Zainol Z. Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes (Basel) 2020; 11:genes11070717. [PMID: 32605144 PMCID: PMC7397300 DOI: 10.3390/genes11070717] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 12/19/2019] [Accepted: 01/07/2020] [Indexed: 11/16/2022] Open
Abstract
The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.
Collapse
Affiliation(s)
- Garba Abdulrauf Sharifai
- Department of Computer Sciences, Yusuf Maitama Sule University, 700222 Kofar Nassarawa, Kano, Nigeria
- School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Malaysia;
- Correspondence: ; Tel.: +60-111-317-0481 or +60-194-004-327
| | - Zurinahni Zainol
- School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Malaysia;
| |
Collapse
|
49
|
Mohammed A, Podila PSB, Davis RL, Ataga KI, Hankins JS, Kamaleswaran R. Using Machine Learning to Predict Early Onset Acute Organ Failure in Critically Ill Intensive Care Unit Patients With Sickle Cell Disease: Retrospective Study. J Med Internet Res 2020; 22:e14693. [PMID: 32401216 PMCID: PMC7254279 DOI: 10.2196/14693] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 08/18/2019] [Accepted: 01/28/2020] [Indexed: 12/22/2022] Open
Abstract
Background Sickle cell disease (SCD) is a genetic disorder of the red blood cells, resulting in multiple acute and chronic complications, including pain episodes, stroke, and kidney disease. Patients with SCD develop chronic organ dysfunction, which may progress to organ failure during disease exacerbations. Early detection of acute physiological deterioration leading to organ failure is not always attainable. Machine learning techniques that allow for prediction of organ failure may enable early identification and treatment and potentially reduce mortality. Objective The aim of this study was to test the hypothesis that machine learning physiomarkers can predict the development of organ dysfunction in a sample of adult patients with SCD admitted to intensive care units (ICUs). Methods We applied diverse machine learning methods, statistical methods, and data visualization techniques to develop classification models to distinguish SCD from controls. Results We studied 63 sequential SCD patients admitted to ICUs with 163 patient encounters (mean age 30.7 years, SD 9.8 years). A subset of these patient encounters, 22.7% (37/163), met the sequential organ failure assessment criteria. The other 126 SCD patient encounters served as controls. A set of signal processing features (such as fast Fourier transform, energy, and continuous wavelet transform) derived from heart rate, blood pressure, and respiratory rate was identified to distinguish patients with SCD who developed acute physiological deterioration leading to organ failure from patients with SCD who did not meet the criteria. A multilayer perceptron model accurately predicted organ failure up to 6 hours before onset, with an average sensitivity and specificity of 96% and 98%, respectively. Conclusions This retrospective study demonstrated the viability of using machine learning to predict acute organ failure among hospitalized adults with SCD. The discovery of salient physiomarkers through machine learning techniques has the potential to further accelerate the development and implementation of innovative care delivery protocols and strategies for medically vulnerable patients.
Collapse
Affiliation(s)
- Akram Mohammed
- Center for Biomedical Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Pradeep S B Podila
- Faith and Health Division, Methodist Le Bonheur Healthcare, Memphis, TN, United States
| | - Robert L Davis
- Center for Biomedical Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Kenneth I Ataga
- Center for Sickle Cell Disease, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Jane S Hankins
- Department of Hematology, St Jude Children's Research Hospital, Memphis, TN, United States
| | - Rishikesan Kamaleswaran
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA, United States
| |
Collapse
|
50
|
Heisinger S, Hitzl W, Hobusch GM, Windhager R, Cotofana S. Predicting Total Knee Replacement from Symptomology and Radiographic Structural Change Using Artificial Neural Networks-Data from the Osteoarthritis Initiative (OAI). J Clin Med 2020; 9:jcm9051298. [PMID: 32369985 PMCID: PMC7288322 DOI: 10.3390/jcm9051298] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 04/22/2020] [Accepted: 04/29/2020] [Indexed: 12/12/2022] Open
Abstract
The aim of the study was to longitudinally investigate symptomatic and structural factors prior to total knee replacement (TKR) surgery in order to identify influential factors that can predict a patient's need for TKR surgery. In total, 165 participants (60% females; 64.5 ± 8.4 years; 29.7 ± 4.7 kg/m2) receiving a TKR in any of both knees within a four-year period were analyzed. Radiographic change, knee pain, knee function and quality of life were annually assessed prior to the TKR procedure. Self-learning artificial neural networks were applied to identify driving factors for the surgical procedure. Significant worsening of radiographic structural change was observed prior to TKR (p ≤ 0.0046), whereas knee symptoms (pain, function, quality of life) worsened significantly only in the year prior to the TKR procedure. By using our prediction model, we were able to predict correctly 80% of the classified individuals to undergo TKR surgery with a positive predictive value of 84% and a negative predictive value of 73%. Our prediction model offers the opportunity to assess a patient's need for TKR surgery two years in advance based on easily available patient data and could therefore be used in a primary care setting.
Collapse
Affiliation(s)
- Stephan Heisinger
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (G.M.H.); (R.W.)
- Correspondence: ; Tel.: +43-1-40400-40830
| | - Wolfgang Hitzl
- Research Office, Biostatistics, Paracelsus Medical University, 5020 Salzburg, Austria;
- Department of Ophthalmology and Optometry, Paracelsus Medical University, 5020 Salzburg, Austria
- Research Program Experimental Ophthalmology and Glaucoma Research, Paracelsus Medical University 5020 Salzburg, Austria
| | - Gerhard M. Hobusch
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (G.M.H.); (R.W.)
| | - Reinhard Windhager
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (G.M.H.); (R.W.)
| | - Sebastian Cotofana
- Department of Clinical Anatomy, Mayo Clinic College of Medicine and Science, Rochester, MN 55905, USA;
| |
Collapse
|