1
|
Hazar Y, Ertuğrul ÖF. Process management in diabetes treatment by blending technique. Comput Biol Med 2025; 190:110034. [PMID: 40107027 DOI: 10.1016/j.compbiomed.2025.110034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 03/04/2025] [Accepted: 03/14/2025] [Indexed: 03/22/2025]
Abstract
Diabetes is a condition marked by persistent metabolic issues and elevated blood glucose levels, which can cause to damage several organs, including eyes, heart, kidneys and nervous system. Effective management of this disease vital to mitigate long-term complications. This research uses advanced AI and ML methods, based on data from E-Nabız personal health record, to predict blood glucose levels in people with diabetes and identify factors that affect these levels. The study is primarily aimed at monitoring and managing diabetes and will investigate whether the condition of diabetic individuals improves. Within this framework, 108 features and 86115 records sourced from E-Nabız including lab results, medical history and medication records were examined to determine key indicators of diabetes management. Features used were selected by intersecting the best 20 features determined by 9 techniques using SFM, MI, RFE, CHI2, ANOVA, KW, CATB, XGB and LGBM. Selected features were evaluated using blending technique, with CATB, XGB and LGBM as first models and ETC as meta-model. Blending approach produced a strong performance, achieving 92.52 % precision, 92.51 % recall, 92.51 % F1-score and 92.50 % accuracy in final score. This approach leverages the strengths of different classification models, reducing weaknesses, increasing reliability and improving overall performance by better representing various features of dataset. While literature generally focuses on single model or traditional ensemble methods, this work presents a more advanced and effective combination strategy. It also highlights the important role that certain factors such as age, medications and cholesterol levels play in diabetes assessment. This study contributes to literature on both theoretical and practical levels by increasing applicability of AI applications in clinical practice and health management. These findings could help healthcare professionals better monitor patient's condition, develop more personalized approaches, and help ensure a positive patient response to treatment.
Collapse
Affiliation(s)
- Yunus Hazar
- Electrical and Electronic Engineering, Batman University, Batman, Turkey.
| | | |
Collapse
|
2
|
Li N, Zhang Y, Zhang Q, Jin H, Han M, Guo J, Zhang Y. Machine learning reveals glycolytic key gene in gastric cancer prognosis. Sci Rep 2025; 15:8688. [PMID: 40082583 PMCID: PMC11906761 DOI: 10.1038/s41598-025-93512-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 03/07/2025] [Indexed: 03/16/2025] Open
Abstract
Glycolysis is recognized as a central metabolic pathway in the neoplastic evolution of gastric cancer, exerting profound effects on the tumor microenvironment and the neoplastic growth trajectory. However, the identification of key glycolytic genes that significantly affect gastric cancer prognosis remains underexplored. In this work, five machine-learning algorithms were used to elucidate the intimate association between the glycolysis-associated gene phosphofructokinase fructose-bisphosphate 3 (PFKFB3) and the prognosis of gastric cancer patients. Validation across multiple independent datasets confirmed the prognostic significance of PFKFB3. Further, we delved into the functional implications of PFKFB3 in modulating immune responses and biological processes within gastric cancer patients, as well as its broader relevance across multiple cancer types. Results underscore the potential of PFKFB3 as a prognostic biomarker and therapeutic target in gastric cancer. Our project can be found at https://github.com/PiPiNam/ML-GCP .
Collapse
Affiliation(s)
- Nan Li
- China Academy of Electronics and Information Technology, National Engineering Research Center for Public Safety Risk Perception and Control by Big Data (RPP), Beijing, China
| | - Yuzhe Zhang
- The First Laboratory of Cancer Institute, The First Hospital of China Medical University, Shenyang, China
| | - Qianyue Zhang
- China Academy of Electronics and Information Technology, National Engineering Research Center for Public Safety Risk Perception and Control by Big Data (RPP), Beijing, China
| | - Hao Jin
- China Academy of Electronics and Information Technology, National Engineering Research Center for Public Safety Risk Perception and Control by Big Data (RPP), Beijing, China
| | - Mengfei Han
- China Academy of Electronics and Information Technology, National Engineering Research Center for Public Safety Risk Perception and Control by Big Data (RPP), Beijing, China
| | - Junhan Guo
- Center for Reproductive Medicine, Henan Key Laboratory of Reproduction and Genetics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Ye Zhang
- The First Laboratory of Cancer Institute, The First Hospital of China Medical University, Shenyang, China.
| |
Collapse
|
3
|
Sadad T, Aurangzeb RA, Safran M, Alfarhood S, Kim J. Classification of Highly Divergent Viruses from DNA/RNA Sequence Using Transformer-Based Models. Biomedicines 2023; 11:biomedicines11051323. [PMID: 37238994 DOI: 10.3390/biomedicines11051323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Viruses infect millions of people worldwide each year, and some can lead to cancer or increase the risk of cancer. As viruses have highly mutable genomes, new viruses may emerge in the future, such as COVID-19 and influenza. Traditional virology relies on predefined rules to identify viruses, but new viruses may be completely or partially divergent from the reference genome, rendering statistical methods and similarity calculations insufficient for all genome sequences. Identifying DNA/RNA-based viral sequences is a crucial step in differentiating different types of lethal pathogens, including their variants and strains. While various tools in bioinformatics can align them, expert biologists are required to interpret the results. Computational virology is a scientific field that studies viruses, their origins, and drug discovery, where machine learning plays a crucial role in extracting domain- and task-specific features to tackle this challenge. This paper proposes a genome analysis system that uses advanced deep learning to identify dozens of viruses. The system uses nucleotide sequences from the NCBI GenBank database and a BERT tokenizer to extract features from the sequences by breaking them down into tokens. We also generated synthetic data for viruses with small sample sizes. The proposed system has two components: a scratch BERT architecture specifically designed for DNA analysis, which is used to learn the next codons unsupervised, and a classifier that identifies important features and understands the relationship between genotype and phenotype. Our system achieved an accuracy of 97.69% in identifying viral sequences.
Collapse
Affiliation(s)
- Tariq Sadad
- Department of Computer Science, University of Engineering & Technology, Mardan 23200, Pakistan
| | - Raja Atif Aurangzeb
- Department of Computer Science & Software Engineering, International Islamic University Islamabad, Islamabad 44000, Pakistan
| | - Mejdl Safran
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
| | - Sultan Alfarhood
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
| | - Jungsuk Kim
- Department of Biomedical Engineering, Gachon University, Seongnam-si 13120, Republic of Korea
| |
Collapse
|
4
|
Pandey D, Onkara PP. Improved downstream functional analysis of single-cell RNA-sequence data using DGAN. Sci Rep 2023; 13:1618. [PMID: 36709340 PMCID: PMC9884242 DOI: 10.1038/s41598-023-28952-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 01/27/2023] [Indexed: 01/29/2023] Open
Abstract
The dramatic increase in the number of single-cell RNA-sequence (scRNA-seq) investigations is indeed an endorsement of the new-fangled proficiencies of next generation sequencing technologies that facilitate the accurate measurement of tens of thousands of RNA expression levels at the cellular resolution. Nevertheless, missing values of RNA amplification persist and remain as a significant computational challenge, as these data omission induce further noise in their respective cellular data and ultimately impede downstream functional analysis of scRNA-seq data. Consequently, it turns imperative to develop robust and efficient scRNA-seq data imputation methods for improved downstream functional analysis outcomes. To overcome this adversity, we have designed an imputation framework namely deep generative autoencoder network [DGAN]. In essence, DGAN is an evolved variational autoencoder designed to robustly impute data dropouts in scRNA-seq data manifested as a sparse gene expression matrix. DGAN principally reckons count distribution, besides data sparsity utilizing a gaussian model whereby, cell dependencies are capitalized to detect and exclude outlier cells via imputation. When tested on five publicly available scRNA-seq data, DGAN outperformed every single baseline method paralleled, with respect to downstream functional analysis including cell data visualization, clustering, classification and differential expression analysis. DGAN is executed in Python and is accessible at https://github.com/dikshap11/DGAN .
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, India
| | - Perumal P Onkara
- Department of Biotechnology, National Institute of Technology, Warangal, India.
| |
Collapse
|
5
|
Multi-Stage Temporal Convolution Network for COVID-19 Variant Classification. Diagnostics (Basel) 2022; 12:diagnostics12112736. [DOI: 10.3390/diagnostics12112736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/18/2022] [Accepted: 10/31/2022] [Indexed: 11/11/2022] Open
Abstract
The outbreak of the novel coronavirus disease COVID-19 (SARS-CoV-2) has developed into a global epidemic. Due to the pathogenic virus’s high transmission rate, accurate identification and early prediction are required for subsequent therapy. Moreover, the virus’s polymorphic nature allows it to evolve and adapt to various environments, making prediction difficult. However, other diseases, such as dengue, MERS-CoV, Ebola, SARS-CoV-1, and influenza, necessitate the employment of a predictor based on their genomic information. To alleviate the situation, we propose a deep learning-based mechanism for the classification of various SARS-CoV-2 virus variants, including the most recent, Omicron. Our model uses a neural network with a temporal convolution neural network to accurately identify different variants of COVID-19. The proposed model first encodes the sequences in the numerical descriptor, and then the convolution operation is applied for discriminative feature extraction from the encoded sequences. The sequential relations between the features are collected using a temporal convolution network to classify COVID-19 variants accurately. We collected recent data from the NCBI, on which the proposed method outperforms various baselines with a high margin.
Collapse
|
6
|
Khandakji MN, Mifsud B. Gene-specific machine learning model to predict the pathogenicity of BRCA2 variants. Front Genet 2022; 13:982930. [PMID: 36246618 PMCID: PMC9561395 DOI: 10.3389/fgene.2022.982930] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 09/12/2022] [Indexed: 11/18/2022] Open
Abstract
Background: Existing BRCA2-specific variant pathogenicity prediction algorithms focus on the prediction of the functional impact of a subtype of variants alone. General variant effect predictors are applicable to all subtypes, but are trained on putative benign and pathogenic variants and do not account for gene-specific information, such as hotspots of pathogenic variants. Local, gene-specific information have been shown to aid variant pathogenicity prediction; therefore, our aim was to develop a BRCA2-specific machine learning model to predict pathogenicity of all types of BRCA2 variants. Methods: We developed an XGBoost-based machine learning model to predict pathogenicity of BRCA2 variants. The model utilizes general variant information such as position, frequency, and consequence for the canonical BRCA2 transcript, as well as deleteriousness prediction scores from several tools. We trained the model on 80% of the expert reviewed variants by the Evidence-Based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium and tested its performance on the remaining 20%, as well as on an independent set of variants of uncertain significance with experimentally determined functional scores. Results: The novel gene-specific model predicted the pathogenicity of ENIGMA BRCA2 variants with an accuracy of 99.9%. The model also performed excellently on predicting the functional consequence of the independent set of variants (accuracy was up to 91.3%). Conclusion: This new, gene-specific model is an accurate method for interpreting the pathogenicity of variants in the BRCA2 gene. It is a valuable addition for variant classification and can prioritize unreviewed variants for functional analysis or expert review.
Collapse
Affiliation(s)
- Mohannad N. Khandakji
- College of Health and Life Sciences, Hamad Bin Khalifa University, Ar-Rayyan, Qatar
- Hamad Medical Corporation, Doha, Qatar
| | - Borbala Mifsud
- College of Health and Life Sciences, Hamad Bin Khalifa University, Ar-Rayyan, Qatar
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- *Correspondence: Borbala Mifsud,
| |
Collapse
|
7
|
Lai Y, Lin X, Lin C, Lin X, Chen Z, Zhang L. Identification of endoplasmic reticulum stress-associated genes and subtypes for prediction of Alzheimer’s disease based on interpretable machine learning. Front Pharmacol 2022; 13:975774. [PMID: 36059957 PMCID: PMC9438901 DOI: 10.3389/fphar.2022.975774] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 07/19/2022] [Indexed: 11/13/2022] Open
Abstract
Introduction: Alzheimer’s disease (AD) is a severe dementia with clinical and pathological heterogeneity. Our study was aim to explore the roles of endoplasmic reticulum (ER) stress-related genes in AD patients based on interpretable machine learning. Methods: Microarray datasets were obtained from the Gene Expression Omnibus (GEO) database. We performed nine machine learning algorithms including AdaBoost, Logistic Regression, Light Gradient Boosting (LightGBM), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), Random Forest, K-nearest neighbors (KNN), Naïve Bayes, and support vector machines (SVM) to screen ER stress-related feature genes and estimate their efficiency of these genes for early diagnosis of AD. ROC curves were performed to evaluate model performance. Shapley additive explanation (SHAP) was applied for interpreting the results of these models. AD patients were classified using a consensus clustering algorithm. Immune infiltration and functional enrichment analysis were performed via CIBERSORT and GSVA, respectively. CMap analysis was utilized to identify subtype-specific small-molecule compounds. Results: Higher levels of immune infiltration were found in AD individuals and were markedly linked to deregulated ER stress-related genes. The SVM model exhibited the highest AUC (0.879), accuracy (0.808), recall (0.773), and precision (0.809). Six characteristic genes (RNF5, UBAC2, DNAJC10, RNF103, DDX3X, and NGLY1) were determined, which enable to precisely predict AD progression. The SHAP plots illustrated how a feature gene influence the output of the SVM prediction model. Patients with AD could obtain clinical benefits from the feature gene-based nomogram. Two ER stress-related subtypes were defined in AD, subtype2 exhibited elevated immune infiltration levels and immune score, as well as higher expression of immune checkpoint. We finally identified several subtype-specific small-molecule compounds. Conclusion: Our study provides new insights into the role of ER stress in AD heterogeneity and the development of novel targets for individualized treatment in patients with AD.
Collapse
Affiliation(s)
- Yongxing Lai
- Department of Geriatric Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
- Fujian Provincial Center for Geriatrics, Fujian Provincial Hospital, Fuzhou, China
| | - Xueyan Lin
- Department of Gastroenterology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
| | - Chunjin Lin
- Department of Geriatric Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
- Fujian Provincial Center for Geriatrics, Fujian Provincial Hospital, Fuzhou, China
| | - Xing Lin
- Department of Geriatric Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
- Fujian Provincial Center for Geriatrics, Fujian Provincial Hospital, Fuzhou, China
| | - Zhihan Chen
- Department of Rheumatology and Immunology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
- *Correspondence: Li Zhang, ; Zhihan Chen,
| | - Li Zhang
- Department of Nephrology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
- *Correspondence: Li Zhang, ; Zhihan Chen,
| |
Collapse
|
8
|
BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection. Comput Biol Chem 2022; 99:107732. [PMID: 35863177 DOI: 10.1016/j.compbiolchem.2022.107732] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 07/12/2022] [Indexed: 02/01/2023]
Abstract
A promoter is a sequence of DNA that initializes the process of transcription and regulates whenever and wherever genes are expressed in the organism. Because of its importance in molecular biology, identifying DNA promoters are challenging to provide useful information related to its functions and related diseases. Several computational models have been developed to early predict promoters from high-throughput sequencing over the past decade. Although some useful predictors have been proposed, there remains short-falls in those models and there is an urgent need to enhance the predictive performance to meet the practice requirements. In this study, we proposed a novel architecture that incorporated transformer natural language processing (NLP) and explainable machine learning to address this problem. More specifically, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model was employed to encode DNA sequences, and SHapley Additive exPlanations (SHAP) analysis served as a feature selection step to look at the top-rank BERT encodings. At the last stage, different machine learning classifiers were implemented to learn the top features and produce the prediction outcomes. This study not only predicted the DNA promoters but also their activities (strong or weak promoters). Overall, several experiments showed an accuracy of 85.5 % and 76.9 % for these two levels, respectively. Our performance showed a superiority to previously published predictors on the same dataset in most measurement metrics. We named our predictor as BERT-Promoter and it is freely available at https://github.com/khanhlee/bert-promoter.
Collapse
|
9
|
Yao Y, Zhang S, Xue T. Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites. Curr Genomics 2022; 23:83-93. [PMID: 36778978 PMCID: PMC9878833 DOI: 10.2174/1389202923666220214122506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/11/2021] [Accepted: 01/18/2022] [Indexed: 11/22/2022] Open
Abstract
Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs.
Collapse
Affiliation(s)
- Yingying Yao
- School of Mathematics and Statistics, Xidian University, Xi’an 710071, P.R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi’an 710071, P.R. China,Address correspondence to this author at the School of Mathematics and Statistics, Xidian University, Xi’an 710071, P.R. China; Tel/Fax: +86-29- 88202860; E-mail:
| | - Tian Xue
- School of Mathematics and Statistics, Xidian University, Xi’an 710071, P.R. China
| |
Collapse
|
10
|
Guan R, Pang H, Liang Y, Shao Z, Gao X, Xu D, Feng X. Discovering trends and hotspots of biosafety and biosecurity research via machine learning. Brief Bioinform 2022; 23:6590367. [PMID: 35596953 PMCID: PMC9487701 DOI: 10.1093/bib/bbac194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 04/06/2022] [Accepted: 04/27/2022] [Indexed: 11/14/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19) has infected hundreds of millions of people and killed millions of them. As an RNA virus, COVID-19 is more susceptible to variation than other viruses. Many problems involved in this epidemic have made biosafety and biosecurity (hereafter collectively referred to as ‘biosafety’) a popular and timely topic globally. Biosafety research covers a broad and diverse range of topics, and it is important to quickly identify hotspots and trends in biosafety research through big data analysis. However, the data-driven literature on biosafety research discovery is quite scant. We developed a novel topic model based on latent Dirichlet allocation, affinity propagation clustering and the PageRank algorithm (LDAPR) to extract knowledge from biosafety research publications from 2011 to 2020. Then, we conducted hotspot and trend analysis with LDAPR and carried out further studies, including annual hot topic extraction, a 10-year keyword evolution trend analysis, topic map construction, hot region discovery and fine-grained correlation analysis of interdisciplinary research topic trends. These analyses revealed valuable information that can guide epidemic prevention work: (1) the research enthusiasm over a certain infectious disease not only is related to its epidemic characteristics but also is affected by the progress of research on other diseases, and (2) infectious diseases are not only strongly related to their corresponding microorganisms but also potentially related to other specific microorganisms. The detailed experimental results and our code are available at https://github.com/KEAML-JLU/Biosafety-analysis.
Collapse
Affiliation(s)
- Renchu Guan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China.,Zhuhai Sub Laboratory, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Zhuhai College of Science and Technology, Zhuhai, 519041, Guangdong, China
| | - Haoyu Pang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China
| | - Yanchun Liang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China.,Zhuhai Sub Laboratory, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Zhuhai College of Science and Technology, Zhuhai, 519041, Guangdong, China
| | - Zhongjun Shao
- Department of Epidemiology, Ministry of Education Key Laboratory of Hazard Assessment and Control in Special Operational Environment, School of Public Health, Air Force Medical University, Xi'an, 710032, Shaanxi, China
| | - Xin Gao
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.,BioMap, Beijing, 100192, China
| | - Dong Xu
- Department of Electric Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, 65201, Missouri, USA
| | - Xiaoyue Feng
- Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China
| |
Collapse
|
11
|
Zhou Z, Xu J, Huang N, Tang J, Ma P, Cheng Y. A Pyroptosis-Related Gene Signature Associated with Prognosis and Tumor Immune Microenvironment in Gliomas. Int J Gen Med 2022; 15:4753-4769. [PMID: 35571289 PMCID: PMC9091698 DOI: 10.2147/ijgm.s353762] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/16/2022] [Indexed: 11/23/2022] Open
Abstract
Background Pyroptosis is a novel form of cell death that plays a significant role in cancer, while the prognostic values of pyroptosis-related genes in gliomas have not been revealed. Methods We analyzed the RNA-seq and clinical data of gliomas from the University of California Santa Cruz (UCSC) Xena database to determine differentially expressed pyroptosis-related genes. Based on these genes, a pyroptosis genes signature was constructed after univariate Cox analysis and Lasso Cox analyses. The sensitivity and specificity of pyroptosis genes signature were verified by the Chinese Glioma Genome Atlas (CGGA) dataset. Finally, we explored the association of risk signatures with tumor microenvironment and immune cell infiltration. Results Of 15 differentially expressed pyroptosis-related genes, three genes of BCL2 associated X (BAX), caspase 3 (CASP3), and caspase 4 (CASP4) were used to construct the risk signature. The effectiveness of risk signature for predicting survival at 1, 3, 5 years was performed by the receiver operating characteristic curve (ROC), and the area under curves (AUC) was 0.739, 0.817, and 0.800, respectively. Functional enrichment results showed signal transduction, cell adhesion, immune response, and inflammatory response were enriched. The immune analysis revealed that pyroptosis had a remarkable effect on the immune microenvironment. Conclusion In this study, we constructed a pyroptosis-related gene signature, which can serve as a potential biomarker for predicting the survival of glioma patients. Additionally, we suggested that pyroptosis may promote gliomas development by inducing chronic inflammation microenvironment.
Collapse
Affiliation(s)
- Zunjie Zhou
- Department of Neurosurgery, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Jing Xu
- Department of Neurosurgery, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Ning Huang
- Department of Neurosurgery, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Jun Tang
- Department of Neurosurgery, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Ping Ma
- Department of Neurosurgery, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Yuan Cheng
- Department of Neurosurgery, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
- Correspondence: Yuan Cheng, Department of Neurosurgery, the Second Affiliated Hospital of Chongqing Medical University, No. 74 Linjiang Road, Yuzhong District, Chongqing, People’s Republic of China, Tel +8613708329653, Email
| |
Collapse
|
12
|
Badré A, Pan C. LINA: A Linearizing Neural Network Architecture for Accurate First-Order and Second-Order Interpretations. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2022; 10:36166-36176. [PMID: 35462722 PMCID: PMC9032252 DOI: 10.1109/access.2022.3163257] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
While neural networks can provide high predictive performance, it was a challenge to identify the salient features and important feature interactions used for their predictions. This represented a key hurdle for deploying neural networks in many biomedical applications that require interpretability, including predictive genomics. In this paper, linearizing neural network architecture (LINA) was developed here to provide both the first-order and the second-order interpretations on both the instance-wise and the model-wise levels. LINA combines the representational capacity of a deep inner attention neural network with a linearized intermediate representation for model interpretation. In comparison with DeepLIFT, LIME, Grad*Input and L2X, the first-order interpretation of LINA had better Spearman correlation with the ground-truth importance rankings of features in synthetic datasets. In comparison with NID and GEH, the second-order interpretation results from LINA achieved better precision for identification of the ground-truth feature interactions in synthetic datasets. These algorithms were further benchmarked using predictive genomics as a real-world application. LINA identified larger numbers of important single nucleotide polymorphisms (SNPs) and salient SNP interactions than the other algorithms at given false discovery rates. The results showed accurate and versatile model interpretation using LINA.
Collapse
Affiliation(s)
- Adrien Badré
- School of Computer Science, The University of Oklahoma, Norman, OK 73019, USA
| | - Chongle Pan
- School of Computer Science, The University of Oklahoma, Norman, OK 73019, USA
| |
Collapse
|
13
|
Comparison of Selection Criteria for Model Selection of Support Vector Machine on Physiological Data with Inter-Subject Variance. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12031749] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Support vector machines (SVMs) utilize hyper-parameters for classification. Model selection (MS) is an essential step in the construction of the SVM classifier as it involves the identification of the appropriate parameters. Several selection criteria have been proposed for MS, but their usefulness is limited for physiological data exhibiting inter-subject variance (ISV) that makes different characteristics between training and test data. To identify an effective solution for the constraint, this study considered a leave-one-subject-out cross validation-based selection criterion (LSSC) with six well-known selection criteria and compared their effectiveness. Nine classification problems were examined for the comparison, and the MS results of each selection criterion were obtained and analyzed. The results showed that the SVM model selected by the LSSC yielded the highest average classification accuracy among all selection criteria in the nine problems. The average accuracy was 2.96% higher than that obtained with the conventional K-fold cross validation-based selection criterion. In addition, the advantage of the LSSC was more evident for data with larger ISV. Thus, the results of this study can help optimize SVM classifiers for physiological data and are expected to be useful for the analysis of physiological data to develop various medical decision systems.
Collapse
|
14
|
Exploring the effectiveness of word embedding based deep learning model for improving email classification. DATA TECHNOLOGIES AND APPLICATIONS 2022. [DOI: 10.1108/dta-07-2021-0191] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeClassifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature vector form for processing is the most difficult challenge in email categorization. The purpose of this paper is to examine the effectiveness of the pre-trained embedding model for the classification of emails using deep learning classifiers such as the long short-term memory (LSTM) model and convolutional neural network (CNN) model.Design/methodology/approachIn this paper, global vectors (GloVe) and Bidirectional Encoder Representations Transformers (BERT) pre-trained word embedding are used to identify relationships between words, which helps to classify emails into their relevant categories using machine learning and deep learning models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.FindingsIn the first set of experiments, machine learning classifiers, the support vector machine (SVM) model, perform better than other machine learning methodologies. The second set of experiments compares the deep learning model performance without embedding, GloVe and BERT embedding. The experiments show that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.Originality/valueThe experiment reveals that the CNN model with GloVe embedding gives slightly better accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.
Collapse
|
15
|
Yao M, Fu L, Liu X, Zheng D. In-Silico Multi-Omics Analysis of the Functional Significance of Calmodulin 1 in Multiple Cancers. Front Genet 2022; 12:793508. [PMID: 35096010 PMCID: PMC8790318 DOI: 10.3389/fgene.2021.793508] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/23/2021] [Indexed: 01/14/2023] Open
Abstract
Aberrant activation of calmodulin 1 (CALM1) has been reported in human cancers. However, comprehensive understanding of the role of CALM1 in most cancer types has remained unclear. We systematically analyzed the expression landscape, DNA methylation, gene alteration, immune infiltration, clinical relevance, and molecular pathway of CALM1 in multiple cancers using various online tools, including The Cancer Genome Atlas, cBioPortal and the Human Protein Atlas databases. Kaplan–Meier and receiver operating characteristic (ROC) curves were plotted to explore the prognostic and diagnostic potential of CALM1 expression. Multivariate analyses were used to evaluate whether the CALM1 expression could be an independent risk factor. A nomogram predicting the overall survival (OS) of patients was developed, evaluated, and compared with the traditional Tumor-Node-Metastasis (TNM) model using decision curve analysis. R language was employed as the main tool for analysis and visualization. Results revealed CALM1 to be highly expressed in most cancers, its expression being regulated by DNA methylation in multiple cancers. CALM1 had a low mutation frequency (within 3%) and was associated with immune infiltration. We observed a substantial positive correlation between CALM1 expression and macrophage and neutrophil infiltration levels in multiple cancers. Different mutational forms of CALM1 hampered immune cell infiltration. Additionally, CALM1 expression had high diagnostic and prognostic potential. Multivariate analyses revealed CALM1 expression to be an independent risk factor for OS. Therefore, our newly developed nomogram had a higher clinical value than the TNM model. The concordance index, calibration curve, and time-dependent ROC curves of the nomogram exhibited excellent performance in terms of predicting the survival rate of patients. Moreover, elevated CALM1 expression contributes to the activation of cancer-related pathways, such as the WNT and MAPK pathways. Overall, our findings improved our understanding of the function of CALM1 in human cancers.
Collapse
Affiliation(s)
- Maolin Yao
- Laboratory of Genetics and Molecular Biology, College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Lanyi Fu
- Laboratory of Genetics and Molecular Biology, College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Xuedong Liu
- Laboratory of Genetics and Molecular Biology, College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Dong Zheng
- Laboratory of Genetics and Molecular Biology, College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| |
Collapse
|
16
|
Xiao K, Zhao S, Yuan J, Pan Y, Song Y, Tang L. Construction of Molecular Subtypes and Related Prognostic and Immune Response Models Based on M2 Macrophages in Glioblastoma. Int J Gen Med 2022; 15:913-926. [PMID: 35115817 PMCID: PMC8801375 DOI: 10.2147/ijgm.s343152] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 12/23/2021] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVES To identify the molecular subtypes of glioblastoma multiforme (GBM) related to M2 macrophage-based prognostic genes, then to preliminarily explore their biological functions and construct immunotherapy response gene models. MATERIAL AND METHODS We used R language to analyze GBM microarray data, and other tools, including xCell and CIBERSORTx, to identify subtypes of GBM that related to M2 macrophages. The process started with the exploration of biological functions of the two subtypes by pathway analyses and GSEA, and continued with a combined procedure of constructing an M2 macrophage-related prognostic gene model and exploring the immune treatment response for GBM. RESULTS A high abundance of M2 macrophages in GBM was associated with poor prognosis. According to M2 macrophage-related prognostic genes, GBM was divided into two subtypes (cluster A and cluster B). The differential gene enrichment analysis of the two clusters showed that cluster A was less enriched in M2 macrophages and had immunopotential. The M2score, which was constructed based on M2 macrophage-related prognostic genes, was not only related to the survival and prognosis of patients with GBM, but also predictive of the effectiveness of immunotherapy in these patients. This result has been effectively verified in an external data set. CONCLUSION GBM was successfully divided into two subtypes according to M2-macrophage-related prognostic genes. In GBM, a high M2score may indicate better clinical outcome and enhancement of the immunotherapy response.
Collapse
Affiliation(s)
- Kai Xiao
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, People’s Republic of China
| | - Shushan Zhao
- Department of Orthopedics, Xiangya Hospital, Central South University, Changsha, People’s Republic of China
| | - Jian Yuan
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, People’s Republic of China
| | - Yimin Pan
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, People’s Republic of China
| | - Ya Song
- Department of Orthopedics, Xiangya Hospital, Central South University, Changsha, People’s Republic of China
| | - Lanhua Tang
- Department of Oncology, Xiangya Hospital, Central South University, Changsha, People’s Republic of China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, People's Republic of China
| |
Collapse
|
17
|
Huang D, Zheng S, Liu Z, Zhu K, Zhi H, Ma G. Machine Learning Revealed Ferroptosis Features and a Novel Ferroptosis-Based Classification for Diagnosis in Acute Myocardial Infarction. Front Genet 2022; 13:813438. [PMID: 35145551 PMCID: PMC8821875 DOI: 10.3389/fgene.2022.813438] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 01/05/2022] [Indexed: 12/30/2022] Open
Abstract
Acute myocardial infarction (AMI) is a leading cause of death and disability worldwide. Early diagnosis of AMI and interventional treatment can significantly reduce myocardial damage. However, owing to limitations in sensitivity and specificity, existing myocardial markers are not efficient for early identification of AMI. Transcriptome-wide association studies (TWASs) have shown excellent performance in identifying significant gene–trait associations and several cardiovascular diseases (CVDs). Furthermore, ferroptosis is a major driver of ischaemic injury in the heart. However, its specific regulatory mechanisms remain unclear. In this study, we screened three Gene Expression Omnibus (GEO) datasets of peripheral blood samples to assess the efficiency of ferroptosis-related genes (FRGs) for early diagnosis of AMI. To the best of our knowledge, for the first time, TWAS and mRNA expression data were integrated in this study to identify 11 FRGs specifically expressed in the peripheral blood of patients with AMI. Subsequently, using multiple machine learning algorithms, an optimal prediction model for AMI was constructed, which demonstrated satisfactory diagnostic efficiency in the training cohort (area under the curve (AUC) = 0.794) and two external validation cohorts (AUC = 0.745 and 0.711). Our study suggests that FRGs are involved in the progression of AMI, thus providing a new direction for early diagnosis, and offers potential molecular targets for optimal treatment of AMI.
Collapse
Affiliation(s)
- Dan Huang
- Department of Cardiology, Zhongda Hospital, Southeast University, Nanjing, China
| | - Shiya Zheng
- Department of Oncology, Zhongda Hospital, Southeast University, Nanjing, China
| | - Zhuyuan Liu
- Department of Cardiology, Zhongda Hospital, Southeast University, Nanjing, China
| | - Kongbo Zhu
- Department of Cardiology, Zhongda Hospital, Southeast University, Nanjing, China
| | - Hong Zhi
- Department of Cardiology, Zhongda Hospital, Southeast University, Nanjing, China
| | - Genshan Ma
- Department of Cardiology, Zhongda Hospital, Southeast University, Nanjing, China
- *Correspondence: Genshan Ma,
| |
Collapse
|
18
|
Lu G, Shi W, Zhang Y. Prognostic Implications and Immune Infiltration Analysis of ALDOA in Lung Adenocarcinoma. Front Genet 2021; 12:721021. [PMID: 34925439 PMCID: PMC8678114 DOI: 10.3389/fgene.2021.721021] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 10/28/2021] [Indexed: 12/31/2022] Open
Abstract
Background: aldolase A (ALDOA) has been reported to be involved in kinds of cancers. However, the role of ALDOA in lung adenocarcinoma has not been fully elucidated. In this study, we explored the prognostic value and correlation with immune infiltration of ALDOA in lung adenocarcinoma. Methods: The expression of ALDOA was analyzed with the Oncomine database, the Cancer Genome Atlas (TCGA), and the Human Protein Atlas (HPA). Mann-Whitney U test was performed to examine the relationship between clinicopathological characteristics and ALDOA expression. The receiver operating characteristic (ROC) curve and Kaplan-Meier method were conducted to describe the diagnostic and prognostic importance of ALDOA. The Search Tool for the Retrieval of Interacting Genes (STRING) and Cytoscape were used to construct PPI networks and identify hub genes. Functional annotations and immune infiltration were conducted. Results: The mRNA and protein expression of ALDOA were higher in lung adenocarcinoma than those in normal tissues. The overexpression of ALDOA was significantly correlated with the high T stage, N stage, M stage, and TNM stage. Kaplan-Meier showed that high expression of ALDOA was correlated with short overall survival (38.9 vs 72.5 months, p < 0.001). Multivariate analysis revealed that ALDOA (HR 1.435, 95%CI, 1.013-2.032, p = 0.042) was an independent poor prognostic factor for overall survival. Functional enrichment analysis showed that positively co-expressed genes of ALDOA were involved in the biological progress of mitochondrial translation, mitochondrial translational elongation, and negative regulation of cell cycle progression. KEGG pathway analysis showed enrichment function in carbon metabolism, the HIF-1 signaling pathway, and glycolysis/gluconeogenesis. The "SCNA" module analysis indicated that the copy number alterations of ALDOA were correlated with three immune cell infiltration levels, including B cells, CD8+ T cells, and CD4+ T cells. The "Gene" module analysis indicated that ALDOA gene expression was negatively correlated with infiltrating levels of B cells, CD8+ T cells, CD4+ T cells, and macrophages. Conclusion: Our study suggested that upregulated ALDOA was significantly correlated with tumor progression, poor survival, and immune infiltrations in lung adenocarcinoma. These results suggest that ALDOA is a potential prognostic biomarker and therapeutic target in lung adenocarcinoma.
Collapse
Affiliation(s)
- Guojun Lu
- Department of Respiratory Medicine, Nanjing Chest Hospital, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China
| | - Wen Shi
- Department of Respiratory Medicine, Nanjing Chest Hospital, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China
| | - Yu Zhang
- Department of Respiratory Medicine, Nanjing Chest Hospital, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China
| |
Collapse
|
19
|
Duong SQ, Zheng L, Xia M, Jin B, Liu M, Li Z, Hao S, Alfreds ST, Sylvester KG, Widen E, Teuteberg JJ, McElhinney DB, Ling XB. Identification of patients at risk of new onset heart failure: Utilizing a large statewide health information exchange to train and validate a risk prediction model. PLoS One 2021; 16:e0260885. [PMID: 34890438 PMCID: PMC8664210 DOI: 10.1371/journal.pone.0260885] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 10/22/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND New-onset heart failure (HF) is associated with poor prognosis and high healthcare utilization. Early identification of patients at increased risk incident-HF may allow for focused allocation of preventative care resources. Health information exchange (HIE) data span the entire spectrum of clinical care, but there are no HIE-based clinical decision support tools for diagnosis of incident-HF. We applied machine-learning methods to model the one-year risk of incident-HF from the Maine statewide-HIE. METHODS AND RESULTS We included subjects aged ≥ 40 years without prior HF ICD9/10 codes during a three-year period from 2015 to 2018, and incident-HF defined as assignment of two outpatient or one inpatient code in a year. A tree-boosting algorithm was used to model the probability of incident-HF in year two from data collected in year one, and then validated in year three. 5,668 of 521,347 patients (1.09%) developed incident-HF in the validation cohort. In the validation cohort, the model c-statistic was 0.824 and at a clinically predetermined risk threshold, 10% of patients identified by the model developed incident-HF and 29% of all incident-HF cases in the state of Maine were identified. CONCLUSIONS Utilizing machine learning modeling techniques on passively collected clinical HIE data, we developed and validated an incident-HF prediction tool that performs on par with other models that require proactively collected clinical data. Our algorithm could be integrated into other HIEs to leverage the EMR resources to provide individuals, systems, and payors with a risk stratification tool to allow for targeted resource allocation to reduce incident-HF disease burden on individuals and health care systems.
Collapse
Affiliation(s)
- Son Q. Duong
- Clinical and Translational Research Program, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Palo Alto, California, United States of America
- * E-mail: (SQD); (XBL)
| | - Le Zheng
- Clinical and Translational Research Program, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Palo Alto, California, United States of America
- Department of Cardiothoracic Surgery, Stanford University School of Medicine, Stanford, California, United States of America
| | - Minjie Xia
- HBI Solutions Inc., Palo Alto, California, United States of America
| | - Bo Jin
- HBI Solutions Inc., Palo Alto, California, United States of America
| | - Modi Liu
- HBI Solutions Inc., Palo Alto, California, United States of America
| | - Zhen Li
- Binhai Industrial Technology Research Institute, Zhejiang University, Tianjin, China
- School of Electrical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Shiying Hao
- Clinical and Translational Research Program, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Palo Alto, California, United States of America
- Department of Cardiothoracic Surgery, Stanford University School of Medicine, Stanford, California, United States of America
| | | | - Karl G. Sylvester
- Department of Surgery, Stanford University School of Medicine, Stanford, California, United States of America
| | - Eric Widen
- HBI Solutions Inc., Palo Alto, California, United States of America
| | - Jeffery J. Teuteberg
- Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, United States of America
| | - Doff B. McElhinney
- Clinical and Translational Research Program, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Palo Alto, California, United States of America
- Department of Cardiothoracic Surgery, Stanford University School of Medicine, Stanford, California, United States of America
| | - Xuefeng B. Ling
- Department of Cardiothoracic Surgery, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Surgery, Stanford University School of Medicine, Stanford, California, United States of America
- * E-mail: (SQD); (XBL)
| |
Collapse
|
20
|
Feng YZ, Liu S, Cheng ZY, Quiroz JC, Rezazadegan D, Chen PK, Lin QT, Qian L, Liu XF, Berkovsky S, Coiera E, Song L, Qiu XM, Cai XR. Severity Assessment and Progression Prediction of COVID-19 Patients Based on the LesionEncoder Framework and Chest CT. INFORMATION 2021; 12:471. [DOI: 10.3390/info12110471] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Automatic severity assessment and progression prediction can facilitate admission, triage, and referral of COVID-19 patients. This study aims to explore the potential use of lung lesion features in the management of COVID-19, based on the assumption that lesion features may carry important diagnostic and prognostic information for quantifying infection severity and forecasting disease progression. A novel LesionEncoder framework is proposed to detect lesions in chest CT scans and to encode lesion features for automatic severity assessment and progression prediction. The LesionEncoder framework consists of a U-Net module for detecting lesions and extracting features from individual CT slices, and a recurrent neural network (RNN) module for learning the relationship between feature vectors and collectively classifying the sequence of feature vectors. Chest CT scans of two cohorts of COVID-19 patients from two hospitals in China were used for training and testing the proposed framework. When applied to assessing severity, this framework outperformed baseline methods achieving a sensitivity of 0.818, specificity of 0.952, accuracy of 0.940, and AUC of 0.903. It also outperformed the other tested methods in disease progression prediction with a sensitivity of 0.667, specificity of 0.838, accuracy of 0.829, and AUC of 0.736. The LesionEncoder framework demonstrates a strong potential for clinical application in current COVID-19 management, particularly in automatic severity assessment of COVID-19 patients. This framework also has a potential for other lesion-focused medical image analyses.
Collapse
Affiliation(s)
- You-Zhen Feng
- Medical Imaging Centre, The First Affiliated Hospital of Jinan University, Guangzhou 510630, China
| | - Sidong Liu
- Centre for Health Informatics, Australian Institute of Health Innovation, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney 2113, Australia
| | - Zhong-Yuan Cheng
- Medical Imaging Centre, The First Affiliated Hospital of Jinan University, Guangzhou 510630, China
| | - Juan C. Quiroz
- Centre for Big Data Research in Health, University of New South Wales, Sydney 1466, Australia
| | - Dana Rezazadegan
- Department of Computer Science and Software Engineering, Swinburne University of Technology, Melbourne 3000, Australia
| | - Ping-Kang Chen
- Medical Imaging Centre, The First Affiliated Hospital of Jinan University, Guangzhou 510630, China
| | - Qi-Ting Lin
- Medical Imaging Centre, The First Affiliated Hospital of Jinan University, Guangzhou 510630, China
| | - Long Qian
- Department of Biomedical Engineering, Peking University, Beijing 100871, China
| | - Xiao-Fang Liu
- Tianjin Key Laboratory of Intelligent Robotics, Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, Tianjin 300350, China
| | - Shlomo Berkovsky
- Centre for Health Informatics, Australian Institute of Health Innovation, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney 2113, Australia
| | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney 2113, Australia
| | - Lei Song
- Department of Radiology, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang 441003, China
| | - Xiao-Ming Qiu
- Department of Radiology, Huangshi Central Hospital, Affiliated Hospital of Hubei Polytechnic University, Edong Healthcare Group, Huangshi 435002, China
| | - Xiang-Ran Cai
- Medical Imaging Centre, The First Affiliated Hospital of Jinan University, Guangzhou 510630, China
| |
Collapse
|
21
|
Zhao Q, Ma J, Wang Y, Xie F, Lv Z, Xu Y, Shi H, Han K. Mul-SNO: A novel prediction tool for S-nitrosylation sites based on deep learning methods. IEEE J Biomed Health Inform 2021; 26:2379-2387. [PMID: 34762593 DOI: 10.1109/jbhi.2021.3123503] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein s-nitrosylation (SNO is one of the most important post-translational modifications and is formed by the covalent modification of nitric oxide and cysteine residues. Extensive studies have shown that SNO plays a pivotal role in the plant immune response and treating various major human diseases. In recent years, SNO sites have become a hot research topic. Traditional biochemical methods for SNO site identification are time-consuming and costly. In this study, we developed an economical and efficient SNO site prediction tool named Mul-SNO. Mul-SNO ensembled current popular and powerful deep learning model bidirectional long short-term memory (BiLSTM and bidirectional encoder representations from Transformers (BERT . Compared with existing state-of-the-art methods, Mul-SNO obtained better ACC of 0.911 and 0.796 based on 10-fold cross-validation and independent data sets, respectively. The prediction server can be obtained for free at http://lab.malab.cn/~mjq/Mul-SNO/.
Collapse
|
22
|
Ebrahimie E, Zamansani F, Alanazi IO, Sabi EM, Khazandi M, Ebrahimi F, Mohammadi-Dehcheshmeh M, Ebrahimi M. Advances in understanding the specificity function of transporters by machine learning. Comput Biol Med 2021; 138:104893. [PMID: 34598069 DOI: 10.1016/j.compbiomed.2021.104893] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/25/2022]
Abstract
Understanding the underlying molecular mechanism of transporter activity is one of the major discussions in structural biology. A transporter can exclusively transport one ion (specific transporter) or multiple ions (general transporter). This study compared categorical and numerical features of general and specific calcium transporters using machine learning and attribute weighting models. To this end, 444 protein features, such as the frequency of dipeptides, organism, and subcellular location, were extracted for general (n = 103) and specific calcium transporters (n = 238). Aliphatic index, subcellular location, organism, Ile-Leu frequency, Glycine frequency, hydrophobic frequency, and specific dipeptides such as Ile-Leu, Phe-Val, and Tyr-Gln were the key features in differentiating general from specific calcium transporters. Calcium transporters in the cell outer membranes were specific, while the inner ones were general; additionally, when the hydrophobic frequency or Aliphatic index is increased, the calcium transporter act as a general transporter. Random Forest with accuracy criterion showed the highest accuracy (88.88% ±5.75%) and high AUC (0.964 ± 0.020), based on 5-fold cross-validation. Decision Tree with accuracy criterion was able to predict the specificity of calcium transporter irrespective of the organism and subcellular location. This study demonstrates the precise classification of transporter function based on sequence-derived physicochemical features.
Collapse
Affiliation(s)
- Esmaeil Ebrahimie
- Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, 3086, Australia; School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia.
| | - Fatemeh Zamansani
- Department of Crop Production and Plant Breeding, College of Agriculture, Shiraz University, Shiraz, Iran.
| | - Ibrahim O Alanazi
- National Center for Biotechnology, Life Science and Environment Research Institute, King Abdulaziz City for Science and Technology (KACST), Riyadh, 6086, Saudi Arabia.
| | - Essa M Sabi
- Department of Pathology, Clinical Biochemistry Unit, College of Medicine, King Saud University, Riyadh, 11461, Saudi Arabia.
| | - Manouchehr Khazandi
- UniSA Clinical and Health Sciences, The University of South Australia, Adelaide, 5000, Australia.
| | - Faezeh Ebrahimi
- Faculty of Life Sciences and Biotechnology, Department of Microbiology and Microbial Biotechnology, Shahid Beheshti University, Tehran, Iran.
| | | | - Mansour Ebrahimi
- School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia; Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran.
| |
Collapse
|
23
|
Wang Y, Xu L, Zou Q, Lin C. prPred-DRLF: Plant R protein predictor using deep representation learning features. Proteomics 2021; 22:e2100161. [PMID: 34569713 DOI: 10.1002/pmic.202100161] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 08/30/2021] [Accepted: 09/21/2021] [Indexed: 12/17/2022]
Abstract
Plant resistance (R) proteins play a significant role in the detection of pathogen invasion. Accurately predicting plant R proteins is a key task in phytopathology. Most plant R protein predictors are dependent on traditional feature extraction methods. Recently, deep representation learning methods have been successfully applied in solving protein classification problems. Motivated by this, we propose a new computational approach, called prPred-DRLF, which uses deep representation learning feature models to encode the amino acids as numerical vectors. The results show that the fused features of bidirectional long short-term memory (BiLSTM) embedding and unified representation (UniRep) embedding have a better performance than other features for plant R protein identification using a light gradient boosting machine (LGBM) classifier. The model was evaluated using an independent test achieving an accuracy of 0.956, F1-score of 0.933, and area under the receiver operating characteristic (ROC) curve (AUC) of 0.997. Meanwhile, compared with the state-of-the-art prPred and HMMER method, prPred-DRLF shows an overall improvement in accuracy, F1-score, AUC, and recall. prPred-DRLF is a higher-performance plant R protein prediction tool based on two kinds of deep representation learning technologies and offers a user-friendly interface for inspecting possible plant R proteins. We hope that prPred-DRLF will become a useful tool for biological research. A user-friendly webserver for prPred-DRLF is freely accessible at http://lab.malab.cn/soft/prPred-DRLF. The Python script can be downloaded from https://github.com/Wangys-prog/prPred-DRLF.
Collapse
Affiliation(s)
- Yansu Wang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Chen Lin
- School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
24
|
Huang L, Lin L, Fu X, Meng C. Development and validation of a novel survival model for acute myeloid leukemia based on autophagy-related genes. PeerJ 2021; 9:e11968. [PMID: 34447636 PMCID: PMC8364747 DOI: 10.7717/peerj.11968] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 07/23/2021] [Indexed: 12/21/2022] Open
Abstract
Background Acute myeloid leukemia (AML) is one of the most common blood cancers, and is characterized by impaired hematopoietic function and bone marrow (BM) failure. Under normal circumstances, autophagy may suppress tumorigenesis, however under the stressful conditions of late stage tumor growth autophagy actually protects tumor cells, so inhibiting autophagy in these cases also inhibits tumor growth and promotes tumor cell death. Methods AML gene expression profile data and corresponding clinical data were obtained from the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases, from which prognostic-related genes were screened to construct a risk score model through LASSO and univariate and multivariate Cox analyses. Then the model was verified in the TCGA cohort and GEO cohorts. In addition, we also analyzed the relationship between autophagy genes and immune infiltrating cells and therapeutic drugs. Results We built a model containing 10 autophagy-related genes to predict the survival of AML patients by dividing them into high- or low-risk subgroups. The high-risk subgroup was prone to a poorer prognosis in both the training TCGA-LAML cohort and the validation GSE37642 cohort. Univariate and multivariate Cox analysis revealed that the risk score of the autophagy model can be used as an independent prognostic factor. The high-risk subgroup had not only higher fractions of CD4 naïve T cell, NK cell activated, and resting mast cells but also higher expression of immune checkpoint genes CTLA4 and CD274. Last, we screened drug sensitivity between high- and low-risk subgroups. Conclusion The risk score model based on 10 autophagy-related genes can serve as an effective prognostic predictor for AML patients and may guide for patient stratification for immunotherapies and drugs.
Collapse
Affiliation(s)
- Li Huang
- Department of Hematology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| | - Lier Lin
- Department of Hematology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| | - Xiangjun Fu
- Department of Hematology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| | - Can Meng
- Department of Hematology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| |
Collapse
|
25
|
Yun H, Choi J, Park JH. XGBoost Algorithm Prediction of Critical Care Outcome for Adult Patients Presenting to Emergency Department Using Initial Triage Information. JMIR Med Inform 2021; 9:e30770. [PMID: 34346889 PMCID: PMC8491120 DOI: 10.2196/30770] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/27/2021] [Accepted: 07/27/2021] [Indexed: 12/23/2022] Open
Abstract
Background The emergency department (ED) triage system to classify and prioritize patients from high risk to less urgent continues to be a challenge. Objective This study, comprising 80,433 patients, aims to develop a machine learning algorithm prediction model of critical care outcomes for adult patients using information collected during ED triage and compare the performance with that of the baseline model using the Korean Triage and Acuity Scale (KTAS). Methods To predict the need for critical care, we used 13 predictors from triage information: age, gender, mode of ED arrival, the time interval between onset and ED arrival, reason of ED visit, chief complaints, systolic blood pressure, diastolic blood pressure, pulse rate, respiratory rate, body temperature, oxygen saturation, and level of consciousness. The baseline model with KTAS was developed using logistic regression, and the machine learning model with 13 variables was generated using extreme gradient boosting (XGB) and deep neural network (DNN) algorithms. The discrimination was measured by the area under the receiver operating characteristic (AUROC) curve. The ability of calibration with Hosmer–Lemeshow test and reclassification with net reclassification index were evaluated. The calibration plot and partial dependence plot were used in the analysis. Results The AUROC of the model with the full set of variables (0.833-0.861) was better than that of the baseline model (0.796). The XGB model of AUROC 0.861 (95% CI 0.848-0.874) showed a higher discriminative performance than the DNN model of 0.833 (95% CI 0.819-0.848). The XGB and DNN models proved better reclassification than the baseline model with a positive net reclassification index. The XGB models were well-calibrated (Hosmer-Lemeshow test; P>.05); however, the DNN showed poor calibration power (Hosmer-Lemeshow test; P<.001). We further interpreted the nonlinear association between variables and critical care prediction. Conclusions Our study demonstrated that the performance of the XGB model using initial information at ED triage for predicting patients in need of critical care outperformed the conventional model with KTAS.
Collapse
Affiliation(s)
- Hyoungju Yun
- Interdisciplinary Program of Medical Informatics, College of Medicine, Seoul National University, Seoul, KR
| | - Jinwook Choi
- Interdisciplinary Program of Medical Informatics, College of Medicine, Seoul National University, Seoul, KR.,Department of Biomedical Engineering, College of Medicine, Seoul National University, Seoul, KR.,Institute of Medical and Biological Engineering,, Seoul National University Medical Research Center, 103 Daehak-Ro, Jongno-Gu, Seoul, KR
| | - Jeong Ho Park
- Department of Emergency Medicine, College of Medicine, Seoul National University, Seoul, KR.,Laboratory of Emergency Medical Services, Seoul National University Hospital Biomedical Research Institute, Seoul, KR
| |
Collapse
|
26
|
Analysis of DNA Sequence Classification Using CNN and Hybrid Models. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:1835056. [PMID: 34306171 PMCID: PMC8285202 DOI: 10.1155/2021/1835056] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 06/25/2021] [Indexed: 12/23/2022]
Abstract
In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Regardless, the feature selection process remains the most challenging aspect of the issue. The most commonly used representations worsen the case of high dimensionality, and sequences lack explicit features. It also helps in detecting the effect of viruses and drug design. In recent days, deep learning (DL) models can automatically extract the features from the input. In this work, we employed CNN, CNN-LSTM, and CNN-Bidirectional LSTM architectures using Label and K-mer encoding for DNA sequence classification. The models are evaluated on different classification metrics. From the experimental results, the CNN and CNN-Bidirectional LSTM with K-mer encoding offers high accuracy with 93.16% and 93.13%, respectively, on testing data.
Collapse
|
27
|
Tan X, Wu X, Han M, Wang L, Xu L, Li B, Yuan Y. Yeast autonomously replicating sequence (ARS): Identification, function, and modification. Eng Life Sci 2021. [DOI: 10.1002/elsc.202000085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Affiliation(s)
- Xiao‐Yu Tan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Xiao‐Le Wu
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Ming‐Zhe Han
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Li Wang
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Li Xu
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Bing‐Zhi Li
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Ying‐Jin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| |
Collapse
|
28
|
Le NQK, Do DT, Nguyen TTD, Le QA. A sequence-based prediction of Kruppel-like factors proteins using XGBoost and optimized features. Gene 2021; 787:145643. [PMID: 33848577 DOI: 10.1016/j.gene.2021.145643] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/07/2021] [Indexed: 10/21/2022]
Abstract
Krüppel-like factors (KLF) refer to a group of conserved zinc finger-containing transcription factors that are involved in various physiological and biological processes, including cell proliferation, differentiation, development, and apoptosis. Some bioinformatics methods such as sequence similarity searches, multiple sequence alignment, phylogenetic reconstruction, and gene synteny analysis have also been proposed to broaden our knowledge of KLF proteins. In this study, we proposed a novel computational approach by using machine learning on features calculated from primary sequences. To detail, our XGBoost-based model is efficient in identifying KLF proteins, with accuracy of 96.4% and MCC of 0.704. It also holds a promising performance when testing our model on an independent dataset. Therefore, our model could serve as an useful tool to identify new KLF proteins and provide necessary information for biologists and researchers in KLF proteins. Our machine learning source codes as well as datasets are freely available at https://github.com/khanhlee/KLF-XGB.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan.
| | - Duyen Thi Do
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei 106, Taiwan
| | | | - Quynh Anh Le
- Faculty of Applied Sciences, Ton Duc Thang University, No. 19 Nguyen Huu Tho Street, Tan Hung Ward, District 7, Ho Chi Minh City, Viet Nam
| |
Collapse
|
29
|
Yao Y, Zhang S, Liang Y. iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:317-331. [PMID: 33730950 DOI: 10.1080/1062936x.2021.1895884] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 02/23/2021] [Indexed: 06/12/2023]
Abstract
DNA replication is not only the basis of biological inheritance but also the most fundamental process in all living organisms. It plays a crucial role in the cell-division cycle and gene expression regulation. Hence, the accurate identification of the origin of replication sites (ORIs) has a great meaning for further understanding the regulatory mechanism of gene expression and treating genic diseases. In this paper, a novel, feasible and powerful model, namely, iORI-ENST is designed for identifying ORIs. Firstly, we extract the different features by incorporating mono-nucleotide binary encoding and dinucleotide-based spatial autocorrelation. Subsequently, elastic net is utilized as the feature selection method to select the optimal feature set. And then stacking learning is employed to predict ORIs and non-ORIs, which contains random forest, adaboost, gradient boosting decision tree, extra trees and support vector machine. Finally, the ORI sites are identified on the benchmark datasets S1 and S2 with their accuracies of 91.41% and 95.07%, respectively. Meanwhile, an independent dataset S3 is employed to verify the validation and transferability of our model and its accuracy reaches 91.10%. Comparing with state-of-the-art methods, our model achieves more remarkable performance. The results show our model is a feasible, effective and powerful tool for identifying ORIs. The source code and datasets are available at https://github.com/YingyingYao/iORI-ENST.
Collapse
Affiliation(s)
- Y Yao
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - S Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Y Liang
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| |
Collapse
|
30
|
Le NQK, Hung TNK, Do DT, Lam LHT, Dang LH, Huynh TT. Radiomics-based machine learning model for efficiently classifying transcriptome subtypes in glioblastoma patients from MRI. Comput Biol Med 2021; 132:104320. [PMID: 33735760 DOI: 10.1016/j.compbiomed.2021.104320] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 12/13/2022]
Abstract
BACKGROUND In the field of glioma, transcriptome subtypes have been considered as an important diagnostic and prognostic biomarker that may help improve the treatment efficacy. However, existing identification methods of transcriptome subtypes are limited due to the relatively long detection period, the unattainability of tumor specimens via biopsy or surgery, and the fleeting nature of intralesional heterogeneity. In search of a superior model over previous ones, this study evaluated the efficiency of eXtreme Gradient Boosting (XGBoost)-based radiomics model to classify transcriptome subtypes in glioblastoma patients. METHODS This retrospective study retrieved patients from TCGA-GBM and IvyGAP cohorts with pathologically diagnosed glioblastoma, and separated them into different transcriptome subtypes groups. GBM patients were then segmented into three different regions of MRI: enhancement of the tumor core (ET), non-enhancing portion of the tumor core (NET), and peritumoral edema (ED). We subsequently used handcrafted radiomics features (n = 704) from multimodality MRI and two-level feature selection techniques (Spearman correlation and F-score tests) in order to find the features that could be relevant. RESULTS After the feature selection approach, we identified 13 radiomics features that were the most meaningful ones that can be used to reach the optimal results. With these features, our XGBoost model reached the predictive accuracies of 70.9%, 73.3%, 88.4%, and 88.4% for classical, mesenchymal, neural, and proneural subtypes, respectively. Our model performance has been improved in comparison with the other models as well as previous works on the same dataset. CONCLUSION The use of XGBoost and two-level feature selection analysis (Spearman correlation and F-score) could be expected as a potential combination for classifying transcriptome subtypes with high performance and might raise public attention for further research on radiomics-based GBM models.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 106, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, 106, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, 110, Taiwan.
| | - Truong Nguyen Khanh Hung
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; Orthopedic and Trauma Department, Cho Ray Hospital, Ho Chi Minh City, 70000, Viet Nam
| | - Duyen Thi Do
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, 106, Taiwan
| | - Luu Ho Thanh Lam
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; Children's Hospital 2, Ho Chi Minh City, 70000, Viet Nam
| | - Luong Huu Dang
- Department of Otolaryngology, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, 70000, Viet Nam
| | - Tuan-Tu Huynh
- Department of Electrical Engineering, Yuan Ze University, No. 135, Yuandong Road, Zhongli, 320, Taoyuan, Taiwan; Department of Electrical Electronic and Mechanical Engineering, Lac Hong University, No. 10, Huynh Van Nghe Road, Bien Hoa, Dong Nai, 76120, Viet Nam
| |
Collapse
|
31
|
Wu F, Yang R, Zhang C, Zhang L. A deep learning framework combined with word embedding to identify DNA replication origins. Sci Rep 2021; 11:844. [PMID: 33436981 PMCID: PMC7804333 DOI: 10.1038/s41598-020-80670-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 12/24/2020] [Indexed: 01/29/2023] Open
Abstract
The DNA replication influences the inheritance of genetic information in the DNA life cycle. As the distribution of replication origins (ORIs) is the major determinant to precisely regulate the replication process, the correct identification of ORIs is significant in giving an insightful understanding of DNA replication mechanisms and the regulatory mechanisms of genetic expressions. For eukaryotes in particular, multiple ORIs exist in each of their gene sequences to complete the replication in a reasonable period of time. To simplify the identification process of eukaryote's ORIs, most of existing methods are developed by traditional machine learning algorithms, and target to the gene sequences with a fixed length. Consequently, the identification results are not satisfying, i.e. there is still great room for improvement. To break through the limitations in previous studies, this paper develops sequence segmentation methods, and employs the word embedding technique, 'Word2vec', to convert gene sequences into word vectors, thereby grasping the inner correlations of gene sequences with different lengths. Then, a deep learning framework to perform the ORI identification task is constructed by a convolutional neural network with an embedding layer. On the basis of the analysis of similarity reduction dimensionality diagram, Word2vec can effectively transform the inner relationship among words into numerical feature. For four species in this study, the best models are obtained with the overall accuracy of 0.975, 0.765, 0.885, 0.967, the Matthew's correlation coefficient of 0.940, 0.530, 0.771, 0.934, and the AUC of 0.975, 0.800, 0.888, 0.981, which indicate that the proposed predictor has a stable ability and provide a high confidence coefficient to classify both of ORIs and non-ORIs. Compared with state-of-the-art methods, the proposed predictor can achieve ORI identification with significant improvement. It is therefore reasonable to anticipate that the proposed method will make a useful high throughput tool for genome analysis.
Collapse
Affiliation(s)
- Feng Wu
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, Weihai, 264200, China
| | - Runtao Yang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, Weihai, 264200, China.
| | - Chengjin Zhang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, Weihai, 264200, China
| | - Lina Zhang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, Weihai, 264200, China
| |
Collapse
|
32
|
Le NQK, Do DT, Hung TNK, Lam LHT, Huynh TT, Nguyen NTK. A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification. Int J Mol Sci 2020; 21:E9070. [PMID: 33260643 PMCID: PMC7730808 DOI: 10.3390/ijms21239070] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 11/25/2020] [Accepted: 11/26/2020] [Indexed: 01/13/2023] Open
Abstract
Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Duyen Thi Do
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei 106, Taiwan;
| | - Truong Nguyen Khanh Hung
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan; (T.N.K.H.); (L.H.T.L.)
- Department of Orthopedic and Trauma, Cho Ray Hospital, Ho Chi Minh 70000, Vietnam
| | - Luu Ho Thanh Lam
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan; (T.N.K.H.); (L.H.T.L.)
- Intensive Care Unit, Children’s Hospital 2, Ho Chi Minh 70000, Vietnam
| | - Tuan-Tu Huynh
- Department of Electrical Engineering, Yuan Ze University, Taoyuan 320, Taiwan;
- Department of Electrical Electronic and Mechanical Engineering, Lac Hong University, Dong Nai 76120, Vietnam
| | - Ngan Thi Kim Nguyen
- School of Nutrition and Health Sciences, Taipei Medical University, Taipei 110, Taiwan;
| |
Collapse
|
33
|
Manavalan B, Basith S, Shin TH, Lee G. Computational prediction of species-specific yeast DNA replication origin via iterative feature representation. Brief Bioinform 2020; 22:6000361. [PMID: 33232970 PMCID: PMC8294535 DOI: 10.1093/bib/bbaa304] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 10/08/2020] [Accepted: 10/09/2020] [Indexed: 12/13/2022] Open
Abstract
Deoxyribonucleic acid replication is one of the most crucial tasks taking place in the cell, and it has to be precisely regulated. This process is initiated in the replication origins (ORIs), and thus it is essential to identify such sites for a deeper understanding of the cellular processes and functions related to the regulation of gene expression. Considering the important tasks performed by ORIs, several experimental and computational approaches have been developed in the prediction of such sites. However, existing computational predictors for ORIs have certain curbs, such as building only single-feature encoding models, limited systematic feature engineering efforts and failure to validate model robustness. Hence, we developed a novel species-specific yeast predictor called yORIpred that accurately identify ORIs in the yeast genomes. To develop yORIpred, we first constructed optimal 40 baseline models by exploring eight different sequence-based encodings and five different machine learning classifiers. Subsequently, the predicted probability of 40 models was considered as the novel feature vector and carried out iterative feature learning approach independently using five different classifiers. Our systematic analysis revealed that the feature representation learned by the support vector machine algorithm (yORIpred) could well discriminate the distribution characteristics between ORIs and non-ORIs when compared with the other four algorithms. Comprehensive benchmarking experiments showed that yORIpred achieved superior and stable performance when compared with the existing predictors on the same training datasets. Furthermore, independent evaluation showcased the best and accurate performance of yORIpred thus underscoring the significance of iterative feature representation. To facilitate the users in obtaining their desired results without undergoing any mathematical, statistical or computational hassles, we developed a web server for the yORIpred predictor, which is available at: http://thegleelab.org/yORIpred.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| |
Collapse
|
34
|
Sadak F, Saadat M, Hajiyavand AM. Real-time deep learning-based image recognition for applications in automated positioning and injection of biological cells. Comput Biol Med 2020; 125:103976. [DOI: 10.1016/j.compbiomed.2020.103976] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 08/13/2020] [Accepted: 08/14/2020] [Indexed: 11/29/2022]
|