1
|
Liao Y, Yang G, Pan W, Lu Y. OA-HybridCNN (OHC): An advanced deep learning fusion model for enhanced diagnostic accuracy in knee osteoarthritis imaging. PLoS One 2025; 20:e0322540. [PMID: 40334259 PMCID: PMC12058133 DOI: 10.1371/journal.pone.0322540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Accepted: 03/24/2025] [Indexed: 05/09/2025] Open
Abstract
Knee osteoarthritis (KOA) is a leading cause of disability globally. Early and accurate diagnosis is paramount in preventing its progression and improving patients' quality of life. However, the inconsistency in radiologists' expertise and the onset of visual fatigue during prolonged image analysis often compromise diagnostic accuracy, highlighting the need for automated diagnostic solutions. In this study, we present an advanced deep learning model, OA-HybridCNN (OHC), which integrates ResNet and DenseNet architectures. This integration effectively addresses the gradient vanishing issue in DenseNet and augments prediction accuracy. To evaluate its performance, we conducted a thorough comparison with other deep learning models using five-fold cross-validation and external tests. The OHC model outperformed its counterparts across all performance metrics. In external testing, OHC exhibited an accuracy of 91.77%, precision of 92.34%, and recall of 91.36%. During the five-fold cross-validation, its average AUC and ACC were 86.34% and 87.42%, respectively. Deep learning, particularly exemplified by the OHC model, has greatly improved the efficiency and accuracy of KOA imaging diagnosis. The adoption of such technologies not only alleviates the burden on radiologists but also significantly enhances diagnostic precision.
Collapse
Affiliation(s)
- Yihan Liao
- Department of anesthesiology, Xuzhou Medical University, Xuzhou, China
| | - Guang Yang
- Department of Neurology, Kunshan Hospital of Traditional Chinese Medicine, Kunshan, China
| | - Wenjin Pan
- Department of anesthesiology, Xuzhou Medical University, Xuzhou, China
| | - Yun Lu
- Department of Min’s Wound, Kunshan Hospital of Traditional Chinese Medicine, Kunshan, China
- School of Nursing, Yangzhou University, Yangzhou, China
| |
Collapse
|
2
|
Wang W, Zhang Y, Zhai Y, Yang W, Xing Y. Alternative splicing dynamics during gastrulation in mouse embryo. Sci Rep 2025; 15:10948. [PMID: 40159515 PMCID: PMC11955514 DOI: 10.1038/s41598-025-96148-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2025] [Accepted: 03/26/2025] [Indexed: 04/02/2025] Open
Abstract
Alternative splicing (AS) plays an essential role in development, differentiation and carcinogenesis. However, the mechanisms underlying splicing regulation during mouse embryo gastrulation remain unclear. Based on spatial-temporal transcriptome and epigenome data, we detected the dynamics of AS and revealed its regulatory mechanisms across primary germ layers during mouse gastrulation, spanning developmental stages from E6.5 to E7.5. Subsequently, the dynamic expression of splicing factors (SFs) during gastrulation was characterized, while the expression patterns and functions of germ layer-specific SFs were identified. The results indicate that AS and differential alternative splicing events (DASEs) exhibit dynamic changes and are significantly abundant during the late stage of gastrulation. Similarly, SFs demonstrate stage-specific expression, with elevated levels observed during the middle and late stages of gastrulation. Epigenetic signals associated with SFs and AS sites demonstrate significant enrichment and undergo dynamic changes throughout gastrulation. Overall, this study offers a systematic analysis of AS during mouse gastrulation, identifies primary germ layer-specific AS events, and characterizes the expression patterns of SFs and the associated epigenetic signals. These findings enhance the understanding of the mechanisms underlying the formation of the three germ layers during mammalian gastrulation, with a focus on pre-mRNA AS.
Collapse
Affiliation(s)
- Wei Wang
- Inner Mongolia Key Laboratory of Life Health and Bioinformatics, School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Yu Zhang
- Inner Mongolia Key Laboratory of Life Health and Bioinformatics, School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Yuanyuan Zhai
- Inner Mongolia Key Laboratory of Life Health and Bioinformatics, School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Wuritu Yang
- Computer Department, Hohhot Vocational College, Hohhot, China.
| | - Yongqiang Xing
- Inner Mongolia Key Laboratory of Life Health and Bioinformatics, School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China.
| |
Collapse
|
3
|
Ye DX, Yu JW, Li R, Hao YD, Wang TY, Yang H, Ding H. The Prediction of Recombination Hotspot Based on Automated Machine Learning. J Mol Biol 2025; 437:168653. [PMID: 38871176 DOI: 10.1016/j.jmb.2024.168653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 05/12/2024] [Accepted: 06/06/2024] [Indexed: 06/15/2024]
Abstract
Meiotic recombination plays a pivotal role in genetic evolution. Genetic variation induced by recombination is a crucial factor in generating biodiversity and a driving force for evolution. At present, the development of recombination hotspot prediction methods has encountered challenges related to insufficient feature extraction and limited generalization capabilities. This paper focused on the research of recombination hotspot prediction methods. We explored deep learning-based recombination hotspot prediction and scrutinized the shortcomings of prevalent models in addressing the challenge of recombination hotspot prediction. To addressing these deficiencies, an automated machine learning approach was utilized to construct recombination hotspot prediction model. The model combined sequence information with physicochemical properties by employing TF-IDF-Kmer and DNA composition components to acquire more effective feature data. Experimental results validate the effectiveness of the feature extraction method and automated machine learning technology used in this study. The final model was validated on three distinct datasets and yielded accuracy rates of 97.14%, 79.71%, and 98.73%, surpassing the current leading models by 2%, 2.56%, and 4%, respectively. In addition, we incorporated tools such as SHAP and AutoGluon to analyze the interpretability of black-box models, delved into the impact of individual features on the results, and investigated the reasons behind misclassification of samples. Finally, an application of recombination hotspot prediction was established to facilitate easy access to necessary information and tools for researchers. The research outcomes of this paper underscore the enormous potential of automated machine learning methods in gene sequence prediction.
Collapse
Affiliation(s)
- Dong-Xin Ye
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jun-Wen Yu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Rui Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yu-Duo Hao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Tian-Yu Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Yang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
| | - Hui Ding
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
4
|
Wang H, Yan S, Wang W, Chen Y, Hong J, He Q, Diao X, Lin Y, Chen Y, Cao Y, Guo W, Fang W. Cropformer: An interpretable deep learning framework for crop genomic prediction. PLANT COMMUNICATIONS 2025; 6:101223. [PMID: 39690739 PMCID: PMC11956090 DOI: 10.1016/j.xplc.2024.101223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 10/15/2024] [Accepted: 12/12/2024] [Indexed: 12/19/2024]
Abstract
Machine learning and deep learning are extensively employed in genomic selection (GS) to expedite the identification of superior genotypes and accelerate breeding cycles. However, a significant challenge with current data-driven deep learning models in GS lies in their low robustness and poor interpretability. To address these challenges, we developed Cropformer, a deep learning framework for predicting crop phenotypes and exploring downstream tasks. This framework combines convolutional neural networks with multiple self-attention mechanisms to improve accuracy. The ability of Cropformer to predict complex phenotypic traits was extensively evaluated on more than 20 traits across five major crops: maize, rice, wheat, foxtail millet, and tomato. Evaluation results show that Cropformer outperforms other GS methods in both precision and robustness, achieving up to a 7.5% improvement in prediction accuracy compared to the runner-up model. Additionally, Cropformer enhances the analysis and mining of genes associated with traits. We identified numerous single nucleotide polymorphisms (SNPs) with potential effects on maize phenotypic traits and revealed key genetic variations underlying these differences. Cropformer represents a significant advancement in predictive performance and gene identification, providing a powerful general tool for improving genomic design in crop breeding. Cropformer is freely accessible at https://cgris.net/cropformer.
Collapse
Affiliation(s)
- Hao Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shen Yan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Wenxi Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yongming Chen
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China; State Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Shandong 261325, China
| | - Jingpeng Hong
- College of Information and Management Science, Henan Agricultural University, Zhengzhou 450002, China
| | - Qiang He
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Xianmin Diao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yunan Lin
- School of Engineering and Design, Technical University Munich, 85521 Munich, Germany
| | - Yanqing Chen
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yongsheng Cao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Weilong Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| | - Wei Fang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| |
Collapse
|
5
|
Ke S, Huang Y, Wang D, Jiang Q, Luo Z, Li B, Yan D, Zhou J. BreCML: identifying breast cancer cell state in scRNA-seq via machine learning. Front Med (Lausanne) 2024; 11:1482726. [PMID: 39574916 PMCID: PMC11579858 DOI: 10.3389/fmed.2024.1482726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Accepted: 10/15/2024] [Indexed: 11/24/2024] Open
Abstract
Breast cancer is a prevalent malignancy and one of the leading causes of cancer-related mortality among women worldwide. This disease typically manifests through the abnormal proliferation and dissemination of malignant cells within breast tissue. Current diagnostic and therapeutic strategies face significant challenges in accurately identifying and localizing specific subtypes of breast cancer. In this study, we developed a novel machine learning-based predictor, BreCML, designed to accurately classify subpopulations of breast cancer cells and their associated marker genes. BreCML exhibits outstanding predictive performance, achieving an accuracy of 98.92% on the training dataset. Utilizing the XGBoost algorithm, BreCML demonstrates superior accuracy (98.67%), precision (99.15%), recall (99.49%), and F1-score (99.79%) on the test dataset. Through the application of machine learning and feature selection techniques, BreCML successfully identified new key genes. This predictor not only serves as a powerful tool for assessing breast cancer cellular status but also offers a rapid and efficient means to uncover potential biomarkers, providing critical insights for precision medicine and therapeutic strategies.
Collapse
Affiliation(s)
- Shanbao Ke
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Yuxuan Huang
- Department of Neuroscience in the Behavioral Sciences, Duke University and Duke Kunshan University, Suzhou, China
| | - Dong Wang
- Pudong Institute for Health Development, Shanghai, China
| | - Qiang Jiang
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Zhangyang Luo
- Pudong Institute for Health Development, Shanghai, China
| | - Baiyu Li
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Danfang Yan
- Department of Radiation Oncology, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China
| | - Jianwei Zhou
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| |
Collapse
|
6
|
Wang Z, Gu Y, Huang L, Liu S, Chen Q, Yang Y, Hong G, Ning W. Construction of machine learning diagnostic models for cardiovascular pan-disease based on blood routine and biochemical detection data. Cardiovasc Diabetol 2024; 23:351. [PMID: 39342281 PMCID: PMC11439295 DOI: 10.1186/s12933-024-02439-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 09/11/2024] [Indexed: 10/01/2024] Open
Abstract
BACKGROUND Cardiovascular disease, also known as circulation system disease, remains the leading cause of morbidity and mortality worldwide. Traditional methods for diagnosing cardiovascular disease are often expensive and time-consuming. So the purpose of this study is to construct machine learning models for the diagnosis of cardiovascular diseases using easily accessible blood routine and biochemical detection data and explore the unique hematologic features of cardiovascular diseases, including some metabolic indicators. METHODS After the data preprocessing, 25,794 healthy people and 32,822 circulation system disease patients with the blood routine and biochemical detection data were utilized for our study. We selected logistic regression, random forest, support vector machine, eXtreme Gradient Boosting (XGBoost), and deep neural network to construct models. Finally, the SHAP algorithm was used to interpret models. RESULTS The circulation system disease prediction model constructed by XGBoost possessed the best performance (AUC: 0.9921 (0.9911-0.9930); Acc: 0.9618 (0.9588-0.9645); Sn: 0.9690 (0.9655-0.9723); Sp: 0.9526 (0.9477-0.9572); PPV: 0.9631 (0.9592-0.9668); NPV: 0.9600 (0.9556-0.9644); MCC: 0.9224 (0.9165-0.9279); F1 score: 0.9661 (0.9634-0.9686)). Most models of distinguishing various circulation system diseases also had good performance, the model performance of distinguishing dilated cardiomyopathy from other circulation system diseases was the best (AUC: 0.9267 (0.8663-0.9752)). The model interpretation by the SHAP algorithm indicated features from biochemical detection made major contributions to predicting circulation system disease, such as potassium (K), total protein (TP), albumin (ALB), and indirect bilirubin (NBIL). But for models of distinguishing various circulation system diseases, we found that red blood cell count (RBC), K, direct bilirubin (DBIL), and glucose (GLU) were the top 4 features subdividing various circulation system diseases. CONCLUSIONS The present study constructed multiple models using 50 features from the blood routine and biochemical detection data for the diagnosis of various circulation system diseases. At the same time, the unique hematologic features of various circulation system diseases, including some metabolic-related indicators, were also explored. This cost-effective work will benefit more people and help diagnose and prevent circulation system diseases.
Collapse
Affiliation(s)
- Zhicheng Wang
- Institute for Clinical Medical Research, School of Medicine, The First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China
- Department of Laboratory Medicine, Xiamen Key Laboratory of Genetic Testing, School of Medicine, the First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China
- Department of Otolaryngology, School of Medicine, Xiamen University, Xiamen, 361003, Fujian, China
| | - Ying Gu
- Institute for Clinical Medical Research, School of Medicine, The First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China
| | - Lindan Huang
- Institute for Clinical Medical Research, School of Medicine, The First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China
- Department of Laboratory Medicine, Xiamen Key Laboratory of Genetic Testing, School of Medicine, the First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China
| | - Shuai Liu
- Institute for Clinical Medical Research, School of Medicine, The First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China
| | - Qun Chen
- Institute for Clinical Medical Research, School of Medicine, The First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China
| | - Yunyun Yang
- Department of Laboratory Medicine, Xiamen Key Laboratory of Genetic Testing, School of Medicine, the First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China.
| | - Guolin Hong
- Department of Laboratory Medicine, Xiamen Key Laboratory of Genetic Testing, School of Medicine, the First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China.
| | - Wanshan Ning
- Institute for Clinical Medical Research, School of Medicine, The First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, 361003, Fujian, China.
| |
Collapse
|
7
|
Hong Y, Li H, Long C, Liang P, Zhou J, Zuo Y. An increment of diversity method for cell state trajectory inference of time-series scRNA-seq data. FUNDAMENTAL RESEARCH 2024; 4:770-776. [PMID: 39156571 PMCID: PMC11330101 DOI: 10.1016/j.fmre.2024.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/29/2023] [Accepted: 01/03/2024] [Indexed: 08/20/2024] Open
Abstract
The increasing emergence of the time-series single-cell RNA sequencing (scRNA-seq) data, inferring developmental trajectory by connecting transcriptome similar cell states (i.e., cell types or clusters) has become a major challenge. Most existing computational methods are designed for individual cells and do not take into account the available time series information. We present IDTI based on the Increment of Diversity for Trajectory Inference, which combines time series information and the minimum increment of diversity method to infer cell state trajectory of time-series scRNA-seq data. We apply IDTI to simulated and three real diverse tissue development datasets, and compare it with six other commonly used trajectory inference methods in terms of topology similarity and branching accuracy. The results have shown that the IDTI method accurately constructs the cell state trajectory without the requirement of starting cells. In the performance test, we further demonstrate that IDTI has the advantages of high accuracy and strong robustness.
Collapse
Affiliation(s)
| | | | - Chunshen Long
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010020, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010020, China
| | - Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010020, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot 010020, China
| |
Collapse
|
8
|
Liu L, Huang Y, Zheng Y, Liao Y, Ma S, Wang Q. ScnML models single-cell transcriptome to predict spinal cord neuronal cell status. Front Genet 2024; 15:1413484. [PMID: 38894722 PMCID: PMC11183327 DOI: 10.3389/fgene.2024.1413484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Injuries to the spinal cord nervous system often result in permanent loss of sensory, motor, and autonomic functions. Accurately identifying the cellular state of spinal cord nerves is extremely important and could facilitate the development of new therapeutic and rehabilitative strategies. Existing experimental techniques for identifying the development of spinal cord nerves are both labor-intensive and costly. In this study, we developed a machine learning predictor, ScnML, for predicting subpopulations of spinal cord nerve cells as well as identifying marker genes. The prediction performance of ScnML was evaluated on the training dataset with an accuracy of 94.33%. Based on XGBoost, ScnML on the test dataset achieved 94.08% 94.24%, 94.26%, and 94.24% accuracies with precision, recall, and F1-measure scores, respectively. Importantly, ScnML identified new significant genes through model interpretation and biological landscape analysis. ScnML can be a powerful tool for predicting the status of spinal cord neuronal cells, revealing potential specific biomarkers quickly and efficiently, and providing crucial insights for precision medicine and rehabilitation recovery.
Collapse
Affiliation(s)
- Lijia Liu
- School of Recreation and Community Sport, Capital University of Physical Education and Sports, Beijing, China
| | - Yuxuan Huang
- Department of Neuroscience in the Behavioral Sciences, Duke University and Duke Kunshan University, Suzhou, Jiangsu, China
| | - Yuan Zheng
- Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Luqiao, China
| | - Yihan Liao
- Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Luqiao, China
| | - Siyuan Ma
- School of Recreation and Community Sport, Capital University of Physical Education and Sports, Beijing, China
| | - Qian Wang
- Department of Neurology, The First Hospital of Tsinghua University, Beijing, China
| |
Collapse
|
9
|
Tong K, Chen X, Yan S, Dai L, Liao Y, Li Z, Wang T. PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics. Genes (Basel) 2024; 15:603. [PMID: 38790232 PMCID: PMC11120712 DOI: 10.3390/genes15050603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/05/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
As a fundamental global staple crop, rice plays a pivotal role in human nutrition and agricultural production systems. However, its complex genetic architecture and extensive trait variability pose challenges for breeders and researchers in optimizing yield and quality. Particularly to expedite breeding methods like genomic selection, isolating core SNPs related to target traits from genome-wide data reduces irrelevant mutation noise, enhancing computational precision and efficiency. Thus, exploring efficient computational approaches to mine core SNPs is of great importance. This study introduces PlantMine, an innovative computational framework that integrates feature selection and machine learning techniques to effectively identify core SNPs critical for the improvement of rice traits. Utilizing the dataset from the 3000 Rice Genomes Project, we applied different algorithms for analysis. The findings underscore the effectiveness of combining feature selection with machine learning in accurately identifying core SNPs, offering a promising avenue to expedite rice breeding efforts and improve crop productivity and resilience to stress.
Collapse
Affiliation(s)
- Kai Tong
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Xiaojing Chen
- National Agriculture Science Data Center, Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China;
- National Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya 572024, China
| | - Shen Yan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China;
| | - Liangli Dai
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Yuxue Liao
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Zhaoling Li
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Ting Wang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
- Key Laboratory of Big Agri-Data, Ministry of Agriculture and Rural Areas, Beijing 100081, China
| |
Collapse
|
10
|
Jin W, Jia J, Si Y, Liu J, Li H, Zhu H, Wu Z, Zuo Y, Yu L. Identification of Key lncRNAs Associated with Immune Infiltration and Prognosis in Gastric Cancer. Biochem Genet 2024:10.1007/s10528-024-10801-w. [PMID: 38658494 DOI: 10.1007/s10528-024-10801-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 04/05/2024] [Indexed: 04/26/2024]
Abstract
Long non-coding RNAs (lncRNAs), as promising novel biomarkers for cancer treatment and prognosis, can function as tumor suppressors and oncogenes in the occurrence and development of many types of cancer, including gastric cancer (GC). However, little is known about the complex regulatory system of lncRNAs in GC. In this study, we systematically analyzed lncRNA and miRNA transcriptomic profiles of GC based on bioinformatics methods and experimental validation. An lncRNA-miRNA interaction network related to GC was constructed, and the nine crucial lncRNAs were identified. These 9 lncRNAs were found to be associated with the prognosis of GC patients by Cox proportional hazards regression analysis. Among them, the expression of lncRNA SNHG14 can affect the survival of GC patients as a potential prognostic marker. Moreover, it was shown that SNHG14 was involved in immune-related pathways and significantly correlated with immune cell infiltration in GC. Meanwhile, we found that SNHG14 affected immune function in many cancers, such as breast cancer and esophageal carcinoma. Such information revealed that SNHG14 may serve as a potential target for cancer immunotherapy. As well, our study could provide practical and theoretical guiding significance for clinical application of non-coding RNAs.
Collapse
Affiliation(s)
- Wen Jin
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China
| | - Jianchao Jia
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China
| | - Yangming Si
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Jianli Liu
- School of Water Resource and Environment Engineering, China University of Geosciences, Beijing, 100083, China
| | - Hanshuang Li
- College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Hao Zhu
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China
| | - Zhouying Wu
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China
| | - Yongchun Zuo
- College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot, 010010, China.
- Inner Mongolia International Mongolian Hospital, Hohhot, 010065, China.
| | - Lan Yu
- Clinical Medical Research Center, Inner Mongolia Key Laboratory of Gene Regulation of the Metabolic Disease, Inner Mongolia People's Hospital, Hohhot, 010010, China.
- Department of Endocrine and Metabolic Diseases, Inner Mongolia People's Hospital, Hohhot, 010010, China.
| |
Collapse
|
11
|
Li C, Ye G, Jiang Y, Wang Z, Yu H, Yang M. Artificial Intelligence in battling infectious diseases: A transformative role. J Med Virol 2024; 96:e29355. [PMID: 38179882 DOI: 10.1002/jmv.29355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/01/2023] [Accepted: 12/17/2023] [Indexed: 01/06/2024]
Abstract
It is widely acknowledged that infectious diseases have wrought immense havoc on human society, being regarded as adversaries from which humanity cannot elude. In recent years, the advancement of Artificial Intelligence (AI) technology has ushered in a revolutionary era in the realm of infectious disease prevention and control. This evolution encompasses early warning of outbreaks, contact tracing, infection diagnosis, drug discovery, and the facilitation of drug design, alongside other facets of epidemic management. This article presents an overview of the utilization of AI systems in the field of infectious diseases, with a specific focus on their role during the COVID-19 pandemic. The article also highlights the contemporary challenges that AI confronts within this domain and posits strategies for their mitigation. There exists an imperative to further harness the potential applications of AI across multiple domains to augment its capacity in effectively addressing future disease outbreaks.
Collapse
Affiliation(s)
- Chunhui Li
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Guoguo Ye
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for Infectious Disease, The Third People's Hospital of Shenzhen, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
| | - Yinghan Jiang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Zhiming Wang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Haiyang Yu
- Hangzhou Yalla Information Technology Service Co., Ltd., Hangzhou, People's Republic of China
| | - Minghui Yang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| |
Collapse
|
12
|
Wang H, Lin YN, Yan S, Hong JP, Tan JR, Chen YQ, Cao YS, Fang W. NRTPredictor: identifying rice root cell state in single-cell RNA-seq via ensemble learning. PLANT METHODS 2023; 19:119. [PMID: 37925413 PMCID: PMC10625708 DOI: 10.1186/s13007-023-01092-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 10/15/2023] [Indexed: 11/06/2023]
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) measurements of gene expression show great promise for studying the cellular heterogeneity of rice roots. How precisely annotating cell identity is a major unresolved problem in plant scRNA-seq analysis due to the inherent high dimensionality and sparsity. RESULTS To address this challenge, we present NRTPredictor, an ensemble-learning system, to predict rice root cell stage and mine biomarkers through complete model interpretability. The performance of NRTPredictor was evaluated using a test dataset, with 98.01% accuracy and 95.45% recall. With the power of interpretability provided by NRTPredictor, our model recognizes 110 marker genes partially involved in phenylpropanoid biosynthesis. Expression patterns of rice root could be mapped by the above-mentioned candidate genes, showing the superiority of NRTPredictor. Integrated analysis of scRNA and bulk RNA-seq data revealed aberrant expression of Epidermis cell subpopulations in flooding, Pi, and salt stresses. CONCLUSION Taken together, our results demonstrate that NRTPredictor is a useful tool for automated prediction of rice root cell stage and provides a valuable resource for deciphering the rice root cellular heterogeneity and the molecular mechanisms of flooding, Pi, and salt stresses. Based on the proposed model, a free webserver has been established, which is available at https://www.cgris.net/nrtp .
Collapse
Affiliation(s)
- Hao Wang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yu-Nan Lin
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Shen Yan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jing-Peng Hong
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jia-Rui Tan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yan-Qing Chen
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Yong-Sheng Cao
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Wei Fang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| |
Collapse
|
13
|
Momanyi BM, Zulfiqar H, Grace-Mercure BK, Ahmed Z, Ding H, Gao H, Liu F. CFNCM: Collaborative filtering neighborhood-based model for predicting miRNA-disease associations. Comput Biol Med 2023; 163:107165. [PMID: 37315383 DOI: 10.1016/j.compbiomed.2023.107165] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 05/31/2023] [Accepted: 06/08/2023] [Indexed: 06/16/2023]
Abstract
MicroRNAs have a significant role in the emergence of various human disorders. Consequently, it is essential to understand the existing interactions between miRNAs and diseases, as this will help scientists better study and comprehend the diseases' biological mechanisms. Findings can be employed as biomarkers or drug targets to advance the detection, diagnosis, and treatment of complex human disorders by foretelling possible disease-related miRNAs. This study proposed a computational model for predicting potential miRNA-disease associations called the Collaborative Filtering Neighborhood-based Classification Model (CFNCM), in light of the shortcomings of conventional and biological experiments, which are expensive and time-consuming. The model generated integrated miRNA and disease similarity matrices using the validated associations and miRNA and disease similarity information and used them as the input features for CFNCM. To produce class labels, we first determined the association scores for brand-new pairs using user-based collaborative filtering. With zero as the threshold, the associations with scores >0 were labelled 1, indicating a potential positive association, otherwise, it is marked as 0. Then, we developed classification models using various machine-learning algorithms. By comparison, we discovered that the support vector machine (SVM) produced the best AUC of 0.96 with 10-fold cross-validation through the GridSearchCV technique for identifying optimal parameter values. In addition, the models were evaluated and verified by analyzing the top 50 breast and lung neoplasms-related miRNAs, of which 46 and 47 associations were verified in two authoritative databases, dbDEMC and miR2Disease.
Collapse
Affiliation(s)
- Biffon Manyura Momanyi
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, 313001, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Zahoor Ahmed
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, 313001, China
| | - Hui Ding
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Hui Gao
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| | - Fen Liu
- Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus), Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Hospital, Hohhot, China.
| |
Collapse
|
14
|
Zhu W, Yuan SS, Li J, Huang CB, Lin H, Liao B. A First Computational Frame for Recognizing Heparin-Binding Protein. Diagnostics (Basel) 2023; 13:2465. [PMID: 37510209 PMCID: PMC10377868 DOI: 10.3390/diagnostics13142465] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 07/13/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023] Open
Abstract
Heparin-binding protein (HBP) is a cationic antibacterial protein derived from multinuclear neutrophils and an important biomarker of infectious diseases. The correct identification of HBP is of great significance to the study of infectious diseases. This work provides the first HBP recognition framework based on machine learning to accurately identify HBP. By using four sequence descriptors, HBP and non-HBP samples were represented by discrete numbers. By inputting these features into a support vector machine (SVM) and random forest (RF) algorithm and comparing the prediction performances of these methods on training data and independent test data, it is found that the SVM-based classifier has the greatest potential to identify HBP. The model could produce an auROC of 0.981 ± 0.028 on training data using 10-fold cross-validation and an overall accuracy of 95.0% on independent test data. As the first model for HBP recognition, it will provide some help for infectious diseases and stimulate further research in related fields.
Collapse
Affiliation(s)
- Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | - Shi-Shi Yuan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jian Li
- School of Basic Medical Sciences, Chengdu University, Chengdu 610106, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, ABa Teachers University, Chengdu 623002, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| |
Collapse
|
15
|
Liu S, Liang Y, Li J, Yang S, Liu M, Liu C, Yang D, Zuo Y. Integrating reduced amino acid composition into PSSM for improving copper ion-binding protein prediction. Int J Biol Macromol 2023:124993. [PMID: 37307968 DOI: 10.1016/j.ijbiomac.2023.124993] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/12/2023] [Accepted: 05/19/2023] [Indexed: 06/14/2023]
Abstract
Copper ion-binding proteins play an essential role in metabolic processes and are critical factors in many diseases, such as breast cancer, lung cancer, and Menkes disease. Many algorithms have been developed for predicting metal ion classification and binding sites, but none have been applied to copper ion-binding proteins. In this study, we developed a copper ion-bound protein classifier, RPCIBP, which integrating the reduced amino acid composition into position-specific score matrix (PSSM). The reduced amino acid composition filters out a large number of useless evolutionary features, improving the operational efficiency and predictive ability of the model (feature dimension from 2900 to 200, ACC from 83 % to 85.1 %). Compared with the basic model using only three sequence feature extraction methods (ACC in training set between 73.8 %-86.2 %, ACC in test set between 69.3 %-87.5 %), the model integrating the evolutionary features of the reduced amino acid composition showed higher accuracy and robustness (ACC in training set between 83.1 %-90.8 %, ACC in test set between 79.1 %-91.9 %). Best copper ion-binding protein classifiers filtered by feature selection progress were deployed in a user-friendly web server (http://bioinfor.imu.edu.cn/RPCIBP). RPCIBP can accurately predict copper ion-binding proteins, which is convenient for further structural and functional studies, and conducive to mechanism exploration and target drug development.
Collapse
Affiliation(s)
- Shanghua Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China
| | - Jinzhao Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Siqi Yang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Ming Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Chengfang Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Dezhi Yang
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China.
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Hohhot 010010, China.
| |
Collapse
|
16
|
Lin Y, Sun M, Zhang J, Li M, Yang K, Wu C, Zulfiqar H, Lai H. Computational identification of promoters in Klebsiella aerogenes by using support vector machine. Front Microbiol 2023; 14:1200678. [PMID: 37250059 PMCID: PMC10215528 DOI: 10.3389/fmicb.2023.1200678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 04/18/2023] [Indexed: 05/31/2023] Open
Abstract
Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in Klebsiella aerogenes (K. aerogenes). In the prediction model, the promoter sequences in K. aerogenes genome were encoded by pseudo k-tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in K. aerogenes. Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in K. aerogenes.
Collapse
Affiliation(s)
- Yan Lin
- Key Laboratory for Animal Disease-Resistance Nutrition of the Ministry of Agriculture, Animal Nutrition Institute, Sichuan Agricultural University, Chengdu, China
| | - Meili Sun
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Junjie Zhang
- Key Laboratory for Animal Disease-Resistance Nutrition of the Ministry of Agriculture, Animal Nutrition Institute, Sichuan Agricultural University, Chengdu, China
| | - Mingyan Li
- Chifeng Product Quality Inspection and Testing Centre, Chifeng, China
| | - Keli Yang
- Nonlinear Research Institute, Baoji University of Arts and Sciences, Baoji, China
| | - Chengyan Wu
- Baotou Teacher’s College, Inner Mongolia University of Science and Technology, Baotou, China
| | - Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Hongyan Lai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
17
|
Liu M, Zhou J, Xi Q, Liang Y, Li H, Liang P, Guo Y, Liu M, Temuqile T, Yang L, Zuo Y. A computational framework of routine test data for the cost-effective chronic disease prediction. Brief Bioinform 2023; 24:7034465. [PMID: 36772998 DOI: 10.1093/bib/bbad054] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 01/04/2023] [Accepted: 01/26/2023] [Indexed: 02/12/2023] Open
Abstract
Chronic diseases, because of insidious onset and long latent period, have become the major global disease burden. However, the current chronic disease diagnosis methods based on genetic markers or imaging analysis are challenging to promote completely due to high costs and cannot reach universality and popularization. This study analyzed massive data from routine blood and biochemical test of 32 448 patients and developed a novel framework for cost-effective chronic disease prediction with high accuracy (AUC 87.32%). Based on the best-performing XGBoost algorithm, 20 classification models were further constructed for 17 types of chronic diseases, including 9 types of cancers, 5 types of cardiovascular diseases and 3 types of mental illness. The highest accuracy of the model was 90.13% for cardia cancer, and the lowest was 76.38% for rectal cancer. The model interpretation with the SHAP algorithm showed that CREA, R-CV, GLU and NEUT% might be important indices to identify the most chronic diseases. PDW and R-CV are also discovered to be crucial indices in classifying the three types of chronic diseases (cardiovascular disease, cancer and mental illness). In addition, R-CV has a higher specificity for cancer, ALP for cardiovascular disease and GLU for mental illness. The association between chronic diseases was further revealed. At last, we build a user-friendly explainable machine-learning-based clinical decision support system (DisPioneer: http://bioinfor.imu.edu.cn/dispioneer) to assist in predicting, classifying and treating chronic diseases. This cost-effective work with simple blood tests will benefit more people and motivate clinical implementation and further investigation of chronic diseases prevention and surveillance program.
Collapse
Affiliation(s)
- Mingzhu Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot 010010, China
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Jian Zhou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot 010010, China
| | - Qilemuge Xi
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Yuchao Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot 010010, China
| | - Haicheng Li
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot 010010, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Yuting Guo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Ming Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
| | - Temuqile Temuqile
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot 010021, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot 010010, China
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| |
Collapse
|
18
|
Zulfiqar H, Guo Z, Grace-Mercure BK, Zhang ZY, Gao H, Lin H, Wu Y. Empirical comparison and recent advances of computational prediction of hormone binding proteins using machine learning methods. Comput Struct Biotechnol J 2023; 21:2253-2261. [PMID: 37035551 PMCID: PMC10073991 DOI: 10.1016/j.csbj.2023.03.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Hormone binding proteins (HBPs) belong to the group of soluble carrier proteins. These proteins selectively and non-covalently interact with hormones and promote growth hormone signaling in human and other animals. The HBPs are useful in many medical and commercial fields. Thus, the identification of HBPs is very important because it can help to discover more details about hormone binding proteins. Meanwhile, the experimental methods are time-consuming and expensive for hormone binding proteins recognition. Computational prediction methods have played significant roles in the correct recognition of hormone binding proteins with the use of sequence information and ML algorithms. In this review, we compared and assessed the implementation of ML-based tools in recognition of HBPs in a unique way. We hope that this study will give enough awareness and knowledge for research on HBPs.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang 313001, China
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhiling Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhao-Yue Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Gao
- School of Computer Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang 313001, China
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yun Wu
- College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
| |
Collapse
|
19
|
Zhang H, Chi M, Su D, Xiong Y, Wei H, Yu Y, Zuo Y, Yang L. A random forest-based metabolic risk model to assess the prognosis and metabolism-related drug targets in ovarian cancer. Comput Biol Med 2023; 153:106432. [PMID: 36608460 DOI: 10.1016/j.compbiomed.2022.106432] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 11/13/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
As one of the most common gynecologic malignant tumors, ovarian cancer is usually diagnosed at an advanced and incurable stage because of its early asymptomatic onset. Increasing research into tumor biology has demonstrated that abnormal cellular metabolism precedes tumorigenesis, therefore it has become an area of active research in academia. Cellular metabolism is of great significance in cancer diagnostic and prognostic studies. In this study, we integrated The Cancer Genome Atlas dataset with multiple Gene Expression Omnibus ovarian cancer datasets, identified 17 metabolic pathways with prognostic values using the random forest algorithm, constructed a metabolic risk scoring model based on metabolic pathway enrichment scores, and classified patients with ovarian cancer into two subtypes. Then, we systematically investigated the differences between different subtypes in terms of prognosis, differential gene expression, immune signature enrichment, Hallmark signature enrichment, and somatic mutations. As well, we successfully predicted differences in sensitivity to immunotherapy and chemotherapy drugs in patients with different metabolic risk subtypes. Moreover, we identified 5 drug targets associated with high metabolic risk and low metabolic risk ovarian cancer phenotypes through the weighted correlation network analysis and investigated their roles in the genesis of ovarian cancer. Finally, we developed an XGBoost classifier for predicting metabolic risk types in patients with ovarian cancer, producing a good predictive effect. In light of the above study, the research findings will provide valuable information for prognostic prediction and personalized medical treatment of patients with ovarian cancer.
Collapse
Affiliation(s)
- Haoxin Zhang
- Department of Gastrointestinal Oncology, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Meng Chi
- Department of Anesthesiology, Harbin Medical University Cancer Hospital, Harbin, 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yao Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd, Hohhot, 010010, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
20
|
Xiong Y, Wang S, Wei H, Li H, Lv Y, Chi M, Su D, Lu Q, Yu Y, Zuo Y, Yang L. Deep learning-based transcription factor activity for stratification of breast cancer patients. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2022; 1865:194838. [PMID: 35690313 DOI: 10.1016/j.bbagrm.2022.194838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 05/19/2022] [Accepted: 05/31/2022] [Indexed: 06/15/2023]
Abstract
Transcription factors directly bind to DNA and regulate the expression of the gene, causing epigenetic modification of the DNA. They often mediate epigenetic parameters of transcriptional and posttranscriptional mechanisms, and their expression activities can be used to characterize genomic aberrations in cancer cell. In this study, the activity profile of transcription factors inferred by VIPER algorithm. The autoencoder model was applied for compressing the transcription factor activity profile for obtaining more useful transformed features for stratifying patients into two different breast cancer subtypes. The deep learning-based subtypes exhibited superior prognostic value and yielded better risk-stratification than the transcription factor activity-based method. Importantly, according to transformed features, a deep neural network was constructed to predict the subtypes, and achieved the accuracy of 94.98% and area under the ROC curve of 0.9663, respectively. The proposed subtypes were found to be significantly associated with immune infiltration, tumor immunogenicity and so on. Furthermore, the ceRNA network was constructed for the breast cancer subtypes. Besides, 11 master regulators were found to be associated with patients in cluster 1. Given the robustness performance of our deep learning model over multiple breast cancer cohorts, we expected this model may be useful in the area of prognosis prediction and lead some possibility for personalized medicine in breast cancer patients.
Collapse
Affiliation(s)
- Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hanshuang Li
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Meng Chi
- Department of Anesthesiology, Harbin Medical University Cancer Hospital, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yao Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mingolia Wesure Date Technology Co., Ltd., Hohhot 010010, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
21
|
Zhao Z, Yang W, Zhai Y, Liang Y, Zhao Y. Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm. Front Genet 2022; 12:821996. [PMID: 35154264 PMCID: PMC8837382 DOI: 10.3389/fgene.2021.821996] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 12/07/2021] [Indexed: 12/13/2022] Open
Abstract
The exploration of DNA-binding proteins (DBPs) is an important aspect of studying biological life activities. Research on life activities requires the support of scientific research results on DBPs. The decline in many life activities is closely related to DBPs. Generally, the detection method for identifying DBPs is achieved through biochemical experiments. This method is inefficient and requires considerable manpower, material resources and time. At present, several computational approaches have been developed to detect DBPs, among which machine learning (ML) algorithm-based computational techniques have shown excellent performance. In our experiments, our method uses fewer features and simpler recognition methods than other methods and simultaneously obtains satisfactory results. First, we use six feature extraction methods to extract sequence features from the same group of DBPs. Then, this feature information is spliced together, and the data are standardized. Finally, the extreme gradient boosting (XGBoost) model is used to construct an effective predictive model. Compared with other excellent methods, our proposed method has achieved better results. The accuracy achieved by our method is 78.26% for PDB2272 and 85.48% for PDB186. The accuracy of the experimental results achieved by our strategy is similar to that of previous detection methods.
Collapse
Affiliation(s)
- Ziye Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
| | - Yixiao Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yingjian Liang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| |
Collapse
|
22
|
Chi M, Xi Q, Su D, Li H, Wei N, Shi X, Wang S, Zuo Y, Yang L. Characterized the diversity of ABCB1 subtypes in immunogenomic landscape for predicting the drug response in breast cancer. Methods 2022; 204:223-233. [PMID: 34999214 DOI: 10.1016/j.ymeth.2022.01.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 12/30/2021] [Accepted: 01/03/2022] [Indexed: 12/24/2022] Open
Abstract
ABCB1 is an important gene that closely related to analgesic tolerance to opioids, and plays an important role in their postoperative treatment. Recent studies have demonstrated that ABCB1 genotype is significantly associated with the chemico-resistance and chemical sensitivity in breast cancer patients. So, it is become very important to investigate the important role of ABCB1 for predicting drug response in breast cancer patients. In this study, by conducting the Cox proportional hazards regression analysis in breast cancer patients, significant differences were found in prognosis between the ABCB1 high- and low-expression subtypes. Meanwhile, by using immune infiltration profiles as well as transcriptomics datasets, the ABCB1 high subtype was found to be significantly enriched in many immune-related KEGG pathways and biological processes, and was characterized by the high infiltration levels of immune cell types. Furthermore, bioinformatics inference revealed that the ABCB1 subtypes were associated with the therapeutic effect of immunotherapy, which would be important for patient prognosis. In conclusion, these findings may provide useful help for recognizing the diversity between ABCB1 subtypes in tumor immune microenvironment, and may unravel prognosis outcomes and immunotherapy utility for ABCB1 in breast cancer.
Collapse
Affiliation(s)
- Meng Chi
- Department of Anesthesiology, Harbin Medical University Cancer Hospital, Harbin 150081, China
| | - Qilemuge Xi
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hanshuang Li
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Na Wei
- Department of Anesthesiology, Harbin Medical University Cancer Hospital, Harbin 150081, China
| | - Xiaoding Shi
- Department of Anesthesiology, Harbin Medical University Cancer Hospital, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mingolia Wesure Date Technology Co., Ltd., Hohhot 010010, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
23
|
Lv H, Shi L, Berkenpas JW, Dao FY, Zulfiqar H, Ding H, Zhang Y, Yang L, Cao R. Application of artificial intelligence and machine learning for COVID-19 drug discovery and vaccine design. Brief Bioinform 2021; 22:bbab320. [PMID: 34410360 PMCID: PMC8511807 DOI: 10.1093/bib/bbab320] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/15/2021] [Accepted: 07/22/2021] [Indexed: 12/13/2022] Open
Abstract
The global pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2, has led to a dramatic loss of human life worldwide. Despite many efforts, the development of effective drugs and vaccines for this novel virus will take considerable time. Artificial intelligence (AI) and machine learning (ML) offer promising solutions that could accelerate the discovery and optimization of new antivirals. Motivated by this, in this paper, we present an extensive survey on the application of AI and ML for combating COVID-19 based on the rapidly emerging literature. Particularly, we point out the challenges and future directions associated with state-of-the-art solutions to effectively control the COVID-19 pandemic. We hope that this review provides researchers with new insights into the ways AI and ML fight and have fought the COVID-19 outbreak.
Collapse
Affiliation(s)
- Hao Lv
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai 200433, China
| | | | - Fu-Ying Dao
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hasan Zulfiqar
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Liming Yang
- Department of Pathophysiology, Harbin Medical University-Daqing, Daqing, 163319, China
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma 98447, USA
| |
Collapse
|
24
|
Zulfiqar H, Yuan SS, Huang QL, Sun ZJ, Dao FY, Yu XL, Lin H. Identification of cyclin protein using gradient boost decision tree algorithm. Comput Struct Biotechnol J 2021; 19:4123-4131. [PMID: 34527186 PMCID: PMC8346528 DOI: 10.1016/j.csbj.2021.07.013] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/15/2021] [Accepted: 07/15/2021] [Indexed: 12/12/2022] Open
Abstract
Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou 570228, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
25
|
Xu L, Ru X, Song R. Application of Machine Learning for Drug-Target Interaction Prediction. Front Genet 2021; 12:680117. [PMID: 34234813 PMCID: PMC8255962 DOI: 10.3389/fgene.2021.680117] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 05/28/2021] [Indexed: 11/13/2022] Open
Abstract
Exploring drug–target interactions by biomedical experiments requires a lot of human, financial, and material resources. To save time and cost to meet the needs of the present generation, machine learning methods have been introduced into the prediction of drug–target interactions. The large amount of available drug and target data in existing databases, the evolving and innovative computer technologies, and the inherent characteristics of various types of machine learning have made machine learning techniques the mainstream method for drug–target interaction prediction research. In this review, details of the specific applications of machine learning in drug–target interaction prediction are summarized, the characteristics of each algorithm are analyzed, and the issues that need to be further addressed and explored for future research are discussed. The aim of this review is to provide a sound basis for the construction of high-performance models.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Xiaoqing Ru
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Rong Song
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
26
|
Ao C, Zou Q, Yu L. RFhy-m2G: Identification of RNA N2-methylguanosine modification sites based on random forest and hybrid features. Methods 2021; 203:32-39. [PMID: 34033879 DOI: 10.1016/j.ymeth.2021.05.016] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 05/04/2021] [Accepted: 05/20/2021] [Indexed: 12/31/2022] Open
Abstract
N2-methylguanosine is a post-transcriptional modification of RNA that is found in eukaryotes and archaea. The biological function of m2G modification discovered so far is to control and stabilize the three-dimensional structure of tRNA and the dynamic barrier of reverse transcription. To discover additional biological functions of m2G, it is necessary to develop time-saving and labor-saving calculation tools to identify m2G. In this paper, based on hybrid features and a random forest, a novel predictor, RFhy-m2G, was developed to identify the m2G modification sites for three species. The hybrid feature used by the predictor is used to fuse the three features of ENAC, PseDNC, and NPPS. These three features include primary sequence derivation properties, physicochemical properties, and position-specific properties. Since there are redundant features in hybrid features, MRMD2.0 is used for optimal feature selection. Through feature analysis, it is found that the optimal hybrid features obtained still contain three kinds of properties, and the hybrid features can more accurately identify m2G modification sites and improve prediction performance. Based on five-fold cross-validation and independent testing to evaluate the prediction model, the accuracies obtained were 0.9982 and 0.9417, respectively. The robustness of the predictor is demonstrated by comparisons with other predictors.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| |
Collapse
|