1
|
Luo Y, Wang Y, Liu L, Huang F, Lu S, Yan Y. Identifying pathological myopia associated genes with GenePlexus in protein-protein interaction network. Front Genet 2025; 16:1533567. [PMID: 40110040 PMCID: PMC11919901 DOI: 10.3389/fgene.2025.1533567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Accepted: 02/18/2025] [Indexed: 03/22/2025] Open
Abstract
Introduction Pathological myopia, a severe form of myopia, is characterized by an extreme elongation of the eyeball, leading to various vision-threatening complications. It is broadly classified into two primary types: high myopia, which primarily involves an excessive axial length of the eye with potential for reversible vision loss, and degenerative myopia, associated with progressive and irreversible retinal damage. Methods Leveraging data from DisGeNET, reporting 184 genes linked to high myopia and 39 genes associated with degenerative myopia, we employed the GenePlexus methodology in conjunction with screening tests to further explore the genetic landscape of pathological myopia. Results and discussion Our comprehensive analysis resulted in the discovery of 21 new genes associated with degenerative myopia and 133 genes linked to high myopia with significant confidence. Among these findings, genes such as ADCY4, a regulator of the cAMP pathway, were functionally linked to high myopia, while THBS1, involved in collagen degradation, was closely associated with the pathophysiology of degenerative myopia. These previously unreported genes play crucial roles in the underlying mechanisms of pathological myopia, thereby emphasizing the complexity and multifactorial nature of this condition. The importance of our study resides in the uncovering of new genetic associations with pathological myopia, the provision of potential biomarkers for early screening, and the identification of therapeutic targets.
Collapse
Affiliation(s)
- Yuanyuan Luo
- Department of Ophthalmology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Yihan Wang
- Department of Ophthalmology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Lin Liu
- Department of Ophthalmology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Feiming Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Shiheng Lu
- Department of Ophthalmology, Shanghai Eye Diseases Prevention and Treatment Center/Shanghai Eye Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Yan Yan
- Department of Ophthalmology, Eye and ENT Hospital, Fudan University, Shanghai, China
| |
Collapse
|
2
|
Yuan F, Zhang YH, Huang F, Cao X, Chen L, Li J, Shen W, Feng K, Bao Y, Huang T, Cai YD. Prediction of Lung Adenocarcinoma Driver Genes Through Protein-Protein Interaction Networks Utilizing GenePlexus. Proteomics 2025; 25:e202400296. [PMID: 39696915 DOI: 10.1002/pmic.202400296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 11/22/2024] [Accepted: 12/05/2024] [Indexed: 12/20/2024]
Abstract
Lung adenocarcinoma, a subtype of lung cancer, is produced by uncontrolled proliferation of somatic cells affected by some tumorigenic factors. The origin of this disease can be attributed to the concept of "cancer driver," which links the occurrence of tumor with specific changes in some key genes. These key genes can be identified at various molecular levels. Our innovative method uses a groundbreaking computing technology called GenePlexus to mine new genes related to lung adenocarcinoma. Initially, a vast network was synthesized from protein-protein interactions. Utilizing GenePlexus, we traversed paths interlinking aberrant genes across different layers and pinpointed emerging candidate genes situated on these trajectories. Finally, the candidate genes that were obtained underwent a series of filtering processes, including a permutation test, interaction test, and enrichment test. Compared with the shortest path method, GenePlexus has identified previously neglected genes involved in lung adenocarcinoma. For example, genes such as EGR2, EPHA3, FGFR4, HOXB1, and HEY1 play key roles at multiple molecular levels, including methylation, microRNA, mRNA and mutation, which affect tumorigenesis and lung cancer progression. These genes regulate various processes, from gene expression and cell proliferation to drug resistance to therapeutic drugs and the progress of lung adenocarcinoma.
Collapse
Affiliation(s)
- Fei Yuan
- Department of Science & Technology, Binzhou Medical University Hospital, Binzhou, Shandong, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Xiaoyu Cao
- Department of Neurology, Binzhou Medical University Hospital, Binzhou, Shandong, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - JiaBo Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - WenFeng Shen
- School of Computer and Information Engineering, Shanghai Polytechnic University, Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - YuSheng Bao
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
3
|
Nourian R, Motamedi SA, Pourfard M. BHBA-GRNet: Cancer detection through improved gene expression profiling using Binary Honey Badger Algorithm and Gene Residual-based Network. Comput Biol Med 2025; 184:109348. [PMID: 39615230 DOI: 10.1016/j.compbiomed.2024.109348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 10/29/2024] [Accepted: 10/30/2024] [Indexed: 12/22/2024]
Abstract
Cancer, a pervasive and devastating disease, remains a leading global cause of mortality, emphasizing the growing urgency for effective detection methods. Gene Expression Microarray (GEM) data has emerged as a crucial tool in this context, offering insights into early cancer detection and treatment. While deep learning methods offer promise in detecting various cancers through GEM analysis, they suffer from high dimensionality inherent in gene sequences, preventing optimal detection performance across diverse cancer types. Additionally, existing methods often resort to synthetic features and data augmentation to enhance performance. To address these challenges and enhance accuracy, a novel Binary Honey Badger Algorithm (BHBA) integrated with the Gene Residual Network (GRNet) method has been proposed. Our approach capitalizes on BHBA's feature reduction mechanism, eliminating the need for additional preprocessing steps. Comprehensive evaluations on three well-established datasets representing lung and blood-type cancers demonstrate that our method reduces GEM data size by approximately 40 % and achieves a superior accuracy improvement of around 1 % in lung cancer types compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Reza Nourian
- Electrical Engineering Department, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, 15875-4413, Tehran, 159163-4311, Iran.
| | - Seyed Ahmad Motamedi
- Electrical Engineering Department, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, 15875-4413, Tehran, 159163-4311, Iran.
| | - Mohammadreza Pourfard
- Electrical Engineering Department, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, 15875-4413, Tehran, 159163-4311, Iran.
| |
Collapse
|
4
|
Qin X, Zhang L, Liu M, Liu G. PRFold-TNN: Protein Fold Recognition With an Ensemble Feature Selection Method Using PageRank Algorithm Based on Transformer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1740-1751. [PMID: 38875077 DOI: 10.1109/tcbb.2024.3414497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Understanding the tertiary structures of proteins is of great benefit to function in many aspects of human life. Protein fold recognition is a vital and salient means to know protein structure. Until now, researchers have successively proposed a variety of methods to realize protein fold recognition, but the novel and effective computational method is still needed to handle this problem with the continuous updating of protein structure databases. In this study, we develop a new protein structure dataset named AT and propose the PRFold-TNN model for protein fold recognition. First, different types of feature extraction methods including AAC, HMM, HMM-Bigram and ACC are selected to extract corresponding features for protein sequences. Then an ensemble feature selection method based on PageRank algorithm integrating various tree-based algorithms is used to screen the fusion features. Ultimately, the classifier based on the Transformer model achieves the final prediction. Experiments show that the prediction accuracy is 86.27% on the AT dataset and 88.91% on the independent test set, indicating that the model can demonstrate superior performance and generalization ability in the problem of protein fold recognition. Furthermore, we also carry out research on the DD, EDD and TG benchmark datasets, and make them achieve prediction accuracy of 88.41%, 97.91% and 95.16%, which are at least 3.0%, 0.8% and 2.5% higher than those of the state-of-the-art methods. It can be concluded that the PRFold-TNN model is more prominent.
Collapse
|
5
|
Jiang B, Bao L, He S, Chen X, Jin Z, Ye Y. Deep learning applications in breast cancer histopathological imaging: diagnosis, treatment, and prognosis. Breast Cancer Res 2024; 26:137. [PMID: 39304962 DOI: 10.1186/s13058-024-01895-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 09/16/2024] [Indexed: 09/22/2024] Open
Abstract
Breast cancer is the most common malignant tumor among women worldwide and remains one of the leading causes of death among women. Its incidence and mortality rates are continuously rising. In recent years, with the rapid advancement of deep learning (DL) technology, DL has demonstrated significant potential in breast cancer diagnosis, prognosis evaluation, and treatment response prediction. This paper reviews relevant research progress and applies DL models to image enhancement, segmentation, and classification based on large-scale datasets from TCGA and multiple centers. We employed foundational models such as ResNet50, Transformer, and Hover-net to investigate the performance of DL models in breast cancer diagnosis, treatment, and prognosis prediction. The results indicate that DL techniques have significantly improved diagnostic accuracy and efficiency, particularly in predicting breast cancer metastasis and clinical prognosis. Furthermore, the study emphasizes the crucial role of robust databases in developing highly generalizable models. Future research will focus on addressing challenges related to data management, model interpretability, and regulatory compliance, ultimately aiming to provide more precise clinical treatment and prognostic evaluation programs for breast cancer patients.
Collapse
Affiliation(s)
- Bitao Jiang
- Department of Hematology and Oncology, Beilun District People's Hospital, Ningbo, 315800, China.
- Department of Hematology and Oncology, Beilun Branch of the First Affiliated Hospital of Zhejiang University, Ningbo, 315800, China.
| | - Lingling Bao
- Department of Hematology and Oncology, Beilun District People's Hospital, Ningbo, 315800, China
- Department of Hematology and Oncology, Beilun Branch of the First Affiliated Hospital of Zhejiang University, Ningbo, 315800, China
| | - Songqin He
- Department of Oncology, The 906th Hospital of the Joint Logistics Force of the Chinese People's Liberation Army, Ningbo, 315100, China
| | - Xiao Chen
- Department of Oncology, The 906th Hospital of the Joint Logistics Force of the Chinese People's Liberation Army, Ningbo, 315100, China
| | - Zhihui Jin
- Department of Hematology and Oncology, Beilun District People's Hospital, Ningbo, 315800, China
- Department of Hematology and Oncology, Beilun Branch of the First Affiliated Hospital of Zhejiang University, Ningbo, 315800, China
| | - Yingquan Ye
- Department of Oncology, The 906th Hospital of the Joint Logistics Force of the Chinese People's Liberation Army, Ningbo, 315100, China.
| |
Collapse
|
6
|
Li L, Huang F, Zhang YH, Cai YD. Identifying allergic-rhinitis-associated genes with random-walk-based method in PPI network. Comput Biol Med 2024; 175:108495. [PMID: 38697003 DOI: 10.1016/j.compbiomed.2024.108495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/21/2024] [Accepted: 04/21/2024] [Indexed: 05/04/2024]
Abstract
Allergic rhinitis is a common allergic disease with a complex pathogenesis and many unresolved issues. Studies have shown that the incidence of allergic rhinitis is closely related to genetic factors, and research on the related genes could help further understand its pathogenesis and develop new treatment methods. In this study, 446 allergic rhinitis-related genes were obtained on the basis of the DisGeNET database. The protein-protein interaction network was searched using the random-walk-with-restart algorithm with these 446 genes as seed nodes to assess the linkages between other genes and allergic rhinitis. Then, this result was further examined by three screening tests, including permutation, interaction, and enrichment tests, which aimed to pick up genes that have strong and special associations with allergic rhinitis. 52 novel genes were finally obtained. The functional enrichment test confirmed their relationships to the biological processes and pathways related to allergic rhinitis. Furthermore, some genes were extensively analyzed to uncover their special or latent associations to allergic rhinitis, including IRAK2 and MAPK, which are involved in the pathogenesis of allergic rhinitis and the inhibition of allergic inflammation via the p38-MAPK pathway, respectively. The new found genes may help the following investigations for understanding the underlying molecular mechanisms of allergic rhinitis and developing effective treatments.
Collapse
Affiliation(s)
- Lin Li
- Department of Otolaryngology and Head&neck, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi Medical Center, Nanjing Medical University, Wuxi, 214023, China; Department of Otolaryngology and Head&neck, China-Japan Union Hospital, Jilin University, Changchun, 130033, China.
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
7
|
Jiang X, Hu Z, Wang S, Zhang Y. Deep Learning for Medical Image-Based Cancer Diagnosis. Cancers (Basel) 2023; 15:3608. [PMID: 37509272 PMCID: PMC10377683 DOI: 10.3390/cancers15143608] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/10/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
(1) Background: The application of deep learning technology to realize cancer diagnosis based on medical images is one of the research hotspots in the field of artificial intelligence and computer vision. Due to the rapid development of deep learning methods, cancer diagnosis requires very high accuracy and timeliness as well as the inherent particularity and complexity of medical imaging. A comprehensive review of relevant studies is necessary to help readers better understand the current research status and ideas. (2) Methods: Five radiological images, including X-ray, ultrasound (US), computed tomography (CT), magnetic resonance imaging (MRI), positron emission computed tomography (PET), and histopathological images, are reviewed in this paper. The basic architecture of deep learning and classical pretrained models are comprehensively reviewed. In particular, advanced neural networks emerging in recent years, including transfer learning, ensemble learning (EL), graph neural network, and vision transformer (ViT), are introduced. Five overfitting prevention methods are summarized: batch normalization, dropout, weight initialization, and data augmentation. The application of deep learning technology in medical image-based cancer analysis is sorted out. (3) Results: Deep learning has achieved great success in medical image-based cancer diagnosis, showing good results in image classification, image reconstruction, image detection, image segmentation, image registration, and image synthesis. However, the lack of high-quality labeled datasets limits the role of deep learning and faces challenges in rare cancer diagnosis, multi-modal image fusion, model explainability, and generalization. (4) Conclusions: There is a need for more public standard databases for cancer. The pre-training model based on deep neural networks has the potential to be improved, and special attention should be paid to the research of multimodal data fusion and supervised paradigm. Technologies such as ViT, ensemble learning, and few-shot learning will bring surprises to cancer diagnosis based on medical images.
Collapse
Grants
- RM32G0178B8 BBSRC
- MC_PC_17171 MRC, UK
- RP202G0230 Royal Society, UK
- AA/18/3/34220 BHF, UK
- RM60G0680 Hope Foundation for Cancer Research, UK
- P202PF11 GCRF, UK
- RP202G0289 Sino-UK Industrial Fund, UK
- P202ED10, P202RE969 LIAS, UK
- P202RE237 Data Science Enhancement Fund, UK
- 24NN201 Fight for Sight, UK
- OP202006 Sino-UK Education Fund, UK
- RM32G0178B8 BBSRC, UK
- 2023SJZD125 Major project of philosophy and social science research in colleges and universities in Jiangsu Province, China
Collapse
Affiliation(s)
- Xiaoyan Jiang
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, China; (X.J.); (Z.H.)
| | - Zuojin Hu
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, China; (X.J.); (Z.H.)
| | - Shuihua Wang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK;
| | - Yudong Zhang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK;
| |
Collapse
|
8
|
Zhang SL, Cheng LS, Zhang ZY, Sun HT, Li JJ. Untangling determinants of gut microbiota and tumor immunologic status through a multi-omics approach in colorectal cancer. Pharmacol Res 2023; 188:106633. [PMID: 36574857 DOI: 10.1016/j.phrs.2022.106633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 12/23/2022] [Indexed: 12/25/2022]
Abstract
The changes in gut microbiota have been implicated in colorectal cancer (CRC). The interplays between the host and gut microbiota remain largely unclear, and few studies have investigated these interplays using integrative multi-omics data. In this study, large-scale multi-comic datasets, including microbiome, metabolome, bulk transcriptomics and single cell RNA sequencing of CRC patients, were analyzed individually and integrated through advanced bioinformatics methods. We further examined the clinical relevance of these findings in the mice recolonized with microbiota from human. We found that CRC patients had distinct microbiota compositions compared to healthy controls. A machine-learning model was developed with 28 biomarkers for detection of CRC, which had high accuracy and clinical applicability. We identified multiple significant correlations between genera and well-characterized genes, suggesting the potential role of gut microbiota in tumor immunity. Further analysis showed that specific metabolites worked as profound communicators between these genera and tumor immunity. Integrating microbiota and metabolome perspectives, we cataloged gut taxonomic and metabolomic features that represented the key multi-omics signature of CRC. Furthermore, gut microbiota transplanted from CRC patients compromised the response of CRC to immunotherapy. These phenotypes were strongly associated with the alterations in gut microbiota, immune cell infiltration as well as multiple metabolic pathways. The comprehensive interplays across multi-comic data of CRC might explain how gut microbiota influenced tumor immunity. Hence, we proposed that modifying the CRC microbiota using healthy donors might serve as a promising strategy to improve response to immunotherapy.
Collapse
Affiliation(s)
- Shi-Long Zhang
- Department of Medical Oncology, Zhongshan Hospital, Fudan University, Shanghai 200032, PR China.
| | - Li-Sha Cheng
- Department of Medical Oncology, Zhongshan Hospital (Xiamen), Fudan University, Xiamen 361015, PR China; Xiamen Clinical Research Center for Cancer Therapy, Xiamen 361015, PR China
| | - Zheng-Yan Zhang
- State Key Laboratory of Oncogenes and Related Genes, Stem Cell Research Center, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 201100, PR China
| | - Hai-Tao Sun
- Department of Radiology, Zhongshan Hospital, Fudan University, Shanghai 200032, PR China; Shanghai Institute of Medical Imaging, Zhongshan Hospital, Fudan University, Shanghai 200032, PR China
| | - Jia-Jia Li
- State Key Laboratory of Soil Erosion and Dryland Farming on the Loess Plateau, Northwest A&F University, Yangling, Shaanxi 712100, PR China
| |
Collapse
|
9
|
Zhou M, Bian K, Hu F, Lai W. A New Strategy for Identification of Coal Miners With Abnormal Physical Signs Based on EN-mRMR. Front Bioeng Biotechnol 2022; 10:935481. [PMID: 35898648 PMCID: PMC9310099 DOI: 10.3389/fbioe.2022.935481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 06/06/2022] [Indexed: 11/21/2022] Open
Abstract
Coal miners’ occupational health is a key part of production safety in the coal mine. Accurate identification of abnormal physical signs is the key to preventing occupational diseases and improving miners’ working environment. There are many problems when evaluating the physical health status of miners manually, such as too many sign parameters, low diagnostic efficiency, missed diagnosis, and misdiagnosis. To solve these problems, the machine learning algorithm is used to identify miners with abnormal signs. We proposed a feature screening strategy of integrating elastic net (EN) and Max-Relevance and Min-Redundancy (mRMR) to establish the model to identify abnormal signs and obtain the key physical signs. First, the raw 21 physical signs were expanded to 25 by feature construction technology. Then, the EN was used to delete redundant physical signs. Finally, the mRMR combined with the support vector classification of intelligent optimization algorithm by Gravitational Search Algorithm (GSA-SVC) is applied to further simplify the rest of 12 relatively important physical signs and obtain the optimal model with data of six physical signs. At this time, the accuracy, precision, recall, specificity, G-mean, and MCC of the test set were 97.50%, 97.78%, 97.78%, 97.14%, 0.98, and 0.95. The experimental results show that the proposed strategy improves the model performance with the smallest features and realizes the accurate identification of abnormal coal miners. The conclusion could provide reference evidence for intelligent classification and assessment of occupational health in the early stage.
Collapse
Affiliation(s)
- Mengran Zhou
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, China
| | - Kai Bian
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, China
| | - Feng Hu
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, China
| | - Wenhao Lai
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, China
| |
Collapse
|
10
|
Pan-cancer identification of the relationship of metabolism-related differentially expressed transcription regulation with non-differentially expressed target genes via a gated recurrent unit network. Comput Biol Med 2022; 148:105883. [PMID: 35878490 DOI: 10.1016/j.compbiomed.2022.105883] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 07/10/2022] [Accepted: 07/16/2022] [Indexed: 11/20/2022]
Abstract
The transcriptome describes the expression of all genes in a sample. Most studies have investigated the differential patterns or discrimination powers of transcript expression levels. In this study, we hypothesized that the quantitative correlations between the expression levels of transcription factors (TFs) and their regulated target genes (mRNAs) serve as a novel view of healthy status, and a disease sample exhibits a differential landscape (mqTrans) of transcription regulations compared with healthy status. We formulated quantitative transcription regulation relationships of metabolism-related genes as a multi-input multi-output regression model via a gated recurrent unit (GRU) network. The GRU model was trained using healthy blood transcriptomes and the expression levels of mRNAs were predicted by those of the TFs. The mqTrans feature of a gene was defined as the difference between its predicted and actual expression levels. A pan-cancer investigation of the differentially expressed mqTrans features was conducted between the early- and late-stage cancers in 26 cancer types of The Cancer Genome Atlas database. This study focused on the differentially expressed mqTrans features, that did not show differential expression in the actual expression levels. These genes could not be detected by conventional differential analysis. Such dark biomarkers are worthy of further wet-lab investigation. The experimental data also showed that the proposed mqTrans investigation improved the classification between early- and late-stage samples for some cancer types. Thus, the mqTrans features serve as a complementary view to transcriptomes, an OMIC type with mature high-throughput production technologies, and abundant public resources.
Collapse
|
11
|
Liang F, Fu X, Ding S, Li L. Use of a Network-Based Method to Identify Latent Genes Associated with Hearing Loss in Children. Front Cell Dev Biol 2021; 9:783500. [PMID: 34912812 PMCID: PMC8667072 DOI: 10.3389/fcell.2021.783500] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Hearing loss is a total or partial inability to hear. Approximately 5% of people worldwide experience this condition. Hearing capacity is closely related to language, social, and basic emotional development; hearing loss is particularly serious in children. The pathogenesis of childhood hearing loss remains poorly understood. Here, we sought to identify new genes potentially associated with two types of hearing loss in children: congenital deafness and otitis media. We used a network-based method incorporating a random walk with restart algorithm, as well as a protein-protein interaction framework, to identify genes potentially associated with either pathogenesis. A following screening procedure was performed and 18 and 87 genes were identified, which potentially involved in the development of congenital deafness or otitis media, respectively. These findings provide novel biomarkers for clinical screening of childhood deafness; they contribute to a genetic understanding of the pathogenetic mechanisms involved.
Collapse
Affiliation(s)
- Feng Liang
- Anaesthesia Department, China-Japan Union Hospital, JiLin University, Changchun, China
| | - Xin Fu
- Anaesthesia Department, China-Japan Union Hospital, JiLin University, Changchun, China
| | - ShiJian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lin Li
- Department of Otorhinolaryngology Head and Neck Surgery, China-Japan Union Hospital of Jilin University, Changchun, China
| |
Collapse
|
12
|
Wu Y, Sa Y, Guo Y, Li Q, Zhang N. Identification of WHO II/III gliomas by 16 prognostic-related gene signatures using machine learning methods. Curr Med Chem 2021; 29:1622-1639. [PMID: 34455959 DOI: 10.2174/0929867328666210827103049] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 05/27/2021] [Accepted: 05/28/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND It is found that the prognosis of gliomas of the same grade has large differences among World Health Organization(WHO) grade II and III in clinical observation. Therefore, a better understanding of the genetics and molecular mechanisms underlying WHO grade II and III gliomas is required, with the aim of developing a classification scheme at the molecular level rather than the conventional pathological morphology level. METHOD We performed survival analysis combined with machine learning methods of Least Absolute Shrinkage and Selection Operator using expression datasets downloaded from the Chinese Glioma Genome Atlas as well as The Cancer Genome Atlas. Risk scores were calculated by the product of expression level of overall survival-related genes and their multivariate Cox proportional hazards regression coefficients. WHO grade II and III gliomas were categorized into the low-risk subgroup, medium-risk subgroup, and high-risk subgroup. We used the 16 prognostic-related genes as input features to build a classification model based on prognosis using a fully connected neural network. Gene function annotations were also performed. RESULTS The 16 genes (AKNAD1, C7orf13, CDK20, CHRFAM7A, CHRNA1, EFNB1, GAS1, HIST2H2BE, KCNK3, KLHL4, LRRK2, NXPH3, PIGZ, SAMD5, ERINC2, and SIX6) related to the glioma prognosis were screened. The 16 selected genes were associated with the development of gliomas and carcinogenesis. The accuracy of an external validation data set of the fully connected neural network model from the two cohorts reached 95.5%. Our method has good potential capability in classifying WHO grade II and III gliomas into low-risk, medium-risk, and high-risk subgroups. The subgroups showed significant (P<0.01) differences in overall survival. CONCLUSION This resulted in the identification of 16 genes that were related to the prognosis of gliomas. Here we developed a computational method to discriminate WHO grade II and III gliomas into three subgroups with distinct prognoses. The gene expression-based method provides a reliable alternative to determine the prognosis of gliomas.
Collapse
Affiliation(s)
- YaMeng Wu
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| | - Yu Sa
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| | - Yu Guo
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| | - QiFeng Li
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| | - Ning Zhang
- Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement, Tianjin University, Tianjin. China
| |
Collapse
|
13
|
Gu X, Yang B, Gao S, Yan LF, Xu D, Wang W. Application of bi-modal signal in the classification and recognition of drug addiction degree based on machine learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:6926-6940. [PMID: 34517564 DOI: 10.3934/mbe.2021344] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Most studies on drug addiction degree are made based on statistical scales, addicts' account, and subjective judgement of rehabilitation doctors. No objective, quantified evaluation has been made. This paper uses devises the synchronous bimodal signal collection and experimentation paradigm with electroencephalogram (EEG) and forehead high-density near-infrared spectroscopy (NIRS) device. The drug addicts are classified into mild, moderate and severe groups with reference to the suggestions of researchers and medical experts. Data of 45 drug addicts (mild: 15; moderate: 15; and severe: 15) is collected, and then used to design an addiction degree testing algorithm based on decision fusion. The algorithm is used to classify mild, moderate and severe addiction. This paper pioneers to use two types of Convolutional Neural Network (CNN) to abstract the EEG and NIR data of drug addicts, and introduces batch normalization to CNN, thus accelerating training process, reducing parameter sensitivity, and enhancing system robustness. The characteristics output by two CNNs are transformed into dimensions. Two new characteristics are assigned with a weight of 50% each. The data is used for decision fusion. In the networks, 27 subjects are used as training sets, 9 as validation sets, and 9 as testing sets. The 3-class accuracy remains to be 63.15%, preliminarily justifying this method as an effective approach to measure drug addiction degree. And the method is ready to use, objective, and offers results in real time.
Collapse
Affiliation(s)
- Xuelin Gu
- School of Mechanical and Electrical Engineering and Automation, Shanghai University, Shanghai 200444, China
| | - Banghua Yang
- School of Mechanical and Electrical Engineering and Automation, Shanghai University, Shanghai 200444, China
| | - Shouwei Gao
- School of Mechanical and Electrical Engineering and Automation, Shanghai University, Shanghai 200444, China
| | - Lin Feng Yan
- Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710038, China
| | - Ding Xu
- Shanghai Drug Rehabilitation Administration Bureau, Shanghai 200080, China
| | - Wen Wang
- Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710038, China
| |
Collapse
|
14
|
Thakur T, Batra I, Luthra M, Vimal S, Dhiman G, Malik A, Shabaz M. Gene Expression-Assisted Cancer Prediction Techniques. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:4242646. [PMID: 34545300 PMCID: PMC8449724 DOI: 10.1155/2021/4242646] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 08/13/2021] [Indexed: 02/07/2023]
Abstract
Cancer is one of the deadliest diseases and with its growing number, its detection and treatment become essential. Researchers have developed various methods based on gene expression. Gene expression is a process that is used to convert deoxyribose nucleic acid (DNA) to ribose nucleic acid (RNA) and then RNA to protein. This protein serves so many purposes, such as creating cells, drugs for cancer, and even hybrid species. As genes carry genetic information from one generation to another, some gene deformity is also transferred to the next generation. Therefore, the deformity needs to be detected. There are many techniques available in the literature to predict cancerous and noncancerous genes from gene expression data. This is an important development from the point of diagnostics and giving a prognosis for the condition. This paper will present a review of some of those techniques from the literature; details about the various datasets on which these techniques are implemented and the advantages and disadvantages.
Collapse
Affiliation(s)
- Tanima Thakur
- School of Computer Science and Engineering, Lovely Professional University, Jalandhar, India
| | - Isha Batra
- School of Computer Science and Engineering, Lovely Professional University, Jalandhar, India
| | | | - Shanmuganathan Vimal
- Department of CSE, Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India
| | - Gaurav Dhiman
- Department of Computer Science, Government Bikram College of Commerce, Patiala, India
| | - Arun Malik
- School of Computer Science and Engineering, Lovely Professional University, Jalandhar, India
| | - Mohammad Shabaz
- Arba Minch University, Arba Minch, Ethiopia
- Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
| |
Collapse
|
15
|
Gupta M, Wu H, Arora S, Gupta A, Chaudhary G, Hua Q. Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:8689873. [PMID: 34367540 PMCID: PMC8337154 DOI: 10.1155/2021/8689873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 06/26/2021] [Accepted: 07/13/2021] [Indexed: 12/03/2022]
Abstract
A cancer tumour consists of thousands of genetic mutations. Even after advancement in technology, the task of distinguishing genetic mutations, which act as driver for the growth of tumour with passengers (Neutral Genetic Mutations), is still being done manually. This is a time-consuming process where pathologists interpret every genetic mutation from the clinical evidence manually. These clinical shreds of evidence belong to a total of nine classes, but the criterion of classification is still unknown. The main aim of this research is to propose a multiclass classifier to classify the genetic mutations based on clinical evidence (i.e., the text description of these genetic mutations) using Natural Language Processing (NLP) techniques. The dataset for this research is taken from Kaggle and is provided by the Memorial Sloan Kettering Cancer Center (MSKCC). The world-class researchers and oncologists contribute the dataset. Three text transformation models, namely, CountVectorizer, TfidfVectorizer, and Word2Vec, are utilized for the conversion of text to a matrix of token counts. Three machine learning classification models, namely, Logistic Regression (LR), Random Forest (RF), and XGBoost (XGB), along with the Recurrent Neural Network (RNN) model of deep learning, are applied to the sparse matrix (keywords count representation) of text descriptions. The accuracy score of all the proposed classifiers is evaluated by using the confusion matrix. Finally, the empirical results show that the RNN model of deep learning has performed better than other proposed classifiers with the highest accuracy of 70%.
Collapse
Affiliation(s)
- Meenu Gupta
- Department of Computer Science and Engineering, Chandigarh University, Ajitgarh, Punjab, India
| | - Hao Wu
- Digital Zhejiang Technology Operations Co., Ltd., Hangzhou, China
| | - Simrann Arora
- Bharati Vidyapeeth's College of Engineering, New Delhi, India
| | - Akash Gupta
- Bharati Vidyapeeth's College of Engineering, New Delhi, India
| | - Gopal Chaudhary
- Bharati Vidyapeeth's College of Engineering, New Delhi, India
| | - Qiaozhi Hua
- Computer School, Hubei University of Arts and Science, Xiangyang 441000, China
| |
Collapse
|
16
|
Liang Z, Chen Y, Gu T, She J, Dai F, Jiang H, Zhan Z, Li K, Liu Y, Zhou X, Tang L. LXR-Mediated Regulation of Marine-Derived Piericidins Aggravates High-Cholesterol Diet-Induced Cholesterol Metabolism Disorder in Mice. J Med Chem 2021; 64:9943-9959. [PMID: 34251816 DOI: 10.1021/acs.jmedchem.1c00175] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Reported as two antirenal cell carcinoma (RCC) drug candidates, marine-derived compounds piericidin A (PA) and glucopiericidin A (GPA) exhibit hepatotoxicity in renal carcinoma xenograft mice. Proteomics and transcriptomics reveal the hepatotoxicity related with cholesterol disposition since RCC is characterized by cholesterol accumulation. PA/GPA aggravate hepatotoxicity in high-cholesterol diet (HCD)-fed mice while exhibiting no toxicity in chow diet-fed mice. High cholesterol accumulation in liver is liver X receptor (LXR)-mediated cytochrome P450 family 7 subfamily a member 1 (CYP7A1) depression and low-density lipoprotein receptor (LDLR) activation. The farnesoid X nuclear receptor (FXR) is also depressed with a downregulated target gene OSTα. Different from PA directly combined with LXRα as an inhibitor, GPA exists as a prodrug in the liver and exerts toxic effects due to transformation into PA. Surface plasmon resonance (SPR) and docking results of 17 piericidins illustrate that glycosides exert no LXRα binding activity. A longer survival time of GPA-treated mice indicates that further exploration in anti-RCC drug research should focus on reducing glycosides transformed into PA and concentrating in the kidney tumor rather than the liver for lowering the risk of hepatotoxicity.
Collapse
Affiliation(s)
- Zhi Liang
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism, Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Yulian Chen
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism, Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Tanwei Gu
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism, Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Jianglian She
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou 510301, China
| | - Fahong Dai
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism, Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Huanguo Jiang
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism, Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Zhikun Zhan
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism, Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Kunlong Li
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou 510301, China
| | - Yonghong Liu
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou 510301, China
| | - Xuefeng Zhou
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Key Laboratory of Marine Materia Medica, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou 510301, China.,Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, China
| | - Lan Tang
- NMPA Key Laboratory for Research and Evaluation of Drug Metabolism, Guangdong Provincial Key Laboratory of New Drug Screening, School of Pharmaceutical Sciences, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
17
|
Zhou H, He Y, Wang Z, Wang Q, Hu C, Wang X, Lu S, Li K, Yang Y, Luan Z. Identifying the functions of two biomarkers in human oligodendrocyte progenitor cell development. J Transl Med 2021; 19:188. [PMID: 33933125 PMCID: PMC8088696 DOI: 10.1186/s12967-021-02857-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 04/24/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Human oligodendrocyte precursor cells (hOPCs) are an important source of myelinating cells for cell transplantation to treat demyelinating diseases. Myelin oligodendrocytes develop from migratory and proliferative hOPCs. It is well known that NG2 and A2B5 are important biological markers of hOPCs. However, the functional differences between the cell populations represented by these two biomarkers have not been well studied in depth. OBJECTIVE To study the difference between NG2 and A2B5 cells in the development of human oligodendrocyte progenitor cells. METHODS Using cell sorting technology, we obtained NG2+/-, A2B5+/- cells. Further research was then conducted via in vitro cell proliferation and migration assays, single-cell sequencing, mRNA sequencing, and cell transplantation into shiverer mice. RESULTS The proportion of PDGFR-α + cells in the negative cell population was higher than that in the positive cell population. The migration ability of the NG2+/-, A2B5+/- cells was inversely proportional to their myelination ability. The migration, proliferation, and myelination capacities of the negative cell population were stronger than those of the positive cell population. The ability of cell migration and proliferation of the four groups of cells from high to low was: A2B5- > NG2- > NG2+ > A2B5+. The content of PDGFR-α+ cells and the ability of cell differentiation from high to low was: NG2- > A2B5- > A2B5+ > NG2+. CONCLUSION In summary, NG2+ and A2B5+ cells have poor myelination ability due to low levels of PDGFR-α+ cells. Therefore, hOPCs with a higher content of PDGFR-α+ cells may have a better effect in the cell transplantation treatment of demyelinating diseases.
Collapse
Affiliation(s)
- Haipeng Zhou
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, 510515, China
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China
| | - Ying He
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, 510515, China
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China
| | - Zhaoyan Wang
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China
| | - Qian Wang
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China
| | - Caiyan Hu
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China
| | - Xiaohua Wang
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, 510515, China
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China
| | - Siliang Lu
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China
| | - Ke Li
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China
| | - Yinxiang Yang
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China.
| | - Zuo Luan
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, 510515, China.
- The Sixth Medical Centre of PLA General Hospital, Beijing, 100048, China.
| |
Collapse
|
18
|
Chen L, Li Z, Zeng T, Zhang YH, Li H, Huang T, Cai YD. Predicting gene phenotype by multi-label multi-class model based on essential functional features. Mol Genet Genomics 2021; 296:905-918. [PMID: 33914130 DOI: 10.1007/s00438-021-01789-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/13/2021] [Indexed: 12/19/2022]
Abstract
Phenotype is one of the most significant concepts in genetics, which is used to describe all the characteristics of a research object that can be observed. Considering that phenotype reflects the integrated features of genotype and environment factors, it is hard to define phenotype characteristics, even difficult to predict unknown phenotypes. Restricted by current biological techniques, it is still quite expensive and time-consuming to obtain sufficient structural information of large-scale phenotype-associated genes/proteins. Various bioinformatics methods have been presented to solve such problem, and researchers have confirmed the efficacy and prediction accuracy of functional network-based prediction. But general functional descriptions have highly complicated inner structures for phenotype prediction. To further address this issue and improve the efficacy of phenotype prediction on more than ten kinds of phenotypes, we first extract functional enrichment features from GO and KEGG, and then use node2vec to learn functional embedding features of genes from a gene-gene network. All these features are analyzed by some feature selection methods (Boruta, minimum redundancy maximum relevance) to generate a feature list. Such list is fed into the incremental feature selection, incorporating some multi-label classifiers built by RAkEL and some classic base classifiers, to build an optimum multi-label multi-class classification model for phenotype prediction. According to recent researches, our method has indeed identified many literature-supported genes/proteins and their associated phenotypes, and even some candidate genes with re-assigned new phenotypes, which provide a new computational tool for the accurate and effective phenotypic prediction.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.,College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, 130052, People's Republic of China
| | - Tao Zeng
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Hao Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, 130052, People's Republic of China
| | - Tao Huang
- Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|
19
|
Qin X, Liu M, Zhang L, Liu G. Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms. Comput Biol Chem 2021; 91:107456. [PMID: 33610129 DOI: 10.1016/j.compbiolchem.2021.107456] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 01/04/2021] [Accepted: 02/06/2021] [Indexed: 11/18/2022]
Abstract
Understanding the function of protein is conducive to research in advanced fields such as gene therapy of diseases, the development and design of new drugs, etc. The prerequisite for understanding the function of a protein is to determine its tertiary structure. The realization of protein structure classification is indispensable for this problem and fold recognition is a commonly used method of protein structure classification. Protein sequences of 40% identity in the ASTRAL protein classification database are used for fold recognition research in current work to predict 27 folding types which mostly belong to four protein structural classes: α, β, α/β and α + β. We extract features from primary structure of protein using methods covering DSSP, PSSM and HMM which are based on secondary structure and evolutionary information to convert protein sequences into feature vectors that can be recognized by machine learning algorithm and utilize the combination of LightGBM feature selection algorithm and incremental feature selection method (IFS) to find the optimal classifiers respectively constructed by machine learning algorithms on the basis of tree structure including Random Forest, XGBoost and LightGBM. Bayesian optimization method is used for hyper-parameter adjustment of machine learning algorithms to make the accuracy of fold recognition reach as high as 93.45% at last. The result obtained by the model we propose is outstanding in the study of protein fold recognition.
Collapse
Affiliation(s)
- Xinyi Qin
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Lu Zhang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Guangzhong Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| |
Collapse
|
20
|
Wu Z, Shou L, Wang J, Huang T, Xu X. The Methylation Pattern for Knee and Hip Osteoarthritis. Front Cell Dev Biol 2020; 8:602024. [PMID: 33240895 PMCID: PMC7677303 DOI: 10.3389/fcell.2020.602024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 10/22/2020] [Indexed: 01/08/2023] Open
Abstract
Osteoarthritis is one of the most prevalent chronic joint diseases for middle-aged and elderly people. But in recent years, the number of young people suffering from the disease increases quickly. It is known that osteoarthritis is a common degenerative disease caused by the combination and interaction of many factors such as natural and environmental factors. DNA methylations reflect the effects of environmental factors. Several researches on DNA methylation at specific genes in OA cartilage indicated the great potential roles of DNA methylation in OA. To systematically investigate the methylation pattern in knee and hip osteoarthritis, we analyzed the methylation profiles in cartilage of 16 OA hip samples, 19 control hip samples and 62 OA knee samples. 12 discriminative methylation sites were identified using advanced minimal Redundancy Maximal Relevance (mRMR) and Incremental Feature Selection (IFS) methods. The SVM classifier of these 12 methylation sites from genes like MEIS1, GABRG3, RXRA, and EN1, can perfectly classify the OA hip samples, control hip samples and OA knee samples evaluated with LOOCV (Leave-One Out-Cross Validation). These 12 methylation sites can not only serve as biomarker, but also provide underlying mechanism of OA.
Collapse
Affiliation(s)
- Zhen Wu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Lu Shou
- Departmemt of Pneumology, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Jian Wang
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Xinwei Xu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| |
Collapse
|
21
|
Zhang L, Liu M, Qin X, Liu G. Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8858489. [PMID: 33224267 PMCID: PMC7673955 DOI: 10.1155/2020/8858489] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 09/25/2020] [Accepted: 10/24/2020] [Indexed: 01/08/2023]
Abstract
Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM), disorder status, and Composition of k-spaced Amino Acid Pairs (CKSAAP) are firstly employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F-measure.
Collapse
Affiliation(s)
- Lu Zhang
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Xinyi Qin
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Guangzhong Liu
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| |
Collapse
|
22
|
Li S, Jiang L, Tang J, Gao N, Guo F. Kernel Fusion Method for Detecting Cancer Subtypes via Selecting Relevant Expression Data. Front Genet 2020; 11:979. [PMID: 33133130 PMCID: PMC7511763 DOI: 10.3389/fgene.2020.00979] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 08/03/2020] [Indexed: 12/19/2022] Open
Abstract
Recently, cancer has been characterized as a heterogeneous disease composed of many different subtypes. Early diagnosis of cancer subtypes is an important study of cancer research, which can be of tremendous help to patients after treatment. In this paper, we first extract a novel dataset, which contains gene expression, miRNA expression, and isoform expression of five cancers from The Cancer Genome Atlas (TCGA). Next, to avoid the effect of noise existing in 60, 483 genes, we select a small number of genes by using LASSO that employs gene expression and survival time of patients. Then, we construct one similarity kernel for each expression data by using Chebyshev distance. And also, We used SKF to fused the three similarity matrix composed of gene, Iso, and miRNA, and finally clustered the fused similarity matrix with spectral clustering. In the experimental results, our method has better P-value in the Cox model than other methods on 10 cancer data from Jiang Dataset and Novel Dataset. We have drawn different survival curves for different cancers and found that some genes play a key role in cancer. For breast cancer, we find out that HSPA2A, RNASE1, CLIC6, and IFITM1 are highly expressed in some specific groups. For lung cancer, we ensure that C4BPA, SESN3, and IRS1 are highly expressed in some specific groups. The code and all supporting data files are available from https://github.com/guofei-tju/Uncovering-Cancer-Subtypes-via-LASSO.
Collapse
Affiliation(s)
- Shuhao Li
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Nan Gao
- School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
23
|
Ren X, Wang S, Huang T. Decipher the connections between proteins and phenotypes. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140503. [PMID: 32707349 DOI: 10.1016/j.bbapap.2020.140503] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/30/2020] [Accepted: 07/16/2020] [Indexed: 10/23/2022]
Abstract
As the outward-most representation of life, phenotype is the fundamental basis with which humans understand life and disease. But with the advent of molecular and sequencing technique and research, a growing portion of science research focuses primarily on the molecular level of life. Our understanding in molecular variations and mechanisms can only be fully utilized when they are translated into the phenotypic level. In this study, we constructed similarity network for phenotype ontology, and then applied network analysis methods to discover phenotype/disease clusters. Then, we used machine learning models to predict protein-phenotype associations. Each protein was characterized by the functional profiles of its interaction neighbors on the protein-protein interaction network. Our methods can not only predict protein-phenotype associations, but also reveal the underlying mechanisms from protein to phenotype.
Collapse
Affiliation(s)
- Xiaohui Ren
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Steven Wang
- Department of Molecular Biology, Columbia University, New York, USA
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
24
|
Pan X, Zeng T, Zhang YH, Chen L, Feng K, Huang T, Cai YD. Investigation and Prediction of Human Interactome Based on Quantitative Features. Front Bioeng Biotechnol 2020; 8:730. [PMID: 32766217 PMCID: PMC7379396 DOI: 10.3389/fbioe.2020.00730] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 06/09/2020] [Indexed: 01/27/2023] Open
Abstract
Protein is one of the most significant components of all living creatures. All significant and essential biological structures and functions relies on proteins and their respective biological functions. However, proteins cannot perform their unique biological significance independently. They have to interact with each other to realize the complicated biological processes in all living creatures including human beings. In other words, proteins depend on interactions (protein-protein interactions) to realize their significant effects. Thus, the significance comparison and quantitative contribution of candidate PPI features must be determined urgently. According to previous studies, 258 physical and chemical characteristics of proteins have been reported and confirmed to definitively affect the interaction efficiency of the related proteins. Among such features, essential physiochemical features of proteins like stoichiometric balance, protein abundance, molecular weight and charge distribution have been validated to be quite significant and irreplaceable for protein-protein interactions (PPIs). Therefore, in this study, we, on one hand, presented a novel computational framework to identify the key factors affecting PPIs with Boruta feature selection (BFS), Monte Carlo feature selection (MCFS), incremental feature selection (IFS), and on the other hand, built a quantitative decision-rule system to evaluate the potential PPIs under real conditions with random forest (RF) and RIPPER algorithms, thereby supplying several new insights into the detailed biological mechanisms of complicated PPIs. The main datasets and codes can be downloaded at https://github.com/xypan1232/Mass-PPI.
Collapse
Affiliation(s)
- Xiaoyong Pan
- School of Life Sciences, Shanghai University, Shanghai, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
25
|
A Novel Deep Learning Approach for Tropical Cyclone Track Prediction Based on Auto-Encoder and Gated Recurrent Unit Networks. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10113965] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Under global climate change, the frequency of typhoons and their strong wind, heavy rain, and storm surge increase, seriously threatening the life and property of human society. However, traditional tropical cyclone track prediction methods have difficulties in processing large amounts of complex data in terms of prediction efficiency and accuracy. Recently, deep learning methods have shown a potential capability to process complex data efficiently and accurately. In this paper, we propose a novel data-driven approach based on auto-encoder (AE) and gated recurrent unit (GRU) models to forecast tropical cyclone landing locations using the historical tropical cyclone tracks and various meteorological attributes. This approach fuses a data preprocessing layer, an AE layer, and a GRU layer with a customized batch process. The model is trained on a real-world tropical cyclone dataset from the years 1945–2017. Through a comparison with existing forecasting methods, the results verified that our proposed model performed around 15%, 42%, and 56% better than the Numerical Weather Prediction model (NWP) in 24, 48, and 72 h forecasts, and 27%, 13%, 17%, and 17% better than RNN, AE-RNN, GRU, and LSTM, respectively, in 24 h forecasts, using the absolute position error. In addition, a comparison of the meteorological variables indicated that the variable maximum sustained wind speed had the most significant effect on tropical cyclone track prediction.
Collapse
|
26
|
Li M, Chen F, Zhang Y, Xiong Y, Li Q, Huang H. Identification of Post-myocardial Infarction Blood Expression Signatures Using Multiple Feature Selection Strategies. Front Physiol 2020; 11:483. [PMID: 32581823 PMCID: PMC7287215 DOI: 10.3389/fphys.2020.00483] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 04/20/2020] [Indexed: 12/24/2022] Open
Abstract
Myocardial infarction (MI) is a type of serious heart attack in which the blood flow to the heart is suddenly interrupted, resulting in injury to the heart muscles due to a lack of oxygen supply. Although clinical diagnosis methods can be used to identify the occurrence of MI, using the changes of molecular markers or characteristic molecules in blood to characterize the early phase and later trend of MI will help us choose a more reasonable treatment plan. Previously, comparative transcriptome studies focused on finding differentially expressed genes between MI patients and healthy people. However, signature molecules altered in different phases of MI have not been well excavated. We developed a set of computational approaches integrating multiple machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), and support vector machine (SVM), to identify gene expression characteristics on different phases of MI. 134 genes were determined to serve as features for building optimal SVM classifiers to distinguish acute MI and post-MI. Subsequently, functional enrichment analyses followed by protein-protein interaction analysis on 134 genes identified several hub genes (IL1R1, TLR2, and TLR4) associated with progression of MI, which can be used as new diagnostic molecules for MI.
Collapse
Affiliation(s)
- Ming Li
- Department of Cardiology, Eastern Hospital, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Fuli Chen
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Yaling Zhang
- Department of Nephrology, Eastern Hospital, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Yan Xiong
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Qiyong Li
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Hui Huang
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| |
Collapse
|
27
|
Zhang J, Hu H, Xu S, Jiang H, Zhu J, Qin E, He Z, Chen E. The Functional Effects of Key Driver KRAS Mutations on Gene Expression in Lung Cancer. Front Genet 2020; 11:17. [PMID: 32117436 PMCID: PMC7010953 DOI: 10.3389/fgene.2020.00017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/07/2020] [Indexed: 12/11/2022] Open
Abstract
Lung cancer is a common malignant cancer. Kirsten rat sarcoma oncogene (KRAS) mutations have been considered as a key driver for lung cancers. KRAS p.G12C mutations were most predominant in NSCLC which was comprised about 11–16% of lung adenocarcinomas (p.G12C accounts for 45–50% of mutant KRAS). But it is still not clear how the KRAS mutation triggers lung cancers. To study the molecular mechanisms of KRAS mutation in lung cancer. We analyzed the gene expression profiles of 156 KRAS mutation samples and other negative samples with two stage feature selection approach: (1) minimal Redundancy Maximal Relevance (mRMR) and (2) Incremental Feature Selection (IFS). At last, 41 predictive genes for KRAS mutation were identified and a KRAS mutation predictor was constructed. Its leave one out cross validation MCC was 0.879. Our results were helpful for understanding the roles of KRAS mutation in lung cancer.
Collapse
Affiliation(s)
- Jisong Zhang
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Huihui Hu
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Shan Xu
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Hanliang Jiang
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Jihong Zhu
- Department of Anesthesiology, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - E Qin
- Department of Respiratory Medicine, Shaoxing People's Hospital (Shaoxing Hospital, Zhejiang University School of Medicine), Shaoxing, China
| | - Zhengfu He
- Department of Thoracic Surgery, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Enguo Chen
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| |
Collapse
|
28
|
Chen L, Li D, Shao Y, Wang H, Liu Y, Zhang Y. Identifying Microbiota Signature and Functional Rules Associated With Bacterial Subtypes in Human Intestine. Front Genet 2019; 10:1146. [PMID: 31803234 PMCID: PMC6872643 DOI: 10.3389/fgene.2019.01146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 10/21/2019] [Indexed: 12/12/2022] Open
Abstract
Gut microbiomes are integral microflora located in the human intestine with particular symbiosis. Among all microorganisms in the human intestine, bacteria are the most significant subgroup that contains many unique and functional species. The distribution patterns of bacteria in the human intestine not only reflect the different microenvironments in different sections of the intestine but also indicate that bacteria may have unique biological functions corresponding to their proper regions of the intestine. However, describing the functional differences between the bacterial subgroups and their distributions in different individuals is difficult using traditional computational approaches. Here, we first attempted to introduce four effective sets of bacterial features from independent databases. We then presented a novel computational approach to identify potential distinctive features among bacterial subgroups based on a systematic dataset on the gut microbiome from approximately 1,500 human gut bacterial strains. We also established a group of quantitative rules for explaining such distinctions. Results may reveal the microstructural characteristics of the intestinal flora and deepen our understanding on the regulatory role of bacterial subgroups in the human intestine.
Collapse
Affiliation(s)
- Lijuan Chen
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Daojie Li
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Ye Shao
- School of Medicine, Huaqiao University, Quanzhou, China
| | - Hui Wang
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Yuqing Liu
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| | - Yunhua Zhang
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| |
Collapse
|
29
|
Le NQK, Yapp EKY, Nagasundaram N, Chua MCH, Yeh HY. Computational identification of vesicular transport proteins from sequences using deep gated recurrent units architecture. Comput Struct Biotechnol J 2019; 17:1245-1254. [PMID: 31921391 PMCID: PMC6944713 DOI: 10.1016/j.csbj.2019.09.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 09/07/2019] [Accepted: 09/11/2019] [Indexed: 11/20/2022] Open
Abstract
Protein function prediction is one of the most well-studied topics, attracting attention from countless researchers in the field of computational biology. Implementing deep neural networks that help improve the prediction of protein function, however, is still a major challenge. In this research, we suggested a new strategy that includes gated recurrent units and position-specific scoring matrix profiles to predict vesicular transportation proteins, a biological function of great importance. Although it is difficult to discover its function, our model is able to achieve accuracies of 82.3% and 85.8% in the cross-validation and independent dataset, respectively. We also solve the problem of imbalance in the dataset via tuning class weight in the deep learning model. The results generated showed sensitivity, specificity, MCC, and AUC to have values of 79.2%, 82.9%, 0.52, and 0.861, respectively. Our strategy shows superiority in results on the same dataset against all other state-of-the-art algorithms. In our suggested research, we have suggested a technique for the discovery of more proteins, particularly proteins connected with vesicular transport. In addition, our accomplishment could encourage the use of gated recurrent units architecture in protein function prediction.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
| | - Edward Kien Yee Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, #08-04, Innovis, 138634, Singapore
| | - N. Nagasundaram
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore
| | - Matthew Chin Heng Chua
- Institute of Systems Science, 25 Heng Mui Keng Terrace, National University of Singapore, 119615, Singapore
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore
| |
Collapse
|
30
|
Zhang GL, Pan LL, Huang T, Wang JH. The transcriptome difference between colorectal tumor and normal tissues revealed by single-cell sequencing. J Cancer 2019; 10:5883-5890. [PMID: 31737124 PMCID: PMC6843882 DOI: 10.7150/jca.32267] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 06/17/2019] [Indexed: 12/29/2022] Open
Abstract
The previous cancer studies were difficult to reproduce since the tumor tissues were analyzed directly. But the tumor tissues were actually a mixture of different cancer cells. The transcriptome of single-cell was much robust than the transcriptome of a mixed tissue. The single-cell transcriptome had much smaller variance. In this study, we analyzed the single-cell transcriptome of 272 colorectal cancer (CRC) epithelial cells and 160 normal epithelial cells and identified 342 discriminative transcripts using advanced machine learning methods. The most discriminative transcripts were LGALS4, PHGR1, C15orf48, HEPACAM2, PERP, FABP1, FCGBP, MT1G, TSPAN1 and CKB. We further clustered the 342 transcripts into two categories. The upregulated transcripts in CRC epithelial cells were significantly enriched in Ribosome, Protein processing in endoplasmic reticulum, Antigen processing and presentation and p53 signaling pathway. The downregulated transcripts in CRC epithelial cells were significantly enriched in Mineral absorption, Aldosterone-regulated sodium reabsorption and Oxidative phosphorylation pathways. The biological analysis of the discriminative transcripts revealed the possible mechanism of colorectal cancer.
Collapse
Affiliation(s)
- Guo-Liang Zhang
- Department of Colorectal Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, Zhejiang, China
| | - Le-Lin Pan
- Department of Colorectal Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, Zhejiang, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jin-Hai Wang
- Department of Colorectal Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, Zhejiang, China
| |
Collapse
|
31
|
Identifying Methylation Pattern and Genes Associated with Breast Cancer Subtypes. Int J Mol Sci 2019; 20:ijms20174269. [PMID: 31480430 PMCID: PMC6747348 DOI: 10.3390/ijms20174269] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 08/19/2019] [Accepted: 08/29/2019] [Indexed: 12/18/2022] Open
Abstract
Breast cancer is regarded worldwide as a severe human disease. Various genetic variations, including hereditary and somatic mutations, contribute to the initiation and progression of this disease. The diagnostic parameters of breast cancer are not limited to the conventional protein content and can include newly discovered genetic variants and even genetic modification patterns such as methylation and microRNA. In addition, breast cancer detection extends to detailed breast cancer stratifications to provide subtype-specific indications for further personalized treatment. One genome-wide expression–methylation quantitative trait loci analysis confirmed that different breast cancer subtypes have various methylation patterns. However, recognizing clinically applied (methylation) biomarkers is difficult due to the large number of differentially methylated genes. In this study, we attempted to re-screen a small group of functional biomarkers for the identification and distinction of different breast cancer subtypes with advanced machine learning methods. The findings may contribute to biomarker identification for different breast cancer subtypes and provide a new perspective for differential pathogenesis in breast cancer subtypes.
Collapse
|
32
|
Inferring novel genes related to oral cancer with a network embedding method and one-class learning algorithms. Gene Ther 2019; 26:465-478. [PMID: 31455874 DOI: 10.1038/s41434-019-0099-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 06/18/2019] [Accepted: 07/15/2019] [Indexed: 12/14/2022]
Abstract
Oral cancer (OC) is one of the most common cancers threatening human lives. However, OC pathogenesis has yet to be fully uncovered, and thus designing effective treatments remains difficult. Identifying genes related to OC is an important way for achieving this purpose. In this study, we proposed three computational models for inferring novel OC-related genes. In contrast to previously proposed computational methods, which lacked the learning procedures, each proposed model adopted a one-class learning algorithm, which can provide a deep insight into features of validated OC-related genes. A network embedding algorithm (i.e., node2vec) was applied to the protein-protein interaction network to produce the representation of genes. The features of the OC-related genes were used in the training of the one-class algorithm, and the performance of the final inferring model was improved through a feature selection procedure. Then, candidate genes were produced by applying the trained inferring model to other genes. Three tests were performed to screen out the important candidate genes. Accordingly, we obtained three inferred gene sets, any two of which were different. The inferred genes were also different from previous reported genes and some of them have been included in the public Oral Cancer Gene Database. Finally, we analyzed several inferred genes to confirm whether they are novel OC-related genes.
Collapse
|