1
|
Pukhta IR, Rout RK. Identification and segregation of genes with improved recurrent neural network trained with optimal gene level and mutation level features. Comput Methods Biomech Biomed Engin 2024:1-16. [PMID: 38424698 DOI: 10.1080/10255842.2024.2311322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 01/20/2024] [Indexed: 03/02/2024]
Abstract
Even though many different approaches have been employed to address the complex mutational heterogeneity of cancer, finding driver genes is still problematic since other genomic factors cannot be fully integrated for combined analyses. This research paper presents a novel gene identification and segregation model with five key processes (a) pre-processing, (b) treatment of class imbalances, (c) feature extraction, (d) feature selection, and (e) gene classification. To increase the data quality, the gathered initial information is first pre-processed utilizing data cleaning and data normalization. This turns the raw data into something that is both useful and effective. In actuality, the sample is skewed against drivers because passenger mutation markers appear in proportionally less instances than drivers do. To address the Class Imbalance Problem, improved K-Means + SMOTE are applied to the preprocessed data. The most crucial characteristics, including those at the gene and mutation levels, are then extracted from the balanced dataset. To lessen the computational load in terms of time, the best features from the retrieved features are selected using Forensic interpretation tailored hunger food search optimization (FIHFSO). The ideal features are used to train the deep learning classifier that conducts the separation procedure. In this research, an Improved Recurrent Neural Network (I-RNN) is used to make a final decision about genes. At 90% of learning percentage, the accuracy of the proposed method achieves 0.98% of 0.83, 0.81, 0.65, 0.80, 0.92 and 0.63% which is compared to the other methods like HGS, FBIO, AOA, AO, GOA and PRO respectively.
Collapse
Affiliation(s)
- Irfan Rashid Pukhta
- Assistant Professor, Department of Computer Science and Engineering National Institute of Technology, Srinagar, Jammu and Kashmir 190006, India
| | - Ranjeet Kumar Rout
- Assistant Professor, Department of Computer Science and Engineering National Institute of Technology, Srinagar, Jammu and Kashmir 190006, India
| |
Collapse
|
2
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
3
|
Lu X, Chen G, Li J, Hu X, Sun F. MAGCN: A Multiple Attention Graph Convolution Networks for Predicting Synthetic Lethality. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2681-2689. [PMID: 36374879 DOI: 10.1109/tcbb.2022.3221736] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Synthetic lethality (SL) is a potential cancer therapeutic strategy and drug discovery. Computational approaches to identify synthetic lethality genes have become an effective complement to wet experiments which are time consuming and costly. Graph convolutional networks (GCN) has been utilized to such prediction task as be good at capturing the neighborhood dependency in a graph. However, it is still a lack of the mechanism of aggregating the complementary neighboring information from various heterogeneous graphs. Here, we propose the Multiple Attention Graph Convolution Networks for predicting synthetic lethality (MAGCN). First, we obtain the functional similarity features and topological structure features of genes from different data sources respectively, such as Gene Ontology data and Protein-Protein Interaction. Then, graph convolutional network is utilized to accumulate the knowledge from neighbor nodes according to synthetic lethal associations. Meanwhile, we propose a multiple graphs attention model and construct a multiple graphs attention network to learn the contribution factors of different graphs to generate embedded representation by aggregating these graphs. Finally, the generated feature matrix is decoded to predict potential synthetic lethal interaction. Experimental results show that MAGCN is superior to other baseline methods. Case study demonstrates the ability of MAGCN to predict human SL gene pairs.
Collapse
|
4
|
Liu P, Liu C, Mao Y, Guo J, Liu F, Cai W, Zhao F. Identification of essential proteins based on edge features and the fusion of multiple-source biological information. BMC Bioinformatics 2023; 24:203. [PMID: 37198530 DOI: 10.1186/s12859-023-05315-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/30/2023] [Indexed: 05/19/2023] Open
Abstract
BACKGROUND A major current focus in the analysis of protein-protein interaction (PPI) data is how to identify essential proteins. As massive PPI data are available, this warrants the design of efficient computing methods for identifying essential proteins. Previous studies have achieved considerable performance. However, as a consequence of the features of high noise and structural complexity in PPIs, it is still a challenge to further upgrade the performance of the identification methods. METHODS This paper proposes an identification method, named CTF, which identifies essential proteins based on edge features including h-quasi-cliques and uv-triangle graphs and the fusion of multiple-source information. We first design an edge-weight function, named EWCT, for computing the topological scores of proteins based on quasi-cliques and triangle graphs. Then, we generate an edge-weighted PPI network using EWCT and dynamic PPI data. Finally, we compute the essentiality of proteins by the fusion of topological scores and three scores of biological information. RESULTS We evaluated the performance of the CTF method by comparison with 16 other methods, such as MON, PeC, TEGS, and LBCC, the experiment results on three datasets of Saccharomyces cerevisiae show that CTF outperforms the state-of-the-art methods. Moreover, our method indicates that the fusion of other biological information is beneficial to improve the accuracy of identification.
Collapse
Affiliation(s)
- Peiqiang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China.
| | - Chang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Yanyan Mao
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
- College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao, China
| | - Junhong Guo
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Fanshu Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Wangmin Cai
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Feng Zhao
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| |
Collapse
|
5
|
Han S, Wang N, Guo Y, Tang F, Xu L, Ju Y, Shi L. Application of Sparse Representation in Bioinformatics. Front Genet 2021; 12:810875. [PMID: 34976030 PMCID: PMC8715914 DOI: 10.3389/fgene.2021.810875] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/01/2021] [Indexed: 11/15/2022] Open
Abstract
Inspired by L1-norm minimization methods, such as basis pursuit, compressed sensing, and Lasso feature selection, in recent years, sparse representation shows up as a novel and potent data processing method and displays powerful superiority. Researchers have not only extended the sparse representation of a signal to image presentation, but also applied the sparsity of vectors to that of matrices. Moreover, sparse representation has been applied to pattern recognition with good results. Because of its multiple advantages, such as insensitivity to noise, strong robustness, less sensitivity to selected features, and no “overfitting” phenomenon, the application of sparse representation in bioinformatics should be studied further. This article reviews the development of sparse representation, and explains its applications in bioinformatics, namely the use of low-rank representation matrices to identify and study cancer molecules, low-rank sparse representations to analyze and process gene expression profiles, and an introduction to related cancers and gene expression profile database.
Collapse
Affiliation(s)
- Shuguang Han
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Ning Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Furong Tang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
- *Correspondence: Ying Ju, ; Lei Shi,
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
- *Correspondence: Ying Ju, ; Lei Shi,
| |
Collapse
|
6
|
Lu X, Liu F, Miao Q, Liu P, Gao Y, He K. A novel method to identify gene interaction patterns. BMC Genomics 2021; 22:436. [PMID: 34112093 PMCID: PMC8194229 DOI: 10.1186/s12864-021-07628-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 04/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene interaction patterns, including modules and motifs, can be used to identify cancer specific biomarkers and to reveal the mechanism of tumorigenesis. Most of the existing module network inferencing methods focus on gene independent functional patterns, while the studies of overlapping characteristics between modules are lacking. The objective of this study was to reveal the functional overlapping patterns in gene modules, helping elucidate the regulatory relationship between overlapping genes and communities, as well as to explore cancer formation and progression. RESULTS We analyzed six cancer datasets from The Cancer Genome Atlas and obtained three kinds of gene functional modules for each cancer, including Independent-Community, Dependent-Community and Merged-Community. In the six cancers, 59(3.5%) Independent-Communities were identified, while 1631(96.5%) Dependent-Communities were acquired. Compared with Lemon-Tree and K-Means, the gene communities identified by our method were enriched in more known GO categories with lower p-values. Meanwhile, those identified distinguishing communities can significantly distinguish the survival prognostic of patients by Kaplan-Meier analysis. Furthermore, identified driver genes in the gene communities can be considered as biomarkers which can accurately distinguish the tumour or normal samples for each cancer type. CONCLUSIONS In all identified communities, Dependent-Communities are the majority. Our method is more effective than the other two methods which do not consider the overlapping characteristics of modules. This indicates that overlapping genes are located in different specific functional groups, and a communication bridge is established between the communities to construct a comprehensive carcinogenesis.
Collapse
Affiliation(s)
- Xinguo Lu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China.
| | - Fang Liu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China
| | - Qiumai Miao
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China
| | - Ping Liu
- Hunan Want Want Hospital, Renmin Zhong Road, Changsha, 410006, China
| | - Yan Gao
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China
| | - Keren He
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China
| |
Collapse
|
7
|
Meng Z, Kuang L, Chen Z, Zhang Z, Tan Y, Li X, Wang L. Method for Essential Protein Prediction Based on a Novel Weighted Protein-Domain Interaction Network. Front Genet 2021; 12:645932. [PMID: 33815480 PMCID: PMC8010314 DOI: 10.3389/fgene.2021.645932] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 02/15/2021] [Indexed: 01/04/2023] Open
Abstract
In recent years a number of calculative models based on protein-protein interaction (PPI) networks have been proposed successively. However, due to false positives, false negatives, and the incompleteness of PPI networks, there are still many challenges affecting the design of computational models with satisfactory predictive accuracy when inferring key proteins. This study proposes a prediction model called WPDINM for detecting key proteins based on a novel weighted protein-domain interaction (PDI) network. In WPDINM, a weighted PPI network is constructed first by combining the gene expression data of proteins with topological information extracted from the original PPI network. Simultaneously, a weighted domain-domain interaction (DDI) network is constructed based on the original PDI network. Next, through integrating the newly obtained weighted PPI network and weighted DDI network with the original PDI network, a weighted PDI network is further constructed. Then, based on topological features and biological information, including the subcellular localization and orthologous information of proteins, a novel PageRank-based iterative algorithm is designed and implemented on the newly constructed weighted PDI network to estimate the criticality of proteins. Finally, to assess the prediction performance of WPDINM, we compared it with 12 kinds of competitive measures. Experimental results show that WPDINM can achieve a predictive accuracy rate of 90.19, 81.96, 70.72, 62.04, 55.83, and 51.13% in the top 1%, top 5%, top 10%, top 15%, top 20%, and top 25% separately, which exceeds the prediction accuracy achieved by traditional state-of-the-art competing measures. Owing to the satisfactory identification effect, the WPDINM measure may contribute to the further development of key protein identification.
Collapse
Affiliation(s)
- Zixuan Meng
- College of Computer, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhiping Chen
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Zhen Zhang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Xueyong Li
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer, Xiangtan University, Xiangtan, China.,College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
8
|
Nojiri N, Meng Z, Saho K, Duan Y, Uemura K, Aravinda CV, Prabhu GA, Shimakawa H, Meng L. Apathy Classification Based on Doppler Radar Image for the Elderly Person. Front Bioeng Biotechnol 2020; 8:553847. [PMID: 33224927 PMCID: PMC7670046 DOI: 10.3389/fbioe.2020.553847] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 09/30/2020] [Indexed: 11/25/2022] Open
Abstract
Apathy is a disease characterized by diminished motivation not attributable to a diminished level of consciousness, cognitive impairment, or emotional distress. It is a serious problem facing the elderly in today's society. The diagnosis of apathy needs to be done at a clinic, which is particularly inconvenient and difficult for elderly patients. In this work, we examine the possibility of using doppler radar imaging for the classification of apathy in the elderly. We recruited 178 elderly participants to help create a dataset by having them fill out a questionnaire and submit to doppler radar imaging while performing a walking action. We selected walking because it is one of the most common actions in daily life and potentially contains a variety of useful health information. We used radar imaging rather than an RGB camera due to the greater privacy protection it affords. Seven machine learning models, including our proposed one, which uses a neural network, were applied to apathy classification using the walking doppler radar images of the elderly. Before classification, we perform a simple image pre-processing for feature extraction. This pre-processing separates every walking doppler radar image into four parts on the vertical and horizontal axes and the number of feature points is then counted in every separated part after binarization to create eight features. In this binarization, the optimized threshold is obtained by experimentally sliding the threshold. We found that our proposed neural network achieved an accuracy of more than 75% in apathy classification. This accuracy is not as high as that of other object classification methods in current use, but as an initial research in this area, it demonstrates the potential of apathy classification using doppler radar images for the elderly. We will examine ways of increasing the accuracy in future work.
Collapse
Affiliation(s)
- Naoto Nojiri
- College of Information Science and Engineering, Ritsumeikan University, Kusatsu, Japan
| | - Zelin Meng
- College of Science and Engineering, Ritsumeikan University, Kusatsu, Japan
| | - Kenshi Saho
- Faculty of Engineering, Toyama Prefectural University, Imizu, Japan
| | - Yucong Duan
- Data Science and Technology Department, Hainan University, Haikou, China
| | - Kazuki Uemura
- Faculty of Engineering, Toyama Prefectural University, Imizu, Japan
| | - C V Aravinda
- Department of Computer Science and Engineering, NMAM Institute of Technology, NITTE, Karkala, India
| | - G Amar Prabhu
- Department of Computer Science and Engineering, NMAM Institute of Technology, NITTE, Karkala, India
| | - Hiromitsu Shimakawa
- College of Information Science and Engineering, Ritsumeikan University, Kusatsu, Japan
| | - Lin Meng
- College of Science and Engineering, Ritsumeikan University, Kusatsu, Japan
| |
Collapse
|