Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cheng J, Xu Z, Wu W, Zhao L, Li X, Liu Y, Tao S. Training set selection for the prediction of essential genes. PLoS One 2014;9:e86805. [PMID: 24466248 PMCID: PMC3899339 DOI: 10.1371/journal.pone.0086805] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 12/13/2013] [Indexed: 01/23/2023] Open

For:	Cheng J, Xu Z, Wu W, Zhao L, Li X, Liu Y, Tao S. Training set selection for the prediction of essential genes. PLoS One 2014;9:e86805. [PMID: 24466248 PMCID: PMC3899339 DOI: 10.1371/journal.pone.0086805] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 12/13/2013] [Indexed: 01/23/2023] Open

Number

Cited by Other Article(s)

Hu W, Li M, Xiao H, Guan L. Essential genes identification model based on sequence feature map and graph convolutional neural network. BMC Genomics 2024;25:47. [PMID: 38200437 PMCID: PMC10777564 DOI: 10.1186/s12864-024-09958-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 01/01/2024] [Indexed: 01/12/2024] Open

Abstract

BACKGROUND

Essential genes encode functions that play a vital role in the life activities of organisms, encompassing growth, development, immune system functioning, and cell structure maintenance. Conventional experimental techniques for identifying essential genes are resource-intensive and time-consuming, and the accuracy of current machine learning models needs further enhancement. Therefore, it is crucial to develop a robust computational model to accurately predict essential genes.

RESULTS

In this study, we introduce GCNN-SFM, a computational model for identifying essential genes in organisms, based on graph convolutional neural networks (GCNN). GCNN-SFM integrates a graph convolutional layer, a convolutional layer, and a fully connected layer to model and extract features from gene sequences of essential genes. Initially, the gene sequence is transformed into a feature map using coding techniques. Subsequently, a multi-layer GCN is employed to perform graph convolution operations, effectively capturing both local and global features of the gene sequence. Further feature extraction is performed, followed by integrating convolution and fully-connected layers to generate prediction results for essential genes. The gradient descent algorithm is utilized to iteratively update the cross-entropy loss function, thereby enhancing the accuracy of the prediction results. Meanwhile, model parameters are tuned to determine the optimal parameter combination that yields the best prediction performance during training.

CONCLUSIONS

Experimental evaluation demonstrates that GCNN-SFM surpasses various advanced essential gene prediction models and achieves an average accuracy of 94.53%. This study presents a novel and effective approach for identifying essential genes, which has significant implications for biology and genomics research.

Collapse

Benstead-Hume G, Wooller SK, Renaut J, Dias S, Woodbine L, Carr AM, Pearl FMG. Biological network topology features predict gene dependencies in cancer cell-lines. BIOINFORMATICS ADVANCES 2022;2:vbac084. [PMID: 36699394 PMCID: PMC9681200 DOI: 10.1093/bioadv/vbac084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 10/02/2022] [Accepted: 11/08/2022] [Indexed: 11/12/2022]

Li Y, Zeng M, Wu Y, Li Y, Li M. Accurate Prediction of Human Essential Proteins Using Ensemble Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:3263-3271. [PMID: 34699365 DOI: 10.1109/tcbb.2021.3122294] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN. Cells 2022;11:cells11172648. [PMID: 36078056 PMCID: PMC9454873 DOI: 10.3390/cells11172648] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 11/25/2022] Open

Al-Anzi BF, Khajah M, Fakhraldeen SA. Predicting and explaining the impact of genetic disruptions and interactions on organismal viability. Bioinformatics 2022;38:4088-4099. [PMID: 35861390 PMCID: PMC9438956 DOI: 10.1093/bioinformatics/btac519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/30/2022] [Accepted: 07/20/2022] [Indexed: 12/24/2022] Open

DELEAT: gene essentiality prediction and deletion design for bacterial genome reduction. BMC Bioinformatics 2021;22:444. [PMID: 34537011 PMCID: PMC8449488 DOI: 10.1186/s12859-021-04348-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/26/2021] [Indexed: 11/10/2022] Open

Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study. MATHEMATICAL AND COMPUTATIONAL APPLICATIONS 2021. [DOI: 10.3390/mca26020040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Zeng M, Li M, Fei Z, Wu FX, Li Y, Pan Y, Wang J. A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:296-305. [PMID: 30736002 DOI: 10.1109/tcbb.2019.2897679] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Le NQK, Do DT, Hung TNK, Lam LHT, Huynh TT, Nguyen NTK. A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification. Int J Mol Sci 2020;21:E9070. [PMID: 33260643 PMCID: PMC7730808 DOI: 10.3390/ijms21239070] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 11/25/2020] [Accepted: 11/26/2020] [Indexed: 01/13/2023] Open

Hasan MA, Lonardi S. DeeplyEssential: a deep neural network for predicting essential genes in microbes. BMC Bioinformatics 2020;21:367. [PMID: 32998698 PMCID: PMC7525945 DOI: 10.1186/s12859-020-03688-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Network Embedding the Protein-Protein Interaction Network for Human Essential Genes Identification. Genes (Basel) 2020;11:genes11020153. [PMID: 32023848 PMCID: PMC7074227 DOI: 10.3390/genes11020153] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 01/27/2020] [Accepted: 01/29/2020] [Indexed: 11/18/2022] Open

Zeng M, Li M, Wu FX, Li Y, Pan Y. DeepEP: a deep learning framework for identifying essential proteins. BMC Bioinformatics 2019;20:506. [PMID: 31787076 PMCID: PMC6886168 DOI: 10.1186/s12859-019-3076-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

Background

Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics.

Results

We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins.

Conclusion

We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods.

Collapse

Wen QF, Liu S, Dong C, Guo HX, Gao YZ, Guo FB. Geptop 2.0: An Updated, More Precise, and Faster Geptop Server for Identification of Prokaryotic Essential Genes. Front Microbiol 2019;10:1236. [PMID: 31214154 PMCID: PMC6558110 DOI: 10.3389/fmicb.2019.01236] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 05/17/2019] [Indexed: 12/16/2022] Open

Rasti S, Vogiatzis C. A survey of computational methods in protein–protein interaction networks. ANNALS OF OPERATIONS RESEARCH 2019;276:35-87. [DOI: 10.1007/s10479-018-2956-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]

Fang M, Lei X, Guo L. A Survey on Computational Methods for Essential Proteins and Genes Prediction. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181112150422] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Azhagesan K, Ravindran B, Raman K. Network-based features enable prediction of essential genes across diverse organisms. PLoS One 2018;13:e0208722. [PMID: 30543651 PMCID: PMC6292609 DOI: 10.1371/journal.pone.0208722] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 11/21/2018] [Indexed: 12/19/2022] Open

Abstract

Machine learning approaches to predict essential genes have gained a lot of traction in recent years. These approaches predominantly make use of sequence and network-based features to predict essential genes. However, the scope of network-based features used by the existing approaches is very narrow. Further, many of these studies focus on predicting essential genes within the same organism, which cannot be readily used to predict essential genes across organisms. Therefore, there is clearly a need for a method that is able to predict essential genes across organisms, by leveraging network-based features. In this study, we extract several sets of network-based features from protein-protein association networks available from the STRING database. Our network features include some common measures of centrality, and also some novel recursive measures recently proposed in social network literature. We extract hundreds of network-based features from networks of 27 diverse organisms to predict the essentiality of 87000+ genes. Our results show that network-based features are statistically significantly better at classifying essential genes across diverse bacterial species, compared to the current state-of-the-art methods, which use mostly sequence and a few 'conventional' network-based features. Our diverse set of network properties gave an AUROC of 0.847 and a precision of 0.320 across 27 organisms. When we augmented the complete set of network features with sequence-derived features, we achieved an improved AUROC of 0.857 and a precision of 0.335. We also constructed a reduced set of 100 sequence and network features, which gave a comparable performance. Further, we show that our features are useful for predicting essential genes in new organisms by using leave-one-species-out validation. Our network features capture the local, global and neighbourhood properties of the network and are hence effective for prediction of essential genes across diverse organisms, even in the absence of other complex biological knowledge. Our approach can be readily exploited to predict essentiality for organisms in interactome databases such as the STRING, where both network and sequence are readily available. All codes are available at https://github.com/RamanLab/nbfpeg.

Collapse

Tian D, Wenlock S, Kabir M, Tzotzos G, Doig AJ, Hentges KE. Identifying mouse developmental essential genes using machine learning. Dis Model Mech 2018;11:11/12/dmm034546. [PMID: 30563825 PMCID: PMC6307915 DOI: 10.1242/dmm.034546] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 10/19/2018] [Indexed: 12/20/2022] Open

Abstract

The genes that are required for organismal survival are annotated as ‘essential genes’. Identifying all the essential genes of an animal species can reveal critical functions that are needed during the development of the organism. To inform studies on mouse development, we developed a supervised machine learning classifier based on phenotype data from mouse knockout experiments. We used this classifier to predict the essentiality of mouse genes lacking experimental data. Validation of our predictions against a blind test set of recent mouse knockout experimental data indicated a high level of accuracy (>80%). We also validated our predictions for other mouse mutagenesis methodologies, demonstrating that the predictions are accurate for lethal phenotypes isolated in random chemical mutagenesis screens and embryonic stem cell screens. The biological functions that are enriched in essential and non-essential genes have been identified, showing that essential genes tend to encode intracellular proteins that interact with nucleic acids. The genome distribution of predicted essential and non-essential genes was analysed, demonstrating that the density of essential genes varies throughout the genome. A comparison with human essential and non-essential genes was performed, revealing conservation between human and mouse gene essentiality status. Our genome-wide predictions of mouse essential genes will be of value for the planning of mouse knockout experiments and phenotyping assays, for understanding the functional processes required during mouse development, and for the prioritisation of disease candidate genes identified in human genome and exome sequence datasets.

Summary: Here, we used computer-based machine learning methodology to predict which genes in the mouse genome are essential for development, and present a database of mouse essential and non-essential genes.

Collapse

Dong C, Jin YT, Hua HL, Wen QF, Luo S, Zheng WX, Guo FB. Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment. Brief Bioinform 2018;21:171-181. [PMID: 30496347 DOI: 10.1093/bib/bby116] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 11/01/2018] [Accepted: 11/02/2018] [Indexed: 02/06/2023] Open

Mobegi FM, Zomer A, de Jonge MI, van Hijum SAFT. Advances and perspectives in computational prediction of microbial gene essentiality. Brief Funct Genomics 2017;16:70-79. [PMID: 26857942 DOI: 10.1093/bfgp/elv063] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Nigatu D, Sobetzko P, Yousef M, Henkel W. Sequence-based information-theoretic features for gene essentiality prediction. BMC Bioinformatics 2017;18:473. [PMID: 29121868 PMCID: PMC5679510 DOI: 10.1186/s12859-017-1884-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 10/26/2017] [Indexed: 11/10/2022] Open

Zhang X, Acencio ML, Lemke N. Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review. Front Physiol 2016;7:75. [PMID: 27014079 PMCID: PMC4781880 DOI: 10.3389/fphys.2016.00075] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 02/15/2016] [Indexed: 01/12/2023] Open

Peng X, Wang J, Wang J, Wu FX, Pan Y. Rechecking the Centrality-Lethality Rule in the Scope of Protein Subcellular Localization Interaction Networks. PLoS One 2015;10:e0130743. [PMID: 26115027 PMCID: PMC4482623 DOI: 10.1371/journal.pone.0130743] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Accepted: 05/23/2015] [Indexed: 12/12/2022] Open

Abstract

Essential proteins are indispensable for living organisms to maintain life activities and play important roles in the studies of pathology, synthetic biology, and drug design. Therefore, besides experiment methods, many computational methods are proposed to identify essential proteins. Based on the centrality-lethality rule, various centrality methods are employed to predict essential proteins in a Protein-protein Interaction Network (PIN). However, neglecting the temporal and spatial features of protein-protein interactions, the centrality scores calculated by centrality methods are not effective enough for measuring the essentiality of proteins in a PIN. Moreover, many methods, which overfit with the features of essential proteins for one species, may perform poor for other species. In this paper, we demonstrate that the centrality-lethality rule also exists in Protein Subcellular Localization Interaction Networks (PSLINs). To do this, a method based on Localization Specificity for Essential protein Detection (LSED), was proposed, which can be combined with any centrality method for calculating the improved centrality scores by taking into consideration PSLINs in which proteins play their roles. In this study, LSED was combined with eight centrality methods separately to calculate Localization-specific Centrality Scores (LCSs) for proteins based on the PSLINs of four species (Saccharomyces cerevisiae, Homo sapiens, Mus musculus and Drosophila melanogaster). Compared to the proteins with high centrality scores measured from the global PINs, more proteins with high LCSs measured from PSLINs are essential. It indicates that proteins with high LCSs measured from PSLINs are more likely to be essential and the performance of centrality methods can be improved by LSED. Furthermore, LSED provides a wide applicable prediction model to identify essential proteins for different species.

Collapse