1
|
Nussinov R, Yavuz BR, Demirel HC, Arici MK, Jang H, Tuncbag N. Review: Cancer and neurodevelopmental disorders: multi-scale reasoning and computational guide. Front Cell Dev Biol 2024; 12:1376639. [PMID: 39015651 PMCID: PMC11249571 DOI: 10.3389/fcell.2024.1376639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 06/10/2024] [Indexed: 07/18/2024] Open
Abstract
The connection and causality between cancer and neurodevelopmental disorders have been puzzling. How can the same cellular pathways, proteins, and mutations lead to pathologies with vastly different clinical presentations? And why do individuals with neurodevelopmental disorders, such as autism and schizophrenia, face higher chances of cancer emerging throughout their lifetime? Our broad review emphasizes the multi-scale aspect of this type of reasoning. As these examples demonstrate, rather than focusing on a specific organ system or disease, we aim at the new understanding that can be gained. Within this framework, our review calls attention to computational strategies which can be powerful in discovering connections, causalities, predicting clinical outcomes, and are vital for drug discovery. Thus, rather than centering on the clinical features, we draw on the rapidly increasing data on the molecular level, including mutations, isoforms, three-dimensional structures, and expression levels of the respective disease-associated genes. Their integrated analysis, together with chromatin states, can delineate how, despite being connected, neurodevelopmental disorders and cancer differ, and how the same mutations can lead to different clinical symptoms. Here, we seek to uncover the emerging connection between cancer, including pediatric tumors, and neurodevelopmental disorders, and the tantalizing questions that this connection raises.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, United States
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Bengi Ruken Yavuz
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, United States
| | | | - M. Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, Türkiye
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, United States
| | - Nurcan Tuncbag
- Department of Chemical and Biological Engineering, Koc University, Istanbul, Türkiye
- School of Medicine, Koc University, Istanbul, Türkiye
- Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Türkiye
| |
Collapse
|
2
|
Bai K, Yang L, Xue J, Zhao L, Hao F. Pathogenicity classification of missense mutations based on deep generative model. Comput Biol Med 2024; 170:107980. [PMID: 38242017 DOI: 10.1016/j.compbiomed.2024.107980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/12/2024] [Accepted: 01/12/2024] [Indexed: 01/21/2024]
Abstract
Missense mutations affect the function of human proteins and are closely associated with multiple acute and chronic diseases. The identification of disease-associated missense mutations and their classification for pathogenicity can provide insights into the genetic basis of disease and protein function. This paper proposes MLAE (Method based on LSTM-Ladder AutoEncoder), a deep learning classification model for identifying disease-associated missense mutations and classifying their pathogenicity based on the Variational AutoEncoder (VAE) framework. MLAE overcomes the limitations of the VAE framework by introducing the Ladder structure, combined with LSTM networks. This reduces the loss of original information during the transmission process, thereby making the model more effective in learning. In the experiment, MLAE classified all 27572 possible missense variants of the three input proteins with an average classification AUC of 0.941. This result provides evidence that MLAE is effective in predicting pathogenicity. Additionally, MLAE provides results for multi-label classification, with an average Hamming loss of 0.196, supporting the classification of complex variants. The proposed MLAE method provides an insightful approach to effectively capture amino acid sequence information and accurately predict the pathogenicity of mutations, thereby providing an analytical basis for the study and prevention of related diseases.
Collapse
Affiliation(s)
- Ke Bai
- Shandong Jianzhu University, Jinan, 250101, PR China
| | - Lu Yang
- Shandong Jianzhu University, Jinan, 250101, PR China
| | - Jian Xue
- Shandong Jianzhu University, Jinan, 250101, PR China
| | - Lin Zhao
- Shandong Jianzhu University, Jinan, 250101, PR China
| | - Fanchang Hao
- Shandong Jianzhu University, Jinan, 250101, PR China.
| |
Collapse
|
3
|
Yan Z, Ge F, Liu Y, Zhang Y, Li F, Song J, Yu DJ. TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion. J Chem Inf Model 2024; 64:1407-1418. [PMID: 38334115 DOI: 10.1021/acs.jcim.3c02019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
Studying the effect of single amino acid variations (SAVs) on protein structure and function is integral to advancing our understanding of molecular processes, evolutionary biology, and disease mechanisms. Screening for deleterious variants is one of the crucial issues in precision medicine. Here, we propose a novel computational approach, TransEFVP, based on large-scale protein language model embeddings and a transformer-based neural network to predict disease-associated SAVs. The model adopts a two-stage architecture: the first stage is designed to fuse different feature embeddings through a transformer encoder. In the second stage, a support vector machine model is employed to quantify the pathogenicity of SAVs after dimensionality reduction. The prediction performance of TransEFVP on blind test data achieves a Matthews correlation coefficient of 0.751, an F1-score of 0.846, and an area under the receiver operating characteristic curve of 0.871, higher than the existing state-of-the-art methods. The benchmark results demonstrate that TransEFVP can be explored as an accurate and effective SAV pathogenicity prediction method. The data and codes for TransEFVP are available at https://github.com/yzh9607/TransEFVP/tree/master for academic use.
Collapse
Affiliation(s)
- Zihao Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and lnformation Displays & lnstitute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, PR China
| | - Yan Liu
- Department of Computer Science, Yangzhou University, Yangzhou 225100, PR China
| | - Yumeng Zhang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, PR China
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria 3000, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| |
Collapse
|
4
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
5
|
Truong A, Myerscough D, Campbell I, Atkinson J, Silberg JJ. A cellular selection identifies elongated flavodoxins that support electron transfer to sulfite reductase. Protein Sci 2023; 32:e4746. [PMID: 37551563 PMCID: PMC10503412 DOI: 10.1002/pro.4746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 07/17/2023] [Accepted: 08/04/2023] [Indexed: 08/09/2023]
Abstract
Flavodoxins (Flds) mediate the flux of electrons between oxidoreductases in diverse metabolic pathways. To investigate whether Flds can support electron transfer to a sulfite reductase (SIR) that evolved to couple with a ferredoxin, we evaluated the ability of Flds to transfer electrons from a ferredoxin-NADP reductase (FNR) to a ferredoxin-dependent SIR using growth complementation of an Escherichia coli strain with a sulfur metabolism defect. We show that Flds from cyanobacteria complement this growth defect when coexpressed with an FNR and an SIR that evolved to couple with a plant ferredoxin. When we evaluated the effect of peptide insertion on Fld-mediated electron transfer, we observed a sensitivity to insertions within regions predicted to be proximal to the cofactor and partner binding sites, while a high insertion tolerance was detected within loops distal from the cofactor and within regions of helices and sheets that are proximal to those loops. Bioinformatic analysis showed that natural Fld sequence variability predicts a large fraction of the motifs that tolerate insertion of the octapeptide SGRPGSLS. These results represent the first evidence that Flds can support electron transfer to assimilatory SIRs, and they suggest that the pattern of insertion tolerance is influenced by interactions with oxidoreductase partners.
Collapse
Affiliation(s)
- Albert Truong
- Biochemistry and Cell Biology Graduate ProgramRice UniversityHoustonTexasUSA
- Department of BiosciencesRice UniversityHoustonTexasUSA
| | | | - Ian Campbell
- Department of BiosciencesRice UniversityHoustonTexasUSA
| | | | - Jonathan J. Silberg
- Department of BiosciencesRice UniversityHoustonTexasUSA
- Department of BioengineeringRice UniversityHoustonTexasUSA
- Department of Chemical and Biomolecular EngineeringRice UniversityHoustonTexasUSA
| |
Collapse
|