1
|
Gu J, Sim BR, Li J, Yu Y, Qin L, Wu L, Liu H, Xu Y, Zhao YL, Nie Y. Coevolution-based protein engineering of alcohol dehydrogenase at distal sites enables enzymatic compatibility with substrate diversity and stereoselectivity. Int J Biol Macromol 2025; 306:141233. [PMID: 39993679 DOI: 10.1016/j.ijbiomac.2025.141233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 01/16/2025] [Accepted: 02/16/2025] [Indexed: 02/26/2025]
Abstract
Chiral alcohols with various substituents and functional groups are attractive synthesizers in many fields. Biocatalysts have attracted great interest for their use in " sustainable chemistry". However, substrate specificity of enzymes limits their widespread use as "generalists" in biocatalysis. In addition, engineering enzymes for simultaneously improving catalytic efficiency and stereoselectivity for structurally diverse substrates is a contemporary challenge. Inspired by naturally occurring coevolution of residues dedicated to a particular function and clustered together in space, we applied coevolution-based engineering to the alcohol dehydrogenase CpRCR from Candida parapsilosis to identify distal sites which can synergistically improve the catalytic properties of diverse substrates. Five variants were developed by clustering the coupling strength and structure of coevolutionary sites which showed improved (up to 28-fold) catalytic efficiency with high stereoselectivity toward 16 structurally diverse substrates (aryl ketones, heterocyclic ketones and β-ketoesters). In particular, for substrate 2-acetylpyridine, the specific activity of K191L/D216H is 12-fold higher than the previously reported highest activity of alcohol dehydrogenase. Theses distal mutations do not directly modify the active center but rather modulate catalytic capacity in various allosteric ways favoring substrate diversity. This study provides a broadly applicable strategy for protein engineering and expanded the applications of biocatalyst on value-added chemicals.
Collapse
Affiliation(s)
- Jie Gu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Byu Ri Sim
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, MOE-LSB & MOE-LSC, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; Department of Biochemistry, University of Toronto, Ontario M5S 3H6, Canada
| | - Jiarui Li
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yangqing Yu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Lei Qin
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Lunjie Wu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Huan Liu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yan Xu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Yi-Lei Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, MOE-LSB & MOE-LSC, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
| | - Yao Nie
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China.
| |
Collapse
|
2
|
Howard CJ, Abell NS, Osuna BA, Jones EM, Chan LY, Chan H, Artis DR, Asfaha JB, Bloom JS, Cooper AR, Liao A, Mahdavi E, Mohammed N, Su AL, Uribe GA, Kosuri S, Dickel DE, Lubock NB. High-resolution deep mutational scanning of the melanocortin-4 receptor enables target characterization for drug discovery. eLife 2025; 13:RP104725. [PMID: 40202051 PMCID: PMC11981609 DOI: 10.7554/elife.104725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2025] Open
Abstract
Deep Mutational Scanning (DMS) is an emerging method to systematically test the functional consequences of thousands of sequence changes to a protein target in a single experiment. Because of its utility in interpreting both human variant effects and protein structure-function relationships, it holds substantial promise to improve drug discovery and clinical development. However, applications in this domain require improved experimental and analytical methods. To address this need, we report novel DMS methods to precisely and quantitatively interrogate disease-relevant mechanisms, protein-ligand interactions, and assess predicted response to drug treatment. Using these methods, we performed a DMS of the melanocortin-4 receptor (MC4R), a G-protein-coupled receptor (GPCR) implicated in obesity and an active target of drug development efforts. We assessed the effects of >6600 single amino acid substitutions on MC4R's function across 18 distinct experimental conditions, resulting in >20 million unique measurements. From this, we identified variants that have unique effects on MC4R-mediated Gαs- and Gαq-signaling pathways, which could be used to design drugs that selectively bias MC4R's activity. We also identified pathogenic variants that are likely amenable to a corrector therapy. Finally, we functionally characterized structural relationships that distinguish the binding of peptide versus small molecule ligands, which could guide compound optimization. Collectively, these results demonstrate that DMS is a powerful method to empower drug discovery and development.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Joshua S Bloom
- Department of Human Genetics and Department of Computational Medicine, University of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical InstituteChevy ChaseUnited States
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Xiong D, Qiu Y, Zhao J, Zhou Y, Lee D, Gupta S, Torres M, Lu W, Liang S, Kang JJ, Eng C, Loscalzo J, Cheng F, Yu H. A structurally informed human protein-protein interactome reveals proteome-wide perturbations caused by disease mutations. Nat Biotechnol 2024:10.1038/s41587-024-02428-4. [PMID: 39448882 DOI: 10.1038/s41587-024-02428-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 09/11/2024] [Indexed: 10/26/2024]
Abstract
To assist the translation of genetic findings to disease pathobiology and therapeutics discovery, we present an ensemble deep learning framework, termed PIONEER (Protein-protein InteractiOn iNtErfacE pRediction), that predicts protein-binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms to generate comprehensive structurally informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods and experimentally validate its predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein-protein interfaces and explore their impact on disease prognosis and drug responses. We identify 586 significant protein-protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from analysis of approximately 11,000 whole exomes across 33 cancer types and show significant associations of oncoPPIs with patient survival and drug responses. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
Collapse
Grants
- R01GM124559 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01GM125639 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01GM130885 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- RM1GM139738 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01DK115398 U.S. Department of Health & Human Services | NIH | National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases)
- U01HG007691 U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- R01HL155107 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01HL155096 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01HL166137 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U54HL119145 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- AHA957729 American Heart Association (American Heart Association, Inc.)
- 24MERIT1185447 American Heart Association (American Heart Association, Inc.)
- R01AG084250 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R56AG074001 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- U01AG073323 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG066707 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG076448 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG082118 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RF1AG082211 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R21AG083003 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RF1NS133812 U.S. Department of Health & Human Services | NIH | National Institute of Neurological Disorders and Stroke (NINDS)
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Yunguang Qiu
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Junfei Zhao
- Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY, USA
| | - Yadi Zhou
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Shobhita Gupta
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
- Biophysics Program, Cornell University, Ithaca, NY, USA
| | - Mateo Torres
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Weiqiang Lu
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Charis Eng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA.
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA.
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
4
|
Alicea B, Bastani S, Gordon NK, Crawford-Young S, Gordon R. The Molecular Basis of Differentiation Wave Activity in Embryogenesis. Biosystems 2024; 243:105272. [PMID: 39033973 DOI: 10.1016/j.biosystems.2024.105272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/10/2024] [Accepted: 07/11/2024] [Indexed: 07/23/2024]
Abstract
As development varies greatly across the tree of life, it may seem difficult to suggest a model that proposes a single mechanism for understanding collective cell behaviors and the coordination of tissue formation. Here we propose a mechanism called differentiation waves, which unify many disparate results involving developmental systems from across the tree of life. We demonstrate how a relatively simple model of differentiation proceeds not from function-related molecular mechanisms, but from so-called differentiation waves. A phenotypic model of differentiation waves is introduced, and its relation to molecular mechanisms is proposed. These waves contribute to a differentiation tree, which is an alternate way of viewing cell lineage and local action of the molecular factors. We construct a model of differentiation wave-related molecular mechanisms (genome, epigenome, and proteome) based on bioinformatic data from the nematode Caenorhabditis elegans. To validate this approach across different modes of development, we evaluate protein expression across different types of development by comparing Caenorhabditis elegans with several model organisms: fruit flies (Drosophila melanogaster), yeast (Saccharomyces cerevisiae), and mouse (Mus musculus). Inspired by gene regulatory networks, two Models of Interactive Contributions (fully-connected MICs and ordered MICs) are used to suggest potential genomic contributions to differentiation wave-related proteins. This, in turn, provides a framework for understanding differentiation and development.
Collapse
Affiliation(s)
- Bradly Alicea
- Orthogonal Research and Education Lab, Champaign-Urbana, IL, USA; OpenWorm Foundation, Boston, MA, USA; University of Illinois Urbana-Champaign, USA.
| | - Suroush Bastani
- Orthogonal Research and Education Lab, Champaign-Urbana, IL, USA.
| | | | | | - Richard Gordon
- Gulf Specimen Marine Laboratory & Aquarium, Panacea, FL, USA.
| |
Collapse
|
5
|
Li MM, Huang Y, Sumathipala M, Liang MQ, Valdeolivas A, Ananthakrishnan AN, Liao K, Marbach D, Zitnik M. Contextual AI models for single-cell protein biology. Nat Methods 2024; 21:1546-1557. [PMID: 39039335 PMCID: PMC11310085 DOI: 10.1038/s41592-024-02341-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 06/10/2024] [Indexed: 07/24/2024]
Abstract
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here we introduce PINNACLE, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multiorgan single-cell atlas, PINNACLE learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. PINNACLE's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. PINNACLE outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases and pinpoints cell type contexts with higher predictive capability than context-free models. PINNACLE's ability to adjust its outputs on the basis of the context in which it operates paves the way for large-scale context-specific predictions in biology.
Collapse
Affiliation(s)
- Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yepeng Huang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marissa Sumathipala
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Man Qing Liang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alberto Valdeolivas
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Ashwin N Ananthakrishnan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Katherine Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Marbach
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
6
|
Liu M, Zhao Z, Wang C, Sang S, Cui Y, Lv C, Yang X, Zhang N, Xiong K, Chen B, Dong Q, Liu K, Gu Y. Harnessing genetic interactions for prediction of immune checkpoint inhibitors response signature in cancer cells. Cancer Lett 2024; 594:216991. [PMID: 38797232 DOI: 10.1016/j.canlet.2024.216991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 05/20/2024] [Accepted: 05/23/2024] [Indexed: 05/29/2024]
Abstract
Genetic interactions (GIs) refer to two altered genes having a combined effect that is not seen individually. They play a crucial role in influencing drug efficacy. We utilized CGIdb 2.0 (http://www.medsysbio.org/CGIdb2/), an updated database of comprehensively published GIs information, encompassing synthetic lethality (SL), synthetic viability (SV), and chemical-genetic interactions. CGIdb 2.0 elucidates GIs relationships between or within protein complex models by integrating protein-protein physical interactions. Additionally, we introduced GENIUS (GENetic Interactions mediated drUg Signature) to leverage GIs for identifying the response signature of immune checkpoint inhibitors (ICIs). GENIUS identified high MAP4K4 expression as a resistant signature and high HERC4 expression as a sensitive signature for ICIs treatment. Melanoma patients with high expression of MAP4K4 were associated with decreased efficacy and poorer survival following ICIs treatment. Conversely, overexpression of HERC4 in melanoma patients correlated with a positive response to ICIs. Notably, HERC4 enhances sensitivity to immunotherapy by facilitating antigen presentation. Analyses of immune cell infiltration and single-cell data revealed that B cells expressing MAP4K4 may contribute to resistance to ICIs in melanoma. Overall, CGIdb 2.0, provides integrated GIs data, thus serving as a crucial tool for exploring drug effects.
Collapse
Affiliation(s)
- Mingyue Liu
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zhangxiang Zhao
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, China
| | - Chengyu Wang
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan, China
| | - Shaocong Sang
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yanrui Cui
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Lv
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiuqi Yang
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Nan Zhang
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Kai Xiong
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Bo Chen
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qi Dong
- Department of Biochemistry and Molecular Biology, Harbin Medical University, Harbin, China
| | - Kaidong Liu
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yunyan Gu
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.
| |
Collapse
|
7
|
Xiong D, Qiu Y, Zhao J, Zhou Y, Lee D, Gupta S, Torres M, Lu W, Liang S, Kang JJ, Eng C, Loscalzo J, Cheng F, Yu H. Structurally-informed human interactome reveals proteome-wide perturbations by disease mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.24.538110. [PMID: 37162909 PMCID: PMC10168245 DOI: 10.1101/2023.04.24.538110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Human genome sequencing studies have identified numerous loci associated with complex diseases. However, translating human genetic and genomic findings to disease pathobiology and therapeutic discovery remains a major challenge at multiscale interactome network levels. Here, we present a deep-learning-based ensemble framework, termed PIONEER (Protein-protein InteractiOn iNtErfacE pRediction), that accurately predicts protein binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms, generating comprehensive structurally-informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods. We further systematically validated PIONEER predictions experimentally through generating 2,395 mutations and testing their impact on 6,754 mutation-interaction pairs, confirming the high quality and validity of PIONEER predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein-protein interfaces after mapping mutations from ~60,000 germline exomes and ~36,000 somatic genomes. We identify 586 significant protein-protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from pan-cancer analysis of ~11,000 tumor whole-exomes across 33 cancer types. We show that PIONEER-predicted oncoPPIs are significantly associated with patient survival and drug responses from both cancer cell lines and patient-derived xenograft mouse models. We identify a landscape of PPI-perturbing tumor alleles upon ubiquitination by E3 ligases, and we experimentally validate the tumorigenic KEAP1-NRF2 interface mutation p.Thr80Lys in non-small cell lung cancer. We show that PIONEER-predicted PPI-perturbing alleles alter protein abundance and correlates with drug responses and patient survival in colon and uterine cancers as demonstrated by proteogenomic data from the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Yunguang Qiu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Junfei Zhao
- Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY 10032, USA
| | - Yadi Zhou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Shobhita Gupta
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
- Biophysics Program, Cornell University, Ithaca, NY 14853, USA
| | - Mateo Torres
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Weiqiang Lu
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Charis Eng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
8
|
Xu X, Bonvin AMJJ. DeepRank-GNN-esm: a graph neural network for scoring protein-protein models using protein language model. BIOINFORMATICS ADVANCES 2024; 4:vbad191. [PMID: 38213822 PMCID: PMC10782804 DOI: 10.1093/bioadv/vbad191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 12/19/2023] [Indexed: 01/13/2024]
Abstract
Motivation Protein-Protein interactions (PPIs) play critical roles in numerous cellular processes. By modelling the 3D structures of the correspond protein complexes valuable insights can be obtained, providing, e.g. starting points for drug and protein design. One challenge in the modelling process is however the identification of near-native models from the large pool of generated models. To this end we have previously developed DeepRank-GNN, a graph neural network that integrates structural and sequence information to enable effective pattern learning at PPI interfaces. Its main features are related to the Position Specific Scoring Matrices (PSSMs), which are computationally expensive to generate, significantly limits the algorithm's usability. Results We introduce here DeepRank-GNN-esm that includes as additional features protein language model embeddings from the ESM-2 model. We show that the ESM-2 embeddings can actually replace the PSSM features at no cost in-, or even better performance on two PPI-related tasks: scoring docking poses and detecting crystal artifacts. This new DeepRank version bypasses thus the need of generating PSSM, greatly improving the usability of the software and opening new application opportunities for systems for which PSSM profiles cannot be obtained or are irrelevant (e.g. antibody-antigen complexes). Availability and implementation DeepRank-GNN-esm is freely available from https://github.com/DeepRank/DeepRank-GNN-esm.
Collapse
Affiliation(s)
- Xiaotong Xu
- Department of Chemistry, Faculty of Science, Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Utrecht 3584 CS, The Netherlands
| | - Alexandre M J J Bonvin
- Department of Chemistry, Faculty of Science, Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Utrecht 3584 CS, The Netherlands
| |
Collapse
|
9
|
Zhang R, Xie J, Yuan X, Yu Y, Zhuang Y, Zhang F, Hou J, Liu Y, Huang W, Zhang M, Li J, Gong Q, Peng X. Newly discovered variants in unexplained neonatal encephalopathy. Mol Genet Genomic Med 2024; 12:e2354. [PMID: 38284441 PMCID: PMC10795097 DOI: 10.1002/mgg3.2354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 11/30/2023] [Accepted: 12/14/2023] [Indexed: 01/30/2024] Open
Abstract
BACKGROUND The genetic background of neonatal encephalopathy (NE) is complicated and early diagnosis is beneficial to optimizing therapeutic strategy for patients. METHODS NE Patients with unclear etiology received regular clinical tests including ammonia test, metabolic screening test, amplitude-integrated electroencephalographic (aEEG) monitoring, brain Magnetic Resonance Imaging (MRI) scanning, and genetic test. The protein structure change was predicted using Dynamut2 and RoseTTAFold. RESULTS 15 out of a total of 113 NE Patients were detected with newly reported pathogenic variants. In this sub-cohort, (1) seizure was the primary initial symptoms; (2) four patients had abnormal metabolic screening results, and two of them were also diagnosed with excessive blood ammonia concentration; (3) the brain MRI results were irregular in three infants and the brain waves were of moderate-severe abnormality in about a half of the patients. The novel pathogenic variants discovered in this study belonged to 12 genes, and seven of them were predicted to introduce a premature translation termination. In-silicon predictions showed that four variants were destructive to the protein structure of KCNQ2. CONCLUSION Our study expands the mutation spectrum of genes associated with NE and introduces new evidence for molecular diagnosis in this newborn illness.
Collapse
Affiliation(s)
- Rong Zhang
- Department of NeonatologyHunan Children's HospitalChangshaHunanChina
| | - Jingjing Xie
- Department of NeonatologyHunan Children's HospitalChangshaHunanChina
| | - Xiao Yuan
- Department of Laboratory DiagnosisChangsha Kingmed Center for Clinical LaboratoryChangshaHunanChina
| | - Yan Yu
- Department of Laboratory DiagnosisChangsha Kingmed Center for Clinical LaboratoryChangshaHunanChina
| | - Yan Zhuang
- Department of NeonatologyHunan Children's HospitalChangshaHunanChina
| | - Fan Zhang
- Department of NeonatologyHunan Children's HospitalChangshaHunanChina
| | - Jianfei Hou
- Department of Laboratory DiagnosisChangsha Kingmed Center for Clinical LaboratoryChangshaHunanChina
| | - Yanqin Liu
- Department of Laboratory DiagnosisChangsha Kingmed Center for Clinical LaboratoryChangshaHunanChina
| | - Weiqing Huang
- Department of NeonatologyHunan Children's HospitalChangshaHunanChina
| | - Min Zhang
- Department of NeonatologyHunan Children's HospitalChangshaHunanChina
| | - Junshuai Li
- Department of NeonatologyHunan Children's HospitalChangshaHunanChina
| | - Qiang Gong
- Department of Laboratory DiagnosisChangsha Kingmed Center for Clinical LaboratoryChangshaHunanChina
| | - Xiaoming Peng
- Department of NeonatologyHunan Children's HospitalChangshaHunanChina
| |
Collapse
|
10
|
Paul A, Shukla D. Oligomerization of Monoamine Transporters. Subcell Biochem 2024; 104:119-137. [PMID: 38963486 DOI: 10.1007/978-3-031-58843-3_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2024]
Abstract
Transporters of the monoamine transporter (MAT) family regulate the uptake of important neurotransmitters like dopamine, serotonin, and norepinephrine. The MAT family functions using the electrochemical gradient of ions across the membrane and comprises three transporters, dopamine transporter (DAT), serotonin transporter (SERT), and norepinephrine transporter (NET). MAT transporters have been observed to exist in monomeric states to higher-order oligomeric states. Structural features, allosteric modulation, and lipid environment regulate the oligomerization of MAT transporters. NET and SERT oligomerization are regulated by levels of PIP2 present in the membrane. The kink present in TM12 in the MAT family is crucial for dimer interface formation. Allosteric modulation in the dimer interface hinders dimer formation. Oligomerization also influences the transporters' function, trafficking, and regulation. This chapter will focus on recent studies on monoamine transporters and discuss the factors affecting their oligomerization and its impact on their function.
Collapse
Affiliation(s)
- Arnav Paul
- Department of Chemistry, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, Department of Bioengineering, Center for Biophysics and Quantitative Biology, Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
11
|
Xing C, Chen P, Zhang L. Computational insight into stability-enhanced systems of anthocyanin with protein/peptide. FOOD CHEMISTRY. MOLECULAR SCIENCES 2023; 6:100168. [PMID: 36923156 PMCID: PMC10009195 DOI: 10.1016/j.fochms.2023.100168] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 12/24/2022] [Accepted: 02/18/2023] [Indexed: 02/24/2023]
Abstract
Anthocyanins, which belong to the flavonoid group, are commonly found in the organs of plants native to South and Central America. However, these pigments are unstable under conditions of varying pH, heat, etc., which limits their potential applications. One method for preserving the stability of anthocyanins is through encapsulation using proteins or peptides. Nevertheless, the complex and diverse structure of these molecules, as well as the limitation of experimental technologies, have hindered a comprehensive understanding of the encapsulation processes and the mechanisms by which stability is enhanced. To address these challenges, computational methods, such as molecular docking and molecular dynamics simulation have been used to study the binding affinity and dynamics of interactions between proteins/peptides and anthocyanins. This review summarizes the mechanisms of interaction between these systems, based on computational approaches, and highlights the role of proteins and peptides in the stability enhancement of anthocyanins. It also discusses the current limitations of these methods and suggests possible solutions.
Collapse
Affiliation(s)
- Cheng Xing
- Department of Chemical Engineering and Waterloo Institute for Nanotechnology, University of Waterloo, Waterloo, Ontario N2L3G1, Canada
- School of Science, Beijing Jiaotong University, 100044 Beijing, China
| | - P. Chen
- Department of Chemical Engineering and Waterloo Institute for Nanotechnology, University of Waterloo, Waterloo, Ontario N2L3G1, Canada
| | - Lei Zhang
- Department of Chemical Engineering and Waterloo Institute for Nanotechnology, University of Waterloo, Waterloo, Ontario N2L3G1, Canada
| |
Collapse
|
12
|
Mészáros B, Park E, Malinverni D, Sejdiu BI, Immadisetty K, Sandhu M, Lang B, Babu MM. Recent breakthroughs in computational structural biology harnessing the power of sequences and structures. Curr Opin Struct Biol 2023; 80:102608. [PMID: 37182396 DOI: 10.1016/j.sbi.2023.102608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/12/2023] [Accepted: 04/17/2023] [Indexed: 05/16/2023]
Abstract
Recent advances in computational approaches and their integration into structural biology enable tackling increasingly complex questions. Here, we discuss several key areas, highlighting breakthroughs and remaining challenges. Theoretical modeling has provided tools to accurately predict and design protein structures on a scale currently difficult to achieve using experimental approaches. Molecular Dynamics simulations have become faster and more precise, delivering actionable information inaccessible by current experimental methods. Virtual screening workflows allow a high-throughput approach to discover ligands that bind and modulate protein function, while Machine Learning methods enable the design of proteins with new functionalities. Integrative structural biology combines several of these approaches, pushing the frontiers of structural and functional characterization to ever larger systems, advancing towards a complete understanding of the living cell. These breakthroughs will accelerate and significantly impact diverse areas of science.
Collapse
Affiliation(s)
- Bálint Mészáros
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA.
| | - Electa Park
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA.
| | - Duccio Malinverni
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/DucMalinverni
| | - Besian I Sejdiu
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/bisejdiu
| | - Kalyan Immadisetty
- Department of Bone Marrow Transplantation & Cellular Therapy, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/k_immadisetty
| | - Manbir Sandhu
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/M5andhu
| | - Benjamin Lang
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/langbnj
| | - M Madan Babu
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA.
| |
Collapse
|
13
|
Banerjee A, Saha S, Tvedt NC, Yang LW, Bahar I. Mutually beneficial confluence of structure-based modeling of protein dynamics and machine learning methods. Curr Opin Struct Biol 2023; 78:102517. [PMID: 36587424 PMCID: PMC10038760 DOI: 10.1016/j.sbi.2022.102517] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/19/2022] [Accepted: 11/22/2022] [Indexed: 12/31/2022]
Abstract
Proteins sample an ensemble of conformers under physiological conditions, having access to a spectrum of modes of motions, also called intrinsic dynamics. These motions ensure the adaptation to various interactions in the cell, and largely assist in, if not determine, viable mechanisms of biological function. In recent years, machine learning frameworks have proven uniquely useful in structural biology, and recent studies further provide evidence to the utility and/or necessity of considering intrinsic dynamics for increasing their predictive ability. Efficient quantification of dynamics-based attributes by recently developed physics-based theories and models such as elastic network models provides a unique opportunity to generate data on dynamics for training ML models towards inferring mechanisms of protein function, assessing pathogenicity, or estimating binding affinities.
Collapse
Affiliation(s)
- Anupam Banerjee
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA
| | - Satyaki Saha
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA
| | - Nathan C Tvedt
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA; Computational and Applied Mathematics and Statistics, The College of William and Mary, Williamsburg, VA 23185, USA
| | - Lee-Wei Yang
- Institute of Bioinformatics and Structural Biology, and PhD Program in Biomedical Artificial Intelligence, National Tsing Hua University, Hsinchu 300044, Taiwan; Physics Division, National Center for Theoretical Sciences, Taipei 106319, Taiwan
| | - Ivet Bahar
- Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh PA 15261, USA.
| |
Collapse
|
14
|
Choudhary P, Waseem M, Kumar S, Subbarao N, Srivastava S, Chakdar H. Y12F mutation in Pseudomonas plecoglossicida S7 lipase enhances its thermal and pH stability for industrial applications: a combination of in silico and in vitro study. World J Microbiol Biotechnol 2023; 39:75. [PMID: 36637534 DOI: 10.1007/s11274-023-03518-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 01/03/2023] [Indexed: 01/14/2023]
Abstract
Appropriate amino acid substitutions are critical for protein engineering to redesign catalytic properties of industrially important enzymes like lipases. The present study aimed for improving the environmental stability of lipase from Pseudomonas plecoglossicida S7 through site-directed mutagenesis driven by computational studies. lipA gene was amplified and sequenced. Both wild type (WT) and mutant type (MT) lipase genes were expressed into the pET SUMO system. The expressed proteins were purified and characterized for pH and thermostability. The lipase gene belonged to subfamily I.1 lipase. Molecular dynamics revealed that Y12F-palmitic acid complex had a greater binding affinity (-6.3 Kcal/mol) than WT (-6.0 Kcal/mol) complex. Interestingly, MDS showed that the binding affinity of WT-complex (-130.314 ± 15.11 KJ/mol) was more than mutant complex (-108.405 ± 69.376 KJ/mol) with a marked increase in the electrostatic energy of mutant (-26.969 ± 12.646 KJ/mol) as compared to WT (-15.082 ± 13.802 KJ/mol). Y12F mutant yielded 1.27 folds increase in lipase activity at 55 °C as compared to the purified WT protein. Also, Y12F mutant showed increased activity (~ 1.2 folds each) at both pH 6 and 10. P. plecoglossicida S7. Y12F mutation altered the kinetic parameters of MT (Km- 1.38 mM, Vmax- 22.32 µM/min) as compared to WT (Km- 1.52 mM, Vmax- 29.76 µM/min) thus increasing the binding affinity of mutant lipase. Y12F mutant lipase with better pH and thermal stability can be used in biocatalysis.
Collapse
Affiliation(s)
- Prassan Choudhary
- Microbial Technology Unit-II, ICAR-National Bureau of Agriculturally Important Microorganisms, 275103, Maunath Bhanjan, India
- Amity Institute of Biotechnology, Amity University, 226010, Lucknow, India
| | - Mohd Waseem
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, 110012, New Delhi, India
| | - Sunil Kumar
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute (IASRI), Library Avenue, 110012, Pusa, New Delhi, India
| | - Naidu Subbarao
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, 110012, New Delhi, India
| | - Shilpi Srivastava
- Amity Institute of Biotechnology, Amity University, 226010, Lucknow, India
| | - Hillol Chakdar
- Microbial Technology Unit-II, ICAR-National Bureau of Agriculturally Important Microorganisms, 275103, Maunath Bhanjan, India.
| |
Collapse
|
15
|
Zhang B, Fan T. Knowledge structure and emerging trends in the application of deep learning in genetics research: A bibliometric analysis [2000–2021]. Front Genet 2022; 13:951939. [PMID: 36081985 PMCID: PMC9445221 DOI: 10.3389/fgene.2022.951939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/13/2022] [Indexed: 11/13/2022] Open
Abstract
Introduction: Deep learning technology has been widely used in genetic research because of its characteristics of computability, statistical analysis, and predictability. Herein, we aimed to summarize standardized knowledge and potentially innovative approaches for deep learning applications of genetics by evaluating publications to encourage more research.Methods: The Science Citation Index Expanded TM (SCIE) database was searched for deep learning applications for genomics-related publications. Original articles and reviews were considered. In this study, we derived a clustered network from 69,806 references that were cited by the 1,754 related manuscripts identified. We used CiteSpace and VOSviewer to identify countries, institutions, journals, co-cited references, keywords, subject evolution, path, current characteristics, and emerging topics.Results: We assessed the rapidly increasing publications concerned about deep learning applications of genomics approaches and identified 1,754 articles that published reports focusing on this subject. Among these, a total of 101 countries and 2,487 institutes contributed publications, The United States of America had the most publications (728/1754) and the highest h-index, and the US has been in close collaborations with China and Germany. The reference clusters of SCI articles were clustered into seven categories: deep learning, logic regression, variant prioritization, random forests, scRNA-seq (single-cell RNA-seq), genomic regulation, and recombination. The keywords representing the research frontiers by year were prediction (2016–2021), sequence (2017–2021), mutation (2017–2021), and cancer (2019–2021).Conclusion: Here, we summarized the current literature related to the status of deep learning for genetics applications and analyzed the current research characteristics and future trajectories in this field. This work aims to provide resources for possible further intensive exploration and encourages more researchers to overcome the research of deep learning applications in genetics.
Collapse
Affiliation(s)
- Bijun Zhang
- Department of Clinical Genetics, Shengjing Hospital of China Medical University, Shenyang, China
| | - Ting Fan
- Department of Computer, School of Intelligent Medicine, China Medical University, Shenyang, China
- *Correspondence: Ting Fan,
| |
Collapse
|
16
|
Schauperl M, Denny RA. AI-Based Protein Structure Prediction in Drug Discovery: Impacts and Challenges. J Chem Inf Model 2022; 62:3142-3156. [PMID: 35727311 DOI: 10.1021/acs.jcim.2c00026] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Proteins are the molecular machinery of the human body, and their malfunctioning is often responsible for diseases, making them crucial targets for drug discovery. The three-dimensional structure of a protein determines its biological function, its conformational state determines substrates, cofactors, and protein binding. Rational drug discovery employs engineered small molecules to selectively interact with proteins to modulate their function. To selectively target a protein and to design small molecules, knowing the protein structure with all its specific conformation is critical. Unfortunately, for a large number of proteins relevant for drug discovery, the three-dimensional structure has not yet been experimentally solved. Therefore, accurately predicting their structure based on their amino acid sequence is one of the grant challenges in biology. Recently, AlphaFold2, a machine learning application based on a deep neural network, was able to predict unknown structures of proteins with an unprecedented accuracy. Despite the impressive progress made by AlphaFold2, nature still challenges the field of structure prediction. In this Perspective, we explore how AlphaFold2 and related methods help make drug design more efficient. Furthermore, we discuss the roles of predicting domain-domain orientations, all relevant conformational states, the influence of posttranslational modifications, and conformational changes due to protein binding partners. We highlight where further improvements are needed for advanced machine learning methods to be successfully and frequently used in the pharmaceutical industry.
Collapse
Affiliation(s)
- Michael Schauperl
- Department of Computational Sciences HotSpot Therapeutics 50 Milk Street, Boston, Massachusetts 02110, United States
| | - Rajiah Aldrin Denny
- Department of Computational Sciences HotSpot Therapeutics 50 Milk Street, Boston, Massachusetts 02110, United States
| |
Collapse
|
17
|
Horne J, Shukla D. Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering. Ind Eng Chem Res 2022; 61:6235-6245. [PMID: 36051311 PMCID: PMC9432854 DOI: 10.1021/acs.iecr.1c04943] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins are Nature's molecular machinery and comprise diverse roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others-where the aim is to ultimately create a protein given desired structural and functional properties. It is often critical to model the relationship between a protein's sequence, folded structure, and biological function to assist in such protein engineering pursuits. However, significant challenges remain in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations result in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). However, progress has been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this Review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating biomolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embeddings for protein sequences.
Collapse
Affiliation(s)
- Jesse Horne
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering and Department of Bioengineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States; Department of Plant Biology, Cancer Center at Illinois, and Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| |
Collapse
|