1
|
Boob AG, Tan SI, Zaidi A, Singh N, Xue X, Zhou S, Martin TA, Chen LQ, Zhao H. Design of diverse, functional mitochondrial targeting sequences across eukaryotic organisms using variational autoencoder. Nat Commun 2025; 16:4151. [PMID: 40320395 PMCID: PMC12050285 DOI: 10.1038/s41467-025-59499-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 04/16/2025] [Indexed: 05/08/2025] Open
Abstract
Mitochondria play a key role in energy production and metabolism, making them a promising target for metabolic engineering and disease treatment. However, despite the known influence of passenger proteins on localization efficiency, only a few protein-localization tags have been characterized for mitochondrial targeting. To address this limitation, we leverage a Variational Autoencoder to design novel mitochondrial targeting sequences. In silico analysis reveals that a high fraction of the generated peptides (90.14%) are functional and possess features important for mitochondrial targeting. We characterize artificial peptides in four eukaryotic organisms and, as a proof-of-concept, demonstrate their utility in increasing 3-hydroxypropionic acid titers through pathway compartmentalization and improving 5-aminolevulinate synthase delivery by 1.62-fold and 4.76-fold, respectively. Moreover, we employ latent space interpolation to shed light on the evolutionary origins of dual-targeting sequences. Overall, our work demonstrates the potential of generative artificial intelligence for both fundamental research and practical applications in mitochondrial biology.
Collapse
Affiliation(s)
- Aashutosh Girish Boob
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Shih-I Tan
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Airah Zaidi
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Nilmani Singh
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Xueyi Xue
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Shuaizhen Zhou
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Teresa A Martin
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Li-Qing Chen
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
2
|
Sherry J, Pawar KI, Dolat L, Smith E, Chang IC, Pha K, Kaake R, Swaney DL, Herrera C, McMahon E, Bastidas RJ, Johnson JR, Valdivia RH, Krogan NJ, Elwell CA, Verba K, Engel JN. The Chlamydia effector Dre1 binds dynactin to reposition host organelles during infection. Cell Rep 2025; 44:115509. [PMID: 40186871 DOI: 10.1016/j.celrep.2025.115509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 01/09/2025] [Accepted: 03/12/2025] [Indexed: 04/07/2025] Open
Abstract
The obligate intracellular pathogen Chlamydia trachomatis replicates in a specialized membrane-bound compartment where it repositions host organelles during infection to acquire nutrients and evade host surveillance. We describe a bacterial effector, Dre1, that binds specifically to dynactin associated with host microtubule organizing centers without globally impeding dynactin function. Dre1 is required to reposition the centrosome, mitotic spindle, Golgi apparatus, and primary cilia around the inclusion and contributes to pathogen fitness in cell-based and mouse models of infection. We utilized Dre1 to affinity purify the megadalton dynactin protein complex and determined the first cryoelectron microscopy (cryo-EM) structure of human dynactin. Our results suggest that Dre1 binds to the pointed end of dynactin and uncovers the first bacterial effector that modulates dynactin function. Our work highlights how a pathogen employs a single effector to evoke targeted, large-scale changes in host cell organization that facilitate pathogen growth without inhibiting host viability.
Collapse
Affiliation(s)
- Jessica Sherry
- Department of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Komal Ishwar Pawar
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Lee Dolat
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Erin Smith
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - I-Chang Chang
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Khavong Pha
- Department of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Robyn Kaake
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Danielle L Swaney
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Clara Herrera
- Department of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Eleanor McMahon
- Department of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Robert J Bastidas
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Jeffrey R Johnson
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Raphael H Valdivia
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Nevan J Krogan
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Cherilyn A Elwell
- Department of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA.
| | - Kliment Verba
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA.
| | - Joanne N Engel
- Department of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA.
| |
Collapse
|
3
|
Grewal S, Iyamu U, Vinals D, Mitran C, Hegde N, Yanow S. A machine learning framework to identify complex physicochemical features of B cell epitopes. RESEARCH SQUARE 2025:rs.3.rs-6255613. [PMID: 40321766 PMCID: PMC12047986 DOI: 10.21203/rs.3.rs-6255613/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/08/2025]
Abstract
During infection with Plasmodium falciparum in pregnancy, parasites express a unique virulence factor, VAR2CSA, that mediates binding of infected red blood cells to the placenta. A major goal in designing vaccines to protect pregnant women from malaria is to elicit antibodies to VAR2CSA. The challenge is that VAR2CSA is highly polymorphic and identifying conserved epitopes is essential to elicit strain-transcending immunity. Unexpectedly, a mouse monoclonal antibody, 3D10, raised against the unrelated Duffy binding protein from P. vivax (DBPII) cross-reacts with diverse alleles of VAR2CSA in vitro. To identify these potentially conserved epitopes in VAR2CSA, we designed a machine learning framework to analyse 3D10 reactivity to peptides derived from two alleles of VAR2CSA, DBPII, and PvEBP2 (negative control). We used decision trees and a panel of 430 features to extract features correlated to 3D10 binding. We analysed patterns of these features in the dataset and designed mutant peptides to test complex sequence motifs. Features associated with 3D10 reactivity were mapped onto predicted 3D structures of Plasmodium proteins and validated based on 3D10 reactivity to the recombinant antigens. While the array data identified certain linear epitopes, the framework predicted other epitopes that are conformational. With this approach, peptide array data can be mined to extract physicochemical properties of epitopes recognized by polyreactive antibodies.
Collapse
|
4
|
Bekker G, Nagao C, Shirota M, Nakamura T, Katayama T, Kihara D, Kinoshita K, Kurisu G. Protein Data Bank Japan: Improved tools for sequence-oriented analysis of protein structures. Protein Sci 2025; 34:e70052. [PMID: 39969112 PMCID: PMC11837027 DOI: 10.1002/pro.70052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 01/19/2025] [Accepted: 01/23/2025] [Indexed: 02/20/2025]
Abstract
Protein Data Bank Japan (PDBj) is the Asian hub of three-dimensional macromolecular structure data, and a founding member of the worldwide Protein Data Bank. We have accepted, processed, and distributed experimentally determined biological macromolecular structures for over two decades. Although we collaborate with RCSB PDB and BMRB in the United States, PDBe and EMDB in Europe and recently PDBc in China for our data-in activities, we have developed our own unique services and tools for searching, exploring, visualizing and analyzing protein structures. We have recently introduced a new UniProt-integrated portal to provide users with a quick overview of their target protein and shows a recommended structure with integrated data from various internal and external resources. The portal page helps users identify known genomic variations of their protein of interest and provide insights into how these modifications might impact the structure, stability and dynamics of the protein. Furthermore, the portal page also helps users to select the optimal structure to use for further analysis. We have also introduced another service to explore proteins using experimental and computational approaches, which enables experimental structural biologists to increase their insight to help them to more efficiently design their experimental studies. With these new additions, we have enhanced our service portfolio to benefit both experimental and computational structural biologists in their search to interpret protein structures, their dynamics and function.
Collapse
Affiliation(s)
| | - Chioko Nagao
- Institute for Protein ResearchOsaka UniversitySuitaJapan
| | - Matsuyuki Shirota
- Tohoku Medical Megabank OrganizationTohoku UniversitySendaiJapan
- Advanced Research Center for Innovations in Next‐Generation MedicineTohoku UniversitySendaiJapan
- Graduate School of Information SciencesTohoku UniversitySendaiJapan
| | - Tsukasa Nakamura
- Department of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
- Structural Biology Research Center, Institute of Material Structure ScienceHigh Energy Accelerator Research OrganizationTsukubaJapan
| | - Toshiaki Katayama
- Institute for Protein ResearchOsaka UniversitySuitaJapan
- Database Center for Life Science, Joint Support‐Center for Data Science ResearchResearch Organization of Information and SystemsKashiwaJapan
| | - Daisuke Kihara
- Institute for Protein ResearchOsaka UniversitySuitaJapan
- Department of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
- Structural Biology Research Center, Institute of Material Structure ScienceHigh Energy Accelerator Research OrganizationTsukubaJapan
- Department of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Kengo Kinoshita
- Tohoku Medical Megabank OrganizationTohoku UniversitySendaiJapan
- Advanced Research Center for Innovations in Next‐Generation MedicineTohoku UniversitySendaiJapan
- Graduate School of Information SciencesTohoku UniversitySendaiJapan
| | - Genji Kurisu
- Institute for Protein ResearchOsaka UniversitySuitaJapan
- Protein Research FoundationMinohJapan
| |
Collapse
|
5
|
Bekker GJ, Nagao C, Shirota M, Nakamura T, Katayama T, Kihara D, Kinoshita K, Kurisu G. Protein Data Bank Japan: Computational Resources for Analysis of Protein Structures. J Mol Biol 2025:169013. [PMID: 40133793 DOI: 10.1016/j.jmb.2025.169013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 02/11/2025] [Accepted: 02/12/2025] [Indexed: 03/27/2025]
Abstract
Protein Data Bank Japan (PDBj, https://pdbj.org/) is the Asian hub of three-dimensional macromolecular structure data, and a founding member of the worldwide Protein Data Bank. We have accepted, processed, and distributed experimentally determined biological macromolecular structures for over two decades. Although we collaborate with RCSB PDB and BMRB in the United States, PDBe and EMDB in Europe and recently PDBc in China for our data-in activities, we have developed our own unique services and tools for searching, exploring, visualizing, and analyzing protein structures. We have also developed novel archives for computational data and raw crystal diffraction images. Recently, we introduced the Sequence Navigator Pro service to explore proteins using experimental and computational approaches, which enables experimental structural biologists to increase their insight to help them to design their experimental studies more efficiently. In addition, we also introduced a new UniProt-integrated portal to provide users with a quick overview of their target protein and it shows a recommended structure and integrates data from various internal and external resources. With these new additions, we have enhanced our service portfolio to benefit both experimental as computational structural biologists in their search to interpret protein structures, their dynamics and function.
Collapse
Affiliation(s)
- Gert-Jan Bekker
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan.
| | - Chioko Nagao
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Matsuyuki Shirota
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi 980-8573, Japan; Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, Sendai, Miyagi 980-8573, Japan; Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi 980-8579, Japan
| | - Tsukasa Nakamura
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA; Structural Biology Research Center, Institute of Material Structure Science, High Energy Accelerator Research Organization, 1-1 Oho, Tsukuba, Ibaraki 305-0801 Japan
| | - Toshiaki Katayama
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan; Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa, Chiba 277-0871, Japan
| | - Daisuke Kihara
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan; Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA; Structural Biology Research Center, Institute of Material Structure Science, High Energy Accelerator Research Organization, 1-1 Oho, Tsukuba, Ibaraki 305-0801 Japan; Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Kengo Kinoshita
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi 980-8573, Japan; Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, Sendai, Miyagi 980-8573, Japan; Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi 980-8579, Japan
| | - Genji Kurisu
- Institute for Protein Research, Osaka University, 3-2, Yamadaoka, Suita, Osaka 565-0871, Japan; Protein Research Foundation, Ina 4-1-2, Minoh, Osaka 562-8686, Japan.
| |
Collapse
|
6
|
Chatzimiltis S, Agathocleous M, Promponas VJ, Christodoulou C. Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings. Comput Struct Biotechnol J 2025; 27:243-251. [PMID: 39866664 PMCID: PMC11764030 DOI: 10.1016/j.csbj.2024.12.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 12/20/2024] [Accepted: 12/21/2024] [Indexed: 01/28/2025] Open
Abstract
Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem. In this paper, we deploy a Convolutional Neural Network (CNN) trained with the Subsampled Hessian Newton (SHN) method (a Hessian Free Optimisation variant), with a two- dimensional input representation of embeddings extracted from a language model pretrained with protein sequences. Utilising a CNN trained with the SHN method and the input embeddings, we achieved on average a 79.96% per residue (Q3) accuracy on the CB513 dataset and 81.45% Q3 accuracy on the PISCES dataset (without any post-processing techniques applied). The application of ensembles and filtering techniques to the results of the CNN improved the overall prediction performance. The Q3 accuracy on the CB513 increased to 93.65% and for the PISCES dataset to 87.13%. Moreover, our method was evaluated using the CASP13 dataset where we showed that as the post-processing window size increased, the prediction performance increased as well. In fact, with the biggest post-processing window size (limited by the smallest CASP13 protein), we achieved a Q3 accuracy of 98.12% and a Segment Overlap (SOV) score of 96.98 on the CASP13 dataset when the CNNs were trained with the PISCES dataset. Finally, we showed that input representations from embeddings can perform equally well as representations extracted from multiple sequence alignments.
Collapse
Affiliation(s)
- Sotiris Chatzimiltis
- University of Cyprus, Department of Computer Science, Nicosia, Cyprus
- 5G/6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, United Kingdom
| | - Michalis Agathocleous
- University of Cyprus, Department of Computer Science, Nicosia, Cyprus
- University of Nicosia, Department of Computer Science, Nicosia, Cyprus
| | | | | |
Collapse
|
7
|
Uddin MN, Mia MA, Akter Y, Chowdhury MAB, Rahman MH, Siddiqua H, Shathi US, Al-Mamun A, Siddika F, Marzan LW. Variations in Furin SNPs, a Major Concern of SARS-CoV-2 Susceptibility Among Different Populations: An In- Silico Approach. Bioinform Biol Insights 2024; 18:11779322241306388. [PMID: 39703750 PMCID: PMC11656424 DOI: 10.1177/11779322241306388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 11/25/2024] [Indexed: 12/21/2024] Open
Abstract
COVID-19 caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) had an adverse effect globally because it caused a global pandemic with several million deaths. This virus possesses spike protein that is cleaved or activated by Furin-like protease enzymes occurring by mammalian lung or respiratory cells to enter the mammalian body. The addition of the Furin cleavage site in SARS-CoV-2 makes it a more infectious and emerging virus than its ancestor's viruses. Phylogenetic relationships of coronavirus spike proteins have analyzed and mapped Furin recognition motif on the tree using bioinformatics tools such as GTEx, KEGG, GO, NCBI, PolyPhen-2, SNAP2, PANTHER, Hidden Markov Models (Fathmm), Phd-single-nucleotide polymorphism (SNP), I-TASSER, Modpred, Phobius, SIFT, iPTREE-STAB, and PROVEAN. During this study, it has been found that in certain regions, Furin SNPs have some relation with the susceptibility to SARS-CoV-2. Whereas in other regions, the effects are very negligible. Finally, our study demonstrates that Furin SNPs have a strong relationship with susceptibility to SARS-CoV-2. As it helps to cleave the spike protein of the virus, thus it can be targeted to inhibit at a particular site to prevent the SARS-CoV-2 from the entrance into the body.
Collapse
Affiliation(s)
- Md Nasir Uddin
- Laboratory of Microbial Genomics and Metabolic Engineering, Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | - Md Arzo Mia
- Laboratory of Microbial Genomics and Metabolic Engineering, Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | - Yasmin Akter
- Laboratory of Microbial Genomics and Metabolic Engineering, Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | - Mohammad Al-baruni Chowdhury
- Laboratory of Microbial Genomics and Metabolic Engineering, Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | - Md Hadisur Rahman
- Molecular Biotechnology Division, National Institute of Biotechnology, Savar, Bangladesh
| | - Hafsa Siddiqua
- Laboratory of Microbial Genomics and Metabolic Engineering, Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | - Umme Salma Shathi
- Laboratory of Microbial Genomics and Metabolic Engineering, Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | - Abdullah Al-Mamun
- Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr, b), Dhaka, Bangladesh
| | - Farida Siddika
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | - Lolo Wal Marzan
- Laboratory of Microbial Genomics and Metabolic Engineering, Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| |
Collapse
|
8
|
da Rocha W, Liberti L, Mucherino A, Malliavin TE. Influence of Stereochemistry in a Local Approach for Calculating Protein Conformations. J Chem Inf Model 2024; 64:8999-9008. [PMID: 39560315 DOI: 10.1021/acs.jcim.4c01232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2024]
Abstract
Protein structure prediction is generally based on the use of local conformational information coupled with long-range distance restraints. Such restraints can be derived from the knowledge of a template structure or the analysis of protein sequence alignment in the framework of models arising from the physics of disordered systems. The accuracy of approaches based on sequence alignment, however, is limited in the case where the number of aligned sequences is small. Here, we derive protein conformations using only local conformations knowledge by means of the interval Branch-and-Prune algorithm. The computation efficiency is directly related to the knowledge of stereochemistry (bond angle and ω values) along the protein sequence and, in particular, to the variations of the torsion angle ω. The impact of stereochemistry variations is particularly strong in the case of protein topologies defined from numerous long-range restraints, as in the case of protein of β secondary structures. The systematic enumeration of the conformations improves the efficiency of the calculations. The analysis of DNA codons permits to connect the variations of torsion angle ω to the positions of rare DNA codons.
Collapse
Affiliation(s)
- Wagner da Rocha
- LIX CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau 91128, France
| | - Leo Liberti
- LIX CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau 91128, France
| | | | - Thérèse E Malliavin
- LPCT, UMR 7019 Université de Lorraine CNRS, Vandoeuvre-lès-Nancy 54500, France
| |
Collapse
|
9
|
Puppala A, Sosa D, Castillo Suchkou J, French R, Dobosz-Bartoszek M, Kiernan K, Simonović M. Human selenocysteine synthase, SEPSECS, has evolved to optimize binding of a tRNA-based substrate. Nucleic Acids Res 2024; 52:13368-13385. [PMID: 39385655 PMCID: PMC11602143 DOI: 10.1093/nar/gkae875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 09/17/2024] [Accepted: 09/24/2024] [Indexed: 10/12/2024] Open
Abstract
The evolution of the genetic code to incorporate selenocysteine (Sec) enabled the development of a selenoproteome in all domains of life. O-phosphoseryl-tRNASec selenium transferase (SepSecS) catalyzes the terminal reaction of Sec synthesis on tRNASec in archaea and eukaryotes. Despite harboring four equivalent active sites, human SEPSECS binds no more than two tRNASec molecules. Though, the basis for this asymmetry remains poorly understood. In humans, an acidic, C-terminal, α-helical extension precludes additional tRNA-binding events in two of the enzyme monomers, stabilizing the SEPSECS•tRNASec complex. However, the existence of a helix exclusively in vertebrates raised questions about the evolution of the tRNA-binding mechanism in SEPSECS and the origin of its C-terminal extension. Herein, using a comparative structural and phylogenetic analysis, we show that the tRNA-binding motifs in SEPSECS are poorly conserved across species. Consequently, in contrast to mammalian SEPSECS, the archaeal ortholog cannot bind unacylated tRNASec and requires an aminoacyl group. Moreover, the C-terminal α-helix 16 is a mammalian innovation, and its absence causes aggregation of the SEPSECS•tRNASec complex at low tRNA concentrations. Altogether, we propose SEPSECS evolved a tRNASec binding mechanism as a crucial functional and structural feature, allowing for additional levels of regulation of Sec and selenoprotein synthesis.
Collapse
Affiliation(s)
- Anupama K Puppala
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Dylan Sosa
- Department of Ecology & Evolution, University of Chicago, Chicago, IL 60637, USA
| | - Jennifer Castillo Suchkou
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Rachel L French
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Malgorzata Dobosz-Bartoszek
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Kaitlyn A Kiernan
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Miljan Simonović
- Department of Biochemistry and Molecular Genetics, University of Illinois at Chicago, Chicago, IL 60607, USA
| |
Collapse
|
10
|
He Y, Wang S, Zeng S, Zhu J, Xu D, Han W, Wang J. NRIMD, a Web Server for Analyzing Protein Allosteric Interactions Based on Molecular Dynamics Simulation. J Chem Inf Model 2024; 64:7176-7183. [PMID: 38991149 DOI: 10.1021/acs.jcim.4c00783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Long-range allosteric communication between distant sites and active sites in proteins is central to biological regulation but still poorly characterized, limiting the development of protein engineering and drug design. Addressing this gap, NRIMD is an open-access web server for analyzing long-range interactions in proteins from molecular dynamics (MD) simulations, such as the effect of mutations at distal sites or allosteric ligand binding at allosteric sites on the active center. Based on our recent works on neural relational inference using graph neural networks, this cloud-based web server accepts MD simulation data on any length of residues in the alpha-carbon skeleton format from mainstream MD software. The input trajectory data are validated at the frontend deployed on the cloud and then processed on the backend deployed on a high-performance computer system with a collection of complementary tools. The web server provides a one-stop-shop MD analysis platform to predict long-range interactions and their paths between distant sites and active sites. It provides a user-friendly interface for detailed analysis and visualization. To the best of our knowledge, NRIMD is the first-of-its-kind online service to provide comprehensive long-range interaction analysis on MD simulations, which significantly lowers the barrier of predictions on protein long-range interactions using deep learning. The NRIMD web server is publicly available at https://nrimd.luddy.indianapolis.iu.edu/.
Collapse
Affiliation(s)
- Yi He
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, Changchun 130012, China
| | - Shuang Wang
- Department of Computer Science, Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana 47405, United States
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, United States
| | - Jingxuan Zhu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, Changchun 130012, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, United States
| | - Weiwei Han
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, Changchun 130012, China
| | - Juexin Wang
- Department of BioHealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis, Indianapolis, Indiana 46202, United States
| |
Collapse
|
11
|
Tsour S, Machne R, Leduc A, Widmer S, Guez J, Karczewski K, Slavov N. Alternate RNA decoding results in stable and abundant proteins in mammals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.26.609665. [PMID: 39253435 PMCID: PMC11383030 DOI: 10.1101/2024.08.26.609665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Amino acid substitutions may substantially alter protein stability and function, but the contribution of substitutions arising from alternate translation (deviations from the genetic code) is unknown. To explore it, we analyzed deep proteomic and transcriptomic data from over 1,000 human samples, including 6 cancer types and 26 healthy human tissues. This global analysis identified 60,024 high confidence substitutions corresponding to 8,801 unique sites in proteins derived from 1,990 genes. Some substitutions are shared across samples, while others exhibit strong tissue-type and cancer specificity. Surprisingly, products of alternate translation are more abundant than their canonical counterparts for hundreds of proteins, suggesting sense codon recoding. Recoded proteins include transcription factors, proteases, signaling proteins, and proteins associated with neurodegeneration. Mechanisms contributing to substitution abundance include protein stability, codon frequency, codon-anticodon mismatches, and RNA modifications. We characterize sequence motifs around alternatively translated amino acids and how substitution ratios vary across protein domains, tissue types and cancers. The substitution ratios are positively associated with intrinsically disordered regions and genetic polymorphisms in gnomAD, though the polymorphisms cannot account for the substitutions. Both the sequence and the tissue-specificity of alternatively translated proteins are conserved between human and mouse. These results demonstrate the contribution of alternate translation to diversifying mammalian proteomes, and its association with protein stability, tissue-specific proteomes, and diseases.
Collapse
Affiliation(s)
- Shira Tsour
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Alnylam Pharmaceuticals, Cambridge, MA, USA
| | - Rainer Machne
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Simon Widmer
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Jeremy Guez
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Konrad Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
| |
Collapse
|
12
|
Gao Y, Zhu S, Li H, Hao X, Chen W, Pan D, Qian Z. AntigenBoost: enhanced mRNA-based antigen expression through rational amino acid substitution. Brief Bioinform 2024; 25:bbae468. [PMID: 39400114 PMCID: PMC11472322 DOI: 10.1093/bib/bbae468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 07/29/2024] [Accepted: 09/09/2024] [Indexed: 10/15/2024] Open
Abstract
Messenger RNA (mRNA) vaccines represent a groundbreaking advancement in immunology and public health, particularly highlighted by their role in combating the COVID-19 pandemic. Optimizing mRNA-based antigen expression is a crucial focus in this emerging industry. We have developed a bioinformatics tool named AntigenBoost to address the challenge posed by destabilizing dipeptides that hinder ribosomal translation. AntigenBoost identifies these dipeptides within specific antigens and provides a range of potential amino acid substitution strategies using a two-dimensional scoring system. Through a combination of bioinformatics analysis and experimental validation, we significantly enhanced the in vitro expression of mRNA-derived Respiratory Syncytial Virus fusion glycoprotein and Influenza A Hemagglutinin antigen. Notably, a single amino acid substitution improved the immune response in mice, underscoring the effectiveness of AntigenBoost in mRNA vaccine design.
Collapse
Affiliation(s)
- Yumiao Gao
- NanoRibo (Shanghai) Biotechnology Co., Ltd., No. 1188 Lianhang Road, Minhang District, Shanghai 200003, China
| | - Siran Zhu
- NanoRibo (Shanghai) Biotechnology Co., Ltd., No. 1188 Lianhang Road, Minhang District, Shanghai 200003, China
| | - Huichun Li
- NanoRibo (Shanghai) Biotechnology Co., Ltd., No. 1188 Lianhang Road, Minhang District, Shanghai 200003, China
| | - Xueting Hao
- NanoRibo (Shanghai) Biotechnology Co., Ltd., No. 1188 Lianhang Road, Minhang District, Shanghai 200003, China
| | - Wen Chen
- NanoRibo (Shanghai) Biotechnology Co., Ltd., No. 1188 Lianhang Road, Minhang District, Shanghai 200003, China
| | - Deng Pan
- NanoRibo (Shanghai) Biotechnology Co., Ltd., No. 1188 Lianhang Road, Minhang District, Shanghai 200003, China
| | - Zhikang Qian
- NanoRibo (Shanghai) Biotechnology Co., Ltd., No. 1188 Lianhang Road, Minhang District, Shanghai 200003, China
| |
Collapse
|
13
|
Boshar S, Trop E, de Almeida BP, Copoiu L, Pierrot T. Are genomic language models all you need? Exploring genomic language models on protein downstream tasks. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae529. [PMID: 39212609 PMCID: PMC11399231 DOI: 10.1093/bioinformatics/btae529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 08/20/2024] [Accepted: 08/28/2024] [Indexed: 09/04/2024]
Abstract
MOTIVATION Large language models, trained on enormous corpora of biological sequences, are state-of-the-art for downstream genomic and proteomic tasks. Since the genome contains the information to encode all proteins, genomic language models (gLMs) hold the potential to make downstream predictions not only about DNA sequences, but also about proteins. However, the performance of gLMs on protein tasks remains unknown, due to few tasks pairing proteins with the coding DNA sequences (CDS) that can be processed by gLMs. RESULTS In this work, we curated five such datasets and used them to evaluate the performance of gLMs and proteomic language models (pLMs). We show that gLMs are competitive and even outperform their pLMs counterparts on some tasks. The best performance was achieved using the retrieved CDS compared to sampling strategies. We found that training a joint genomic-proteomic model outperforms each individual approach, showing that they capture different but complementary sequence representations, as we demonstrate through model interpretation of their embeddings. Lastly, we explored different genomic tokenization schemes to improve downstream protein performance. We trained a new Nucleotide Transformer (50M) foundation model with 3mer tokenization that outperforms its 6mer counterpart on protein tasks while maintaining performance on genomics tasks. The application of gLMs to proteomics offers the potential to leverage rich CDS data, and in the spirit of the central dogma, the possibility of a unified and synergistic approach to genomics and proteomics. AVAILABILITY AND IMPLEMENTATION We make our inference code, 3mer pre-trained model weights and datasets available.
Collapse
Affiliation(s)
- Sam Boshar
- InstaDeep, Cambridge, MA 02142, United States
| | - Evan Trop
- InstaDeep, Cambridge, MA 02142, United States
| | | | | | | |
Collapse
|
14
|
Welp LM, Sachsenberg T, Wulf A, Chernev A, Horokhovskyi Y, Neumann P, Pašen M, Siraj A, Raabe M, Johannsson S, Schmitzova J, Netz E, Pfeuffer J, He Y, Fritzemeier K, Delanghe B, Viner R, Vos SM, Cramer P, Ficner R, Liepe J, Kohlbacher O, Urlaub H. Chemical crosslinking extends and complements UV crosslinking in analysis of RNA/DNA nucleic acid-protein interaction sites by mass spectrometry. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.29.610268. [PMID: 39257782 PMCID: PMC11383681 DOI: 10.1101/2024.08.29.610268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
UV (ultra-violet) crosslinking with mass spectrometry (XL-MS) has been established for identifying RNA-and DNA-binding proteins along with their domains and amino acids involved. Here, we explore chemical XL-MS for RNA-protein, DNA-protein, and nucleotide-protein complexes in vitro and in vivo . We introduce a specialized nucleotide-protein-crosslink search engine, NuXL, for robust and fast identification of such crosslinks at amino acid resolution. Chemical XL-MS complements UV XL-MS by generating different crosslink species, increasing crosslinked protein yields in vivo almost four-fold and thus it expands the structural information accessible via XL-MS. Our workflow facilitates integrative structural modelling of nucleic acid-protein complexes and adds spatial information to the described RNA-binding properties of enzymes, for which crosslinking sites are often observed close to their cofactor-binding domains. In vivo UV and chemical XL-MS data from E. coli cells analysed by NuXL establish a comprehensive nucleic acid-protein crosslink inventory with crosslink sites at amino acid level for more than 1500 proteins. Our new workflow combined with the dedicated NuXL search engine identified RNA crosslinks that cover most RNA-binding proteins, with DNA and RNA crosslinks detected in transcriptional repressors and activators.
Collapse
|
15
|
Hummel NFC, Markel K, Stefani J, Staller MV, Shih PM. Systematic identification of transcriptional activation domains from non-transcription factor proteins in plants and yeast. Cell Syst 2024; 15:662-672.e4. [PMID: 38866009 DOI: 10.1016/j.cels.2024.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/26/2024] [Accepted: 05/22/2024] [Indexed: 06/14/2024]
Abstract
Transcription factors can promote gene expression through activation domains. Whole-genome screens have systematically mapped activation domains in transcription factors but not in non-transcription factor proteins (e.g., chromatin regulators and coactivators). To fill this knowledge gap, we employed the activation domain predictor PADDLE to analyze the proteomes of Arabidopsis thaliana and Saccharomyces cerevisiae. We screened 18,000 predicted activation domains from >800 non-transcription factor genes in both species, confirming that 89% of candidate proteins contain active fragments. Our work enables the annotation of hundreds of nuclear proteins as putative coactivators, many of which have never been ascribed any function in plants. Analysis of peptide sequence compositions reveals how the distribution of key amino acids dictates activity. Finally, we validated short, "universal" activation domains with comparable performance to state-of-the-art activation domains used for genome engineering. Our approach enables the genome-wide discovery and annotation of activation domains that can function across diverse eukaryotes.
Collapse
Affiliation(s)
- Niklas F C Hummel
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Department of Biology, Technische Universität Darmstadt, 64287 Darmstadt, Germany
| | - Kasey Markel
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jordan Stefani
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Max V Staller
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA; Center for Computational Biology, University of California, Berkeley, CA 94720, USA; Chan Zuckerberg Biohub-San Francisco, San Francisco, CA 9415, USA.
| | - Patrick M Shih
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA; Feedstocks Division, Joint BioEnergy Institute, Emeryville, CA 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA.
| |
Collapse
|
16
|
Buchan DWA, Moffat L, Lau A, Kandathil S, Jones D. Deep learning for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res 2024; 52:W287-W293. [PMID: 38747351 PMCID: PMC11223827 DOI: 10.1093/nar/gkae328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/08/2024] [Accepted: 04/24/2024] [Indexed: 07/06/2024] Open
Abstract
The PSIRED Workbench is a long established and popular bioinformatics web service offering a wide range of machine learning based analyses for characterizing protein structure and function. In this paper we provide an update of the recent additions and developments to the webserver, with a focus on new Deep Learning based methods. We briefly discuss some trends in server usage since the publication of AlphaFold2 and we give an overview of some upcoming developments for the service. The PSIPRED Workbench is available at http://bioinf.cs.ucl.ac.uk/psipred.
Collapse
Affiliation(s)
- Daniel W A Buchan
- UCL Bioinformatics Group, Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Lewis Moffat
- UCL Bioinformatics Group, Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Andy Lau
- UCL Bioinformatics Group, Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Shaun M Kandathil
- UCL Bioinformatics Group, Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - David T Jones
- UCL Bioinformatics Group, Department of Computer Science, University College London, London, WC1E 6BT, UK
| |
Collapse
|
17
|
Wan F, Torres MDT, Peng J, de la Fuente-Nunez C. Deep-learning-enabled antibiotic discovery through molecular de-extinction. Nat Biomed Eng 2024; 8:854-871. [PMID: 38862735 PMCID: PMC11310081 DOI: 10.1038/s41551-024-01201-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 03/25/2024] [Indexed: 06/13/2024]
Abstract
Molecular de-extinction aims at resurrecting molecules to solve antibiotic resistance and other present-day biological and biomedical problems. Here we show that deep learning can be used to mine the proteomes of all available extinct organisms for the discovery of antibiotic peptides. We trained ensembles of deep-learning models consisting of a peptide-sequence encoder coupled with neural networks for the prediction of antimicrobial activity and used it to mine 10,311,899 peptides. The models predicted 37,176 sequences with broad-spectrum antimicrobial activity, 11,035 of which were not found in extant organisms. We synthesized 69 peptides and experimentally confirmed their activity against bacterial pathogens. Most peptides killed bacteria by depolarizing their cytoplasmic membrane, contrary to known antimicrobial peptides, which tend to target the outer membrane. Notably, lead compounds (including mammuthusin-2 from the woolly mammoth, elephasin-2 from the straight-tusked elephant, hydrodamin-1 from the ancient sea cow, mylodonin-2 from the giant sloth and megalocerin-1 from the extinct giant elk) showed anti-infective activity in mice with skin abscess or thigh infections. Molecular de-extinction aided by deep learning may accelerate the discovery of therapeutic molecules.
Collapse
Affiliation(s)
- Fangping Wan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Marcelo D T Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacqueline Peng
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
18
|
Delic S, Shuman B, Lee S, Bahmanyar S, Momany M, Onishi M. The evolutionary origins and ancestral features of septins. Front Cell Dev Biol 2024; 12:1406966. [PMID: 38994454 PMCID: PMC11238149 DOI: 10.3389/fcell.2024.1406966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 05/08/2024] [Indexed: 07/13/2024] Open
Abstract
Septins are a family of membrane-associated cytoskeletal guanine-nucleotide binding proteins that play crucial roles in various cellular processes, such as cell division, phagocytosis, and organelle fission. Despite their importance, the evolutionary origins and ancestral function of septins remain unclear. In opisthokonts, septins form five distinct groups of orthologs, with subunits from multiple groups assembling into heteropolymers, thus supporting their diverse molecular functions. Recent studies have revealed that septins are also conserved in algae and protists, indicating an ancient origin from the last eukaryotic common ancestor. However, the phylogenetic relationships among septins across eukaryotes remained unclear. Here, we expanded the list of non-opisthokont septins, including previously unrecognized septins from glaucophyte algae. Constructing a rooted phylogenetic tree of 254 total septins, we observed a bifurcation between the major non-opisthokont and opisthokont septin clades. Within the non-opisthokont septins, we identified three major subclades: Group 6 representing chlorophyte green algae (6A mostly for species with single septins, 6B for species with multiple septins), Group 7 representing algae in chlorophytes, heterokonts, haptophytes, chrysophytes, and rhodophytes, and Group 8 representing ciliates. Glaucophyte and some ciliate septins formed orphan lineages in-between all other septins and the outgroup. Combining ancestral-sequence reconstruction and AlphaFold predictions, we tracked the structural evolution of septins across eukaryotes. In the GTPase domain, we identified a conserved GAP-like arginine finger within the G-interface of at least one septin in most algal and ciliate species. This residue is required for homodimerization of the single Chlamydomonas septin, and its loss coincided with septin duplication events in various lineages. The loss of the arginine finger is often accompanied by the emergence of the α0 helix, a known NC-interface interaction motif, potentially signifying the diversification of septin-septin interaction mechanisms from homo-dimerization to hetero-oligomerization. Lastly, we found amphipathic helices in all septin groups, suggesting that membrane binding is an ancestral trait. Coiled-coil domains were also broadly distributed, while transmembrane domains were found in some septins in Group 6A and 7. In summary, this study advances our understanding of septin distribution and phylogenetic groupings, shedding light on their ancestral features, potential function, and early evolution.
Collapse
Affiliation(s)
- Samed Delic
- Department of Biology, Duke University, Durham, NC, United States
| | - Brent Shuman
- Fungal Biology Group and Plant Biology Department, University of Georgia, Athens, GA, United States
| | - Shoken Lee
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT, United States
| | - Shirin Bahmanyar
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT, United States
| | - Michelle Momany
- Fungal Biology Group and Plant Biology Department, University of Georgia, Athens, GA, United States
| | - Masayuki Onishi
- Department of Biology, Duke University, Durham, NC, United States
| |
Collapse
|
19
|
Delic S, Shuman B, Lee S, Bahmanyar S, Momany M, Onishi M. The Evolutionary Origins and Ancestral Features of Septins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.25.586683. [PMID: 38585751 PMCID: PMC10996617 DOI: 10.1101/2024.03.25.586683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Septins are a family of membrane-associated cytoskeletal GTPases that play crucial roles in various cellular processes, such as cell division, phagocytosis, and organelle fission. Despite their importance, the evolutionary origins and ancestral function of septins remain unclear. In opisthokonts, septins form five distinct groups of orthologs, with subunits from multiple groups assembling into heteropolymers, thus supporting their diverse molecular functions. Recent studies have revealed that septins are also conserved in algae and protists, indicating an ancient origin from the last eukaryotic common ancestor. However, the phylogenetic relationships among septins across eukaryotes remained unclear. Here, we expanded the list of non-opisthokont septins, including previously unrecognized septins from rhodophyte red algae and glaucophyte algae. Constructing a rooted phylogenetic tree of 254 total septins, we observed a bifurcation between the major non-opisthokont and opisthokont septin clades. Within the non-opisthokont septins, we identified three major subclades: Group 6 representing chlorophyte green algae (6A mostly for species with single septins, 6B for species with multiple septins), Group 7 representing algae in chlorophytes, heterokonts, haptophytes, chrysophytes, and rhodophytes, and Group 8 representing ciliates. Glaucophyte and some ciliate septins formed orphan lineages in-between all other septins and the outgroup. Combining ancestral-sequence reconstruction and AlphaFold predictions, we tracked the structural evolution of septins across eukaryotes. In the GTPase domain, we identified a conserved GAP-like arginine finger within the G-interface of at least one septin in most algal and ciliate species. This residue is required for homodimerization of the single Chlamydomonas septin, and its loss coincided with septin duplication events in various lineages. The loss of the arginine finger is often accompanied by the emergence of the α0 helix, a known NC-interface interaction motif, potentially signifying the diversification of septin-septin interaction mechanisms from homo-dimerization to hetero-oligomerization. Lastly, we found amphipathic helices in all septin groups, suggesting that curvature-sensing is an ancestral trait of septin proteins. Coiled-coil domains were also broadly distributed, while transmembrane domains were found in some septins in Group 6A and 7. In summary, this study advances our understanding of septin distribution and phylogenetic groupings, shedding light on their ancestral features, potential function, and early evolution.
Collapse
Affiliation(s)
- Samed Delic
- Department of Biology, Duke University, Durham, North Carolina, USA
| | - Brent Shuman
- Fungal Biology Group and Plant Biology Department, University of Georgia, Athens, Georgia, USA
| | - Shoken Lee
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, Connecticut, USA
| | - Shirin Bahmanyar
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, Connecticut, USA
| | - Michelle Momany
- Fungal Biology Group and Plant Biology Department, University of Georgia, Athens, Georgia, USA
| | - Masayuki Onishi
- Department of Biology, Duke University, Durham, North Carolina, USA
| |
Collapse
|
20
|
Ma H, Jiang F, Rong Y, Guo Y, Huang J. Toward Robust Self-Training Paradigm for Molecular Prediction Tasks. J Comput Biol 2024; 31:213-228. [PMID: 38531049 DOI: 10.1089/cmb.2023.0187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024] Open
Abstract
Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.
Collapse
Affiliation(s)
- Hehuan Ma
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
| | - Feng Jiang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
| | - Yu Rong
- Tecent AI Lab, Shenzhen, China
| | - Yuzhi Guo
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
| | - Junzhou Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
| |
Collapse
|
21
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
22
|
Hummel NFC, Markel K, Stefani J, Staller MV, Shih PM. Systematic identification of transcriptional activator domains from non-transcription factor proteins in plants and yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.12.557247. [PMID: 37745555 PMCID: PMC10515812 DOI: 10.1101/2023.09.12.557247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Transcription factors promote gene expression via trans-regulatory activation domains. Although whole genome scale screens in model organisms (e.g. human, yeast, fly) have helped identify activation domains from transcription factors, such screens have been less extensively used to explore the occurrence of activation domains in non-transcription factor proteins, such as transcriptional coactivators, chromatin regulators and some cytosolic proteins, leaving a blind spot on what role activation domains in these proteins could play in regulating transcription. We utilized the activation domain predictor PADDLE to mine the entire proteomes of two model eukaryotes, Arabidopsis thaliana and Saccharomyces cerevisiae ( 1 ). We characterized 18,000 fragments covering predicted activation domains from >800 non-transcription factor genes in both species, and experimentally validated that 89% of proteins contained fragments capable of activating transcription in yeast. Peptides with similar sequence composition show a broad range of activities, which is explained by the arrangement of key amino acids. We also annotated hundreds of nuclear proteins with activation domains as putative coactivators; many of which have never been ascribed any function in plants. Furthermore, our library contains >250 non-nuclear proteins containing peptides with activation domain function across both eukaryotic lineages, suggesting that there are unknown biological roles of these peptides beyond transcription. Finally, we identify and validate short, 'universal' eukaryotic activation domains that activate transcription in both yeast and plants with comparable or stronger performance to state-of-the-art activation domains. Overall, our dual host screen provides a blueprint on how to systematically discover novel genetic parts for synthetic biology that function across a wide diversity of eukaryotes. Significance Statement Activation domains promote transcription and play a critical role in regulating gene expression. Although the mapping of activation domains from transcription factors has been carried out in previous genome-wide screens, their occurrence in non-transcription factors has been less explored. We utilize an activation domain predictor to mine the entire proteomes of Arabidopsis thaliana and Saccharomyces cerevisiae for new activation domains on non-transcription factor proteins. We validate peptides derived from >750 non-transcription factor proteins capable of activating transcription, discovering many potentially new coactivators in plants. Importantly, we identify novel genetic parts that can function across both species, representing unique synthetic biology tools.
Collapse
|
23
|
Kandathil SM, Lau AM, Jones DT. Machine learning methods for predicting protein structure from single sequences. Curr Opin Struct Biol 2023; 81:102627. [PMID: 37320955 DOI: 10.1016/j.sbi.2023.102627] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/17/2023] [Accepted: 05/17/2023] [Indexed: 06/17/2023]
Abstract
Recent breakthroughs in protein structure prediction have increasingly relied on the use of deep neural networks. These recent methods are notable in that they produce 3-D atomic coordinates as a direct output of the networks, a feature which presents many advantages. Although most techniques of this type make use of multiple sequence alignments as their primary input, a new wave of methods have attempted to use just single sequences as the input. We discuss the make-up and operating principles of these models, and highlight new developments in these areas, as well as areas for future development.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Andy M Lau
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
| |
Collapse
|
24
|
Kesner JS, Chen Z, Shi P, Aparicio AO, Murphy MR, Guo Y, Trehan A, Lipponen JE, Recinos Y, Myeku N, Wu X. Noncoding translation mitigation. Nature 2023; 617:395-402. [PMID: 37046090 PMCID: PMC10560126 DOI: 10.1038/s41586-023-05946-4] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 03/13/2023] [Indexed: 04/14/2023]
Abstract
Translation is pervasive outside of canonical coding regions, occurring in long noncoding RNAs, canonical untranslated regions and introns1-4, especially in ageing4-6, neurodegeneration5,7 and cancer8-10. Notably, the majority of tumour-specific antigens are results of noncoding translation11-13. Although the resulting polypeptides are often nonfunctional, translation of noncoding regions is nonetheless necessary for the birth of new coding sequences14,15. The mechanisms underlying the surveillance of translation in diverse noncoding regions and how escaped polypeptides evolve new functions remain unclear10,16-19. Functional polypeptides derived from annotated noncoding sequences often localize to membranes20,21. Here we integrate massively parallel analyses of more than 10,000 human genomic sequences and millions of random sequences with genome-wide CRISPR screens, accompanied by in-depth genetic and biochemical characterizations. Our results show that the intrinsic nucleotide bias in the noncoding genome and in the genetic code frequently results in polypeptides with a hydrophobic C-terminal tail, which is captured by the ribosome-associated BAG6 membrane protein triage complex for either proteasomal degradation or membrane targeting. By contrast, canonical proteins have evolved to deplete C-terminal hydrophobic residues. Our results reveal a fail-safe mechanism for the surveillance of unwanted translation from diverse noncoding regions and suggest a possible biochemical route for the preferential membrane localization of newly evolved proteins.
Collapse
Affiliation(s)
- Jordan S Kesner
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Ziheng Chen
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Peiguo Shi
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Alexis O Aparicio
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Michael R Murphy
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Yang Guo
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Aditi Trehan
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Jessica E Lipponen
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Yocelyn Recinos
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Natura Myeku
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Xuebing Wu
- Cardiometabolic Genomics Program, Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
25
|
Chatterjee A, Walters R, Shafi Z, Ahmed OS, Sebek M, Gysi D, Yu R, Eliassi-Rad T, Barabási AL, Menichetti G. Improving the generalizability of protein-ligand binding predictions with AI-Bind. Nat Commun 2023; 14:1989. [PMID: 37031187 PMCID: PMC10082765 DOI: 10.1038/s41467-023-37572-z] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 03/23/2023] [Indexed: 04/10/2023] Open
Abstract
Identifying novel drug-target interactions is a critical and rate-limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, here we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Here we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training to improve binding predictions for novel proteins and ligands. We validate AI-Bind predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. AI-Bind is a high-throughput approach to identify drug-target combinations with the potential of becoming a powerful tool in drug discovery.
Collapse
Affiliation(s)
- Ayan Chatterjee
- Network Science Institute, Northeastern University, Boston, MA, USA
| | - Robin Walters
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Zohair Shafi
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Omair Shafi Ahmed
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Michael Sebek
- Network Science Institute, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
| | - Deisy Gysi
- Network Science Institute, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, CA, USA
| | - Tina Eliassi-Rad
- Network Science Institute, Northeastern University, Boston, MA, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
- Santa Fe Institute, Santa Fe, NM, USA
- The Institute for Experiential AI, Northeastern University, Boston, MA, USA
| | - Albert-László Barabási
- Network Science Institute, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
- Department of Network and Data Science, Central European University, Budapest, Hungary
| | - Giulia Menichetti
- Network Science Institute, Northeastern University, Boston, MA, USA.
- Department of Physics, Northeastern University, Boston, MA, USA.
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
26
|
Andongma BT, Huang Y, Chen F, Tang Q, Yang M, Chou SH, Li X, He J. In silico design of a promiscuous chimeric multi-epitope vaccine against Mycobacterium tuberculosis. Comput Struct Biotechnol J 2023; 21:991-1004. [PMID: 36733703 PMCID: PMC9883148 DOI: 10.1016/j.csbj.2023.01.019] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 01/15/2023] [Accepted: 01/15/2023] [Indexed: 01/18/2023] Open
Abstract
Tuberculosis (TB) is a global health threat, killing approximately 1.5 million people each year. The eradication of Mycobacterium tuberculosis, the main causative agent of TB, is increasingly challenging due to the emergence of extensive drug-resistant strains. Vaccination is considered an effective way to protect the host from pathogens, but the only clinically approved TB vaccine, Bacillus Calmette-Guérin (BCG), has limited protection in adults. Multi-epitope vaccines have been found to enhance immunity to diseases by selectively combining epitopes from several candidate proteins. This study aimed to design a multi-epitope vaccine against TB using an immuno-informatics approach. Through functional enrichment, we identified eight proteins secreted by M. tuberculosis that are either required for pathogenesis, secreted into extracellular space, or both. We then analyzed the epitopes of these proteins and selected 16 helper T lymphocyte epitopes with interferon-γ inducing activity, 15 cytotoxic T lymphocyte epitopes, and 10 linear B-cell epitopes, and conjugated them with adjuvant and Pan HLA DR-binding epitope (PADRE) using appropriate linkers. Moreover, we predicted the tertiary structure of this vaccine, its potential interaction with Toll-Like Receptor-4 (TLR4), and the immune response it might elicit. The results showed that this vaccine had a strong affinity for TLR4, which could significantly stimulate CD4+ and CD8+ cells to secrete immune factors and B lymphocytes to secrete immunoglobulins, so as to obtain good humoral and cellular immunity. Overall, this multi-epitope protein was predicted to be stable, safe, highly antigenic, and highly immunogenic, which has the potential to serve as a global vaccine against TB.
Collapse
Affiliation(s)
- Binda T. Andongma
- State Key Laboratory of Agricultural Microbiology & Hubei Hongshan Laboratory, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Yazheng Huang
- State Key Laboratory of Agricultural Microbiology & Hubei Hongshan Laboratory, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Fang Chen
- State Key Laboratory of Agricultural Microbiology & Hubei Hongshan Laboratory, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Qing Tang
- State Key Laboratory of Agricultural Microbiology & Hubei Hongshan Laboratory, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Min Yang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430070, PR China
| | - Shan-Ho Chou
- State Key Laboratory of Agricultural Microbiology & Hubei Hongshan Laboratory, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China
| | - Xinfeng Li
- State Key Laboratory of Agricultural Microbiology & Hubei Hongshan Laboratory, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China,CAS Key Laboratory of Special Pathogens and Biosafety, Center for Biosafety Mega-Science, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, PR China,Correspondence to: The State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Street, Wuhan, Hubei 430070, PR China.
| | - Jin He
- State Key Laboratory of Agricultural Microbiology & Hubei Hongshan Laboratory, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, PR China,Correspondence to: The State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Street, Wuhan, Hubei 430070, PR China.
| |
Collapse
|
27
|
Ismi DP, Pulungan R, Afiahayati. Deep learning for protein secondary structure prediction: Pre and post-AlphaFold. Comput Struct Biotechnol J 2022; 20:6271-6286. [PMID: 36420164 PMCID: PMC9678802 DOI: 10.1016/j.csbj.2022.11.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 11/05/2022] [Accepted: 11/05/2022] [Indexed: 11/13/2022] Open
Abstract
This paper aims to provide a comprehensive review of the trends and challenges of deep neural networks for protein secondary structure prediction (PSSP). In recent years, deep neural networks have become the primary method for protein secondary structure prediction. Previous studies showed that deep neural networks had uplifted the accuracy of three-state secondary structure prediction to more than 80%. Favored deep learning methods, such as convolutional neural networks, recurrent neural networks, inception networks, and graph neural networks, have been implemented in protein secondary structure prediction. Methods adapted from natural language processing (NLP) and computer vision are also employed, including attention mechanism, ResNet, and U-shape networks. In the post-AlphaFold era, PSSP studies focus on different objectives, such as enhancing the quality of evolutionary information and exploiting protein language models as the PSSP input. The recent trend to utilize pre-trained language models as input features for secondary structure prediction provides a new direction for PSSP studies. Moreover, the state-of-the-art accuracy achieved by previous PSSP models is still below its theoretical limit. There are still rooms for improvement to be made in the field.
Collapse
Affiliation(s)
- Dewi Pramudi Ismi
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
- Department of Infomatics, Faculty of Industrial Technology, Universitas Ahmad Dahlan, Yogyakarta, Indonesia
| | - Reza Pulungan
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Afiahayati
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| |
Collapse
|
28
|
Burke PC, Park H, Subramaniam AR. A nascent peptide code for translational control of mRNA stability in human cells. Nat Commun 2022; 13:6829. [PMID: 36369503 PMCID: PMC9652226 DOI: 10.1038/s41467-022-34664-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 11/02/2022] [Indexed: 11/13/2022] Open
Abstract
Stability of eukaryotic mRNAs is associated with their codon, amino acid, and GC content. Yet, coding sequence motifs that predictably alter mRNA stability in human cells remain poorly defined. Here, we develop a massively parallel assay to measure mRNA effects of thousands of synthetic and endogenous coding sequence motifs in human cells. We identify several families of simple dipeptide repeats whose translation triggers mRNA destabilization. Rather than individual amino acids, specific combinations of bulky and positively charged amino acids are critical for the destabilizing effects of dipeptide repeats. Remarkably, dipeptide sequences that form extended β strands in silico and in vitro slowdown ribosomes and reduce mRNA levels in vivo. The resulting nascent peptide code underlies the mRNA effects of hundreds of endogenous peptide sequences in the human proteome. Our work suggests an intrinsic role for the ribosome as a selectivity filter against the synthesis of bulky and aggregation-prone peptides.
Collapse
Affiliation(s)
- Phillip C Burke
- Basic Sciences Division and Computational Biology Section of the Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Department of Microbiology, University of Washington, Seattle, WA, 98195, USA
| | - Heungwon Park
- Basic Sciences Division and Computational Biology Section of the Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Arvind Rasi Subramaniam
- Basic Sciences Division and Computational Biology Section of the Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA.
- Department of Microbiology, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
29
|
Schwabe J, Pérez-Burgos M, Herfurth M, Glatter T, Søgaard-Andersen L. Evidence for a Widespread Third System for Bacterial Polysaccharide Export across the Outer Membrane Comprising a Composite OPX/β-Barrel Translocon. mBio 2022; 13:e0203222. [PMID: 35972145 PMCID: PMC9601211 DOI: 10.1128/mbio.02032-22] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 07/25/2022] [Indexed: 11/20/2022] Open
Abstract
In Gram-negative bacteria, secreted polysaccharides have multiple critical functions. In Wzx/Wzy- and ABC transporter-dependent pathways, an outer membrane (OM) polysaccharide export (OPX) type translocon exports the polysaccharide across the OM. The paradigm OPX protein Wza of Escherichia coli is an octamer in which the eight C-terminal domains form an α-helical OM pore and the eight copies of the three N-terminal domains (D1 to D3) form a periplasmic cavity. In synthase-dependent pathways, the OM translocon is a 16- to 18-stranded β-barrel protein. In Myxococcus xanthus, the secreted polysaccharide EPS (exopolysaccharide) is synthesized in a Wzx/Wzy-dependent pathway. Here, using experiments, phylogenomics, and computational structural biology, we identify and characterize EpsX as an OM 18-stranded β-barrel protein important for EPS synthesis and identify AlgE, a β-barrel translocon of a synthase-dependent pathway, as its closest structural homolog. We also find that EpsY, the OPX protein of the EPS pathway, consists only of the periplasmic D1 and D2 domains and completely lacks the domain for spanning the OM (herein termed a D1D2OPX protein). In vivo, EpsX and EpsY mutually stabilize each other and interact in in vivo pulldown experiments supporting their direct interaction. Based on these observations, we propose that EpsY and EpsX make up and represent a third type of translocon for polysaccharide export across the OM. Specifically, in this composite translocon, EpsX functions as the OM-spanning β-barrel translocon together with the periplasmic D1D2OPX protein EpsY. Based on computational genomics, similar composite systems are widespread in Gram-negative bacteria. IMPORTANCE Bacteria secrete a wide variety of polysaccharides that have critical functions in, e.g., fitness, surface colonization, and biofilm formation and in beneficial and pathogenic human-, animal-, and plant-microbe interactions. In Gram-negative bacteria, export of these chemically diverse polysaccharides across the outer membrane depends on two known translocons, i.e., an outer membrane OPX protein in Wzx/Wzy- and ABC transporter-dependent pathways and an outer membrane 16- to 18-stranded β-barrel protein in synthase-dependent pathways. Here, using a combination of experiments in Myxococcus xanthus, phylogenomics, and computational structural biology, we provide evidence supporting that a third type of translocon can export polysaccharides across the outer membrane. Specifically, in this translocon, an outer membrane-spanning β-barrel protein functions together with an entirely periplasmic OPX protein that completely lacks the domain for spanning the OM. Computational genomics support that similar composite systems are widespread in Gram-negative bacteria.
Collapse
Affiliation(s)
- Johannes Schwabe
- Department of Ecophysiology, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - María Pérez-Burgos
- Department of Ecophysiology, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Marco Herfurth
- Department of Ecophysiology, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Timo Glatter
- Core Facility for Mass Spectrometry & Proteomics, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Lotte Søgaard-Andersen
- Department of Ecophysiology, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| |
Collapse
|
30
|
Rodrigues CHM, Garg A, Keizer D, Pires DEV, Ascher DB. CSM-peptides: A computational approach to rapid identification of therapeutic peptides. Protein Sci 2022; 31:e4442. [PMID: 36173168 PMCID: PMC9518225 DOI: 10.1002/pro.4442] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022]
Abstract
Peptides are attractive alternatives for the development of new therapeutic strategies due to their versatility and low complexity of synthesis. Increasing interest in these molecules has led to the creation of large collections of experimentally characterized therapeutic peptides, which greatly contributes to development of data-driven computational approaches. Here we propose CSM-peptides, a novel machine learning method for rapid identification of eight different types of therapeutic peptides: anti-angiogenic, anti-bacterial, anti-cancer, anti-inflammatory, anti-viral, cell-penetrating, quorum sensing, and surface binding. Our method has shown to outperform existing approaches, achieving an AUC of up to 0.92 on independent blind tests, and consistent performance on cross-validation. We anticipate CSM-peptides to be of great value in helping screening large libraries to identify novel peptides with therapeutic potential and have made it freely available as a user-friendly web server and Application Programming Interface at https://biosig.lab.uq.edu.au/csm_peptides.
Collapse
Affiliation(s)
- Carlos H. M. Rodrigues
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandSt LuciaQueenslandAustralia
| | - Anjali Garg
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
| | - David Keizer
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
| | - Douglas E. V. Pires
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Computing and Information SystemsUniversity of MelbourneMelbourneVictoriaAustralia
| | - David B. Ascher
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandSt LuciaQueenslandAustralia
| |
Collapse
|
31
|
Parmar M, Thumar R, Sheth J, Patel D. Designing multi-epitope based peptide vaccine targeting spike protein SARS-CoV-2 B1.1.529 (Omicron) variant using computational approaches. Struct Chem 2022; 33:2243-2260. [PMID: 36160688 PMCID: PMC9485025 DOI: 10.1007/s11224-022-02027-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 08/02/2022] [Indexed: 10/26/2022]
Abstract
Millions of lives have been infected since the SARS-CoV-2 outbreak in 2019. The high human-to-human transmission rate has warranted a need for a vaccine to protect people. Although some vaccines are in use, due to the high mutation rate in the SARS-CoV-2 multiple variants, the current vaccines may not be sufficient to immunize people against new variant threats. One of the emerging concern variants is B1.1.529 (Omicron), which carries ~ 30 mutations in the Spike protein (S) of SARS-CoV-2 and is predicted to evade antibody recognition even from vaccinated people. We used a structure-based approach and an epitope prediction server to develop a Multi-Epitope based Subunit Vaccine (MESV) involving SARS-CoV-2 B1.1.529 variant spike glycoprotein. The predicted epitope with better antigenicity and non-toxicity was used for designing and predicting vaccine construct features and structure models. In addition, the MESV construct In silico cloning in the pET28a expression vector predicted the construct to be highly translational. The proposed MESV vaccine construct was also subjected to immune simulation prediction and was found to be highly antigenic and elicit a cell-mediated immune response. Therefore, the proposed MESV in the present study has the potential to be evaluated further for vaccine production against the newly identified B1.1.529 (Omicron) variant of concern. Supplementary Information The online version contains supplementary material available at 10.1007/s11224-022-02027-6.
Collapse
Affiliation(s)
- Meet Parmar
- Department of Biotechnology and Bioengineering, Institute of Advanced Research, Koba Institutional Area, Gandhinagar-382426, Gujarat, India
| | - Ritik Thumar
- Department of Biotechnology and Bioengineering, Institute of Advanced Research, Koba Institutional Area, Gandhinagar-382426, Gujarat, India
| | - Jigar Sheth
- Department of Biotechnology and Bioengineering, Institute of Advanced Research, Koba Institutional Area, Gandhinagar-382426, Gujarat, India
| | - Dhaval Patel
- Department of Biotechnology and Bioengineering, Institute of Advanced Research, Koba Institutional Area, Gandhinagar-382426, Gujarat, India
- Gujarat Biotechnology University, Gujarat International Finance Tec-City, Gandhinagar, 382355 Gujarat India
| |
Collapse
|
32
|
Canuti M, Pénzes JJ, Lang AS. A new perspective on the evolution and diversity of the genus Amdoparvovirus (family Parvoviridae) through genetic characterization, structural homology modeling, and phylogenetics. Virus Evol 2022; 8:veac056. [PMID: 35783582 PMCID: PMC9242002 DOI: 10.1093/ve/veac056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 05/13/2022] [Accepted: 06/13/2022] [Indexed: 12/11/2022] Open
Abstract
Amdoparvoviruses (genus Amdoparvovirus, family Parvoviridae) are primarily viruses of carnivorans, but recent studies have indicated that their host range might also extend to rodents and chiropterans. While their classification is based on the full sequence of the major nonstructural protein (NS1), several studies investigating amdoparvoviral diversity have been focused on partial sequences, leading to difficulties in accurately determining species demarcations and leaving several viruses unclassified. In this study, while reporting the complete genomic sequence of a novel amdoparvovirus identified in an American mink (British Columbia amdoparvovirus, BCAV), we studied the phylogenetic relationships of all amdoparvovirus-related sequences and provide a comprehensive reevaluation of their diversity and evolution. After excluding recombinant sequences, phylogenetic and pairwise sequence identity analyses allowed us to define fourteen different viruses, including the five currently classified species, BCAV, and four additional viruses that fulfill the International Committee on Taxonomy of Viruses criteria to be classified as species. We show that the group of viruses historically known as Aleutian mink disease virus (species Carnivore amdoparvovirus 1) should be considered as a cluster of at least four separate viral species that have been co-circulating in mink farms, facilitating the occurrence of inter-species recombination. Genome organization, splicing donor and acceptor sites, and protein sequence motifs were surprisingly conserved within the genus. The sequence of the major capsid protein virus protein 2 (VP2) was significantly more conserved between and within species compared to NS1, a phenomenon possibly linked to antibody-dependent enhancement (ADE). Homology models suggest a remarkably high degree of conservation of the spikes located near the icosahedral threefold axis of the capsid, comprising the surface region associated with ADE. A surprisingly high number of divergent amino acid positions were found in the luminal threefold and twofold axes of the capsid, regions of hitherto unknown function. We emphasize the importance of complete genome analyses and, given the marked phylogenetic inconsistencies across the genome, advise to obtain the complete coding sequences of divergent strains. Further studies on amdoparvovirus biology and structure as well as epidemiological and virus discovery investigations are required to better characterize the ecology and evolution of this important group of viruses.
Collapse
Affiliation(s)
- Marta Canuti
- Department of Biology, Memorial University of Newfoundland, 45 Arctic Ave., St. John’s NL A1C 5S7, Canada
| | - Judit J Pénzes
- Institute for Quantitative Biomedicine, Rutgers the State University of New Jersey, 174 Frelinghuysen Rd, Piscataway, NJ 08854, USA
| | - Andrew S Lang
- Department of Biology, Memorial University of Newfoundland, 45 Arctic Ave., St. John’s NL A1C 5S7, Canada
| |
Collapse
|
33
|
Berger TM, Michaelis C, Probst I, Sagmeister T, Petrowitsch L, Puchner S, Pavkov-Keller T, Gesslbauer B, Grohmann E, Keller W. Small Things Matter: The 11.6-kDa TraB Protein is Crucial for Antibiotic Resistance Transfer Among Enterococci. Front Mol Biosci 2022; 9:867136. [PMID: 35547396 PMCID: PMC9083827 DOI: 10.3389/fmolb.2022.867136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 03/14/2022] [Indexed: 11/18/2022] Open
Abstract
Conjugative transfer is the most important means for spreading antibiotic resistance genes. It is used by Gram-positive and Gram-negative bacteria, and archaea as well. Conjugative transfer is mediated by molecular membrane-spanning nanomachines, so called Type 4 Secretion Systems (T4SS). The T4SS of the broad-host-range inc18-plasmid pIP501 is organized in a single operon encoding 15 putative transfer proteins. pIP501 was originally isolated from a clinical Streptococcus agalactiae strain but is mainly found in Enterococci. In this study, we demonstrate that the small transmembrane protein TraB is essential for pIP501 transfer. Complementation of a markerless pIP501∆traB knockout by traB lacking its secretion signal sequence did not fully restore conjugative transfer. Pull-downs with Strep-tagged TraB demonstrated interactions of TraB with the putative mating pair formation proteins, TraF, TraH, TraK, TraM, and with the lytic transglycosylase TraG. As TraB is the only putative mating pair formation complex protein containing a secretion signal sequence, we speculate on its role as T4SS recruitment factor. Moreover, structural features of TraB and TraB orthologs are presented, making an essential role of TraB-like proteins in antibiotic resistance transfer among Firmicutes likely.
Collapse
Affiliation(s)
- Tamara M.I. Berger
- Institute of Molecular Biosciences, Department of Structural Biology, University of Graz, Graz, Austria
| | - Claudia Michaelis
- Faculty of Life Sciences and Technology, Department of Microbiology, Berliner Hochschule für Technik, Berlin, Germany
| | - Ines Probst
- Division of Infectious Diseases, University Medical Center Freiburg, Freiburg, Germany
| | - Theo Sagmeister
- Institute of Molecular Biosciences, Department of Structural Biology, University of Graz, Graz, Austria
| | - Lukas Petrowitsch
- Institute of Molecular Biosciences, Department of Structural Biology, University of Graz, Graz, Austria
| | - Sandra Puchner
- Faculty of Life Sciences and Technology, Department of Microbiology, Berliner Hochschule für Technik, Berlin, Germany
| | - Tea Pavkov-Keller
- Institute of Molecular Biosciences, Department of Structural Biology, University of Graz, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
| | - Bernd Gesslbauer
- Institute of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, University of Graz, Graz, Austria
| | - Elisabeth Grohmann
- Faculty of Life Sciences and Technology, Department of Microbiology, Berliner Hochschule für Technik, Berlin, Germany
| | - Walter Keller
- Institute of Molecular Biosciences, Department of Structural Biology, University of Graz, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
| |
Collapse
|
34
|
Yang W, Liu Y, Xiao C. Deep metric learning for accurate protein secondary structure prediction. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|