201
|
Adelusi TI, Oyedele AQK, Boyenle ID, Ogunlana AT, Adeyemi RO, Ukachi CD, Idris MO, Olaoba OT, Adedotun IO, Kolawole OE, Xiaoxing Y, Abdul-Hammed M. Molecular modeling in drug discovery. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100880] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
|
202
|
Levine TP. Sequence Analysis and Structural Predictions of Lipid Transfer Bridges in the Repeating Beta Groove (RBG) Superfamily Reveal Past and Present Domain Variations Affecting Form, Function and Interactions of VPS13, ATG2, SHIP164, Hobbit and Tweek. CONTACT (THOUSAND OAKS (VENTURA COUNTY, CALIF.)) 2022; 5:251525642211343. [PMID: 36571082 PMCID: PMC7613979 DOI: 10.1177/25152564221134328] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Lipid transfer between organelles requires proteins that shield the hydrophobic portions of lipids as they cross the cytoplasm. In the last decade a new structural form of lipid transfer protein (LTP) has been found: long hydrophobic grooves made of beta-sheet that bridge between organelles at membrane contact sites. Eukaryotes have five families of bridge-like LTPs: VPS13, ATG2, SHIP164, Hobbit and Tweek. These are unified into a single superfamily through their bridges being composed of just one domain, called the repeating beta groove (RBG) domain, which builds into rod shaped multimers with a hydrophobic-lined groove and hydrophilic exterior. Here, sequences and predicted structures of the RBG superfamily were analyzed in depth. Phylogenetics showed that the last eukaryotic common ancestor contained all five RBG proteins, with duplicated VPS13s. The current set of long RBG protein appears to have arisen in even earlier ancestors from shorter forms with 4 RBG domains. The extreme ends of most RBG proteins have amphipathic helices that might be an adaptation for direct or indirect bilayer interaction, although this has yet to be tested. The one exception to this is the C-terminus of SHIP164, which instead has a coiled-coil. Finally, the exterior surfaces of the RBG bridges are shown to have conserved residues along most of their length, indicating sites for partner interactions almost all of which are unknown. These findings can inform future cell biological and biochemical experiments.
Collapse
|
203
|
Zhang Z, Zhao Y, Wang J, Guo M. DeepRCI: predicting ATP-binding proteins using the residue-residue contact information. IEEE J Biomed Health Inform 2021; 26:2822-2829. [PMID: 34941538 DOI: 10.1109/jbhi.2021.3137840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Adenine-5'-triphosphate (ATP) is a direct energy source for various activities of tissues and cells in the body. The release of ATP energies requires the assistance of ATP-binding proteins. Therefore, the identification of ATP-binding proteins is of great significance for the research on organisms. So far, there are several methods for predicting ATP-binding proteins. However, the accuracies of these methods are so low that the predicted proteins are inaccurate. Here, we designed a novel method, called as DeepRCI (based on Deep convolutional neural network and Residue-residue Contact Information), for predicting ATP-binding proteins. DeepRCI achieved an accuracy of 93.61\% on the test set which was a significant improvement over the state-of-the-art methods.
Collapse
|
204
|
Mise K, Masuda Y, Senoo K, Itoh H. Undervalued Pseudo- nifH Sequences in Public Databases Distort Metagenomic Insights into Biological Nitrogen Fixers. mSphere 2021; 6:e0078521. [PMID: 34787447 PMCID: PMC8597730 DOI: 10.1128/msphere.00785-21] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 11/03/2021] [Indexed: 12/16/2022] Open
Abstract
Nitrogen fixation, a distinct process incorporating the inactive atmospheric nitrogen into the active biological processes, has been a major topic in biological and geochemical studies. Currently, insights into diversity and distribution of nitrogen-fixing microbes are dependent upon homology-based analyses of nitrogenase genes, especially the nifH gene, which are broadly conserved in nitrogen-fixing microbes. Here, we report the pitfall of using nifH as a marker of microbial nitrogen fixation. We exhaustively analyzed genomes in RefSeq (231,908 genomes) and KEGG (6,509 genomes) and cooccurrence and gene order patterns of nitrogenase genes (including nifH) therein. Up to 20% of nifH-harboring genomes lacked nifD and nifK, which encode essential subunits of nitrogenase, within 10 coding sequences upstream or downstream of nifH or on the same genome. According to a phenotypic database of prokaryotes, no species and strains harboring only nifH possess nitrogen-fixing activities, which shows that these nifH genes are "pseudo"-nifH genes. Pseudo-nifH sequences mainly belong to anaerobic microbes, including members of the class Clostridia and methanogens. We also detected many pseudo-nifH reads from metagenomic sequences of anaerobic environments such as animal guts, wastewater, paddy soils, and sediments. In some samples, pseudo-nifH overwhelmed the number of "true" nifH reads by 50% or 10 times. Because of the high sequence similarity between pseudo- and true-nifH, pronounced amounts of nifH-like reads were not confidently classified. Overall, our results encourage reconsideration of the conventional use of nifH for detecting nitrogen-fixing microbes, while suggesting that nifD or nifK would be a more reliable marker. IMPORTANCE Nitrogen-fixing microbes affect biogeochemical cycling, agricultural productivity, and microbial ecosystems, and their distributions have been investigated intensively using genomic and metagenomic sequencing. Currently, insights into nitrogen fixers in the environment have been acquired by homology searches against nitrogenase genes, particularly the nifH gene, in public databases. Here, we report that public databases include a significant amount of incorrectly annotated nifH sequences (pseudo-nifH). We exhaustively investigated the genomic structures of nifH-harboring genomes and found hundreds of pseudo-nifH sequences in RefSeq and KEGG. Over half of these pseudo-nifH sequences belonged to members of the class Clostridia, which is supposed to be a prominent nitrogen-fixing clade. We also found that the abundance of nitrogen fixers in metagenomes could be overestimated by 1.5 to >10 times due to pseudo-nifH recorded in public databases. Our results encourage reconsideration of the prevalent use of nifH as a marker of nitrogen-fixing microbes.
Collapse
Affiliation(s)
- Kazumori Mise
- National Institute of Advanced Industrial Science and Technology (AIST) Hokkaido, Sapporo, Hokkaido, Japan
| | - Yoko Masuda
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Keishi Senoo
- Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo, Japan
| | - Hideomi Itoh
- National Institute of Advanced Industrial Science and Technology (AIST) Hokkaido, Sapporo, Hokkaido, Japan
| |
Collapse
|
205
|
A High-Content Microscopy Screening Identifies New Genes Involved in Cell Width Control in Bacillus subtilis. mSystems 2021; 6:e0101721. [PMID: 34846166 PMCID: PMC8631317 DOI: 10.1128/msystems.01017-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
How cells control their shape and size is a fundamental question of biology. In most bacteria, cell shape is imposed by the peptidoglycan (PG) polymeric meshwork that surrounds the cell. Thus, bacterial cell morphogenesis results from the coordinated action of the proteins assembling and degrading the PG shell. Remarkably, during steady-state growth, most bacteria maintain a defined shape along generations, suggesting that error-proof mechanisms tightly control the process. In the rod-shaped model for the Gram-positive bacterium Bacillus subtilis, the average cell length varies as a function of the growth rate, but the cell diameter remains constant throughout the cell cycle and across growth conditions. Here, in an attempt to shed light on the cellular circuits controlling bacterial cell width, we developed a screen to identify genetic determinants of cell width in B. subtilis. Using high-content screening (HCS) fluorescence microscopy and semiautomated measurement of single-cell dimensions, we screened a library of ∼4,000 single knockout mutants. We identified 13 mutations significantly altering cell diameter, in genes that belong to several functional groups. In particular, our results indicate that metabolism plays a major role in cell width control in B. subtilis. IMPORTANCE Bacterial shape is primarily dictated by the external cell wall, a vital structure that, as such, is the target of countless antibiotics. Our understanding of how bacteria synthesize and maintain this structure is therefore a cardinal question for both basic and applied research. Bacteria usually multiply from generation to generation while maintaining their progenies with rigorously identical shapes. This implies that the bacterial cells constantly monitor and maintain a set of parameters to ensure this perpetuation. Here, our study uses a large-scale microscopy approach to identify at the whole-genome level, in a model bacterium, the genes involved in the control of one of the most tightly controlled cellular parameters, the cell width.
Collapse
|
206
|
Lewis AJO, Hegde RS. A unified evolutionary origin for the ubiquitous protein transporters SecY and YidC. BMC Biol 2021; 19:266. [PMID: 34911545 PMCID: PMC8675477 DOI: 10.1186/s12915-021-01171-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 10/21/2021] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Protein transporters translocate hydrophilic segments of polypeptide across hydrophobic cell membranes. Two protein transporters are ubiquitous and date back to the last universal common ancestor: SecY and YidC. SecY consists of two pseudosymmetric halves, which together form a membrane-spanning protein-conducting channel. YidC is an asymmetric molecule with a protein-conducting hydrophilic groove that partially spans the membrane. Although both transporters mediate insertion of membrane proteins with short translocated domains, only SecY transports secretory proteins and membrane proteins with long translocated domains. The evolutionary origins of these ancient and essential transporters are not known. RESULTS The features conserved by the two halves of SecY indicate that their common ancestor was an antiparallel homodimeric channel. Structural searches with SecY's halves detect exceptional similarity with YidC homologs. The SecY halves and YidC share a fold comprising a three-helix bundle interrupted by a helical hairpin. In YidC, this hairpin is cytoplasmic and facilitates substrate delivery, whereas in SecY, it is transmembrane and forms the substrate-binding lateral gate helices. In both transporters, the three-helix bundle forms a protein-conducting hydrophilic groove delimited by a conserved hydrophobic residue. Based on these similarities, we propose that SecY originated as a YidC homolog which formed a channel by juxtaposing two hydrophilic grooves in an antiparallel homodimer. We find that archaeal YidC and its eukaryotic descendants use this same dimerisation interface to heterodimerise with a conserved partner. YidC's sufficiency for the function of simple cells is suggested by the results of reductive evolution in mitochondria and plastids, which tend to retain SecY only if they require translocation of large hydrophilic domains. CONCLUSIONS SecY and YidC share previously unrecognised similarities in sequence, structure, mechanism, and function. Our delineation of a detailed correspondence between these two essential and ancient transporters enables a deeper mechanistic understanding of how each functions. Furthermore, key differences between them help explain how SecY performs its distinctive function in the recognition and translocation of secretory proteins. The unified theory presented here explains the evolution of these features, and thus reconstructs a key step in the origin of cells.
Collapse
Affiliation(s)
- Aaron J O Lewis
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK.
| | - Ramanujan S Hegde
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK.
| |
Collapse
|
207
|
Olo Ndela E, Enault F, Toussaint A. Transposable Prophages in Leptospira: An Ancient, Now Diverse, Group Predominant in Causative Agents of Weil's Disease. Int J Mol Sci 2021; 22:13434. [PMID: 34948244 PMCID: PMC8705779 DOI: 10.3390/ijms222413434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/06/2021] [Accepted: 12/11/2021] [Indexed: 12/24/2022] Open
Abstract
The virome associated with the corkscrew shaped bacterium Leptospira, responsible for Weil's disease, is scarcely known, and genetic tools available for these bacteria remain limited. To reduce these two issues, potential transposable prophages were searched in Leptospiraceae genomes. The 236 predicted transposable prophages were particularly abundant in the most pathogenic leptospiral clade, being potentially involved in the acquisition of virulent traits. According to genomic similarities and phylogenies, these prophages are distantly related to known transposable phages and are organized into six groups, one of them encompassing prophages with unusual TA-TA ends. Interestingly, structural and transposition proteins reconstruct different relationships between groups, suggesting ancestral recombinations. Based on the baseplate phylogeny, two large clades emerge, with specific gene-contents and high sequence divergence reflecting their ancient origin. Despite their high divergence, the size and overall genomic organization of all prophages are very conserved, a testimony to the highly constrained nature of their genomes. Finally, similarities between these prophages and the three known non-transposable phages infecting L. biflexa, suggest gene transfer between different Caudovirales inside their leptospiral host, and the possibility to use some of the transposable prophages in that model strain.
Collapse
Affiliation(s)
- Eric Olo Ndela
- Laboratoire Microorganismes: Genome Environment (LMGE), Université Clermont Auvergne, CNRS, F-63000 Clermont-Ferrand, France;
| | - François Enault
- Laboratoire Microorganismes: Genome Environment (LMGE), Université Clermont Auvergne, CNRS, F-63000 Clermont-Ferrand, France;
| | - Ariane Toussaint
- Microbiologie Cellulaire et Moléculaire, Université Libre de Bruxelles, IBMM-DBM, 12 Rue des Professeurs Jeneer et Brachet, B-6041 Gosselies, Belgium;
| |
Collapse
|
208
|
Decoding the link of microbiome niches with homologous sequences enables accurately targeted protein structure prediction. Proc Natl Acad Sci U S A 2021; 118:2110828118. [PMID: 34873061 DOI: 10.1073/pnas.2110828118] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2021] [Indexed: 12/26/2022] Open
Abstract
Information derived from metagenome sequences through deep-learning techniques has significantly improved the accuracy of template free protein structure modeling. However, most of the deep learning-based modeling studies are based on blind sequence database searches and suffer from low efficiency in computational resource utilization and model construction, especially when the sequence library becomes prohibitively large. We proposed a MetaSource model built on 4.25 billion microbiome sequences from four major biomes (Gut, Lake, Soil, and Fermentor) to decode the inherent linkage of microbial niches with protein homologous families. Large-scale protein family folding experiments on 8,700 unknown Pfam families showed that a microbiome targeted approach with multiple sequence alignment constructed from individual MetaSource biomes requires more than threefold less computer memory and CPU (central processing unit) time but generates contact-map and three-dimensional structure models with a significantly higher accuracy, compared with that using combined metagenome datasets. These results demonstrate an avenue to bridge the gap between the rapidly increasing metagenome databases and the limited computing resources for efficient genome-wide database mining, which provides a useful bluebook to guide future microbiome sequence database and modeling development for high-accuracy protein structure and function prediction.
Collapse
|
209
|
Zheng W, Li Y, Zhang C, Zhou X, Pearce R, Bell EW, Huang X, Zhang Y. Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins 2021; 89:1734-1751. [PMID: 34331351 PMCID: PMC8616857 DOI: 10.1002/prot.26193] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/06/2021] [Accepted: 07/22/2021] [Indexed: 11/10/2022]
Abstract
In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
210
|
Su H, Wang W, Du Z, Peng Z, Gao S, Cheng M, Yang J. Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2021; 8:e2102592. [PMID: 34719864 PMCID: PMC8693034 DOI: 10.1002/advs.202102592] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/12/2021] [Indexed: 06/04/2023]
Abstract
The accuracy of de novo protein structure prediction has been improved considerably in recent years, mostly due to the introduction of deep learning techniques. In this work, trRosettaX, an improved version of trRosetta for protein structure prediction is presented. The major improvement over trRosetta consists of two folds. The first is the application of a new multi-scale network, i.e., Res2Net, for improved prediction of inter-residue geometries, including distance and orientations. The second is an attention-based module to exploit multiple homologous templates to increase the accuracy further. Compared with trRosetta, trRosettaX improves the contact precision by 6% and 8% on the free modeling targets of CASP13 and CASP14, respectively. A preliminary version of trRosettaX is ranked as one of the top server groups in CASP14's blind test. Additional benchmark test on 161 targets from CAMEO (between Jun and Sep 2020) shows that trRosettaX achieves an average TM-score ≈0.8, outperforming the top groups in CAMEO. These data suggest the effectiveness of using the multi-scale network and the benefit of incorporating homologous templates into the network. The trRosettaX algorithm is incorporated into the trRosetta server since Nov 2020. The web server, the training and inference codes are available at: https://yanglab.nankai.edu.cn/trRosetta/.
Collapse
Affiliation(s)
- Hong Su
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Wenkai Wang
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Zongyang Du
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary SciencesShandong UniversityQingdao266237China
| | - Shang‐Hua Gao
- College of Computer ScienceNankai UniversityTianjin300071China
| | - Ming‐Ming Cheng
- College of Computer ScienceNankai UniversityTianjin300071China
| | - Jianyi Yang
- Research Center for Mathematics and Interdisciplinary SciencesShandong UniversityQingdao266237China
| |
Collapse
|
211
|
Darrouzet E, Rinaldi C, Zambelli B, Ciurli S, Cavazza C. Revisiting the CooJ family, a potential chaperone for nickel delivery to [NiFe]‑carbon monoxide dehydrogenase. J Inorg Biochem 2021; 225:111588. [PMID: 34530332 DOI: 10.1016/j.jinorgbio.2021.111588] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/17/2021] [Accepted: 08/18/2021] [Indexed: 11/21/2022]
Abstract
Nickel insertion into nickel-dependent carbon monoxide dehydrogenase (CODH) represents a key step in the enzyme activation. This is the last step of the biosynthesis of the active site, which contains an atypical heteronuclear NiFe4S4 cluster known as the C-cluster. The enzyme maturation is performed by three accessory proteins, namely CooC, CooT and CooJ. Among them, CooJ from Rhodospirillum rubrum is a histidine-rich protein containing two distinct and spatially separated Ni(II)-binding sites: a N-terminal high affinity site (HAS) and a histidine tail at the C-terminus. In 46 CooJ homologues, the HAS motif was found to be strictly conserved with a H(W/F)XXHXXXH sequence. Here, a proteome database search identified at least 150 CooJ homologues and revealed distinct motifs for HAS, featuring 2, 3 or 4 histidines. The purification and biophysical characterization of three representative members of this protein family showed that they are all homodimers able to bind Ni(II) ions via one or two independent binding sites. Initially thought to be present only in R. rubrum, this study strongly suggests that CooJ could play a significant role in CODH maturation or in nickel homeostasis.
Collapse
Affiliation(s)
- Elisabeth Darrouzet
- University of Grenoble Alpes, CEA, CNRS, IRIG, CBM, F-38000 Grenoble, France
| | - Clara Rinaldi
- University of Grenoble Alpes, CEA, CNRS, IRIG, CBM, F-38000 Grenoble, France
| | - Barbara Zambelli
- Laboratory of Bioinorganic Chemistry, Department of Pharmacy and Biotechnology, University of Bologna, Via Giuseppe Fanin 40, I-40127 Bologna, Italy
| | - Stefano Ciurli
- Laboratory of Bioinorganic Chemistry, Department of Pharmacy and Biotechnology, University of Bologna, Via Giuseppe Fanin 40, I-40127 Bologna, Italy
| | - Christine Cavazza
- University of Grenoble Alpes, CEA, CNRS, IRIG, CBM, F-38000 Grenoble, France.
| |
Collapse
|
212
|
Du Z, Su H, Wang W, Ye L, Wei H, Peng Z, Anishchenko I, Baker D, Yang J. The trRosetta server for fast and accurate protein structure prediction. Nat Protoc 2021; 16:5634-5651. [PMID: 34759384 DOI: 10.1038/s41596-021-00628-9] [Citation(s) in RCA: 344] [Impact Index Per Article: 86.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 08/31/2021] [Indexed: 11/10/2022]
Abstract
The trRosetta (transform-restrained Rosetta) server is a web-based platform for fast and accurate protein structure prediction, powered by deep learning and Rosetta. With the input of a protein's amino acid sequence, a deep neural network is first used to predict the inter-residue geometries, including distance and orientations. The predicted geometries are then transformed as restraints to guide the structure prediction on the basis of direct energy minimization, which is implemented under the framework of Rosetta. The trRosetta server distinguishes itself from other similar structure prediction servers in terms of rapid and accurate de novo structure prediction. As an illustration, trRosetta was applied to two Pfam families with unknown structures, for which the predicted de novo models were estimated to have high accuracy. Nevertheless, to take advantage of homology modeling, homologous templates are used as additional inputs to the network automatically. In general, it takes ~1 h to predict the final structure for a typical protein with ~300 amino acids, using a maximum of 10 CPU cores in parallel in our cluster system. To enable large-scale structure modeling, a downloadable package of trRosetta with open-source codes is available as well. A detailed guidance for using the package is also available in this protocol. The server and the package are available at https://yanglab.nankai.edu.cn/trRosetta/ and https://yanglab.nankai.edu.cn/trRosetta/download/ , respectively.
Collapse
Affiliation(s)
- Zongyang Du
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Hong Su
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Wenkai Wang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Lisha Ye
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Hong Wei
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA, USA.,Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA.,Institute for Protein Design, University of Washington, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Jianyi Yang
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China.
| |
Collapse
|
213
|
Heo L, Janson G, Feig M. Physics-based protein structure refinement in the era of artificial intelligence. Proteins 2021; 89:1870-1887. [PMID: 34156124 PMCID: PMC8616793 DOI: 10.1002/prot.26161] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/31/2021] [Accepted: 06/08/2021] [Indexed: 12/21/2022]
Abstract
Protein structure refinement is the last step in protein structure prediction pipelines. Physics-based refinement via molecular dynamics (MD) simulations has made significant progress during recent years. During CASP14, we tested a new refinement protocol based on an improved sampling strategy via MD simulations. MD simulations were carried out at an elevated temperature (360 K). An optimized use of biasing restraints and the use of multiple starting models led to enhanced sampling. The new protocol generally improved the model quality. In comparison with our previous protocols, the CASP14 protocol showed clear improvements. Our approach was successful with most initial models, many based on deep learning methods. However, we found that our approach was not able to refine machine-learning models from the AlphaFold2 group, often decreasing already high initial qualities. To better understand the role of refinement given new types of models based on machine-learning, a detailed analysis via MD simulations and Markov state modeling is presented here. We continue to find that MD-based refinement has the potential to improve AI predictions. We also identified several practical issues that make it difficult to realize that potential. Increasingly important is the consideration of inter-domain and oligomeric contacts in simulations; the presence of large kinetic barriers in refinement pathways also continues to present challenges. Finally, we provide a perspective on how physics-based refinement could continue to play a role in the future for improving initial predictions based on machine learning-based methods.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
214
|
Simpkin AJ, Rodríguez FS, Mesdaghi S, Kryshtafovych A, Rigden DJ. Evaluation of model refinement in CASP14. Proteins 2021; 89:1852-1869. [PMID: 34288138 PMCID: PMC8616799 DOI: 10.1002/prot.26185] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/19/2021] [Accepted: 07/11/2021] [Indexed: 12/15/2022]
Abstract
We report here an assessment of the model refinement category of the 14th round of Critical Assessment of Structure Prediction (CASP14). As before, predictors submitted up to five ranked refinements, along with associated residue-level error estimates, for targets that had a wide range of starting quality. The ability of groups to accurately rank their submissions and to predict coordinate error varied widely. Overall, only four groups out-performed a "naïve predictor" corresponding to the resubmission of the starting model. Among the top groups, there are interesting differences of approach and in the spread of improvements seen: some methods are more conservative, others more adventurous. Some targets were "double-barreled" for which predictors were offered a high-quality AlphaFold 2 (AF2)-derived prediction alongside another of lower quality. The AF2-derived models were largely unimprovable, many of their apparent errors being found to reside at domain and, especially, crystal lattice contacts. Refinement is shown to have a mixed impact overall on structure-based function annotation methods to predict nucleic acid binding, spot catalytic sites, and dock protein structures.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Filomeno Sánchez Rodríguez
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
- Life Science, Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE, England
| | - Shahram Mesdaghi
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | | | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
215
|
Quezada-Rodríguez EH, Gómez-Velasco H, Arthikala MK, Lara M, Hernández-López A, Nanjareddy K. Exploration of Autophagy Families in Legumes and Dissection of the ATG18 Family with a Special Focus on Phaseolus vulgaris. PLANTS 2021; 10:plants10122619. [PMID: 34961093 PMCID: PMC8703869 DOI: 10.3390/plants10122619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 11/03/2021] [Accepted: 11/03/2021] [Indexed: 11/16/2022]
Abstract
Macroautophagy/autophagy is a fundamental catabolic pathway that maintains cellular homeostasis in eukaryotic cells by forming double-membrane-bound vesicles named autophagosomes. The autophagy family genes remain largely unexplored except in some model organisms. Legumes are a large family of economically important crops, and knowledge of their important cellular processes is essential. Here, to first address the knowledge gaps, we identified 17 ATG families in Phaseolus vulgaris, Medicago truncatula and Glycine max based on Arabidopsis sequences and elucidated their phylogenetic relationships. Second, we dissected ATG18 in subfamilies from early plant lineages, chlorophytes to higher plants, legumes, which included a total of 27 photosynthetic organisms. Third, we focused on the ATG18 family in P. vulgaris to understand the protein structure and developed a 3D model for PvATG18b. Our results identified ATG homologs in the chosen legumes and differential expression data revealed the nitrate-responsive nature of ATG genes. A multidimensional scaling analysis of 280 protein sequences from 27 photosynthetic organisms classified ATG18 homologs into three subfamilies that were not based on the BCAS3 domain alone. The domain structure, protein motifs (FRRG) and the stable folding conformation structure of PvATG18b revealing the possible lipid-binding sites and transmembrane helices led us to propose PvATG18b as the functional homolog of AtATG18b. The findings of this study contribute to an in-depth understanding of the autophagy process in legumes and improve our knowledge of ATG18 subfamilies.
Collapse
Affiliation(s)
- Elsa-Herminia Quezada-Rodríguez
- Ciencias Agrogenómicas, Escuela Nacional de Estudios Superiores Unidad León, Universidad Nacional Autónoma de México (UNAM), León C.P. 37684, Mexico; (E.-H.Q.-R.); (M.-K.A.); (A.H.-L.)
| | - Homero Gómez-Velasco
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), Cuidad Universitaria, Cuidad de Mexico C.P. 04510, Mexico;
| | - Manoj-Kumar Arthikala
- Ciencias Agrogenómicas, Escuela Nacional de Estudios Superiores Unidad León, Universidad Nacional Autónoma de México (UNAM), León C.P. 37684, Mexico; (E.-H.Q.-R.); (M.-K.A.); (A.H.-L.)
| | - Miguel Lara
- Departamento de Biología Molecular de Plantas, Instituto de Biotecnología, Universidad Nacional Autónoma de México (UNAM), Cuernavaca C.P. 62271, Mexico;
| | - Antonio Hernández-López
- Ciencias Agrogenómicas, Escuela Nacional de Estudios Superiores Unidad León, Universidad Nacional Autónoma de México (UNAM), León C.P. 37684, Mexico; (E.-H.Q.-R.); (M.-K.A.); (A.H.-L.)
| | - Kalpana Nanjareddy
- Ciencias Agrogenómicas, Escuela Nacional de Estudios Superiores Unidad León, Universidad Nacional Autónoma de México (UNAM), León C.P. 37684, Mexico; (E.-H.Q.-R.); (M.-K.A.); (A.H.-L.)
- Correspondence: ; Tel.: +52-477-1940800 (ext. 43462)
| |
Collapse
|
216
|
Takahashi-Kariyazono S, Terai Y. Two divergent haplogroups of a sacsin-like gene in Acropora corals. Sci Rep 2021; 11:23018. [PMID: 34837037 PMCID: PMC8626496 DOI: 10.1038/s41598-021-02386-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 11/08/2021] [Indexed: 11/24/2022] Open
Abstract
Reef-building corals are declining due to environmental changes. Sacsin is a member of the heat shock proteins and has been reported as a candidate protein associated with the stress response in Acropora corals. Recently, high nucleotide diversity and the persistence of two divergent haplogroups of sacsin-like genes in Acropora millepora have been reported. While it was not clear when the two haplogroups have split and whether the haplogroups have persisted in only A. millepora or the other lineages in the genus Acropora. In this study, we analyzed a genomic region containing a sacsin-like gene from Acropora and Montipora species. Higher nucleotide diversity in the sacsin-like gene compared with that of surrounding regions was also observed in A. digitifera. This nucleotide diversity is derived from two divergent haplogroups of a sacsin-like gene, which are present in at least three Acropora species. The origin of these two haplogroups can be traced back before the divergence of Acropora and Montipora (119 Ma). Although the link between exceptionally high genetic variation in sacsin-like genes and functional differences in sacsin-like proteins is not clear, the divergent haplogroups may respond differently to envionmental stressors and serve in the adaptive phsiological ecology of these keystone species.
Collapse
Affiliation(s)
- Shiho Takahashi-Kariyazono
- Department of Evolutionary Studies of Biosystems, SOKENDAI (The Graduate University for Advanced Studies), Shonan Village, Hayama, 240-0193, Japan.
| | - Yohey Terai
- Department of Evolutionary Studies of Biosystems, SOKENDAI (The Graduate University for Advanced Studies), Shonan Village, Hayama, 240-0193, Japan.
| |
Collapse
|
217
|
Harith-Fadzilah N, Lam SD, Haris-Hussain M, Ghani IA, Zainal Z, Jalinas J, Hassan M. Proteomics and Interspecies Interaction Analysis Revealed Abscisic Acid Signalling to Be the Primary Driver for Oil Palm's Response against Red Palm Weevil Infestation. PLANTS (BASEL, SWITZERLAND) 2021; 10:2574. [PMID: 34961045 PMCID: PMC8709180 DOI: 10.3390/plants10122574] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 10/29/2021] [Accepted: 11/10/2021] [Indexed: 06/14/2023]
Abstract
The red palm weevil (RPW; Rhynchophorus ferrugineus Olivier (Coleoptera Curculionidae)) is an invasive insect pest that is difficult to manage due to its nature of infesting the host palm trees from within. A holistic, molecular-based approach to identify proteins that correlate with RPW infestation could give useful insights into the vital processes that are prevalent to the host's infestation response and identify the potential biomarkers for an early detection technique. Here, a shotgun proteomic analysis was performed on oil palm (Elaeis guineensis; OP) under untreated (control), wounding by drilling (wounded), and artificial larval infestation (infested) conditions at three different time points to characterise the RPW infestation response at three different stages. KEGG pathway enrichment analysis revealed many overlapping pathways between the control, wounded, and infested groups. Further analysis via literature searches narrowed down biologically relevant proteins into categories, which were photosynthesis, growth, and stress response. Overall, the patterns of protein expression suggested abscisic acid (ABA) hormone signalling to be the primary driver of insect herbivory response. Interspecies molecular docking analysis between RPW ligands and OP receptor proteins provided putative interactions that result in ABA signalling activation. Seven proteins were selected as candidate biomarkers for early infestation detection based on their relevance and association with ABA signalling. The MS data are available via ProteomeXchange with identifier PXD028986. This study provided a deeper insight into the mechanism of stress response in OP in order to develop a novel detection method or improve crop management.
Collapse
Affiliation(s)
- Nazmi Harith-Fadzilah
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia; (N.H.-F.); (Z.Z.)
| | - Su Datt Lam
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia;
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Mohammad Haris-Hussain
- Department of Biological Sciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia; (M.H.-H.); (I.A.G.); (J.J.)
| | - Idris Abd Ghani
- Department of Biological Sciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia; (M.H.-H.); (I.A.G.); (J.J.)
| | - Zamri Zainal
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia; (N.H.-F.); (Z.Z.)
| | - Johari Jalinas
- Department of Biological Sciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia; (M.H.-H.); (I.A.G.); (J.J.)
| | - Maizom Hassan
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia; (N.H.-F.); (Z.Z.)
| |
Collapse
|
218
|
Caetano-Anollés G, Aziz MF, Mughal F, Caetano-Anollés D. Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution. Expert Rev Proteomics 2021; 18:863-880. [PMID: 34628994 DOI: 10.1080/14789450.2021.1992277] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
INTRODUCTION While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. AREAS COVERED Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late 'big bang' of domain combinations. EXPERT OPINION Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically, protein evolution mirrors folding by combining supersecondary structures into domains, developing translation machinery to facilitate folding speed and stability, and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA.,C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Derek Caetano-Anollés
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
219
|
Abstract
The biological significance of proteins attracted the scientific community in exploring their characteristics. The studies shed light on the interaction patterns and functions of proteins in a living body. Due to their practical difficulties, reliable experimental techniques pave the way for introducing computational methods in the interaction prediction. Automated methods reduced the difficulties but could not yet replace experimental studies as the field is still evolving. Interaction prediction problem being critical needs highly accurate results, but none of the existing methods could offer reliable performance that can parallel with experimental results yet. This article aims to assess the existing computational docking algorithms, their challenges, and future scope. Blind docking techniques are quite helpful when no information other than the individual structures are available. As more and more complex structures are being added to different databases, information-driven approaches can be a good alternative. Artificial intelligence, ruling over the major fields, is expected to take over this domain very shortly.
Collapse
|
220
|
Cummings TFM, Gori K, Sanchez-Pulido L, Gavriilidis G, Moi D, Wilson AR, Murchison E, Dessimoz C, Ponting CP, Christophorou MA. Citrullination Was Introduced into Animals by Horizontal Gene Transfer from Cyanobacteria. Mol Biol Evol 2021; 39:6420225. [PMID: 34730808 PMCID: PMC8826395 DOI: 10.1093/molbev/msab317] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Protein posttranslational modifications add great sophistication to biological systems. Citrullination, a key regulatory mechanism in human physiology and pathophysiology, is enigmatic from an evolutionary perspective. Although the citrullinating enzymes peptidylarginine deiminases (PADIs) are ubiquitous across vertebrates, they are absent from yeast, worms, and flies. Based on this distribution PADIs were proposed to have been horizontally transferred, but this has been contested. Here, we map the evolutionary trajectory of PADIs into the animal lineage. We present strong phylogenetic support for a clade encompassing animal and cyanobacterial PADIs that excludes fungal and other bacterial homologs. The animal and cyanobacterial PADI proteins share functionally relevant primary and tertiary synapomorphic sequences that are distinct from a second PADI type present in fungi and actinobacteria. Molecular clock calculations and sequence divergence analyses using the fossil record estimate the last common ancestor of the cyanobacterial and animal PADIs to be less than 1 billion years old. Additionally, under an assumption of vertical descent, PADI sequence change during this evolutionary time frame is anachronistically low, even when compared with products of likely endosymbiont gene transfer, mitochondrial proteins, and some of the most highly conserved sequences in life. The consilience of evidence indicates that PADIs were introduced from cyanobacteria into animals by horizontal gene transfer (HGT). The ancestral cyanobacterial PADI is enzymatically active and can citrullinate eukaryotic proteins, suggesting that the PADI HGT event introduced a new catalytic capability into the regulatory repertoire of animals. This study reveals the unusual evolution of a pleiotropic protein modification.
Collapse
Affiliation(s)
- Thomas F M Cummings
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom,Corresponding authors: E-mails: ;
| | - Kevin Gori
- Transmissible Cancer Group, Department of Veterinary Medicine, Cambridge, United Kingdom
| | - Luis Sanchez-Pulido
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Gavriil Gavriilidis
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - David Moi
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Abigail R Wilson
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Elizabeth Murchison
- Transmissible Cancer Group, Department of Veterinary Medicine, Cambridge, United Kingdom
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland,Swiss Institute of Bioinformatics, Lausanne, Switzerland,Department of Genetics Evolution and Environment, University College London, London, United Kingdom,Department of Computer Science, University College London, London, United Kingdom
| | - Chris P Ponting
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Maria A Christophorou
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom,Epigenetics Department, The Babraham Institute, Cambridge, United Kingdom,Corresponding authors: E-mails: ;
| |
Collapse
|
221
|
Villegas-Morcillo A, Gomez AM, Morales-Cordovilla JA, Sanchez V. Protein Fold Recognition From Sequences Using Convolutional and Recurrent Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2848-2854. [PMID: 32750896 DOI: 10.1109/tcbb.2020.3012732] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The identification of a protein fold type from its amino acid sequence provides important insights about the protein 3D structure. In this paper, we propose a deep learning architecture that can process protein residue-level features to address the protein fold recognition task. Our neural network model combines 1D-convolutional layers with gated recurrent unit (GRU) layers. The GRU cells, as recurrent layers, cope with the processing issues associated to the highly variable protein sequence lengths and so extract a fold-related embedding of fixed size for each protein domain. These embeddings are then used to perform the pairwise fold recognition task, which is based on transferring the fold type of the most similar template structure. We compare our model with several template-based and deep learning-based methods from the state-of-the-art. The evaluation results over the well-known LINDAHL and SCOP_TEST sets, along with a proposed LINDAHL test set updated to SCOP 1.75, show that our embeddings perform significantly better than these methods, specially at the fold level. Supplementary material, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2020.3012732, source code and trained models are available at http://sigmat.ugr.es/~amelia/CNN-GRU-RF+/.
Collapse
|
222
|
Gao M, Lund-Andersen P, Morehead A, Mahmud S, Chen C, Chen X, Giri N, Roy RS, Quadir F, Effler TC, Prout R, Abraham S, Elwasif W, Haas NQ, Skolnick J, Cheng J, Sedova A. High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS 2021; 2021:46-57. [PMID: 35112110 PMCID: PMC8802329 DOI: 10.1109/mlhpc54614.2021.00010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.
Collapse
Affiliation(s)
- Mu Gao
- Georgia Institute of Technology, Atlanta, GA
| | | | | | | | - Chen Chen
- University of Missouri, Columbia, MO
| | - Xiao Chen
- University of Missouri, Columbia, MO
| | | | | | | | | | - Ryan Prout
- Oak Ridge National Laboratory, Oak Ridge, TN
| | | | | | | | | | | | - Ada Sedova
- Oak Ridge National Laboratory, Oak Ridge, TN
| |
Collapse
|
223
|
Naschberger A, Baradaran R, Rupp B, Carroni M. The structure of neurofibromin isoform 2 reveals different functional states. Nature 2021; 599:315-319. [PMID: 34707296 PMCID: PMC8580823 DOI: 10.1038/s41586-021-04024-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 09/13/2021] [Indexed: 01/20/2023]
Abstract
The autosomal dominant monogenetic disease neurofibromatosis type 1 (NF1) affects approximately one in 3,000 individuals and is caused by mutations in the NF1 tumour suppressor gene, leading to dysfunction in the protein neurofibromin (Nf1)1,2. As a GTPase-activating protein, a key function of Nf1 is repression of the Ras oncogene signalling cascade. We determined the human Nf1 dimer structure at an overall resolution of 3.3 Å. The cryo-electron microscopy structure reveals domain organization and structural details of the Nf1 exon 23a splicing3 isoform 2 in a closed, self-inhibited, Zn-stabilized state and an open state. In the closed conformation, HEAT/ARM core domains shield the GTPase-activating protein-related domain (GRD) so that Ras binding is sterically inhibited. In a distinctly different, open conformation of one protomer, a large-scale movement of the GRD occurs, which is necessary to access Ras, whereas Sec14-PH reorients to allow interaction with the cellular membrane4. Zn incubation of Nf1 leads to reduced Ras-GAP activity with both protomers in the self-inhibited, closed conformation stabilized by a Zn binding site between the N-HEAT/ARM domain and the GRD-Sec14-PH linker. The transition between closed, self-inhibited states of Nf1 and open states provides guidance for targeted studies deciphering the complex molecular mechanism behind the widespread neurofibromatosis syndrome and Nf1 dysfunction in carcinogenesis.
Collapse
Affiliation(s)
- Andreas Naschberger
- SciLifeLab, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
- Institute of Genetic Epidemiology, Medical University Innsbruck, Innsbruck, Austria
| | - Rozbeh Baradaran
- SciLifeLab, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Bernhard Rupp
- Institute of Genetic Epidemiology, Medical University Innsbruck, Innsbruck, Austria.
- k.-k. Hofkristallamt, San Diego, CA, USA.
| | - Marta Carroni
- SciLifeLab, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden.
| |
Collapse
|
224
|
Mobini S, Chizari M, Mafakher L, Rismani E, Rismani E. Structure-based study of immune receptors as eligible binding targets of coronavirus SARS-CoV-2 spike protein. J Mol Graph Model 2021; 108:107997. [PMID: 34343818 PMCID: PMC8317541 DOI: 10.1016/j.jmgm.2021.107997] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 07/23/2021] [Accepted: 07/26/2021] [Indexed: 12/27/2022]
Abstract
One of the most important challenges in the battle against contagious SARS-CoV-2 is subtle identification of the virus pathogenesis. The broad range of COVID-19 clinical manifestations may indicate diversity of virus-host cells. Amongst key manifestations, especially in severe COVID-19 patients, reduction and/or exhaustion of lymphocytes, monocytes, basophils, and dendritic cells are seen.; therefore, it is required to recognize that how the virus infects the cells. Interestingly, angiotensin-converting enzyme 2 (ACE2) as the well-known receptor of SARS-CoV-2 is low or non-expressed in these cells. Using computational approach, several receptor candidates including leukocyte surface molecules and chemokine receptors that expressed in most lineages of immune cells were evaluated as the feasible receptor of spike receptor-binding domain (RBD) of SARS-CoV-2. The results revealed the higher binding affinity of CD26, CD2, CD56, CD7, CCR9, CD150, CD4, CD50, XCR1 and CD106 compared to ACE2. However, the modes of binding and amino acids involved in the interactions with the RBD domain of spike were various. Overall, the affinity of immune receptor candidates in binding to SARS-CoV-2 RBD may offer insight into the recognition of novel therapeutic targets in association with COVID-19.
Collapse
Affiliation(s)
- Saeed Mobini
- Department of Immunology, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
| | - Milad Chizari
- Department of Medical Biotechnology, School of Allied Medical Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Ladan Mafakher
- Thalassemia and Hemoglobinopathy Research Center, Health Research Institute, Ahvaz Jundishapur University of Medical Science, Ahvaz, Iran
| | - Elmira Rismani
- Payam Noor University, Biology Department, Tehran, Iran.
| | - Elham Rismani
- Molecular Medicine Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran.
| |
Collapse
|
225
|
Sangeetha B, Krishnamoorthy AS, Sharmila DJS, Renukadevi P, Malathi VG, Amirtham D. Molecular modelling of coat protein of the Groundnut bud necrosis tospovirus and its binding with Squalene as an antiviral agent: In vitro and in silico docking investigations. Int J Biol Macromol 2021; 189:618-634. [PMID: 34437921 DOI: 10.1016/j.ijbiomac.2021.08.143] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 08/18/2021] [Indexed: 01/15/2023]
Abstract
Bud blight disease caused by groundnut bud necrosis virus (GBNV) is a serious constraint in the cultivation of agricultural crops such as legumes, tomato, chilies, potato, cotton etc. Owing to the significant damage caused by GBNV, an attempt was made to identify suitable organic antiviral agents through molecular modelling of the nucleocapsid Coat Protein of GBNV; molecular docking and molecular dynamics that disclosed the interaction of the ligands viz., Squalene and Ganoderic acid-A with coat protein of GBNV. Invitro inhibitory effect of Squalene and Ganoderic acid-A was examined in comparison with different concentrations, against GBNV in cowpea plants under glasshouse condition. The different concentrations of Squalene (50, 100, 150, 250 and 500 ppm) tested in vitro resulted in reduction of lesion numbers (1.69 cm2) as well as reduced virus titre in co-inoculation spray. The present study suggests the antiviral activity of Squalene by effectively fitting into binding site of coat protein of GBNV with favourable hydrophilic as well as strong hydrophobic interactions thereby challenging and blocking the binding of viral replication RNA with coat protein and propagation. The present organic antiviral molecules will be helpful in development of suitable eco-friendly formulations to mitigate GBNV infection disease in plants.
Collapse
Affiliation(s)
- B Sangeetha
- Department of Plant Pathology, Centre for Plant Protection Studies, Tamil Nadu Agricultural University, Coimbatore, Tamil Nadu 641003, India
| | - A S Krishnamoorthy
- Department of Plant Pathology, Centre for Plant Protection Studies, Tamil Nadu Agricultural University, Coimbatore, Tamil Nadu 641003, India.
| | - D Jeya Sundara Sharmila
- Department of Nano Science and Technology, Tamil Nadu Agricultural University, Coimbatore, Tamil Nadu 641003, India
| | - P Renukadevi
- Department of Sericulture, Forest College and Research Institute, Mettupalayam, Tamil Nadu 641003, India
| | - V G Malathi
- Department of Plant Pathology, Centre for Plant Protection Studies, Tamil Nadu Agricultural University, Coimbatore, Tamil Nadu 641003, India
| | - D Amirtham
- Department of Food and Agricultural Process Engineering, Tamil Nadu Agricultural University, Coimbatore, Tamil Nadu 641003, India
| |
Collapse
|
226
|
TwinCons: Conservation score for uncovering deep sequence similarity and divergence. PLoS Comput Biol 2021; 17:e1009541. [PMID: 34714829 PMCID: PMC8580257 DOI: 10.1371/journal.pcbi.1009541] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 11/10/2021] [Accepted: 10/06/2021] [Indexed: 11/19/2022] Open
Abstract
We have developed the program TwinCons, to detect noisy signals of deep ancestry of proteins or nucleic acids. As input, the program uses a composite alignment containing pre-defined groups, and mathematically determines a 'cost' of transforming one group to the other at each position of the alignment. The output distinguishes conserved, variable and signature positions. A signature is conserved within groups but differs between groups. The method automatically detects continuous characteristic stretches (segments) within alignments. TwinCons provides a convenient representation of conserved, variable and signature positions as a single score, enabling the structural mapping and visualization of these characteristics. Structure is more conserved than sequence. TwinCons highlights alternative sequences of conserved structures. Using TwinCons, we detected highly similar segments between proteins from the translation and transcription systems. TwinCons detects conserved residues within regions of high functional importance for the ribosomal RNA (rRNA) and demonstrates that signatures are not confined to specific regions but are distributed across the rRNA structure. The ability to evaluate both nucleic acid and protein alignments allows TwinCons to be used in combined sequence and structural analysis of signatures and conservation in rRNA and in ribosomal proteins (rProteins). TwinCons detects a strong sequence conservation signal between bacterial and archaeal rProteins related by circular permutation. This conserved sequence is structurally colocalized with conserved rRNA, indicated by TwinCons scores of rRNA alignments of bacterial and archaeal groups. This combined analysis revealed deep co-evolution of rRNA and rProtein buried within the deepest branching points in the tree of life.
Collapse
|
227
|
Alvarez-Carreño C, Penev PI, Petrov AS, Williams LD. Fold Evolution before LUCA: Common Ancestry of SH3 Domains and OB Domains. Mol Biol Evol 2021; 38:5134-5143. [PMID: 34383917 PMCID: PMC8557408 DOI: 10.1093/molbev/msab240] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SH3 and OB are the simplest, oldest, and most common protein domains within the translation system. SH3 and OB domains are β-barrels that are structurally similar but are topologically distinct. To transform an OB domain to a SH3 domain, β-strands must be permuted in a multistep and evolutionarily implausible mechanism. Here, we explored relationships between SH3 and OB domains of ribosomal proteins, initiation, and elongation factors using a combined sequence- and structure-based approach. We detect a common core of SH3 and OB domains, as a region of significant structure and sequence similarity. The common core contains four β-strands and a loop, but omits the fifth β-strand, which is variable and is absent from some OB and SH3 domain proteins. The structure of the common core immediately suggests a simple permutation mechanism for interconversion between SH3 and OB domains, which appear to share an ancestor. The OB domain was formed by duplication and adaptation of the SH3 domain core, or vice versa, in a simple and probable transformation. By employing the folding algorithm AlphaFold2, we demonstrated that an ancestral reconstruction of a permuted SH3 sequence folds into an OB structure, and an ancestral reconstruction of a permuted OB sequence folds into a SH3 structure. The tandem SH3 and OB domains in the universal ribosomal protein uL2 share a common ancestor, suggesting that the divergence of these two domains occurred before the last universal common ancestor.
Collapse
Affiliation(s)
- Claudia Alvarez-Carreño
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - Petar I Penev
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Anton S Petrov
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - Loren Dean Williams
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
228
|
Sandaruwan PD, Wannige CT. An improved deep learning model for hierarchical classification of protein families. PLoS One 2021; 16:e0258625. [PMID: 34669708 PMCID: PMC8528337 DOI: 10.1371/journal.pone.0258625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 10/01/2021] [Indexed: 12/28/2022] Open
Abstract
Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.
Collapse
|
229
|
Villegas-Morcillo A, Sanchez V, Gomez AM. FoldHSphere: deep hyperspherical embeddings for protein fold recognition. BMC Bioinformatics 2021; 22:490. [PMID: 34641786 PMCID: PMC8507389 DOI: 10.1186/s12859-021-04419-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 09/29/2021] [Indexed: 12/01/2022] Open
Abstract
Background Current state-of-the-art deep learning approaches for protein fold recognition learn protein embeddings that improve prediction performance at the fold level. However, there still exists aperformance gap at the fold level and the (relatively easier) family level, suggesting that it might be possible to learn an embedding space that better represents the protein folds. Results In this paper, we propose the FoldHSphere method to learn a better fold embedding space through a two-stage training procedure. We first obtain prototype vectors for each fold class that are maximally separated in hyperspherical space. We then train a neural network by minimizing the angular large margin cosine loss to learn protein embeddings clustered around the corresponding hyperspherical fold prototypes. Our network architectures, ResCNN-GRU and ResCNN-BGRU, process the input protein sequences by applying several residual-convolutional blocks followed by a gated recurrent unit-based recurrent layer. Evaluation results on the LINDAHL dataset indicate that the use of our hyperspherical embeddings effectively bridges the performance gap at the family and fold levels. Furthermore, our FoldHSpherePro ensemble method yields an accuracy of 81.3% at the fold level, outperforming all the state-of-the-art methods. Conclusions Our methodology is efficient in learning discriminative and fold-representative embeddings for the protein domains. The proposed hyperspherical embeddings are effective at identifying the protein fold class by pairwise comparison, even when amino acid sequence similarities are low. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04419-7.
Collapse
Affiliation(s)
- Amelia Villegas-Morcillo
- Department of Signal Theory, Telematics and Communications, University of Granada, Periodista Daniel Saucedo Aranda, 18071, Granada, Spain.
| | - Victoria Sanchez
- Department of Signal Theory, Telematics and Communications, University of Granada, Periodista Daniel Saucedo Aranda, 18071, Granada, Spain
| | - Angel M Gomez
- Department of Signal Theory, Telematics and Communications, University of Granada, Periodista Daniel Saucedo Aranda, 18071, Granada, Spain
| |
Collapse
|
230
|
Hendriks IA, Buch-Larsen SC, Prokhorova E, Elsborg JD, Rebak AKLFS, Zhu K, Ahel D, Lukas C, Ahel I, Nielsen ML. The regulatory landscape of the human HPF1- and ARH3-dependent ADP-ribosylome. Nat Commun 2021; 12:5893. [PMID: 34625544 PMCID: PMC8501107 DOI: 10.1038/s41467-021-26172-4] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Accepted: 09/21/2021] [Indexed: 11/08/2022] Open
Abstract
Despite the involvement of Poly(ADP-ribose) polymerase-1 (PARP1) in many important biological pathways, the target residues of PARP1-mediated ADP-ribosylation remain ambiguous. To explicate the ADP-ribosylation regulome, we analyze human cells depleted for key regulators of PARP1 activity, histone PARylation factor 1 (HPF1) and ADP-ribosylhydrolase 3 (ARH3). Using quantitative proteomics, we characterize 1,596 ADP-ribosylation sites, displaying up to 1000-fold regulation across the investigated knockout cells. We find that HPF1 and ARH3 inversely and homogenously regulate the serine ADP-ribosylome on a proteome-wide scale with consistent adherence to lysine-serine-motifs, suggesting that targeting is independent of HPF1 and ARH3. Notably, we do not detect an HPF1-dependent target residue switch from serine to glutamate/aspartate under the investigated conditions. Our data support the notion that serine ADP-ribosylation mainly exists as mono-ADP-ribosylation in cells, and reveal a remarkable degree of histone co-modification with serine ADP-ribosylation and other post-translational modifications.
Collapse
Affiliation(s)
- Ivo A Hendriks
- Proteomics Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200, Copenhagen, Denmark
| | - Sara C Buch-Larsen
- Proteomics Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200, Copenhagen, Denmark
| | - Evgeniia Prokhorova
- Sir William Dunn School of Pathology, University of Oxford, Oxford, OX1 3RE, UK
| | - Jonas D Elsborg
- Proteomics Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200, Copenhagen, Denmark
| | - Alexandra K L F S Rebak
- Proteomics Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200, Copenhagen, Denmark
| | - Kang Zhu
- Sir William Dunn School of Pathology, University of Oxford, Oxford, OX1 3RE, UK
| | - Dragana Ahel
- Sir William Dunn School of Pathology, University of Oxford, Oxford, OX1 3RE, UK
| | - Claudia Lukas
- Protein Signaling Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200, Copenhagen, Denmark
| | - Ivan Ahel
- Sir William Dunn School of Pathology, University of Oxford, Oxford, OX1 3RE, UK
| | - Michael L Nielsen
- Proteomics Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200, Copenhagen, Denmark.
| |
Collapse
|
231
|
Trinquier J, Uguzzoni G, Pagnani A, Zamponi F, Weigt M. Efficient generative modeling of protein sequences using simple autoregressive models. Nat Commun 2021; 12:5800. [PMID: 34608136 PMCID: PMC8490405 DOI: 10.1038/s41467-021-25756-4] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 08/23/2021] [Indexed: 02/08/2023] Open
Abstract
Generative models emerge as promising candidates for novel sequence-data driven approaches to protein design, and for the extraction of structural and functional information about proteins deeply hidden in rapidly growing sequence databases. Here we propose simple autoregressive models as highly accurate but computationally efficient generative sequence models. We show that they perform similarly to existing approaches based on Boltzmann machines or deep generative models, but at a substantially lower computational cost (by a factor between 102 and 103). Furthermore, the simple structure of our models has distinctive mathematical advantages, which translate into an improved applicability in sequence generation and evaluation. Within these models, we can easily estimate both the probability of a given sequence, and, using the model's entropy, the size of the functional sequence space related to a specific protein family. In the example of response regulators, we find a huge number of ca. 1068 possible sequences, which nevertheless constitute only the astronomically small fraction 10-80 of all amino-acid sequences of the same length. These findings illustrate the potential and the difficulty in exploring sequence space via generative sequence models.
Collapse
Affiliation(s)
- Jeanne Trinquier
- grid.503253.20000 0004 0520 7190Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France ,grid.462608.e0000 0004 0384 7821Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Guido Uguzzoni
- grid.4800.c0000 0004 1937 0343Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy ,grid.428948.b0000 0004 1784 6598Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy
| | - Andrea Pagnani
- grid.4800.c0000 0004 1937 0343Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy ,grid.428948.b0000 0004 1784 6598Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy ,grid.470222.10000 0004 7471 9712INFN Sezione di Torino, Via P. Giuria 1, I-10125 Torino, Italy
| | - Francesco Zamponi
- grid.462608.e0000 0004 0384 7821Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Martin Weigt
- grid.503253.20000 0004 0520 7190Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France
| |
Collapse
|
232
|
Robson B. Testing machine learning techniques for general application by using protein secondary structure prediction. A brief survey with studies of pitfalls and benefits using a simple progressive learning approach. Comput Biol Med 2021; 138:104883. [PMID: 34598067 DOI: 10.1016/j.compbiomed.2021.104883] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/05/2021] [Accepted: 09/17/2021] [Indexed: 01/05/2023]
Abstract
Many researchers have recently used the prediction of protein secondary structure (local conformational states of amino acid residues) to test advances in predictive and machine learning technology such as Neural Net Deep Learning. Protein secondary structure prediction continues to be a helpful tool in research in biomedicine and the life sciences, but it is also extremely enticing for testing predictive methods such as neural nets that are intended for different or more general purposes. A complication is highlighted here for researchers testing their methods for other applications. Modern protein databases inevitably contain important clues to the answer, so-called "strong buried clues", though often obscurely; they are hard to avoid. This is because most proteins or parts of proteins in a modern protein data base are related to others by biological evolution. For researchers developing machine learning and predictive methods, this can overstate and so confuse understanding of the true quality of a predictive method. However, for researchers using the algorithms as tools, understanding strong buried clues is of great value, because they need to make maximum use of all information available. A simple method related to the GOR methods but with some features of neural nets in the sense of progressive learning of large numbers of weights, is used to explore this. It can acquire tens of millions and hence gigabytes of weights, but they are learned stably by exhaustive sampling. The significance of the findings is discussed in the light of promising recent results from AlphaFold using Google's DeepMind.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc. Ohio, USA and the Dirac Foundation Oxfordshire, UK.
| |
Collapse
|
233
|
Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021; 433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]
Abstract
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | - Chris P Ponting
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| |
Collapse
|
234
|
Moya-Beltrán A, Makarova KS, Acuña LG, Wolf YI, Covarrubias PC, Shmakov SA, Silva C, Tolstoy I, Johnson DB, Koonin EV, Quatrini R. Evolution of Type IV CRISPR-Cas Systems: Insights from CRISPR Loci in Integrative Conjugative Elements of Acidithiobacillia. CRISPR J 2021; 4:656-672. [PMID: 34582696 DOI: 10.1089/crispr.2021.0051] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Type IV CRISPR-Cas are a distinct variety of highly derived CRISPR-Cas systems that appear to have evolved from type III systems through the loss of the target-cleaving nuclease and partial deterioration of the large subunit of the effector complex. All known type IV CRISPR-Cas systems are encoded on plasmids, integrative and conjugative elements (ICEs), or prophages, and are thought to contribute to competition between these elements, although the mechanistic details of their function remain unknown. There is a clear parallel between the compositions and likely origin of type IV and type I systems recruited by Tn7-like transposons and mediating RNA-guided transposition. We investigated the diversity and evolutionary relationships of type IV systems, with a focus on those in Acidithiobacillia, where this variety of CRISPR is particularly abundant and always found on ICEs. Our analysis revealed remarkable evolutionary plasticity of type IV CRISPR-Cas systems, with adaptation and ancillary genes originating from different ancestral CRISPR-Cas varieties, and extensive gene shuffling within the type IV loci. The adaptation module and the CRISPR array apparently were lost in the type IV ancestor but were subsequently recaptured by type IV systems on several independent occasions. We demonstrate a high level of heterogeneity among the repeats with type IV CRISPR arrays, which far exceed the heterogeneity of any other known CRISPR repeats and suggest a unique adaptation mechanism. The spacers in the type IV arrays, for which protospacers could be identified, match plasmid genes, in particular those encoding the conjugation apparatus components. Both the biochemical mechanism of type IV CRISPR-Cas function and their role in the competition among mobile genetic elements remain to be investigated.
Collapse
Affiliation(s)
- Ana Moya-Beltrán
- Fundación Ciencia y Vida, Santiago, Chile; Universidad San Sebastián, Santiago, Chile.,ANID-Millennium Science Initiative Program, Millennium Nucleus in the Biology of the Intestinal Microbiota, Santiago, Chile; Universidad San Sebastián, Santiago, Chile
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA; Universidad San Sebastián, Santiago, Chile
| | - Lillian G Acuña
- Fundación Ciencia y Vida, Santiago, Chile; Universidad San Sebastián, Santiago, Chile
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA; Universidad San Sebastián, Santiago, Chile
| | - Paulo C Covarrubias
- Fundación Ciencia y Vida, Santiago, Chile; Universidad San Sebastián, Santiago, Chile
| | - Sergey A Shmakov
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA; Universidad San Sebastián, Santiago, Chile
| | - Cristian Silva
- Fundación Ciencia y Vida, Santiago, Chile; Universidad San Sebastián, Santiago, Chile
| | - Igor Tolstoy
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA; Universidad San Sebastián, Santiago, Chile
| | - D Barrie Johnson
- School of Natural Sciences, Bangor University, Bangor, United Kingdom; Universidad San Sebastián, Santiago, Chile.,Faculty of Health and Life Sciences, Coventry University, Coventry, United Kingdom; and Universidad San Sebastián, Santiago, Chile
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA; Universidad San Sebastián, Santiago, Chile
| | - Raquel Quatrini
- Fundación Ciencia y Vida, Santiago, Chile; Universidad San Sebastián, Santiago, Chile.,ANID-Millennium Science Initiative Program, Millennium Nucleus in the Biology of the Intestinal Microbiota, Santiago, Chile; Universidad San Sebastián, Santiago, Chile.,Facultad de Medicina y Ciencia, Universidad San Sebastián, Santiago, Chile
| |
Collapse
|
235
|
Cheng Y, Grueber C, Hogg CJ, Belov K. Improved high-throughput MHC typing for non-model species using long-read sequencing. Mol Ecol Resour 2021; 22:862-876. [PMID: 34551192 PMCID: PMC9293008 DOI: 10.1111/1755-0998.13511] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/26/2021] [Accepted: 09/06/2021] [Indexed: 11/29/2022]
Abstract
The major histocompatibility complex (MHC) plays a critical role in the vertebrate immune system. Accurate MHC typing is critical to understanding not only host fitness and disease susceptibility, but also the mechanisms underlying host‐pathogen co‐evolution. However, due to the high degree of gene duplication and diversification of MHC genes, it is often technically challenging to accurately characterise MHC genetic diversity in non‐model species. Here we conducted a systematic review to identify common issues associated with current widely used MHC typing approaches. Then to overcome these challenges, we developed a long‐read based MHC typing method along with a new analysis pipeline. Our approach enables the sequencing of fully phased MHC alleles spanning all key functional domains and the separation of highly similar alleles as well as the removal of technical artefacts such as PCR heteroduplexes and chimeras. Using this approach, we performed population‐scale MHC typing in the Tasmanian devil (Sarcophilus harrisii), revealing previously undiscovered MHC functional diversity in this endangered species. Our new method provides a better solution for addressing research questions that require high MHC typing accuracy. Since the method is not limited by species or the number of genes analysed, it will be applicable for studying not only the MHC but also other complex gene families.
Collapse
Affiliation(s)
- Yuanyuan Cheng
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Catherine Grueber
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia.,San Diego Zoo Wildlife Alliance, San Diego, California, USA
| | - Katherine Belov
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
236
|
Makarova KS, Wolf YI, Karamycheva S, Koonin EV. A Unique Gene Module in Thermococcales Archaea Centered on a Hypervariable Protein Containing Immunoglobulin Domains. Front Microbiol 2021; 12:721392. [PMID: 34489912 PMCID: PMC8416519 DOI: 10.3389/fmicb.2021.721392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/22/2021] [Indexed: 11/17/2022] Open
Abstract
Molecular mechanisms involved in biological conflicts and self vs nonself recognition in archaea remain poorly characterized. We apply phylogenomic analysis to identify a hypervariable gene module that is widespread among Thermococcales. These loci consist of an upstream gene coding for a large protein containing several immunoglobulin (Ig) domains and unique combinations of downstream genes, some of which also contain Ig domains. In the large Ig domain containing protein, the C-terminal Ig domain sequence is hypervariable, apparently, as a result of recombination between genes from different Thermococcales. To reflect the hypervariability, we denote this gene module VARTIG (VARiable Thermococcales IG). The overall organization of the VARTIG modules is similar to the organization of Polymorphic Toxin Systems (PTS). Archaeal genomes outside Thermococcales encode a variety of Ig domain proteins, but no counterparts to VARTIG and no Ig domains with comparable levels of variability. The specific functions of VARTIG remain unknown but the identified features of this system imply three testable hypotheses: (i) involvement in inter-microbial conflicts analogous to PTS, (ii) role in innate immunity analogous to the vertebrate complement system, and (iii) function in self vs nonself discrimination analogous to the vertebrate Major Histocompatibility Complex. The latter two hypotheses seem to be of particular interest given the apparent analogy to the vertebrate immunity.
Collapse
Affiliation(s)
- Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, United States
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, United States
| | - Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, United States
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, United States
| |
Collapse
|
237
|
Schierwater B, Osigus HJ, Bergmann T, Blackstone NW, Hadrys H, Hauslage J, Humbert PO, Kamm K, Kvansakul M, Wysocki K, DeSalle R. The enigmatic Placozoa part 2: Exploring evolutionary controversies and promising questions on earth and in space. Bioessays 2021; 43:e2100083. [PMID: 34490659 DOI: 10.1002/bies.202100083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 07/21/2021] [Accepted: 08/16/2021] [Indexed: 12/28/2022]
Abstract
The placozoan Trichoplax adhaerens has been bridging gaps between research disciplines like no other animal. As outlined in part 1, placozoans have been subject of hot evolutionary debates and placozoans have challenged some fundamental evolutionary concepts. Here in part 2 we discuss the exceptional genetics of the phylum Placozoa and point out some challenging model system applications for the best known species, Trichoplax adhaerens.
Collapse
Affiliation(s)
- Bernd Schierwater
- Institute of Animal Ecology, University of Veterinary Medicine Hannover, Foundation, Hannover, Germany
| | - Hans-Jürgen Osigus
- Institute of Animal Ecology, University of Veterinary Medicine Hannover, Foundation, Hannover, Germany
| | - Tjard Bergmann
- Institute of Animal Ecology, University of Veterinary Medicine Hannover, Foundation, Hannover, Germany
| | - Neil W Blackstone
- Department of Biological Sciences, Northern Illinois University, DeKalb, Illinois, USA
| | - Heike Hadrys
- Institute of Animal Ecology, University of Veterinary Medicine Hannover, Foundation, Hannover, Germany
| | - Jens Hauslage
- Gravitational Biology, Institute of Aerospace Medicine, German Aerospace Center (DLR), Cologne, Germany
| | - Patrick O Humbert
- Department of Biochemistry & Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria, Australia.,Research Centre for Molecular Cancer Prevention, La Trobe University, Melbourne, Victoria, Australia
| | - Kai Kamm
- Institute of Animal Ecology, University of Veterinary Medicine Hannover, Foundation, Hannover, Germany
| | - Marc Kvansakul
- Department of Biochemistry & Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria, Australia.,Research Centre for Molecular Cancer Prevention, La Trobe University, Melbourne, Victoria, Australia
| | - Kathrin Wysocki
- Institute of Animal Ecology, University of Veterinary Medicine Hannover, Foundation, Hannover, Germany
| | - Rob DeSalle
- American Museum of Natural History, New York, New York, USA
| |
Collapse
|
238
|
Ferruz N, Michel F, Lobos F, Schmidt S, Höcker B. Fuzzle 2.0: Ligand Binding in Natural Protein Building Blocks. Front Mol Biosci 2021; 8:715972. [PMID: 34485385 PMCID: PMC8416435 DOI: 10.3389/fmolb.2021.715972] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/06/2021] [Indexed: 11/13/2022] Open
Abstract
Modern proteins have been shown to share evolutionary relationships via subdomain-sized fragments. The assembly of such fragments through duplication and recombination events led to the complex structures and functions we observe today. We previously implemented a pipeline that identified more than 1,000 of these fragments that are shared by different protein folds and developed a web interface to analyze and search for them. This resource named Fuzzle helps structural and evolutionary biologists to identify and analyze conserved parts of a protein but it also provides protein engineers with building blocks for example to design proteins by fragment combination. Here, we describe a new version of this web resource that was extended to include ligand information. This addition is a significant asset to the database since now protein fragments that bind specific ligands can be identified and analyzed. Often the mode of ligand binding is conserved in proteins thereby supporting a common evolutionary origin. The same can now be explored for subdomain-sized fragments within this database. This ligand binding information can also be used in protein engineering to graft binding pockets into other protein scaffolds or to transfer functional sites via recombination of a specific fragment. Fuzzle 2.0 is freely available at https://fuzzle.uni-bayreuth.de/2.0.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Florian Michel
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Francisco Lobos
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Steffen Schmidt
- Computational Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
239
|
Makarova KS, Wolf YI, Shmakov SA, Liu Y, Li M, Koonin EV. Unprecedented Diversity of Unique CRISPR-Cas-Related Systems and Cas1 Homologs in Asgard Archaea. CRISPR J 2021; 3:156-163. [PMID: 33555973 DOI: 10.1089/crispr.2020.0012] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The principal function of archaeal and bacterial CRISPR-Cas systems is antivirus adaptive immunity. However, recent genome analyses identified a variety of derived CRISPR-Cas variants at least some of which appear to perform different functions. Here, we describe a unique repertoire of CRISPR-Cas-related systems that we discovered by searching archaeal metagenome-assemble genomes of the Asgard superphylum. Several of these variants contain extremely diverged homologs of Cas1, the integrase involved in CRISPR adaptation as well as casposon transposition. Strikingly, the diversity of Cas1 in Asgard archaea alone is greater than that detected so far among the rest of archaea and bacteria. The Asgard CRISPR-Cas derivatives also encode distinct forms of Cas4, Cas5, and Cas7 proteins, and/or additional nucleases. Some of these systems are predicted to perform defense functions, but possibly not programmable ones, whereas others are likely to represent previously unknown mobile genetic elements.
Collapse
Affiliation(s)
- Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Sergey A Shmakov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Yang Liu
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P.R. China
| | - Meng Li
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P.R. China
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
240
|
Pereira J, Alva V. How do I get the most out of my protein sequence using bioinformatics tools? Acta Crystallogr D Struct Biol 2021; 77:1116-1126. [PMID: 34473083 PMCID: PMC8411974 DOI: 10.1107/s2059798321007907] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 08/02/2021] [Indexed: 12/21/2022] Open
Abstract
Biochemical and biophysical experiments are essential for uncovering the three-dimensional structure and biological role of a protein of interest. However, meaningful predictions can frequently also be made using bioinformatics resources that transfer knowledge from a well studied protein to an uncharacterized protein based on their evolutionary relatedness. These predictions are helpful in developing specific hypotheses to guide wet-laboratory experiments. Commonly used bioinformatics resources include methods to identify and predict conserved sequence motifs, protein domains, transmembrane segments, signal sequences, and secondary as well as tertiary structure. Here, several such methods available through the MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) are described and how their combined use can provide meaningful information on a protein of unknown function is demonstrated. In particular, the identification of homologs of known structure using HHpred, internal repeats using HHrepID, coiled coils using PCOILS and DeepCoil, and transmembrane segments using Quick2D are focused on.
Collapse
Affiliation(s)
- Joana Pereira
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
| | - Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
| |
Collapse
|
241
|
Senkevich TG, Yutin N, Wolf YI, Koonin EV, Moss B. Ancient Gene Capture and Recent Gene Loss Shape the Evolution of Orthopoxvirus-Host Interaction Genes. mBio 2021; 12:e0149521. [PMID: 34253028 PMCID: PMC8406176 DOI: 10.1128/mbio.01495-21] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 05/24/2021] [Indexed: 01/27/2023] Open
Abstract
The survival of viruses depends on their ability to resist host defenses and, of all animal virus families, the poxviruses have the most antidefense genes. Orthopoxviruses (ORPV), a genus within the subfamily Chordopoxvirinae, infect diverse mammals and include one of the most devastating human pathogens, the now eradicated smallpox virus. ORPV encode ∼200 genes, of which roughly half are directly involved in virus genome replication and expression as well as virion morphogenesis. The remaining ∼100 "accessory" genes are responsible for virus-host interactions, particularly counter-defense of innate immunity. Complete sequences are currently available for several hundred ORPV genomes isolated from a variety of mammalian hosts, providing a rich resource for comparative genomics and reconstruction of ORPV evolution. To identify the provenance and evolutionary trends of the ORPV accessory genes, we constructed clusters including the orthologs of these genes from all chordopoxviruses. Most of the accessory genes were captured in three major waves early in chordopoxvirus evolution, prior to the divergence of ORPV and the sister genus Centapoxvirus from their common ancestor. The capture of these genes from the host was followed by extensive gene duplication, yielding several paralogous gene families. In addition, nine genes were gained during the evolution of ORPV themselves. In contrast, nearly every accessory gene was lost, some on multiple, independent occasions in numerous lineages of ORPV, so that no ORPV retains them all. A variety of functional interactions could be inferred from examination of pairs of ORPV accessory genes that were either often or rarely lost concurrently. IMPORTANCE Orthopoxviruses (ORPV) include smallpox (variola) virus, one of the most devastating human pathogens, and vaccinia virus, comprising the vaccine used for smallpox eradication. Among roughly 200 ORPV genes, about half are essential for genome replication and expression as well as virion morphogenesis, whereas the remaining half consists of accessory genes counteracting the host immune response. We reannotated the accessory genes of ORPV, predicting the functions of uncharacterized genes, and reconstructed the history of their gain and loss during the evolution of ORPV. Most of the accessory genes were acquired in three major waves antedating the origin of ORPV from chordopoxviruses. The evolution of ORPV themselves was dominated by gene loss, with numerous genes lost at the base of each major group of ORPV. Examination of pairs of ORPV accessory genes that were either often or rarely lost concurrently during ORPV evolution allows prediction of different types of functional interactions.
Collapse
Affiliation(s)
- Tatiana G. Senkevich
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Instutes of Health, Bethesda, Maryland, USA
| | - Natalya Yutin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Bernard Moss
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Instutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
242
|
Shao J, Chen J, Liu B. ProtRe-CN: Protein Remote Homology Detection by Combining Classification Methods and Network Methods via Learning to Rank. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; PP:1-1. [PMID: 34460380 DOI: 10.1109/tcbb.2021.3108168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Protein remote homology detection is one of fundamental research tasks for downstream analysis (i.e., protein structure and function prediction). Many advanced methods are proposed from different views with complementary detection ability, such as the classification method, the network method, and the ranking method. A framework integrating these heterogeneous methods is urgently desired to reduce the false positive rate and predictive bias. We propose a novel ranking method called ProtRe-CN by fusing the classification methods and network methods via Learning to Rank. Experimental results on the benchmark dataset and the independent dataset show that ProtRe-CN outperforms other existing state-of-the-art predictors. ProtRe-CN improves the detective performance via correcting the false positives in the ranking list by combining the heterogeneous methods. The web server of ProtRe-CN can be accessed at http://bliulab.net/ProtRe-CN.
Collapse
|
243
|
Gaber A, Pavšič M. Modeling and Structure Determination of Homo-Oligomeric Proteins: An Overview of Challenges and Current Approaches. Int J Mol Sci 2021; 22:9081. [PMID: 34445785 PMCID: PMC8396596 DOI: 10.3390/ijms22169081] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/20/2021] [Accepted: 08/20/2021] [Indexed: 12/12/2022] Open
Abstract
Protein homo-oligomerization is a very common phenomenon, and approximately half of proteins form homo-oligomeric assemblies composed of identical subunits. The vast majority of such assemblies possess internal symmetry which can be either exploited to help or poses challenges during structure determination. Moreover, aspects of symmetry are critical in the modeling of protein homo-oligomers either by docking or by homology-based approaches. Here, we first provide a brief overview of the nature of protein homo-oligomerization. Next, we describe how the symmetry of homo-oligomers is addressed by crystallographic and non-crystallographic symmetry operations, and how biologically relevant intermolecular interactions can be deciphered from the ordered array of molecules within protein crystals. Additionally, we describe the most important aspects of protein homo-oligomerization in structure determination by NMR. Finally, we give an overview of approaches aimed at modeling homo-oligomers using computational methods that specifically address their internal symmetry and allow the incorporation of other experimental data as spatial restraints to achieve higher model reliability.
Collapse
|
244
|
Boucher L, Somani S, Negron C, Ma W, Jacobs S, Chan W, Malia T, Obmolova G, Teplyakov A, Gilliland GL, Luo J. Surface salt bridges contribute to the extreme thermal stability of an FN3-like domain from a thermophilic bacterium. Proteins 2021; 90:270-281. [PMID: 34405904 DOI: 10.1002/prot.26218] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 03/08/2021] [Accepted: 08/02/2021] [Indexed: 12/27/2022]
Abstract
This study uses differential scanning calorimetry, X-ray crystallography, and molecular dynamics simulations to investigate the structural basis for the high thermal stability (melting temperature 97.5°C) of a FN3-like protein domain from thermophilic bacteria Thermoanaerobacter tengcongensis (FN3tt). FN3tt adopts a typical FN3 fold with a three-stranded beta sheet packing against a four-stranded beta sheet. We identified three solvent exposed arginine residues (R23, R25, and R72), which stabilize the protein through salt bridge interactions with glutamic acid residues on adjacent strands. Alanine mutation of the three arginine residues reduced melting temperature by up to 22°C. Crystal structures of the wild type (WT) and a thermally destabilized (∆Tm -19.7°C) triple mutant (R23L/R25T/R72I) were found to be nearly identical, suggesting that the destabilization is due to interactions of the arginine residues. Molecular dynamics simulations showed that the salt bridge interactions in the WT were stable and provided a dynamical explanation for the cooperativity observed between R23 and R25 based on calorimetry measurements. In addition, folding free energy changes computed using free energy perturbation molecular dynamics simulations showed high correlation with melting temperature changes. This work is another example of surface salt bridges contributing to the enhanced thermal stability of thermophilic proteins. The molecular dynamics simulation methods employed in this study may be broadly useful for in silico surface charge engineering of proteins.
Collapse
Affiliation(s)
- Lauren Boucher
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| | - Sandeep Somani
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| | | | - Wenting Ma
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| | - Steven Jacobs
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| | - Winnie Chan
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| | - Thomas Malia
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| | - Galina Obmolova
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| | - Alexey Teplyakov
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| | - Gary L Gilliland
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| | - Jinquan Luo
- Janssen Research & Development, LLC, Spring House, Pennsylvania, USA
| |
Collapse
|
245
|
Intrinsic disorder and phase transitions: Pieces in the puzzling role of the prion protein in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2021; 183:1-43. [PMID: 34656326 DOI: 10.1016/bs.pmbts.2021.06.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
After four decades of prion protein research, the pressing questions in the literature remain similar to the common existential dilemmas. Who am I? Some structural characteristics of the cellular prion protein (PrPC) and scrapie PrP (PrPSc) remain unknown: there are no high-resolution atomic structures for either full-length endogenous human PrPC or isolated infectious PrPSc particles. Why am I here? It is not known why PrPC and PrPSc are found in specific cellular compartments such as the nucleus; while the physiological functions of PrPC are still being uncovered, the misfolding site remains obscure. Where am I going? The subcellular distribution of PrPC and PrPSc is wide (reported in 10 different locations in the cell). This complexity is further exacerbated by the eight different PrP fragments yielded from conserved proteolytic cleavages and by reversible post-translational modifications, such as glycosylation, phosphorylation, and ubiquitination. Moreover, about 55 pathological mutations and 16 polymorphisms on the PrP gene (PRNP) have been described. Prion diseases also share unique, challenging features: strain phenomenon (associated with the heterogeneity of PrPSc conformations) and the possible transmissibility between species, factors which contribute to PrP undruggability. However, two recent concepts in biochemistry-intrinsically disordered proteins and phase transitions-may shed light on the molecular basis of PrP's role in physiology and disease.
Collapse
|
246
|
Moura de Sousa JA, Pfeifer E, Touchon M, Rocha EPC. Causes and Consequences of Bacteriophage Diversification via Genetic Exchanges across Lifestyles and Bacterial Taxa. Mol Biol Evol 2021; 38:2497-2512. [PMID: 33570565 PMCID: PMC8136500 DOI: 10.1093/molbev/msab044] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Bacteriophages (phages) evolve rapidly by acquiring genes from other phages. This results in mosaic genomes. Here, we identify numerous genetic transfers between distantly related phages and aim at understanding their frequency, consequences, and the conditions favoring them. Gene flow tends to occur between phages that are enriched for recombinases, transposases, and nonhomologous end joining, suggesting that both homologous and illegitimate recombination contribute to gene flow. Phage family and host phyla are strong barriers to gene exchange, but phage lifestyle is not. Even if we observe four times more recent transfers between temperate phages than between other pairs, there is extensive gene flow between temperate and virulent phages, and between the latter. These predominantly involve virulent phages with large genomes previously classed as low gene flux, and lead to the preferential transfer of genes encoding functions involved in cell energetics, nucleotide metabolism, DNA packaging and injection, and virion assembly. Such exchanges may contribute to the observed twice larger genomes of virulent phages. We used genetic transfers, which occur upon coinfection of a host, to compare phage host range. We found that virulent phages have broader host ranges and can mediate genetic exchanges between narrow host range temperate phages infecting distant bacterial hosts, thus contributing to gene flow between virulent phages, as well as between temperate phages. This gene flow drastically expands the gene repertoires available for phage and bacterial evolution, including the transfer of functional innovations across taxa.
Collapse
Affiliation(s)
| | - Eugen Pfeifer
- Microbial Evolutionary Genomics, Institut Pasteur, CNRS, UMR3525, Paris, France
| | - Marie Touchon
- Microbial Evolutionary Genomics, Institut Pasteur, CNRS, UMR3525, Paris, France
| | - Eduardo P C Rocha
- Microbial Evolutionary Genomics, Institut Pasteur, CNRS, UMR3525, Paris, France
| |
Collapse
|
247
|
Andreas MP, Giessen TW. Large-scale computational discovery and analysis of virus-derived microbial nanocompartments. Nat Commun 2021; 12:4748. [PMID: 34362927 PMCID: PMC8346489 DOI: 10.1038/s41467-021-25071-y] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 07/12/2021] [Indexed: 02/06/2023] Open
Abstract
Encapsulins are a class of microbial protein compartments defined by the viral HK97-fold of their capsid protein, self-assembly into icosahedral shells, and dedicated cargo loading mechanism for sequestering specific enzymes. Encapsulins are often misannotated and traditional sequence-based searches yield many false positive hits in the form of phage capsids. Here, we develop an integrated search strategy to carry out a large-scale computational analysis of prokaryotic genomes with the goal of discovering an exhaustive and curated set of all HK97-fold encapsulin-like systems. We find over 6,000 encapsulin-like systems in 31 bacterial and four archaeal phyla, including two novel encapsulin families. We formulate hypotheses about their potential biological functions and biomedical relevance, which range from natural product biosynthesis and stress resistance to carbon metabolism and anaerobic hydrogen production. An evolutionary analysis of encapsulins and related HK97-type virus families shows that they share a common ancestor, and we conclude that encapsulins likely evolved from HK97-type bacteriophages.
Collapse
Affiliation(s)
- Michael P Andreas
- Department of Biomedical Engineering, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Tobias W Giessen
- Department of Biomedical Engineering, University of Michigan Medical School, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
248
|
ASTE1 promotes shieldin-complex-mediated DNA repair by attenuating end resection. Nat Cell Biol 2021; 23:894-904. [PMID: 34354233 DOI: 10.1038/s41556-021-00723-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 06/25/2021] [Indexed: 12/23/2022]
Abstract
The shieldin complex functions as the downstream effector of 53BP1-RIF1 to promote DNA double-strand break end-joining by restricting end resection. The SHLD2 subunit binds to single-stranded DNA ends and blocks end resection through OB-fold domains. Besides blocking end resection, it is unclear how the shieldin complex processes SHLD2-bound single-stranded DNA and promotes non-homologous end-joining. Here, we identify a downstream effector of the shieldin complex, ASTE1, as a structure-specific DNA endonuclease that specifically cleaves single-stranded DNA and 3' overhang DNA. ASTE1 localizes to DNA damage sites in a shieldin-dependent manner. Loss of ASTE1 impairs non-homologous end-joining, leads to hyper-resection and causes defective immunoglobulin class switch recombination. ASTE1 deficiency also causes resistance to poly(ADP-ribose) polymerase inhibitors in BRCA1-deficient cells owing to restoration of homologous recombination. These findings suggest that ASTE1-mediated 3' single-stranded DNA end cleavage contributes to the control of DSB repair choice by 53BP1, RIF1 and shieldin.
Collapse
|
249
|
Terzian P, Olo Ndela E, Galiez C, Lossouarn J, Pérez Bucio RE, Mom R, Toussaint A, Petit MA, Enault F. PHROG: families of prokaryotic virus proteins clustered using remote homology. NAR Genom Bioinform 2021; 3:lqab067. [PMID: 34377978 PMCID: PMC8341000 DOI: 10.1093/nargab/lqab067] [Citation(s) in RCA: 198] [Impact Index Per Article: 49.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 06/25/2021] [Accepted: 07/16/2021] [Indexed: 12/12/2022] Open
Abstract
Viruses are abundant, diverse and ancestral biological entities. Their diversity is high, both in terms of the number of different protein families encountered and in the sequence heterogeneity of each protein family. The recent increase in sequenced viral genomes constitutes a great opportunity to gain new insights into this diversity and consequently urges the development of annotation resources to help functional and comparative analysis. Here, we introduce PHROG (Prokaryotic Virus Remote Homologous Groups), a library of viral protein families generated using a new clustering approach based on remote homology detection by HMM profile-profile comparisons. Considering 17 473 reference (pro)viruses of prokaryotes, 868 340 of the total 938 864 proteins were grouped into 38 880 clusters that proved to be a 2-fold deeper clustering than using a classical strategy based on BLAST-like similarity searches, and yet to remain homogeneous. Manual inspection of similarities to various reference sequence databases led to the annotation of 5108 clusters (containing 50.6 % of the total protein dataset) with 705 different annotation terms, included in 9 functional categories, specifically designed for viruses. Hopefully, PHROG will be a useful tool to better annotate future prokaryotic viral sequences thus helping the scientific community to better understand the evolution and ecology of these entities.
Collapse
Affiliation(s)
- Paul Terzian
- Université Clermont Auvergne, CNRS, LMGE, F-63000 Clermont-Ferrand, France
| | - Eric Olo Ndela
- Université Clermont Auvergne, CNRS, LMGE, F-63000 Clermont-Ferrand, France
| | - Clovis Galiez
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Julien Lossouarn
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France
| | | | - Robin Mom
- Université Clermont Auvergne, CNRS, LMGE, F-63000 Clermont-Ferrand, France
| | - Ariane Toussaint
- Cellular and Molecular Microbiology, IBMM-DBM, Université libre de Bruxelles, 6041 Gosselies, Belgium
| | - Marie-Agnès Petit
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350, Jouy-en-Josas, France
| | - François Enault
- Université Clermont Auvergne, CNRS, LMGE, F-63000 Clermont-Ferrand, France
| |
Collapse
|
250
|
Levine TP. TMEM106B in humans and Vac7 and Tag1 in yeast are predicted to be lipid transfer proteins. Proteins 2021; 90:164-175. [PMID: 34347309 DOI: 10.1002/prot.26201] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 07/11/2021] [Accepted: 07/23/2021] [Indexed: 11/05/2022]
Abstract
TMEM106B is an integral membrane protein of late endosomes and lysosomes involved in neuronal function, its overexpression being associated with familial frontotemporal lobar degeneration, and point mutation linked to hypomyelination. It has also been identified in multiple screens for host proteins required for productive SARS-CoV-2 infection. Because standard approaches to understand TMEM106B at the sequence level find no homology to other proteins, it has remained a protein of unknown function. Here, the standard tool PSI-BLAST was used in a nonstandard way to show that the lumenal portion of TMEM106B is a member of the late embryogenesis abundant-2 (LEA-2) domain superfamily. More sensitive tools (HMMER, HHpred, and trRosetta) extended this to predict LEA-2 domains in two yeast proteins. One is Vac7, a regulator of PI(3,5)P2 production in the degradative vacuole, equivalent to the lysosome, which has a LEA-2 domain in its lumenal domain. The other is Tag1, another vacuolar protein, which signals to terminate autophagy and has three LEA-2 domains in its lumenal domain. Further analysis of LEA-2 structures indicated that LEA-2 domains have a long, conserved lipid-binding groove. This implies that TMEM106B, Vac7, and Tag1 may all be lipid transfer proteins in the lumen of late endocytic organelles.
Collapse
|