1
|
Eckhart L, Sachslehner AP, Steinbinder J, Fischer H. Caspase Domain Duplication During the Evolution of Caspase-16. J Mol Evol 2025:10.1007/s00239-025-10252-w. [PMID: 40392285 DOI: 10.1007/s00239-025-10252-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2025] [Accepted: 05/05/2025] [Indexed: 05/22/2025]
Abstract
Caspases are cysteine-dependent aspartate-directed proteases which have critical functions in programmed cell death and inflammation. Their catalytic activity depends on a catalytic dyad of cysteine and histidine within a characteristic protein fold, the so-called caspase domain. Here, we investigated the evolution of caspase-16 (CASP16), an enigmatic member of the caspase family, for which only a partial human gene had been reported previously. The presence of CASP16 orthologs in placental mammals, marsupials and monotremes suggests that caspase-16 originated prior to the divergence of the main phylogenetic clades of mammals. Caspase-16 proteins of various species contain a carboxy-terminal caspase domain and an amino-terminal prodomain predicted to fold into a caspase domain-like structure, which is a unique feature among caspases known so far. Comparative sequence analysis indicates that the prodomain of caspase-16 has evolved by the duplication of exons encoding the caspase domain, whereby the catalytic site was lost in the amino-terminal domain and conserved in the carboxy-terminal domain of caspase-16. The murine and human orthologs of CASP16 contain frameshift mutations and therefore represent pseudogenes (CASP16P). CASP16 of the chimpanzee displays more than 98% nucleotide sequence identity with the human CASP16P gene but, like CASP16 genes of other primates, has an intact protein coding sequence. We conclude that caspase-16 structurally differs from other mammalian caspases, and the pseudogenization of CASP16 distinguishes humans from their phylogenetically closest relatives.
Collapse
Affiliation(s)
- Leopold Eckhart
- Department of Dermatology, Medical University of Vienna, 1090, Vienna, Austria.
| | | | - Julia Steinbinder
- Department of Dermatology, Medical University of Vienna, 1090, Vienna, Austria
| | - Heinz Fischer
- Division of Cell and Developmental Biology, Center for Anatomy and Cell Biology, Medical University of Vienna, 1090, Vienna, Austria
| |
Collapse
|
2
|
Coban A, Bornberg-Bauer E, Kemena C. Tracing the paths of modular evolution by quantifying rearrangement events of protein domains. BMC Ecol Evol 2025; 25:6. [PMID: 39773110 PMCID: PMC11707847 DOI: 10.1186/s12862-024-02347-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Accepted: 12/27/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND Protein evolution is central to molecular adaptation and largely characterized by modular rearrangements of domains, the evolutionary and structural building blocks of proteins. Genetic events underlying protein rearrangements are relatively rare compared to changes of amino-acids. Therefore, these events can be used to characterize and reconstruct major events of molecular adaptation by comparing large data sets of proteomes. RESULTS Here we determine, at unprecedented completeness, the rates of fusion, fission, emergence and loss of domains in five eukaryotic clades (monocots, eudicots, fungi, insects, vertebrates). By characterizing rearrangements that were previously considered "ambiguous" or "complex" we raise the fraction of resolved rearrangement events from previously ca. 60% to around 92%. We exemplify our method by analyzing the evolutionary histories of protein rearrangements in (i) the extracellular matrix, (ii) innate immunity across Eukaryota, Metazoa, and Vertebrata, and (iii) Toll-Like-Receptors in the innate immune system of Eukaryota. In all three cases we can find hot-spots of rearrangement events in their phylogeny which (i) can be related with major events of adaptation and (ii) which follow the emergence of new domains which become integrated into existing arrangements. CONCLUSION Our results demonstrate that, akin to the change at the level of amino acids, domain rearrangements follow a clock-like dynamic which can be well quantified and supports the concept of evolutionary tinkering. While many novel domain emergence events are ancient, emerged domains are quickly incorporated into a great number of proteins. In parallel, the observed rates of emergence of new domains are becoming smaller over time.
Collapse
Affiliation(s)
- Abdulbaki Coban
- Institute for Evolution and Biodiversity, University of Münster, Münster, 48159, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, 48159, Germany
- Departement of Protein Evolution, Max Planck Institute for Biology Tübingen, Tübingen, 72076, Germany
| | - Carsten Kemena
- Institute for Evolution and Biodiversity, University of Münster, Münster, 48159, Germany.
| |
Collapse
|
3
|
Szatkownik A, Zea DJ, Richard H, Laine E. Building alternative splicing and evolution-aware sequence-structure maps for protein repeats. J Struct Biol 2023; 215:107997. [PMID: 37453591 DOI: 10.1016/j.jsb.2023.107997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/15/2023] [Accepted: 07/05/2023] [Indexed: 07/18/2023]
Abstract
Alternative splicing of repeats in proteins provides a mechanism for rewiring and fine-tuning protein interaction networks. In this work, we developed a robust and versatile method, ASPRING, to identify alternatively spliced protein repeats from gene annotations. ASPRING leverages evolutionary meaningful alternative splicing-aware hierarchical graphs to provide maps between protein repeats sequences and 3D structures. We re-think the definition of repeats by explicitly accounting for transcript diversity across several genes/species. Using a stringent sequence-based similarity criterion, we detected over 5,000 evolutionary conserved repeats by screening virtually all human protein-coding genes and their orthologs across a dozen species. Through a joint analysis of their sequences and structures, we extracted specificity-determining sequence signatures and assessed their implication in experimentally resolved and modelled protein interactions. Our findings demonstrate the widespread alternative usage of protein repeats in modulating protein interactions and open avenues for targeting repeat-mediated interactions.
Collapse
Affiliation(s)
- Antoine Szatkownik
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France; Bioinformatics Unit, Genome Competence Center (MF1), Robert Koch Institute, 13353 Berlin, Germany
| | - Diego Javier Zea
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Hugues Richard
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France; Bioinformatics Unit, Genome Competence Center (MF1), Robert Koch Institute, 13353 Berlin, Germany.
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France.
| |
Collapse
|
4
|
Rapid molecular diversification and homogenization of clustered major ampullate silk genes in Argiope garden spiders. PLoS Genet 2022; 18:e1010537. [PMID: 36508456 PMCID: PMC9779670 DOI: 10.1371/journal.pgen.1010537] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 12/22/2022] [Accepted: 11/18/2022] [Indexed: 12/14/2022] Open
Abstract
The evolutionary diversification of orb-web weaving spiders is closely tied to the mechanical performance of dragline silk. This proteinaceous fiber provides the primary structural framework of orb web architecture, and its extraordinary toughness allows these structures to absorb the high energy of aerial prey impact. The dominant model of dragline silk molecular structure involves the combined function of two highly repetitive, spider-specific, silk genes (spidroins)-MaSp1 and MaSp2. Recent genomic studies, however, have suggested this framework is overly simplistic, and our understanding of how MaSp genes evolve is limited. Here we present a comprehensive analysis of MaSp structural and evolutionary diversity across species of Argiope (garden spiders). This genomic analysis reveals the largest catalog of MaSp genes found in any spider, driven largely by an expansion of MaSp2 genes. The rapid diversification of Argiope MaSp genes, located primarily in a single genomic cluster, is associated with profound changes in silk gene structure. MaSp2 genes, in particular, have evolved complex hierarchically organized repeat units (ensemble repeats) delineated by novel introns that exhibit remarkable evolutionary dynamics. These repetitive introns have arisen independently within the genus, are highly homogenized within a gene, but diverge rapidly between genes. In some cases, these iterated introns are organized in an alternating structure in which every other intron is nearly identical in sequence. We hypothesize that this intron structure has evolved to facilitate homogenization of the coding sequence. We also find evidence of intergenic gene conversion and identify a more diverse array of stereotypical amino acid repeats than previously recognized. Overall, the extreme diversification found among MaSp genes requires changes in the structure-function model of dragline silk performance that focuses on the differential use and interaction among various MaSp paralogs as well as the impact of ensemble repeat structure and different amino acid motifs on mechanical behavior.
Collapse
|
5
|
Betschart B, Bisoffi M, Alaeddine F. Identification and characterization of epicuticular proteins of nematodes sharing motifs with cuticular proteins of arthropods. PLoS One 2022; 17:e0274751. [PMID: 36301857 PMCID: PMC9612446 DOI: 10.1371/journal.pone.0274751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 09/05/2022] [Indexed: 11/18/2022] Open
Abstract
Specific collagens and insoluble proteins called cuticlins are major constituents of the nematode cuticles. The epicuticle, which forms the outermost electron-dense layer of the cuticle, is composed of another category of insoluble proteins called epicuticlins. It is distinct from the insoluble cuticlins localized in the cortical layer and the fibrous ribbon underneath lateral alae. Our objective was to identify and characterize genes and their encoded proteins forming the epicuticle. The combination between previously obtained laboratory results and recently made available data through the whole-genome shotgun contigs (WGS) and the transcriptome Shotgun Assembly (TSA) sequencing projects of Ascaris suum allowed us to identify the first epicuticlin gene, Asu-epic-1, on the chromosome VI. This gene is formed of exon1 (55 bp) and exon2 (1067 bp), separated by an intron of 1593 bp. Exon 2 is formed of tandem repeats (TR) whose number varies in different cDNA and genomic clones of Asu-epic-1. These variations could be due to slippage of the polymerases during DNA replication and RNA transcription leading to insertions and deletions (Indels). The deduced protein, Asu-EPIC-1, consists of a signal peptide of 20 amino acids followed by 353 amino acids composed of seven TR of 49 or 51 amino acids each. Three highly conserved tyrosine motifs characterize each repeat. The GYR motif is the Pfam motif PF02756 present in several cuticular proteins of arthropods. Asu-EPIC-1 is an intrinsically disordered protein (IDP) containing seven predicted molecular recognition features (MoRFs). This type of protein undergoes a disorder-to-order transition upon binding protein partners. Three epicuticular sequences have been identified in A. suum, Ascaris lumbricoides, and Toxocara canis. Homologous epicuticular proteins were identified in over 50 other nematode species. The potential of this new category of proteins in forming the nematode cuticle through covalent interactions with other cuticular components, particularly with collagens, is discussed. Their localization in the outermost layer of the nematode body and their unique structure render them crucial candidates for biochemical and molecular interaction studies and targets for new biotechnological and biomedical applications.
Collapse
Affiliation(s)
- Bruno Betschart
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| | - Marco Bisoffi
- Chemistry and Biochemistry, Schmid College of Science and Technology, Chapman University, Orange, California, United States of America
| | - Ferial Alaeddine
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| |
Collapse
|
6
|
Kruglikov A, Wei Y, Xia X. Proteins from Thermophilic Thermus thermophilus Often Do Not Fold Correctly in a Mesophilic Expression System Such as Escherichia coli. ACS OMEGA 2022; 7:37797-37806. [PMID: 36312379 PMCID: PMC9608423 DOI: 10.1021/acsomega.2c04786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/07/2022] [Indexed: 06/16/2023]
Abstract
Majority of protein structure studies use Escherichia coli (E. coli) and other model organisms as expression systems for other species' genes. However, protein folding depends on cellular environment factors, such as chaperone proteins, cytoplasmic pH, temperature, and ionic concentrations. Because of differences in these factors, especially temperature and chaperones, native proteins in organisms such as extremophiles may fold improperly when they are expressed in mesophilic model organisms. Here we present a methodology of assessing the effects of using E. coli as the expression system on protein structures. We compare these effects between eight mesophilic bacteria and Thermus thermophilus (T. thermophilus), a thermophile, and found that differences are significantly larger for T. thermophilus. More specifically, helical secondary structures in T. thermophilus proteins are often replaced by coil structures in E. coli. Our results show unique directionality in misfolding when proteins in thermophiles are expressed in mesophiles. This indicates that extremophiles, such as thermophiles, require unique protein expression systems in protein folding studies.
Collapse
Affiliation(s)
- Alibek Kruglikov
- Department
of Biology, University of Ottawa, Ottawa, Canada K1N 6N5
| | - Yulong Wei
- Department
of Biology, University of Ottawa, Ottawa, Canada K1N 6N5
| | - Xuhua Xia
- Department
of Biology, University of Ottawa, Ottawa, Canada K1N 6N5
- Ottawa
Institute of Systems Biology, University
of Ottawa, Ottawa, Canada K1N 6N5
| |
Collapse
|
7
|
Abstract
Repeat proteins are made with tandem copies of similar amino acid stretches that fold into elongated architectures. These proteins constitute excellent model systems to investigate how evolution relates to structure, folding, and function. Here, we propose a scheme to map evolutionary information at the sequence level to a coarse-grained model for repeat-protein folding and use it to investigate the folding of thousands of repeat proteins. We model the energetics by a combination of an inverse Potts-model scheme with an explicit mechanistic model of duplications and deletions of repeats to calculate the evolutionary parameters of the system at the single-residue level. These parameters are used to inform an Ising-like model that allows for the generation of folding curves, apparent domain emergence, and occupation of intermediate states that are highly compatible with experimental data in specific case studies. We analyzed the folding of thousands of natural Ankyrin repeat proteins and found that a multiplicity of folding mechanisms are possible. Fully cooperative all-or-none transitions are obtained for arrays with enough sequence-similar elements and strong interactions between them, while noncooperative element-by-element intermittent folding arose if the elements are dissimilar and the interactions between them are energetically weak. Additionally, we characterized nucleation-propagation and multidomain folding mechanisms. We show that the global stability and cooperativity of the repeating arrays can be predicted from simple sequence scores.
Collapse
|
8
|
Cui X, Xue Y, McCormack C, Garces A, Rachman TW, Yi Y, Stolzer M, Durand D. Simulating domain architecture evolution. Bioinformatics 2022; 38:i134-i142. [PMID: 35758772 PMCID: PMC9236583 DOI: 10.1093/bioinformatics/btac242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Simulation is an essential technique for generating biomolecular data with a ‘known’ history for use in validating phylogenetic inference and other evolutionary methods. On longer time scales, simulation supports investigations of equilibrium behavior and provides a formal framework for testing competing evolutionary hypotheses. Twenty years of molecular evolution research have produced a rich repertoire of simulation methods. However, current models do not capture the stringent constraints acting on the domain insertions, duplications, and deletions by which multidomain architectures evolve. Although these processes have the potential to generate any combination of domains, only a tiny fraction of possible domain combinations are observed in nature. Modeling these stringent constraints on domain order and co-occurrence is a fundamental challenge in domain architecture simulation that does not arise with sequence and gene family simulation. Results Here, we introduce a stochastic model of domain architecture evolution to simulate evolutionary trajectories that reflect the constraints on domain order and co-occurrence observed in nature. This framework is implemented in a novel domain architecture simulator, DomArchov, using the Metropolis–Hastings algorithm with data-driven transition probabilities. The use of a data-driven event module enables quick and easy redeployment of the simulator for use in different taxonomic and protein function contexts. Using empirical evaluation with metazoan datasets, we demonstrate that domain architectures simulated by DomArchov recapitulate properties of genuine domain architectures that reflect the constraints on domain order and adjacency seen in nature. This work expands the realm of evolutionary processes that are amenable to simulation. Availability and implementation DomArchov is written in Python 3 and is available at http://www.cs.cmu.edu/~durand/DomArchov. The data underlying this article are available via the same link. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyue Cui
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Yifan Xue
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Collin McCormack
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alejandro Garces
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Thomas W Rachman
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Yang Yi
- Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.,Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Maureen Stolzer
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
9
|
Savino S, Desmet T, Franceus J. Insertions and deletions in protein evolution and engineering. Biotechnol Adv 2022; 60:108010. [PMID: 35738511 DOI: 10.1016/j.biotechadv.2022.108010] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/17/2022]
Abstract
Protein evolution or engineering studies are traditionally focused on amino acid substitutions and the way these contribute to fitness. Meanwhile, the insertion and deletion of amino acids is often overlooked, despite being one of the most common sources of genetic variation. Recent methodological advances and successful engineering stories have demonstrated that the time is ripe for greater emphasis on these mutations and their understudied effects. This review highlights the evolutionary importance and biotechnological relevance of insertions and deletions (indels). We provide a comprehensive overview of approaches that can be employed to include indels in random, (semi)-rational or computational protein engineering pipelines. Furthermore, we discuss the tolerance to indels at the structural level, address how domain indels can link the function of unrelated proteins, and feature studies that illustrate the surprising and intriguing potential of frameshift mutations.
Collapse
Affiliation(s)
- Simone Savino
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Tom Desmet
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Jorick Franceus
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium..
| |
Collapse
|
10
|
Lindenburg LH, Pantelejevs T, Gielen F, Zuazua-Villar P, Butz M, Rees E, Kaminski CF, Downs JA, Hyvönen M, Hollfelder F. Improved RAD51 binders through motif shuffling based on the modularity of BRC repeats. Proc Natl Acad Sci U S A 2021; 118:e2017708118. [PMID: 34772801 PMCID: PMC8727024 DOI: 10.1073/pnas.2017708118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/10/2021] [Indexed: 01/20/2023] Open
Abstract
Exchanges of protein sequence modules support leaps in function unavailable through point mutations during evolution. Here we study the role of the two RAD51-interacting modules within the eight binding BRC repeats of BRCA2. We created 64 chimeric repeats by shuffling these modules and measured their binding to RAD51. We found that certain shuffled module combinations were stronger binders than any of the module combinations in the natural repeats. Surprisingly, the contribution from the two modules was poorly correlated with affinities of natural repeats, with a weak BRC8 repeat containing the most effective N-terminal module. The binding of the strongest chimera, BRC8-2, to RAD51 was improved by -2.4 kCal/mol compared to the strongest natural repeat, BRC4. A crystal structure of RAD51:BRC8-2 complex shows an improved interface fit and an extended β-hairpin in this repeat. BRC8-2 was shown to function in human cells, preventing the formation of nuclear RAD51 foci after ionizing radiation.
Collapse
Affiliation(s)
- Laurens H Lindenburg
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Teodors Pantelejevs
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Fabrice Gielen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
- Living Systems Institute, University of Exeter, Exeter EX4 4QD, United Kingdom
| | - Pedro Zuazua-Villar
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Maren Butz
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Eric Rees
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Clemens F Kaminski
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Jessica A Downs
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Marko Hyvönen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| |
Collapse
|
11
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Structural, Functional, and Evolutionary Characteristics of Proteins with Repeats. Mol Biol 2021. [DOI: 10.1134/s0026893321040038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
12
|
Izert MA, Szybowska PE, Górna MW, Merski M. The Effect of Mutations in the TPR and Ankyrin Families of Alpha Solenoid Repeat Proteins. FRONTIERS IN BIOINFORMATICS 2021; 1:696368. [PMID: 36303725 PMCID: PMC9581033 DOI: 10.3389/fbinf.2021.696368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/22/2021] [Indexed: 11/20/2022] Open
Abstract
Protein repeats are short, highly similar peptide motifs that occur several times within a single protein, for example the TPR and Ankyrin repeats. Understanding the role of mutation in these proteins is complicated by the competing facts that 1) the repeats are much more restricted to a set sequence than non-repeat proteins, so mutations should be harmful much more often because there are more residues that are heavily restricted due to the need of the sequence to repeat and 2) the symmetry of the repeats in allows the distribution of functional contributions over a number of residues so that sometimes no specific site is singularly responsible for function (unlike enzymatic active site catalytic residues). To address this issue, we review the effects of mutations in a number of natural repeat proteins from the tetratricopeptide and Ankyrin repeat families. We find that mutations are context dependent. Some mutations are indeed highly disruptive to the function of the protein repeats while mutations in identical positions in other repeats in the same protein have little to no effect on structure or function.
Collapse
Affiliation(s)
| | | | | | - Matthew Merski
- *Correspondence: Maria Wiktoria Górna, ; Matthew Merski,
| |
Collapse
|
13
|
Homopeptide and homocodon levels across fungi are coupled to GC/AT-bias and intrinsic disorder, with unique behaviours for some amino acids. Sci Rep 2021; 11:10025. [PMID: 33976321 PMCID: PMC8113271 DOI: 10.1038/s41598-021-89650-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/22/2021] [Indexed: 11/09/2022] Open
Abstract
Homopeptides (runs of one amino-acid type) are evolutionarily important since they are prone to expand/contract during DNA replication, recombination and repair. To gain insight into the genomic/proteomic traits driving their variation, we analyzed how homopeptides and homocodons (which are pure codon repeats) vary across 405 Dikarya, and probed their linkage to genome GC/AT bias and other factors. We find that amino-acid homopeptide frequencies vary diversely between clades, with the AT-rich Saccharomycotina trending distinctly. As organisms evolve, homocodon and homopeptide numbers are majorly coupled to GC/AT-bias, exhibiting a bi-furcated correlation with degree of AT- or GC-bias. Mid-GC/AT genomes tend to have markedly fewer simply because they are mid-GC/AT. Despite these trends, homopeptides tend to be GC-biased relative to other parts of coding sequences, even in AT-rich organisms, indicating they absorb AT bias less or are inherently more GC-rich. The most frequent and most variable homopeptide amino acids favour intrinsic disorder, and there are an opposing correlation and anti-correlation versus homopeptide levels for intrinsic disorder and structured-domain content respectively. Specific homopeptides show unique behaviours that we suggest are linked to inherent slippage probabilities during DNA replication and recombination, such as poly-glutamine, which is an evolutionarily very variable homopeptide with a codon repertoire unbiased for GC/AT, and poly-lysine whose homocodons are overwhelmingly made from the codon AAG.
Collapse
|
14
|
Paladin L, Necci M, Piovesan D, Mier P, Andrade-Navarro MA, Tosatto SCE. A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. J Struct Biol 2020; 212:107608. [PMID: 32896658 DOI: 10.1016/j.jsb.2020.107608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/30/2022]
Abstract
Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the "repeat/exon plot". An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the β propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific "evolutionary pattern" which may improve TRPs detection and classification.
Collapse
Affiliation(s)
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padova, Italy
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, Germany
| | | | | |
Collapse
|
15
|
Galpern EA, Freiberger MI, Ferreiro DU. Large Ankyrin repeat proteins are formed with similar and energetically favorable units. PLoS One 2020; 15:e0233865. [PMID: 32579546 PMCID: PMC7314423 DOI: 10.1371/journal.pone.0233865] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/13/2020] [Indexed: 11/19/2022] Open
Abstract
Ankyrin containing proteins are one of the most abundant repeat protein families present in all extant organisms. They are made with tandem copies of similar amino acid stretches that fold into elongated architectures. Here, we built and curated a dataset of 200 thousand proteins that contain 1.2 million Ankyrin regions and characterize the abundance, structure and energetics of the repetitive regions in natural proteins. We found that there is a continuous roughly exponential variety of array lengths with an exceptional frequency at 24 repeats. We described that individual repeats are seldom interrupted with long insertions and accept few deletions, in line with the known tertiary structures. We found that longer arrays are made up of repeats that are more similar to each other than shorter arrays, and display more favourable folding energy, hinting at their evolutionary origin. The array distributions show that there is a physical upper limit to the size of an array of repeats of about 120 copies, consistent with the limit found in nature. The identity patterns within the arrays suggest that they may have originated by sequential copies of more than one Ankyrin unit.
Collapse
Affiliation(s)
- Ezequiel A. Galpern
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - María I. Freiberger
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Diego U. Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
- * E-mail:
| |
Collapse
|
16
|
Merski M, Młynarczyk K, Ludwiczak J, Skrzeczkowski J, Dunin-Horkawicz S, Górna MW. Self-analysis of repeat proteins reveals evolutionarily conserved patterns. BMC Bioinformatics 2020; 21:179. [PMID: 32381046 PMCID: PMC7204011 DOI: 10.1186/s12859-020-3493-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 04/15/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional "dot plot" protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. RESULTS Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2% sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. CONCLUSIONS Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.
Collapse
Affiliation(s)
- Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Krzysztof Młynarczyk
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Jakub Skrzeczkowski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Stanisław Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Maria W. Górna
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| |
Collapse
|
17
|
Shafee T, Bacic A, Johnson K. Evolution of Sequence-Diverse Disordered Regions in a Protein Family: Order within the Chaos. Mol Biol Evol 2020; 37:2155-2172. [DOI: 10.1093/molbev/msaa096] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Abstract
Approaches for studying the evolution of globular proteins are now well established yet are unsuitable for disordered sequences. Our understanding of the evolution of proteins containing disordered regions therefore lags that of globular proteins, limiting our capacity to estimate their evolutionary history, classify paralogs, and identify potential sequence–function relationships. Here, we overcome these limitations by using new analytical approaches that project representations of sequence space to dissect the evolution of proteins with both ordered and disordered regions, and the correlated changes between these. We use the fasciclin-like arabinogalactan proteins (FLAs) as a model family, since they contain a variable number of globular fasciclin domains as well as several distinct types of disordered regions: proline (Pro)-rich arabinogalactan (AG) regions and longer Pro-depleted regions.
Sequence space projections of fasciclin domains from 2019 FLAs from 78 species identified distinct clusters corresponding to different types of fasciclin domains. Clusters can be similarly identified in the seemingly random Pro-rich AG and Pro-depleted disordered regions. Sequence features of the globular and disordered regions clearly correlate with one another, implying coevolution of these distinct regions, as well as with the N-linked and O-linked glycosylation motifs. We reconstruct the overall evolutionary history of the FLAs, annotated with the changing domain architectures, glycosylation motifs, number and length of AG regions, and disordered region sequence features. Mapping these features onto the functionally characterized FLAs therefore enables their sequence–function relationships to be interrogated. These findings will inform research on the abundant disordered regions in protein families from all kingdoms of life.
Collapse
Affiliation(s)
- Thomas Shafee
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
| | - Antony Bacic
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
- Sino-Australia Plant Cell Wall Research Centre, College of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, China
| | - Kim Johnson
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
- Sino-Australia Plant Cell Wall Research Centre, College of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, China
| |
Collapse
|
18
|
A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder. Genes (Basel) 2020; 11:genes11040407. [PMID: 32283633 PMCID: PMC7230257 DOI: 10.3390/genes11040407] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/29/2020] [Accepted: 04/01/2020] [Indexed: 12/31/2022] Open
Abstract
Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence.
Collapse
|
19
|
Dohmen E, Klasberg S, Bornberg-Bauer E, Perrey S, Kemena C. The modular nature of protein evolution: domain rearrangement rates across eukaryotic life. BMC Evol Biol 2020; 20:30. [PMID: 32059645 PMCID: PMC7023805 DOI: 10.1186/s12862-020-1591-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 01/31/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Modularity is important for evolutionary innovation. The recombination of existing units to form larger complexes with new functionalities spares the need to create novel elements from scratch. In proteins, this principle can be observed at the level of protein domains, functional subunits which are regularly rearranged to acquire new functions. RESULTS In this study we analyse the mechanisms leading to new domain arrangements in five major eukaryotic clades (vertebrates, insects, fungi, monocots and eudicots) at unprecedented depth and breadth. This allows, for the first time, to directly compare rates of rearrangements between different clades and identify both lineage specific and general patterns of evolution in the context of domain rearrangements. We analyse arrangement changes along phylogenetic trees by reconstructing ancestral domain content in combination with feasible single step events, such as fusion or fission. Using this approach we explain up to 70% of all rearrangements by tracing them back to their precursors. We find that rates in general and the ratio between these rates for a given clade in particular, are highly consistent across all clades. In agreement with previous studies, fusions are the most frequent event leading to new domain arrangements. A lineage specific pattern in fungi reveals exceptionally high loss rates compared to other clades, supporting recent studies highlighting the importance of loss for evolutionary innovation. Furthermore, our methodology allows us to link domain emergences at specific nodes in the phylogenetic tree to important functional developments, such as the origin of hair in mammals. CONCLUSIONS Our results demonstrate that domain rearrangements are based on a canonical set of mutational events with rates which lie within a relatively narrow and consistent range. In addition, gained knowledge about these rates provides a basis for advanced domain-based methodologies for phylogenetics and homology analysis which complement current sequence-based methods.
Collapse
Affiliation(s)
- Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany.,Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, Recklinghausen, 45665, Germany
| | - Steffen Klasberg
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany
| | - Sören Perrey
- Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, Recklinghausen, 45665, Germany
| | - Carsten Kemena
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, 48149, Germany.
| |
Collapse
|
20
|
Northover DE, Shank SD, Liberles DA. Characterizing lineage-specific evolution and the processes driving genomic diversification in chordates. BMC Evol Biol 2020; 20:24. [PMID: 32046633 PMCID: PMC7011509 DOI: 10.1186/s12862-020-1585-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 01/16/2020] [Indexed: 11/21/2022] Open
Abstract
Background Understanding the origins of genome content has long been a goal of molecular evolution and comparative genomics. By examining genome evolution through the guise of lineage-specific evolution, it is possible to make inferences about the evolutionary events that have given rise to species-specific diversification. Here we characterize the evolutionary trends found in chordate species using The Adaptive Evolution Database (TAED). TAED is a database of phylogenetically indexed gene families designed to detect episodes of directional or diversifying selection across chordates. Gene families within the database have been assessed for lineage-specific estimates of dN/dS and have been reconciled to the chordate species to identify retained duplicates. Gene families have also been mapped to the functional pathways and amino acid changes which occurred on high dN/dS lineages have been mapped to protein structures. Results An analysis of this exhaustive database has enabled a characterization of the processes of lineage-specific diversification in chordates. A pathway level enrichment analysis of TAED determined that pathways most commonly found to have elevated rates of evolution included those involved in metabolism, immunity, and cell signaling. An analysis of protein fold presence on proteins, after normalizing for frequency in the database, found common folds such as Rossmann folds, Jelly Roll folds, and TIM barrels were overrepresented on proteins most likely to undergo directional selection. A set of gene families which experience increased numbers of duplications within short evolutionary times are associated with pathways involved in metabolism, olfactory reception, and signaling. An analysis of protein secondary structure indicated more relaxed constraint in β-sheets and stronger constraint on alpha Helices, amidst a general preference for substitutions at exposed sites. Lastly a detailed analysis of the ornithine decarboxylase gene family, a key enzyme in the pathway for polyamine synthesis, revealed lineage-specific evolution along the lineage leading to Cetacea through rapid sequence evolution in a duplicate gene with amino acid substitutions causing active site rearrangement. Conclusion Episodes of lineage-specific evolution are frequent throughout chordate species. Both duplication and directional selection have played large roles in the evolution of the phylum. TAED is a powerful tool for facilitating this understanding of lineage-specific evolution.
Collapse
Affiliation(s)
- David E Northover
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - Stephen D Shank
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA. .,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA.
| |
Collapse
|
21
|
Thomas GWC, Dohmen E, Hughes DST, Murali SC, Poelchau M, Glastad K, Anstead CA, Ayoub NA, Batterham P, Bellair M, Binford GJ, Chao H, Chen YH, Childers C, Dinh H, Doddapaneni HV, Duan JJ, Dugan S, Esposito LA, Friedrich M, Garb J, Gasser RB, Goodisman MAD, Gundersen-Rindal DE, Han Y, Handler AM, Hatakeyama M, Hering L, Hunter WB, Ioannidis P, Jayaseelan JC, Kalra D, Khila A, Korhonen PK, Lee CE, Lee SL, Li Y, Lindsey ARI, Mayer G, McGregor AP, McKenna DD, Misof B, Munidasa M, Munoz-Torres M, Muzny DM, Niehuis O, Osuji-Lacy N, Palli SR, Panfilio KA, Pechmann M, Perry T, Peters RS, Poynton HC, Prpic NM, Qu J, Rotenberg D, Schal C, Schoville SD, Scully ED, Skinner E, Sloan DB, Stouthamer R, Strand MR, Szucsich NU, Wijeratne A, Young ND, Zattara EE, Benoit JB, Zdobnov EM, Pfrender ME, Hackett KJ, Werren JH, Worley KC, Gibbs RA, Chipman AD, Waterhouse RM, Bornberg-Bauer E, Hahn MW, Richards S. Gene content evolution in the arthropods. Genome Biol 2020; 21:15. [PMID: 31969194 PMCID: PMC6977273 DOI: 10.1186/s13059-019-1925-7] [Citation(s) in RCA: 106] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 12/26/2019] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Arthropods comprise the largest and most diverse phylum on Earth and play vital roles in nearly every ecosystem. Their diversity stems in part from variations on a conserved body plan, resulting from and recorded in adaptive changes in the genome. Dissection of the genomic record of sequence change enables broad questions regarding genome evolution to be addressed, even across hyper-diverse taxa within arthropods. RESULTS Using 76 whole genome sequences representing 21 orders spanning more than 500 million years of arthropod evolution, we document changes in gene and protein domain content and provide temporal and phylogenetic context for interpreting these innovations. We identify many novel gene families that arose early in the evolution of arthropods and during the diversification of insects into modern orders. We reveal unexpected variation in patterns of DNA methylation across arthropods and examples of gene family and protein domain evolution coincident with the appearance of notable phenotypic and physiological adaptations such as flight, metamorphosis, sociality, and chemoperception. CONCLUSIONS These analyses demonstrate how large-scale comparative genomics can provide broad new insights into the genotype to phenotype map and generate testable hypotheses about the evolution of animal diversity.
Collapse
Affiliation(s)
- Gregg W. C. Thomas
- 0000 0001 0790 959Xgrid.411377.7Department of Biology and Department of Computer Science, Indiana University, Bloomington, IN USA
| | - Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münsterss, 48149 Münster, Germany ,0000 0001 2287 2617grid.9026.dInstitute for Bioinformatics and Chemoinformatics, University of Hamburg, Hamburg, Germany ,Westphalian University of Applied Sciences, 45665 Recklinghausen, Germany
| | - Daniel S. T. Hughes
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA ,0000000419368729grid.21729.3fPresent Address: Institute for Genomic Medicine, Columbia University, New York, NY 10032 USA
| | - Shwetha C. Murali
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA ,0000000122986657grid.34477.33Present Address: Howard Hughes Medical Institute, Department of Genome Sciences, University of Washington, Seattle, WA 98195 USA
| | - Monica Poelchau
- 0000 0001 2113 2895grid.483014.aNational Agricultural Library, USDA, Beltsville, MD 20705 USA
| | - Karl Glastad
- 0000 0001 2097 4943grid.213917.fSchool of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332 USA ,0000 0004 1936 8972grid.25879.31Present Address: Penn Epigenetics Institute, Department of Cell and Developmental Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104 USA
| | - Clare A. Anstead
- 0000 0001 2179 088Xgrid.1008.9Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Nadia A. Ayoub
- grid.268042.aDepartment of Biology, Washington and Lee University, 204 West Washington Street, Lexington, VA 24450 USA
| | - Phillip Batterham
- 0000 0001 2179 088Xgrid.1008.9School of BioSciences Science Faculty, The University of Melbourne, Melbourne, VIC 3010 Australia
| | - Michelle Bellair
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA ,Present Address: CooperGenomics, Houston, TX USA
| | - Greta J. Binford
- 0000 0004 1936 9043grid.259053.8Department of Biology, Lewis & Clark College, Portland, OR 97219 USA
| | - Hsu Chao
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Yolanda H. Chen
- 0000 0004 1936 7689grid.59062.38Department of Plant and Soil Sciences, University of Vermont, Burlington, USA
| | - Christopher Childers
- 0000 0001 2113 2895grid.483014.aNational Agricultural Library, USDA, Beltsville, MD 20705 USA
| | - Huyen Dinh
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Harsha Vardhan Doddapaneni
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Jian J. Duan
- 0000 0004 0404 0958grid.463419.dBeneficial Insects Introduction Research Unit, United States Department of Agriculture, Agricultural Research Service, Newark, DE USA
| | - Shannon Dugan
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Lauren A. Esposito
- 0000 0004 0461 6769grid.242287.9Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Drive, San Francisco, CA 94118 USA
| | - Markus Friedrich
- 0000 0001 1456 7807grid.254444.7Department of Biological Sciences, Wayne State University, Detroit, MI 48202 USA
| | - Jessica Garb
- 0000 0000 9620 1122grid.225262.3Department of Biological Sciences, University of Massachusetts Lowell, 198 Riverside Street, Lowell, MA 01854 USA
| | - Robin B. Gasser
- 0000 0001 2179 088Xgrid.1008.9Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Michael A. D. Goodisman
- 0000 0001 2097 4943grid.213917.fSchool of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332 USA
| | - Dawn E. Gundersen-Rindal
- 0000 0004 0404 0958grid.463419.dUSDA-ARS Invasive Insect Biocontrol and Behavior Laboratory, Beltsville, MD USA
| | - Yi Han
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Alfred M. Handler
- 0000 0004 0404 0958grid.463419.dUSDA-ARS, Center for Medical, Agricultural, and Veterinary Entomology, 1700 S.W. 23rd Drive, Gainesville, FL 32608 USA
| | - Masatsugu Hatakeyama
- 0000 0001 0699 0373grid.410590.9Division of Insect Sciences, National Institute of Agrobiological Sciences, Owashi, Tsukuba, 305-8634 Japan
| | - Lars Hering
- 0000 0001 1089 1036grid.5155.4Department of Zoology, Institute of Biology, University of Kassel, 34132 Kassel, Germany
| | - Wayne B. Hunter
- 0000 0004 0404 0958grid.463419.dUSDA ARS, U. S. Horticultural Research Laboratory, Ft. Pierce, FL 34945 USA
| | - Panagiotis Ioannidis
- 0000 0001 2322 4988grid.8591.5Department of Genetic Medicine and Development and Swiss Institute of Bioinformatics, University of Geneva, 1211 Geneva, Switzerland ,0000 0004 0635 685Xgrid.4834.bPresent Address: Foundation for Research and Technology Hellas, Institute of Molecular Biology and Biotechnology, Vassilika Vouton, 70013 Heraklion, Greece
| | - Joy C. Jayaseelan
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Divya Kalra
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Abderrahman Khila
- 0000 0001 2150 7757grid.7849.2Université de Lyon, Institut de Génomique Fonctionnelle de Lyon, CNRS UMR 5242, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, 46 allée d’Italie, 69364 Lyon, France
| | - Pasi K. Korhonen
- 0000 0001 2179 088Xgrid.1008.9Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Carol Eunmi Lee
- 0000 0001 0701 8607grid.28803.31Department of Integrative Biology, University of Wisconsin, Madison, WI 53706 USA
| | - Sandra L. Lee
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Yiyuan Li
- 0000 0001 2168 0066grid.131063.6Department of Biological Sciences, University of Notre Dame, 109B Galvin Life Sciences, Notre Dame, IN 46556 USA
| | - Amelia R. I. Lindsey
- 0000 0001 2222 1582grid.266097.cDepartment of Entomology, University of California Riverside, Riverside, CA USA ,0000 0001 0790 959Xgrid.411377.7Present Address: Department of Biology, Indiana University, Bloomington, IN USA
| | - Georg Mayer
- 0000 0001 1089 1036grid.5155.4Department of Zoology, Institute of Biology, University of Kassel, 34132 Kassel, Germany
| | - Alistair P. McGregor
- 0000 0001 0726 8331grid.7628.bDepartment of Biological and Medical Sciences, Oxford Brookes University, Gipsy Lane, Oxford, OX3 0BP UK
| | - Duane D. McKenna
- 0000 0000 9560 654Xgrid.56061.34Department of Biological Sciences, University of Memphis, 3700 Walker Ave, Memphis, TN 38152 USA
| | - Bernhard Misof
- 0000 0001 2216 5875grid.452935.cCenter for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Mala Munidasa
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Monica Munoz-Torres
- 0000 0001 2231 4551grid.184769.5Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, USA ,0000 0004 4665 2899grid.497331.bPresent Address: Phoenix Bioinformatics, 39221 Paseo Padre Parkway, Ste. J., Fremont, CA 94538 USA
| | - Donna M. Muzny
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Oliver Niehuis
- grid.5963.9Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert Ludwig University of Freiburg, 79104 Freiburg (Brsg.), Germany
| | - Nkechinyere Osuji-Lacy
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Subba R. Palli
- 0000 0004 1936 8438grid.266539.dDepartment of Entomology, University of Kentucky, Lexington, KY 40546 USA
| | - Kristen A. Panfilio
- 0000 0000 8809 1613grid.7372.1School of Life Sciences, University of Warwick, Gibbet Hill Campus, Coventry, CV4 7AL UK
| | - Matthias Pechmann
- 0000 0000 8580 3777grid.6190.eCologne Biocenter, Zoological Institute, Department of Developmental Biology, University of Cologne, 50674 Cologne, Germany
| | - Trent Perry
- 0000 0001 2179 088Xgrid.1008.9School of BioSciences Science Faculty, The University of Melbourne, Melbourne, VIC 3010 Australia
| | - Ralph S. Peters
- 0000 0001 2216 5875grid.452935.cCentre of Taxonomy and Evolutionary Research, Arthropoda Department, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Helen C. Poynton
- 0000 0004 0386 3207grid.266685.9School for the Environment, University of Massachusetts Boston, Boston, MA 02125 USA
| | - Nikola-Michael Prpic
- 0000 0001 2364 4210grid.7450.6Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Abteilung für Entwicklungsbiologie, Georg-August-Universität Göttingen, Göttingen, Germany ,0000 0001 2364 4210grid.7450.6Göttingen Center for Molecular Biosciences (GZMB), Georg-August-Universität Göttingen, Göttingen, Germany
| | - Jiaxin Qu
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Dorith Rotenberg
- 0000 0001 2173 6074grid.40803.3fDepartment of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27606 USA
| | - Coby Schal
- 0000 0001 2173 6074grid.40803.3fDepartment of Entomology and W.M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695 USA
| | - Sean D. Schoville
- 0000 0001 2167 3675grid.14003.36Department of Entomology, University of Wisconsin-Madison, Madison, USA
| | - Erin D. Scully
- Stored Product Insect and Engineering Research Unit, USDA-ARS Center for Grain and Animal Health Research, Manhattan, KS 66502 USA
| | - Evette Skinner
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Daniel B. Sloan
- 0000 0004 1936 8083grid.47894.36Department of Biology, Colorado State University, Ft. Collins, CO USA
| | - Richard Stouthamer
- 0000 0001 2222 1582grid.266097.cDepartment of Entomology, University of California Riverside, Riverside, CA USA
| | - Michael R. Strand
- 0000 0004 1936 738Xgrid.213876.9Department of Entomology, University of Georgia, Athens, GA USA
| | - Nikolaus U. Szucsich
- 0000 0001 2169 5989grid.252381.fPresent Address: Arkansas Biosciences Institute, Arkansas State University, Jonesboro, AR USA
| | - Asela Wijeratne
- 0000 0000 9560 654Xgrid.56061.34Department of Biological Sciences, University of Memphis, 3700 Walker Ave, Memphis, TN 38152 USA ,0000 0001 2112 4115grid.425585.bNatural History Museum Vienna, Burgring 7, 1010 Vienna, Austria
| | - Neil D. Young
- 0000 0001 2179 088Xgrid.1008.9Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Eduardo E. Zattara
- 0000 0001 2112 473Xgrid.412234.2INIBIOMA, Univ. Nacional del Comahue – CONICET, Bariloche, Argentina
| | - Joshua B. Benoit
- 0000 0001 2179 9593grid.24827.3bDepartment of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221 USA
| | - Evgeny M. Zdobnov
- 0000 0001 2322 4988grid.8591.5Department of Genetic Medicine and Development and Swiss Institute of Bioinformatics, University of Geneva, 1211 Geneva, Switzerland
| | - Michael E. Pfrender
- 0000 0001 2168 0066grid.131063.6Department of Biological Sciences, University of Notre Dame, 109B Galvin Life Sciences, Notre Dame, IN 46556 USA
| | - Kevin J. Hackett
- 0000 0004 0404 0958grid.463419.dCrop Production and Protection, U.S. Department of Agriculture-Agricultural Research Service, Beltsville, MD 20705 USA
| | - John H. Werren
- 0000 0004 1936 9174grid.16416.34Department of Biology, University of Rochester, Rochester, NY 14627 USA
| | - Kim C. Worley
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Richard A. Gibbs
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA
| | - Ariel D. Chipman
- 0000 0004 1937 0538grid.9619.7Department of Ecology, Evolution and Behavior, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, 91904 Jerusalem, Israel
| | - Robert M. Waterhouse
- 0000 0001 2165 4204grid.9851.5Department of Ecology & Evolution and Swiss Institute of Bioinformatics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münsterss, 48149 Münster, Germany ,0000 0001 2287 2617grid.9026.dInstitute for Bioinformatics and Chemoinformatics, University of Hamburg, Hamburg, Germany ,0000 0001 1014 8330grid.419495.4Department Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Matthew W. Hahn
- 0000 0001 0790 959Xgrid.411377.7Department of Biology and Department of Computer Science, Indiana University, Bloomington, IN USA
| | - Stephen Richards
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 USA ,0000 0004 1936 9684grid.27860.3bPresent Address: UC Davis Genome Center, University of California, Davis, CA 95616 USA
| |
Collapse
|
22
|
Ntountoumi C, Vlastaridis P, Mossialos D, Stathopoulos C, Iliopoulos I, Promponas V, Oliver SG, Amoutzias GD. Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved. Nucleic Acids Res 2019; 47:9998-10009. [PMID: 31504783 PMCID: PMC6821194 DOI: 10.1093/nar/gkz730] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 07/16/2019] [Accepted: 08/15/2019] [Indexed: 01/27/2023] Open
Abstract
We provide the first high-throughput analysis of the properties and functional role of Low Complexity Regions (LCRs) in more than 1500 prokaryotic and phage proteomes. We observe that, contrary to a widespread belief based on older and sparse data, LCRs actually have a significant, persistent and highly conserved presence and role in many and diverse prokaryotes. Their specific amino acid content is linked to proteins with certain molecular functions, such as the binding of RNA, DNA, metal-ions and polysaccharides. In addition, LCRs have been repeatedly identified in very ancient, and usually highly expressed proteins of the translation machinery. At last, based on the amino acid content enriched in certain categories, we have developed a neural network web server to identify LCRs and accurately predict whether they can bind nucleic acids, metal-ions or are involved in chaperone functions. An evaluation of the tool showed that it is highly accurate for eukaryotic proteins as well.
Collapse
Affiliation(s)
- Chrysa Ntountoumi
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | - Panayotis Vlastaridis
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | - Dimitris Mossialos
- Microbial Biotechnology-Molecular Bacteriology-Virology Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | | | | | - Vasilios Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, New Campus, University of Cyprus, PO Box 20537, CY-1678 Nicosia, Cyprus
| | - Stephen G Oliver
- Cambridge Systems Biology Centre & Department of Biochemistry, University of Cambridge, CB2 1GA, UK
| | - Grigoris D Amoutzias
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| |
Collapse
|
23
|
A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families. Methods Mol Biol 2019. [PMID: 30298401 DOI: 10.1007/978-1-4939-8736-8_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Reconstructing evolutionary relationships in repeat proteins is notoriously difficult due to the high degree of sequence divergence that typically occurs between duplicated repeats. This is complicated further by the fact that proteins with a large number of similar repeats are more likely to produce significant local sequence alignments than proteins with fewer copies of the repeat motif. Furthermore, biologically correct sequence alignments are sometimes impossible to achieve in cases where insertion or translocation events disrupt the order of repeats in one of the sequences being aligned. Combined, these attributes make traditional phylogenetic methods for studying protein families unreliable for repeat proteins, due to the dependence of such methods on accurate sequence alignment.We present here a practical solution to this problem, making use of graph clustering combined with the open-source software package HH-suite, which enables highly sensitive detection of sequence relationships. Carrying out multiple rounds of homology searches via alignment of profile hidden Markov models, large sets of related proteins are generated. By representing the relationships between proteins in these sets as graphs, subsequent clustering with the Markov cluster algorithm enables robust detection of repeat protein subfamilies.
Collapse
|
24
|
Banguera-Hinestroza E, Ferrada E, Sawall Y, Flot JF. Computational Characterization of the mtORF of Pocilloporid Corals: Insights into Protein Structure and Function in Stylophora Lineages from Contrasting Environments. Genes (Basel) 2019; 10:E324. [PMID: 31035578 PMCID: PMC6562464 DOI: 10.3390/genes10050324] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Revised: 04/22/2019] [Accepted: 04/23/2019] [Indexed: 01/15/2023] Open
Abstract
More than a decade ago, a new mitochondrial Open Reading Frame (mtORF) was discovered in corals of the family Pocilloporidae and has been used since then as an effective barcode for these corals. Recently, mtORF sequencing revealed the existence of two differentiated Stylophora lineages occurring in sympatry along the environmental gradient of the Red Sea (18.5°C to 33.9°C). In the endemic Red Sea lineage RS_LinB, the mtORF and the heat shock protein gene hsp70 uncovered similar phylogeographic patterns strongly correlated with environmental variations. This suggests that the mtORF too might be involved in thermal adaptation. Here, we used computational analyses to explore the features and putative function of this mtORF. In particular, we tested the likelihood that this gene encodes a functional protein and whether it may play a role in adaptation. Analyses of full mitogenomes showed that the mtORF originated in the common ancestor of Madracis and other pocilloporids, and that it encodes a transmembrane protein differing in length and domain architecture among genera. Homology-based annotation and the relative conservation of metal-binding sites revealed traces of an ancient hydrolase catalytic activity. Furthermore, signals of pervasive purifying selection, lack of stop codons in 1830 sequences analyzed, and a codon-usage bias similar to that of other mitochondrial genes indicate that the protein is functional, i.e., not a pseudogene. Other features, such as intrinsically disordered regions, tandem repeats, and signals of positive selection particularly in StylophoraRS_LinB populations, are consistent with a role of the mtORF in adaptive responses to environmental changes.
Collapse
Affiliation(s)
- Eulalia Banguera-Hinestroza
- Evolutionary Biology and Ecology, Université libre de Bruxelles, B-1050 Brussels, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels-(IB)2, 1050 Brussels, Belgium.
| | - Evandro Ferrada
- Center for Genomics and Bioinformatics, Universidad Mayor, Santiago, Chile.
| | - Yvonne Sawall
- Coral Reef Ecology, Bermuda Institute of Ocean Sciences (BIOS), St.George's GE 01, Bermuda.
| | - Jean-François Flot
- Evolutionary Biology and Ecology, Université libre de Bruxelles, B-1050 Brussels, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels-(IB)2, 1050 Brussels, Belgium.
| |
Collapse
|
25
|
Da Lage JL, Thomas GWC, Bonneau M, Courtier-Orgogozo V. Evolution of salivary glue genes in Drosophila species. BMC Evol Biol 2019; 19:36. [PMID: 30696414 PMCID: PMC6352337 DOI: 10.1186/s12862-019-1364-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 01/17/2019] [Indexed: 11/23/2022] Open
Abstract
Background At the very end of the larval stage Drosophila expectorate a glue secreted by their salivary glands to attach themselves to a substrate while pupariating. The glue is a mixture of apparently unrelated proteins, some of which are highly glycosylated and possess internal repeats. Because species adhere to distinct substrates (i.e. leaves, wood, rotten fruits), glue genes are expected to evolve rapidly. Results We used available genome sequences and PCR-sequencing of regions of interest to investigate the glue genes in 20 Drosophila species. We discovered a new gene in addition to the seven glue genes annotated in D. melanogaster. We also identified a phase 1 intron at a conserved position present in five of the eight glue genes of D. melanogaster, suggesting a common origin for those glue genes. A slightly significant rate of gene turnover was inferred. Both the number of repeats and the repeat sequence were found to diverge rapidly, even between closely related species. We also detected high repeat number variation at the intrapopulation level in D. melanogaster. Conclusion Most conspicuous signs of accelerated evolution are found in the repeat regions of several glue genes. Electronic supplementary material The online version of this article (10.1186/s12862-019-1364-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jean-Luc Da Lage
- UMR 9191 Évolution, Génomes, Comportement, Écologie. CNRS, IRD, Université Paris-Sud. Université Paris-Saclay, F-91198, Gif-sur-Yvette, France.
| | - Gregg W C Thomas
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Magalie Bonneau
- UMR 9191 Évolution, Génomes, Comportement, Écologie. CNRS, IRD, Université Paris-Sud. Université Paris-Saclay, F-91198, Gif-sur-Yvette, France
| | | |
Collapse
|
26
|
Madio B, Peigneur S, Chin YKY, Hamilton BR, Henriques ST, Smith JJ, Cristofori-Armstrong B, Dekan Z, Boughton BA, Alewood PF, Tytgat J, King GF, Undheim EAB. PHAB toxins: a unique family of predatory sea anemone toxins evolving via intra-gene concerted evolution defines a new peptide fold. Cell Mol Life Sci 2018; 75:4511-4524. [PMID: 30109357 PMCID: PMC11105382 DOI: 10.1007/s00018-018-2897-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 07/26/2018] [Accepted: 07/31/2018] [Indexed: 10/28/2022]
Abstract
Sea anemone venoms have long been recognized as a rich source of peptides with interesting pharmacological and structural properties, but they still contain many uncharacterized bioactive compounds. Here we report the discovery, three-dimensional structure, activity, tissue localization, and putative function of a novel sea anemone peptide toxin that constitutes a new, sixth type of voltage-gated potassium channel (KV) toxin from sea anemones. Comprised of just 17 residues, κ-actitoxin-Ate1a (Ate1a) is the shortest sea anemone toxin reported to date, and it adopts a novel three-dimensional structure that we have named the Proline-Hinged Asymmetric β-hairpin (PHAB) fold. Mass spectrometry imaging and bioassays suggest that Ate1a serves a primarily predatory function by immobilising prey, and we show this is achieved through inhibition of Shaker-type KV channels. Ate1a is encoded as a multi-domain precursor protein that yields multiple identical mature peptides, which likely evolved by multiple domain duplication events in an actinioidean ancestor. Despite this ancient evolutionary history, the PHAB-encoding gene family exhibits remarkable sequence conservation in the mature peptide domains. We demonstrate that this conservation is likely due to intra-gene concerted evolution, which has to our knowledge not previously been reported for toxin genes. We propose that the concerted evolution of toxin domains provides a hitherto unrecognised way to circumvent the effects of the costly evolutionary arms race considered to drive toxin gene evolution by ensuring efficient secretion of ecologically important predatory toxins.
Collapse
Affiliation(s)
- Bruno Madio
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Steve Peigneur
- Toxicology and Pharmacology, University of Leuven, Leuven, 3000, Belgium
| | - Yanni K Y Chin
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Brett R Hamilton
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, 4072, Australia
- Centre for Microscopy and Microanalysis, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Sónia Troeira Henriques
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Jennifer J Smith
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Ben Cristofori-Armstrong
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Zoltan Dekan
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Berin A Boughton
- Metabolomics Australia, School of Biosciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Paul F Alewood
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Jan Tytgat
- Toxicology and Pharmacology, University of Leuven, Leuven, 3000, Belgium
| | - Glenn F King
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia.
| | - Eivind A B Undheim
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, 4072, Australia.
| |
Collapse
|
27
|
Maxwell M, Undheim EAB, Mobli M. Secreted Cysteine-Rich Repeat Proteins "SCREPs": A Novel Multi-Domain Architecture. Front Pharmacol 2018; 9:1333. [PMID: 30524283 PMCID: PMC6262176 DOI: 10.3389/fphar.2018.01333] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 10/29/2018] [Indexed: 01/12/2023] Open
Abstract
Peptide toxins isolated from animal venom secretions have proven to be useful pharmacological tools for probing the structure and function of a number of molecular receptors. Their molecular structures are stabilized by posttranslational formation of multiple disulfide bonds formed between sidechain thiols of cysteine residues, resulting in high thermal and chemical stability. Many of these peptides have been found to be potent modulators of ion channels, making them particularly influential in this field. Recently, several peptide toxins have been described that have an unusual tandem repeat organization, while also eliciting a unique pharmacological response toward ion channels. Most of these are two-domain peptide toxins from spider venoms, such as the double-knot toxin (DkTx), isolated from the Earth Tiger tarantula (Haplopelma schmidti). The unusual pharmacology of DkTx is its high avidity for its receptor (TRPV1), a property that has been attributed to a bivalent mode-of-action. DkTx has subsequently proven a powerful tool for elucidating the structural basis for the function of the TRPV1 channel. Interestingly, all tandem repeat peptides functionally characterized to date share this high avidity to their respective binding targets, suggesting they comprise an unrecognized structural class of peptides with unique structural features that result in a characteristic set of pharmacological properties. In this article, we explore the prevalence of this emerging class of peptides, which we have named Secreted, Cysteine-rich REpeat Peptides, or “SCREPs.” To achieve this, we have employed data mining techniques to extract SCREP-like sequences from the UniProtKB database, yielding approximately sixty thousand candidates. These results indicate that SCREPs exist within a diverse range of species with greatly varying sizes and predicted fold types, and likely include peptides with novel structures and unique modes of action. We present our approach to mining this database for discovery of novel ion-channel modulators and discuss a number of “hits” as promising leads for further investigation. Our database of SCREPs thus constitutes a novel resource for biodiscovery and highlights the value of a data-driven approach to the identification of new bioactive pharmacological tools and therapeutic lead molecules.
Collapse
Affiliation(s)
- Michael Maxwell
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, Australia
| | - Eivind A B Undheim
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, Australia
| | - Mehdi Mobli
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, Australia
| |
Collapse
|
28
|
Tohmonda T, Kamiya A, Ishiguro A, Iwaki T, Fujimi TJ, Hatayama M, Aruga J. Identification and Characterization of Novel Conserved Domains in Metazoan Zic Proteins. Mol Biol Evol 2018; 35:2205-2229. [DOI: 10.1093/molbev/msy122] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Takahide Tohmonda
- Laboratory for Behavioral and Developmental Disorders, RIKEN Brain Science Institute, Wako-Shi, Saitama, Japan
| | - Akiko Kamiya
- Laboratory for Behavioral and Developmental Disorders, RIKEN Brain Science Institute, Wako-Shi, Saitama, Japan
| | - Akira Ishiguro
- Laboratory for Behavioral and Developmental Disorders, RIKEN Brain Science Institute, Wako-Shi, Saitama, Japan
| | - Takashi Iwaki
- Meguro Parasitological Museum, Meguro-Ku, Tokyo, Japan
| | - Takahiko J Fujimi
- Laboratory for Behavioral and Developmental Disorders, RIKEN Brain Science Institute, Wako-Shi, Saitama, Japan
| | - Minoru Hatayama
- Department of Medical Pharmacology, Nagasaki University Institute of Biomedical Sciences, Nagasaki, Japan
| | - Jun Aruga
- Laboratory for Behavioral and Developmental Disorders, RIKEN Brain Science Institute, Wako-Shi, Saitama, Japan
- Department of Medical Pharmacology, Nagasaki University Institute of Biomedical Sciences, Nagasaki, Japan
| |
Collapse
|
29
|
Muthu Krishnan S. Using Chou's general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains. J Theor Biol 2018; 445:62-74. [DOI: 10.1016/j.jtbi.2018.02.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 01/24/2018] [Accepted: 02/12/2018] [Indexed: 01/31/2023]
|
30
|
Barik S. Amino acid repeats avert mRNA folding through conservative substitutions and synonymous codons, regardless of codon bias. Heliyon 2017; 3:e00492. [PMID: 29387823 PMCID: PMC5772840 DOI: 10.1016/j.heliyon.2017.e00492] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 12/06/2017] [Accepted: 12/13/2017] [Indexed: 11/18/2022] Open
Abstract
A significant number of proteins in all living species contains amino acid repeats (AARs) of various lengths and compositions, many of which play important roles in protein structure and function. Here, I have surveyed select homopolymeric single [(A)n] and double [(AB)n] AARs in the human proteome. A close examination of their codon pattern and analysis of RNA structure propensity led to the following set of empirical rules: (1) One class of amino acid repeats (Class I) uses a mixture of synonymous codons, some of which approximate the codon bias ratio in the overall human proteome; (2) The second class (Class II) disregards the codon bias ratio, and appears to have originated by simple repetition of the same codon (or just a few codons); and finally, (3) In all AARs (including Class I, Class II, and the in-betweens), the codons are chosen in a manner that precludes the formation of RNA secondary structure. It appears that the AAR genes have evolved by orchestrating a balance between codon usage and mRNA secondary structure. The insights gained here should provide a better understanding of AAR evolution and may assist in designing synthetic genes.
Collapse
|
31
|
Inferring repeat-protein energetics from evolutionary information. PLoS Comput Biol 2017; 13:e1005584. [PMID: 28617812 PMCID: PMC5491312 DOI: 10.1371/journal.pcbi.1005584] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 06/29/2017] [Accepted: 05/21/2017] [Indexed: 11/19/2022] Open
Abstract
Natural protein sequences contain a record of their history. A common constraint in a given protein family is the ability to fold to specific structures, and it has been shown possible to infer the main native ensemble by analyzing covariations in extant sequences. Still, many natural proteins that fold into the same structural topology show different stabilization energies, and these are often related to their physiological behavior. We propose a description for the energetic variation given by sequence modifications in repeat proteins, systems for which the overall problem is simplified by their inherent symmetry. We explicitly account for single amino acid and pair-wise interactions and treat higher order correlations with a single term. We show that the resulting evolutionary field can be interpreted with structural detail. We trace the variations in the energetic scores of natural proteins and relate them to their experimental characterization. The resulting energetic evolutionary field allows the prediction of the folding free energy change for several mutants, and can be used to generate synthetic sequences that are statistically indistinguishable from the natural counterparts.
Collapse
|
32
|
Ramírez-Sánchez O, Pérez-Rodríguez P, Delaye L, Tiessen A. Plant Proteins Are Smaller Because They Are Encoded by Fewer Exons than Animal Proteins. GENOMICS, PROTEOMICS & BIOINFORMATICS 2016; 14:357-370. [PMID: 27998811 PMCID: PMC5200936 DOI: 10.1016/j.gpb.2016.06.003] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Revised: 06/03/2016] [Accepted: 06/03/2016] [Indexed: 01/27/2023]
Abstract
Protein size is an important biochemical feature since longer proteins can harbor more domains and therefore can display more biological functionalities than shorter proteins. We found remarkable differences in protein length, exon structure, and domain count among different phylogenetic lineages. While eukaryotic proteins have an average size of 472 amino acid residues (aa), average protein sizes in plant genomes are smaller than those of animals and fungi. Proteins unique to plants are ∼81aa shorter than plant proteins conserved among other eukaryotic lineages. The smaller average size of plant proteins could neither be explained by endosymbiosis nor subcellular compartmentation nor exon size, but rather due to exon number. Metazoan proteins are encoded on average by ∼10 exons of small size [∼176 nucleotides (nt)]. Streptophyta have on average only ∼5.7 exons of medium size (∼230nt). Multicellular species code for large proteins by increasing the exon number, while most unicellular organisms employ rather larger exons (>400nt). Among subcellular compartments, membrane proteins are the largest (∼520aa), whereas the smallest proteins correspond to the gene ontology group of ribosome (∼240aa). Plant genes are encoded by half the number of exons and also contain fewer domains than animal proteins on average. Interestingly, endosymbiotic proteins that migrated to the plant nucleus became larger than their cyanobacterial orthologs. We thus conclude that plants have proteins larger than bacteria but smaller than animals or fungi. Compared to the average of eukaryotic species, plants have ∼34% more but ∼20% smaller proteins. This suggests that photosynthetic organisms are unique and deserve therefore special attention with regard to the evolutionary forces acting on their genomes and proteomes.
Collapse
Affiliation(s)
- Obed Ramírez-Sánchez
- Genetic Engineering Department, CINVESTAV Unidad Irapuato, Irapuato, CP 36821, Mexico
| | | | - Luis Delaye
- Genetic Engineering Department, CINVESTAV Unidad Irapuato, Irapuato, CP 36821, Mexico
| | - Axel Tiessen
- Genetic Engineering Department, CINVESTAV Unidad Irapuato, Irapuato, CP 36821, Mexico.
| |
Collapse
|