1
|
Garber ME, Frank V, Kazakov AE, Incha MR, Nava AA, Zhang H, Valencia LE, Keasling JD, Rajeev L, Mukhopadhyay A. REC protein family expansion by the emergence of a new signaling pathway. mBio 2023; 14:e0262223. [PMID: 37991384 PMCID: PMC10746176 DOI: 10.1128/mbio.02622-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 10/20/2023] [Indexed: 11/23/2023] Open
Abstract
IMPORTANCE We explore when and why large classes of proteins expand into new sequence space. We used an unsupervised machine learning approach to observe the sequence landscape of REC domains of bacterial response regulator proteins. We find that within-gene recombination can switch effector domains and, consequently, change the regulatory context of the duplicated protein.
Collapse
Affiliation(s)
- Megan E. Garber
- Department of Comparative Biochemistry, University of California, Berkeley, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Vered Frank
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Alexey E. Kazakov
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Matthew R. Incha
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California, USA
| | - Alberto A. Nava
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California, USA
| | - Hanqiao Zhang
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Bioengineering, University of California, Berkeley, California, USA
| | - Luis E. Valencia
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Bioengineering, University of California, Berkeley, California, USA
| | - Jay D. Keasling
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California, USA
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California, USA
- Department of Bioengineering, University of California, Berkeley, California, USA
- Center for Biosustainability, Danish Technical University, Lyngby, Denmark
- Center for Synthetic Biochemistry, Shenzhen Institutes for Advanced Technologies, Shenzhen, China
| | - Lara Rajeev
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Aindrila Mukhopadhyay
- Department of Comparative Biochemistry, University of California, Berkeley, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| |
Collapse
|
2
|
Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023; 91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]
Abstract
Genes duplicate, mutate, recombine, fuse or fission to produce new genes, or when genes are formed from de novo, novel functions arise during evolution. Researchers have tried to quantify the causes of these molecular diversification processes to know how these genes increase molecular complexity over a period of time, for instance protein domain organization. In contrast to global sequence similarity, protein domain architectures can capture key structural and functional characteristics, making them better proxies for describing functional equivalence. In Prokaryotes and eukaryotes it has proven that, domain designs are retained over significant evolutionary distances. Protein domain architectures are now being utilized to categorize and distinguish evolutionarily related proteins and find homologs among species that are evolutionarily distant from one another. Additionally, structural information stored in domain structures has accelerated homology identification and sequence search methods. Tools for functional protein annotation have been developed to discover, protein domain content, domain order, domain recurrence, and domain position as all these contribute to the prediction of protein functional accuracy. In this review, an attempt is made to summarise facts and speculations regarding the use of protein domain architecture and modularity to identify possible therapeutic targets among cellular activities based on the understanding their linked biological processes.
Collapse
Affiliation(s)
- Pavan Gollapalli
- Center for Bioinformatics and Biostatistics, Nitte (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Sushmitha Rudrappa
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India
| | - Vadlapudi Kumar
- Department of Biochemistry, Davangere University, Shivagangothri, Davangere, Karnataka, 577007, India
| | - Hulikal Shivashankara Santosh Kumar
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India.
| |
Collapse
|
3
|
Mota MBS, Woods NT, Carvalho MA, Monteiro ANA, Mesquita RD. Evolution of the triplet BRCT domain. DNA Repair (Amst) 2023; 129:103532. [PMID: 37453244 DOI: 10.1016/j.dnarep.2023.103532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/28/2023] [Accepted: 07/01/2023] [Indexed: 07/18/2023]
Abstract
Organisms have evolved a complex system, called the DNA damage response (DDR), which maintains genome integrity. The DDR is responsible for identifying and repairing a variety of lesions and alterations in DNA. DDR proteins coordinate DNA damage detection, cell cycle arrest, and repair, with many of these events regulated by protein phosphorylation. In the human proteome, 23 proteins contain the BRCT (BRCA1 C-Terminus domain) domain, a modular signaling domain that can bind phosphopeptides and mediate protein-protein interactions. BRCTs can be found as functional single units, tandem (tBRCT), triplet (tpBRCT), and quartet. Here we examine the evolution of the tpBRCT architecture present in TOPBP1 (DNA topoisomerase II binding protein 1) and ECT2 (epithelial cell transforming 2), and their respective interaction partners RAD9 (Cell cycle checkpoint control protein RAD9) and CYK-4 (Rac GTPase-activating protein 1), with a focus on the conservation of the phosphopeptide-binding residues. The pair TOPBP1-RAD9 arose with the Eukaryotes and ECT2-CYK-4 with the Eumetazoans. Triplet structural and functional characteristics were conserved in almost all organisms. The first unit of the triplet (BRCT0) is different from the other two BRCTs but conserved between orthologs for both TOPBP1 and ECT2. BRCT domain evolution simulations suggest a trend to retain the singlet or towards two or three BRCT copies per protein consistent with functional tBRCT and tpBRCT architectures. Our results shed light on the emergence of the function and architecture of multiple BRCT domain organizations and provide information about the evolution of the BRCT triplet. Knowledge of BRCT domain evolution can improve the understanding of DNA damage response mechanisms and signal transduction in DDR.
Collapse
Affiliation(s)
- M B S Mota
- Departamento de Bioquímica, Instituto de Química, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - N T Woods
- Eppley Institute, Fred & Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - M A Carvalho
- Instituto Federal de Educação, Ciência e Tecnologia do Rio de Janeiro, RJ, Brazil; Instituto Nacional de Câncer, Rio de Janeiro, RJ, Brazil
| | - A N A Monteiro
- Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - R D Mesquita
- Departamento de Bioquímica, Instituto de Química, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil.
| |
Collapse
|
4
|
Persson E, Sonnhammer ELL. InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins. J Mol Biol 2023:168001. [PMID: 36764355 DOI: 10.1016/j.jmb.2023.168001] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/20/2023] [Accepted: 02/01/2023] [Indexed: 02/11/2023]
Abstract
Prediction of orthologs is an important bioinformatics pursuit that is frequently used for inferring protein function and evolutionary analyses. The InParanoid database is a well known resource of ortholog predictions between a wide variety of organisms. Although orthologs have historically been inferred at the level of full-length protein sequences, many proteins consist of several independent protein domains that may be orthologous to domains in other proteins in a way that differs from the full-length protein case. To be able to capture all types of orthologous relations, conventional full-length protein orthologs can be complemented with orthologs inferred at the domain level. We here present InParanoiDB 9, covering 640 species and providing orthologs for both protein domains and full-length proteins. InParanoiDB 9 was built using the faster InParanoid-DIAMOND algorithm for orthology analysis, as well as Domainoid and Pfam to infer orthologous domains. InParanoiDB 9 is based on proteomes from 447 eukaryotes, 158 bacteria and 35 archaea, and includes over one billion predicted ortholog groups. A new website has been built for the database, providing multiple search options as well as visualization of groups of orthologs and orthologous domains. This release constitutes a major upgrade of the InParanoid database in terms of the number of species as well as the new capability to operate on the domain level. InParanoiDB 9 is available at https://inparanoidb.sbc.su.se/.
Collapse
Affiliation(s)
- Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden. https://twitter.com/eriksonnhammer
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden.
| |
Collapse
|
5
|
Cortés E, Pak JS, Özkan E. Structure and evolution of neuronal wiring receptors and ligands. Dev Dyn 2023; 252:27-60. [PMID: 35727136 PMCID: PMC10084454 DOI: 10.1002/dvdy.512] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 06/13/2022] [Accepted: 06/14/2022] [Indexed: 01/04/2023] Open
Abstract
One of the fundamental properties of a neuronal circuit is the map of its connections. The cellular and developmental processes that allow for the growth of axons and dendrites, selection of synaptic targets, and formation of functional synapses use neuronal surface receptors and their interactions with other surface receptors, secreted ligands, and matrix molecules. Spatiotemporal regulation of the expression of these receptors and cues allows for specificity in the developmental pathways that wire stereotyped circuits. The families of molecules controlling axon guidance and synapse formation are generally conserved across animals, with some important exceptions, which have consequences for neuronal connectivity. Here, we summarize the distribution of such molecules across multiple taxa, with a focus on model organisms, evolutionary processes that led to the multitude of such molecules, and functional consequences for the diversification or loss of these receptors.
Collapse
Affiliation(s)
- Elena Cortés
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, USA.,The Neuroscience Institute, University of Chicago, Chicago, Illinois, USA
| | - Joseph S Pak
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, USA.,The Neuroscience Institute, University of Chicago, Chicago, Illinois, USA
| | - Engin Özkan
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, USA.,The Neuroscience Institute, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
6
|
Murcia-Garzón J, Méndez-Tenorio A. Promiscuous Domains in Eukaryotes and HAT Proteins in FUNGI Have Followed Different Evolutionary Paths. J Mol Evol 2022; 90:124-138. [PMID: 35084521 DOI: 10.1007/s00239-021-10046-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 12/27/2021] [Indexed: 10/19/2022]
Abstract
Diverse studies have shown that the content of genes present in sequenced genomes does not seem to correlate with the complexity of the organisms. However, various studies have shown that organism complexity and the size of the proteome has, indeed, a significant correlation. This characteristic allows us to postulate that some molecular mechanisms have permitted a greater functional diversity to some proteins to increase their participation in developing organisms with higher complexity. Among those mechanisms, the domain promiscuity, defined as the ability of the domains to organize in combination with other distinct domains, is of great importance for the evolution of organisms. Previous works have analyzed the degree of domain promiscuity of the proteomes showing how it seems to have paralleled the evolution of eukaryotic organisms. The latter has motivated the present study, where we analyzed the domain promiscuity in a collection of 84 eukaryotic proteomes representative of all the taxonomy groups of the tree of life. Using a grammar definition approach, we determined the architecture of 1,223,227 proteins, conformed by 2,296,371 domains, which established 839,184 bigram types. The phylogenetic reconstructions based on differences in the content of information from measures of proteome promiscuity confirm that the evolution of the promiscuity of domains in eukaryotic organisms resembles the evolutionary history of the species. However, a close analysis of the PHD and RING domains, the most promiscuous domains found in fungi and functional components of chromatin remodeling enzymes and important expression regulators, suggests an evolution according to their function.
Collapse
Affiliation(s)
- Jazmín Murcia-Garzón
- Laboratorio de Biotecnología Vegetal, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Boulevard del Maestro S/N esq. Elías Piña, Col. Narciso Mendoza, 88710, Reynosa, Tamaulipas, Mexico
| | - Alfonso Méndez-Tenorio
- Laboratorio de Biotecnología y Bioinformática Genómica, Departamento de Bioquímica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Prol. de Carpio y Plan de Ayala s/n, Col. Santo Tomás, 11340, Mexico City, Mexico.
| |
Collapse
|
7
|
A unique NLRC4 receptor from echinoderms mediates Vibrio phagocytosis via rearrangement of the cytoskeleton and polymerization of F-actin. PLoS Pathog 2021; 17:e1010145. [PMID: 34898657 PMCID: PMC8699970 DOI: 10.1371/journal.ppat.1010145] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 12/23/2021] [Accepted: 11/27/2021] [Indexed: 11/20/2022] Open
Abstract
Many members of the nucleotide-binding and oligomerization domain (NACHT)- and leucine-rich-repeat-containing protein (NLR) family play crucial roles in pathogen recognition and innate immune response regulation. In our previous work, a unique and Vibrio splendidus-inducible NLRC4 receptor comprising Ig and NACHT domains was identified from the sea cucumber Apostichopus japonicus, and this receptor lacked the CARD and LRR domains that are typical of common cytoplasmic NLRs. To better understand the functional role of AjNLRC4, we confirmed that AjNLRC4 was a bona fide membrane PRR with two transmembrane structures. AjNLRC4 was able to directly bind microbes and polysaccharides via its extracellular Ig domain and agglutinate a variety of microbes in a Ca2+-dependent manner. Knockdown of AjNLRC4 by RNA interference and blockade of AjNLRC4 by antibodies in coelomocytes both could significantly inhibit the phagocytic activity and elimination of V. splendidus. Conversely, overexpression of AjNLRC4 enhanced the phagocytic activity of V. splendidus, and this effect could be specifically blocked by treatment with the actin-mediated endocytosis inhibitor cytochalasin D but not other endocytosis inhibitors. Moreover, AjNLRC4-mediated phagocytic activity was dependent on the interaction between the intracellular domain of AjNLRC4 and the β-actin protein and further regulated the Arp2/3 complex to mediate the rearrangement of the cytoskeleton and the polymerization of F-actin. V. splendidus was found to be colocalized with lysosomes in coelomocytes, and the bacterial quantities were increased after injection of chloroquine, a lysosome inhibitor. Collectively, these results suggested that AjNLRC4 served as a novel membrane PRR in mediating coelomocyte phagocytosis and further clearing intracellular Vibrio through the AjNLRC4-β-actin-Arp2/3 complex-lysosome pathway. Vibrio splendidus is ubiquitously present in marine environments and in or on many aquaculture species and is considered to be an important opportunistic pathogen that has caused serious economic losses to the aquaculture industry worldwide. Phagocytosis is the first step of pathogen clearance and is triggered by specific interactions between host pattern recognition receptors (PRRs) and pathogen-associated molecular patterns (PAMPs) from invasive bacteria. However, the mechanism that underlies receptor-mediated V. splendidus phagocytosis is poorly understood. In this study, an atypical AjNLRC4 receptor without LRR and CARD domains was found to serve as the membrane receptor for V. splendidus, not the common cytoplasmic NLRs. The Ig domain of AjNLRC4 is replaced with a conventional LRR domain to bind V. splendidus, and the intracellular domain of AjNLRC4 specifically interacts with β-actin to mediate V. splendidus endocytosis in an actin-dependent manner. Endocytic V. splendidus is ultimately degraded in phagolysosomes. Our findings will contribute to the development of novel strategies for treating V. splendidus infection by modulating the actin-dependent endocytosis pathway.
Collapse
|
8
|
Czubat B, Minias A, Brzostek A, Żaczek A, Struś K, Zakrzewska-Czerwińska J, Dziadek J. Functional Disassociation Between the Protein Domains of MSMEG_4305 of Mycolicibacterium smegmatis ( Mycobacterium smegmatis) in vivo. Front Microbiol 2020; 11:2008. [PMID: 32973726 PMCID: PMC7466739 DOI: 10.3389/fmicb.2020.02008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 07/29/2020] [Indexed: 12/02/2022] Open
Abstract
MSMEG_4305 is a two-domain protein of Mycolicibacterium smegmatis (Mycobacterium smegmatis) (Mycolicibacterium smegmatis). The N-terminal domain of MSMEG_4305 encodes an RNase H type I. The C-terminal domain is a presumed CobC, predicted to be involved in the aerobic synthesis of vitamin B12. Both domains reach their maximum at distinct pH, approximately 8.5 and 4.5, respectively. The presence of the CobC domain influenced RNase activity in vitro in homolog Rv2228c. Here, we analyzed the role of MSMEG_4305 in vitamin B12 synthesis and the functional association between both domains in vivo in M. smegmatis. We used knock-out mutant of M. smegmatis, deficient in MSMEG_4305. Whole-cell lysates of the mutants strain contained a lower concentration of vitamin B12, as it determined with immunoenzimatic assay. We observed growth deficits, related to vitamin B12 production, on media containing sulfamethazine and propionate. Removal of the CobC domain of MSMEG_4305 in ΔrnhA background hardly affected the growth rate of M. smegmatis in vivo. The strain carrying truncation showed no fitness deficit in the competitive assay and it did not show increased level of RNA/DNA hybrids in its genome. We show that homologs of MSMEG_4305 are present only in the Actinomycetales phylogenetic branch (according to the old classification system). The domains of MSMEG_4305 homologs accumulate mutations at a different rate, while the linker region is highly variable. We conclude that MSMEG_4305 is a multidomain protein that most probably was fixed in the phylogenetic tree of life due to genetic drift.
Collapse
Affiliation(s)
- Bożena Czubat
- Department of Experimental and Clinical Pharmacology, University of Rzeszów, Rzeszów, Poland.,Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| | - Alina Minias
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| | - Anna Brzostek
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| | - Anna Żaczek
- Institute of Medical Sciences, Medical College of Rzeszów University, Rzeszów, Poland
| | - Katarzyna Struś
- Department of Bioenergetics, Food Analysis and Microbiology, Institute of Food Technology and Nutrition, University of Rzeszów, Rzeszów, Poland
| | | | - Jarosław Dziadek
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| |
Collapse
|
9
|
Zhou J, Ren H, Hu M, Zhou J, Li B, Kong N, Zhang Q, Jin Y, Liang L, Yue J. Characterization of Burkholderia cepacia Complex Core Genome and the Underlying Recombination and Positive Selection. Front Genet 2020; 11:506. [PMID: 32528528 PMCID: PMC7253759 DOI: 10.3389/fgene.2020.00506] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 04/24/2020] [Indexed: 11/13/2022] Open
Abstract
Recombination and positive selection are two key factors that play a vital role in pathogenic microorganisms’ population adaptation and diversification. The Burkholderia cepacia complex (Bcc) represents bacterial species with high similarity, which can cause severe infections among cases suffering from the chronic granulomatous disorder and cystic fibrosis (CF). At present, no genome-wide study has been carried out focusing on investigating the core genome of Bcc associated with the two evolutionary forces. The general characteristics of the core genome of Bcc species remain scarce as well. In this study, we explored the core orthologous genes of 116 Bcc strains using comparative genomic analysis and studied the two adaptive evolutionary forces: recombination and positive selection. We estimated 1005 orthogroups consisting entirely of single copy genes. These single copy orthologous genes in some Cluster of Orthologous Groups (COG) categories showed significant differences in the comparison of several evolutionary properties, and the encoding proteins were relatively simple and compact. Our findings showed that 5.8% of the core orthologous genes strongly supported recombination; in the meantime, 1.1% supported positive selection. We found that genes involved in protein synthesis as well as material transport and metabolism are favored by selection pressure. More importantly, homologous recombination contributed more genetic variation to a large number of genes and largely maintained the genetic cohesion in Bcc. This high level of recombination between Bcc species blurs their taxonomic boundaries, which leads Bcc species to be difficult or impossible to distinguish phenotypically and genotypically.
Collapse
Affiliation(s)
- Jianglin Zhou
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China
| | - Hongguang Ren
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China
| | - Mingda Hu
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China
| | - Jing Zhou
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China
| | - Beiping Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China
| | - Na Kong
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China.,Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Qi Zhang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China
| | - Yuan Jin
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China
| | - Long Liang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China
| | - Junjie Yue
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Biotechnology, Beijing, China
| |
Collapse
|
10
|
Persson E, Kaduk M, Forslund SK, Sonnhammer ELL. Domainoid: domain-oriented orthology inference. BMC Bioinformatics 2019; 20:523. [PMID: 31660857 PMCID: PMC6816169 DOI: 10.1186/s12859-019-3137-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 10/09/2019] [Indexed: 11/18/2022] Open
Abstract
Background Orthology inference is normally based on full-length protein sequences. However, most proteins contain independently folding and recurring regions, domains. The domain architecture of a protein is vital for its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length orthology methods by inferring orthology on the domain level. It employs the InParanoid algorithm on single domains separately, to infer groups of orthologous domains. Results This domain-oriented approach allows detection of discordant domain orthologs, cases where different domains on the same protein have different evolutionary histories. In addition to domain level analysis, protein level orthology based on the fraction of domains that are orthologous can be inferred. Domainoid orthology assignments were compared to those yielded by the conventional full-length approach InParanoid, and were validated in a standard benchmark. Conclusions Our results show that domain-based orthology inference can reveal many orthologous relationships that are not found by full-length sequence approaches. Availability https://bitbucket.org/sonnhammergroup/domainoid/
Collapse
Affiliation(s)
- Emma Persson
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121, Solna, Sweden
| | - Mateusz Kaduk
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121, Solna, Sweden
| | - Sofia K Forslund
- Experimental and Clinical Research Cente, a joint cooperation of Max-Delbrück Center for Molecular Medicine and Charité-Universitätsmedizin Berlin, 13125, Berlin, Germany.,European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|
11
|
Naveenkumar N, Kumar G, Sowdhamini R, Srinivasan N, Vishwanath S. Fold combinations in multi-domain proteins. Bioinformation 2019; 15:342-350. [PMID: 31249437 PMCID: PMC6589474 DOI: 10.6026/97320630015342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Accepted: 05/07/2019] [Indexed: 01/21/2023] Open
Abstract
Domain-domain interactions in multi-domain proteins play an important role in the combined function of individual domains for the
overall biological activity of the protein. The functions of the tethered domains are often coupled and hence, limited numbers of domain
architectures with defined folds are known in nature. Therefore, it is of interest to document the available fold-fold combinations and their
preference in multi-domain proteins. Hence, we analyzed all multi-domain proteins with known structures in the protein databank and
observed that only about 860 fold-fold combinations are present among them. Analyses of multi-domain proteins represented in sequence
database result in recognition of 29,860 fold-fold combinations and it accounts for only 2.8% of the theoretically possible 1,036,080 (1439C2)
fold-fold combinations. The observed preference for fold-fold combinations in multi-domain proteins is interesting in the context of
multiple functions through structural adaptation by gene fusion.
Collapse
Affiliation(s)
- Nagarajan Naveenkumar
- National Center for Biological Science, GKVK Campus, Bengaluru, Karnataka, India - 560065.,Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.,Molecular Biophysics Unit, Indian Institute of Science, Bengaluru, Karnataka, India - 560012
| | - Gayatri Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bengaluru, Karnataka, India - 560012
| | - Ramanathan Sowdhamini
- National Center for Biological Science, GKVK Campus, Bengaluru, Karnataka, India - 560065
| | | | - Sneha Vishwanath
- Molecular Biophysics Unit, Indian Institute of Science, Bengaluru, Karnataka, India - 560012
| |
Collapse
|
12
|
Abstract
Genomes appear similar to natural language texts, and protein domains can be treated as analogs of words. To investigate the linguistic properties of genomes further, we calculated the complexity of the “protein languages” in all major branches of life and identified a nearly universal value of information gain associated with the transition from a random domain arrangement to the current protein domain architecture. An exploration of the evolutionary relationship of the protein languages identified the domain combinations that discriminate between the major branches of cellular life. We conclude that there exists a “quasi-universal grammar” of protein domains and that the nearly constant information gain we identified corresponds to the minimal complexity required to maintain a functional cell. From an abstract, informational perspective, protein domains appear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. We employ a popular linguistic technique, n-gram analysis, to probe the “proteome grammar”—that is, the rules of association of domains that generate various domain architectures of proteins. Comparison of the complexity measures of “protein languages” in major branches of life shows that the relative entropy difference (information gain) between the observed domain architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are observed in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. We also identify the n-grams that represent signatures of the major branches of cellular life. The results of this analysis bolster the analogy between genomes and natural language and show that a “quasi-universal grammar” underlies the evolution of domain architectures in all divisions of cellular life. The nearly universal value of information gain by the domain architectures could reflect the minimum complexity of signal processing that is required to maintain a functioning cell.
Collapse
|
13
|
Li L, Bansal MS. An Integrated Reconciliation Framework for Domain, Gene, and Species Level Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:63-76. [PMID: 29994126 DOI: 10.1109/tcbb.2018.2846253] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The majority of genes in eukaryotes consists of one or more protein domains that can be independently lost or gained during evolution. This gain and loss of protein domains, through domain duplications, transfers, or losses, has important evolutionary and functional consequences. Yet, even though it is well understood that domains evolve inside genes and genes inside species, there do not exist any computational frameworks to simultaneously model the evolution of domains, genes, and species and account for their inter-dependency. Here, we develop an integrated model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. Our model extends the classical phylogenetic reconciliation framework, which infers gene family evolution by comparing gene trees and species trees, by explicitly considering domain-level evolution and decoupling domain-level events from gene-level events. In this paper, we (i) introduce the new integrated reconciliation framework, (ii) prove that the associated optimization problem is NP-hard, (iii) devise an efficient heuristic solution for the problem, (iv) apply our algorithm to a large biological dataset, and (v) demonstrate the impact of using our new computational framework compared to existing approaches. The implemented software is freely available from http://compbio.engr.uconn.edu/software/seadog/.
Collapse
|
14
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
15
|
Bitard‐Feildel T, Lamiable A, Mornon J, Callebaut I. Order in Disorder as Observed by the "Hydrophobic Cluster Analysis" of Protein Sequences. Proteomics 2018; 18:e1800054. [PMID: 30299594 PMCID: PMC7168002 DOI: 10.1002/pmic.201800054] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 08/29/2018] [Indexed: 12/17/2022]
Abstract
Hydrophobic cluster analysis (HCA) is an original approach for protein sequence analysis, which provides access to the foldable repertoire of the protein universe, including yet unannotated protein segments ("dark proteome"). Foldable segments correspond to ordered regions, as well as to intrinsically disordered regions (IDRs) undergoing disorder to order transitions. In this review, how HCA can be used to give insight into this last category of foldable segments is illustrated, with examples matching known 3D structures. After reviewing the HCA principles, examples of short foldable segments are given, which often contain short linear motifs, typically matching hydrophobic clusters. These segments become ordered upon contact with partners, with secondary structure preferences generally corresponding to those observed in the 3D structures within the complexes. Such small foldable segments are sometimes larger than the segments of known 3D structures, including flanking hydrophobic clusters that may be critical for interaction specificity or regulation, as well as intervening sequences allowing fuzziness. Cases of larger conditionally disordered domains are also presented, with lower density in hydrophobic clusters than well-folded globular domains or with exposed hydrophobic patches, which are stabilized by interaction with partners.
Collapse
Affiliation(s)
- Tristan Bitard‐Feildel
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
- Laboratoire de Biologie Computationnelle et Quantitative (LCQB)Institute of Biology Paris‐Seine (IBPS)Centre national de la recherche scientifique (CNRS)Sorbonne Université75005ParisFrance
| | - Alexis Lamiable
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| | - Jean‐Paul Mornon
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| | - Isabelle Callebaut
- Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC)Institut de recherche pour le développement (IRD)UMR CNRS 7590Muséum National d'Histoire NaturelleSorbonne Université75005ParisFrance
| |
Collapse
|
16
|
Verma P, Patel GK, Kar B, Sharma AK. A case of neofunctionalization of a Putranjiva roxburghii PNP protein to trypsin inhibitor by disruption of PNP-UDP domain through an insert containing inhibitory site. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2017; 260:19-30. [PMID: 28554472 DOI: 10.1016/j.plantsci.2017.03.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Revised: 03/11/2017] [Accepted: 03/25/2017] [Indexed: 05/24/2023]
Abstract
The attainment of new function by a protein is achieved through convergent/divergent evolution. In present work, the sequence analysis of a 34kDa protein from Putranjiva roxburghii, earlier reported as a potent trypsin inhibitor, showed resemblance to some of the wound inducible and vegetative storage proteins. A detailed sequence analysis revealed that these proteins belong to PNP-UDP family. In case of P. roxburghii protein, an approximately 46 residue insert disrupts the PNP domain. Similar disruption of PNP domain is observed in related plant proteins. The characterization of recombinant full length and truncated (without 46 residue insert) forms of P. roxburghii PNP family protein (PRpnp) unraveled that trypsin inhibitory active site is located within the insert. The truncated form containing uninterrupted PNP domain showed strong PNP enzymatic activity where it hydrolyzed the N-glycosidic bond of inosine and guanosine. The full length protein, however, showed weak PNP enzyme activity which may be due to presence of the insert. These results indicate towards the neofunctionalization of PRpnp to a potent trypsin inhibitor through an insert containing inhibitory residue to cater to the needs of plant defense. The similar wound inducible and vegetative storage proteins may have also evolved due to evolutionary needs.
Collapse
Affiliation(s)
- Preeti Verma
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247 667, India
| | - Girijesh K Patel
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247 667, India
| | - Bibekananda Kar
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247 667, India
| | - Ashwani K Sharma
- Department of Biotechnology, Indian Institute of Technology Roorkee, Roorkee, 247 667, India.
| |
Collapse
|
17
|
Eggermont L, Verstraeten B, Van Damme EJM. Genome-Wide Screening for Lectin Motifs in Arabidopsis thaliana. THE PLANT GENOME 2017; 10. [PMID: 28724081 DOI: 10.3835/plantgenome2017.02.0010] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
For more than three decades, served as a model for plant biology research. At present only a few protein families have been studied in detail in . This study focused on all sequences with lectin motifs in the genome of . Based on amino acid sequence similarity (BLASTp searches), 217 putative lectin genes were retrieved belonging to 9 out of 12 different lectin families. The domain organization and genomic distribution for each lectin family were analyzed. Domain architecture analysis revealed that most of these lectin gene sequences are linked to other domains, often belonging to protein families with catalytic activity. Many protein domains identified are known to play a role in stress signaling and defense, suggesting a major contribution of the putative lectins in development and plant defense. This genome-wide screen for different lectin motifs will help to unravel the functional characteristics of lectins. In addition, phylogenetic trees and WebLogos were created and showed that most lectin sequences that share the same domain architecture evolved together. Furthermore, the amino acids responsible for carbohydrate binding are largely conserved. Our results provide information about the evolutionary relationships and functional divergence of the lectin motifs in .
Collapse
|
18
|
Valansi C, Moi D, Leikina E, Matveev E, Graña M, Chernomordik LV, Romero H, Aguilar PS, Podbilewicz B. Arabidopsis HAP2/GCS1 is a gamete fusion protein homologous to somatic and viral fusogens. J Cell Biol 2017; 216:571-581. [PMID: 28137780 PMCID: PMC5350521 DOI: 10.1083/jcb.201610093] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Revised: 12/27/2016] [Accepted: 01/18/2017] [Indexed: 01/08/2023] Open
Abstract
Cell-cell fusion is inherent to sexual reproduction. Loss of HAPLESS 2/GENERATIVE CELL SPECIFIC 1 (HAP2/GCS1) proteins results in gamete fusion failure in diverse organisms, but their exact role is unclear. In this study, we show that Arabidopsis thaliana HAP2/GCS1 is sufficient to promote mammalian cell-cell fusion. Hemifusion and complete fusion depend on HAP2/GCS1 presence in both fusing cells. Furthermore, expression of HAP2 on the surface of pseudotyped vesicular stomatitis virus results in homotypic virus-cell fusion. We demonstrate that the Caenorhabditis elegans Epithelial Fusion Failure 1 (EFF-1) somatic cell fusogen can replace HAP2/GCS1 in one of the fusing membranes, indicating that HAP2/GCS1 and EFF-1 share a similar fusion mechanism. Structural modeling of the HAP2/GCS1 protein family predicts that they are homologous to EFF-1 and viral class II fusion proteins (e.g., Zika virus). We name this superfamily Fusexins: fusion proteins essential for sexual reproduction and exoplasmic merger of plasma membranes. We suggest a common origin and evolution of sexual reproduction, enveloped virus entry into cells, and somatic cell fusion.
Collapse
Affiliation(s)
- Clari Valansi
- Department of Biology, Technion- Israel Institute of Technology, Haifa 32000, Israel
| | - David Moi
- Laboratorio de Biología Celular de Membranas, Instituto de Investigaciones Biotecnologicas "Dr. Rodolfo A. Ugalde," Universidad Nacional de San Martin, Buenos Aires, CP1650, Argentina
| | - Evgenia Leikina
- Section on Membrane Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892
| | - Elena Matveev
- Department of Biology, Technion- Israel Institute of Technology, Haifa 32000, Israel
| | - Martín Graña
- Unidad de Bioinformática, Institut Pasteur Montevideo, Montevideo 11400, Uruguay
| | - Leonid V Chernomordik
- Section on Membrane Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892
| | - Héctor Romero
- Laboratorio de Organización y Evolución del Genoma, Unidad de Genómica Evolutiva, Departamento Ecología y Evolución, Facultad de Ciencias/Centro Universitario Regional del Este, Universidad de la República, Montevideo 11400, Uruguay
| | - Pablo S Aguilar
- Laboratorio de Biología Celular de Membranas, Instituto de Investigaciones Biotecnologicas "Dr. Rodolfo A. Ugalde," Universidad Nacional de San Martin, Buenos Aires, CP1650, Argentina
| | - Benjamin Podbilewicz
- Department of Biology, Technion- Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
19
|
Saripella GV, Sonnhammer ELL, Forslund K. Benchmarking the next generation of homology inference tools. Bioinformatics 2016; 32:2636-41. [PMID: 27256311 PMCID: PMC5013910 DOI: 10.1093/bioinformatics/btw305] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 05/05/2016] [Indexed: 12/21/2022] Open
Abstract
Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the ‘next generation’ of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA. Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM + Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases. Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization. Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity. Availability and Implementation: Benchmark datasets and all scripts are placed at (http://sonnhammer.org/download/Homology_benchmark). Contact:forslund@embl.de Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ganapathi Varma Saripella
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden
| | - Erik L L Sonnhammer
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden
| | - Kristoffer Forslund
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg 69117, Germany
| |
Collapse
|
20
|
Kurland CG, Harish A. The phylogenomics of protein structures: The backstory. Biochimie 2015; 119:284-302. [DOI: 10.1016/j.biochi.2015.07.027] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2015] [Accepted: 07/28/2015] [Indexed: 12/11/2022]
|
21
|
Scaiewicz A, Levitt M. The language of the protein universe. Curr Opin Genet Dev 2015; 35:50-6. [PMID: 26451980 DOI: 10.1016/j.gde.2015.08.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 08/20/2015] [Accepted: 08/25/2015] [Indexed: 11/17/2022]
Abstract
Proteins, the main cell machinery which play a major role in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the 'Rosetta Stone' of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins.
Collapse
Affiliation(s)
- Andrea Scaiewicz
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States
| | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States.
| |
Collapse
|
22
|
Stolzer M, Siewert K, Lai H, Xu M, Durand D. Event inference in multidomain families with phylogenetic reconciliation. BMC Bioinformatics 2015; 16 Suppl 14:S8. [PMID: 26451642 PMCID: PMC4610023 DOI: 10.1186/1471-2105-16-s14-s8] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reconstructing evolution provides valuable insights into the processes of gene evolution and function. However, while there have been great advances in algorithms and software to reconstruct the history of gene families, these tools do not model the domain shuffling events (domain duplication, insertion, transfer, and deletion) that drive the evolution of multidomain protein families. Protein evolution through domain shuffling events allows for rapid exploration of functions by introducing new combinations of existing folds. This powerful mechanism was key to some significant evolutionary innovations, such as multicellularity and the vertebrate immune system. A method for reconstructing this important evolutionary process is urgently needed. RESULTS Here, we introduce a novel, event-based framework for studying multidomain evolution by reconciling a domain tree with a gene tree, with additional information provided by the species tree. In the context of this framework, we present the first reconciliation algorithms to infer domain shuffling events, while addressing the challenges inherent in the inference of evolution across three levels of organization. CONCLUSIONS We apply these methods to the evolution of domains in the Membrane associated Guanylate Kinase family. These case studies reveal a more vivid and detailed evolutionary history than previously provided. Our algorithms have been implemented in software, freely available at http://www.cs.cmu.edu/˜durand/Notung.
Collapse
|
23
|
Zakrzewski AC, Weigert A, Helm C, Adamski M, Adamska M, Bleidorn C, Raible F, Hausen H. Early divergence, broad distribution, and high diversity of animal chitin synthases. Genome Biol Evol 2015; 6:316-25. [PMID: 24443419 PMCID: PMC3942024 DOI: 10.1093/gbe/evu011] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Even though chitin is one of the most abundant biopolymers in nature, current knowledge on chitin formation is largely based only on data from fungi and insects. This study reveals unanticipated broad taxonomic distribution and extensive diversification of chitin synthases (CSs) in Metazoa, shedding new light on the relevance of chitin in animals and suggesting unforeseen complexity of chitin synthesis in many groups. We uncovered robust orthologs to insect type CSs in several representatives of deuterostomes, which generally are not thought to possess chitin. This suggests a broader distribution and function of chitin in this branch of the animal kingdom. We characterize a new CS type present not only in basal metazoans such as sponges and cnidarians but also in several bilaterian representatives. The most extensive diversification of CSs took place during emergence of lophotrochozoans, the third large group of protostomes next to arthropods and nematodes, resulting in coexistence of up to ten CS paralogs in molluscs. Independent fusion to different kinds of myosin motor domains in fungi and lophotrochozoans points toward high relevance of CS interaction with the cytoskeleton for fine-tuned chitin secretion. Given the fundamental role that chitin plays in the morphology of many animals, the here presented CS diversification reveals many evolutionary complexities. Our findings strongly suggest a very broad and multifarious occurrence of chitin and question an ancestral role as cuticular component. The molecular mechanisms underlying regulation of animal chitin synthesis are most likely far more complex and diverse than existing data from insects suggest.
Collapse
Affiliation(s)
- Anne-C Zakrzewski
- Sars International Centre for Marine Molecular Biology, Bergen, Norway
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Sonnhammer ELL, Gabaldón T, Sousa da Silva AW, Martin M, Robinson-Rechavi M, Boeckmann B, Thomas PD, Dessimoz C. Big data and other challenges in the quest for orthologs. Bioinformatics 2014; 30:2993-8. [PMID: 25064571 PMCID: PMC4201156 DOI: 10.1093/bioinformatics/btu492] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 06/25/2014] [Accepted: 07/16/2014] [Indexed: 01/29/2023] Open
Abstract
UNLABELLED Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION All such materials are available at http://questfororthologs.org.
Collapse
Affiliation(s)
- Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Toni Gabaldón
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Alan W Sousa da Silva
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Maria Martin
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Marc Robinson-Rechavi
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Brigitte Boeckmann
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Paul D Thomas
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Christophe Dessimoz
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| |
Collapse
|
25
|
A phylogenomic census of molecular functions identifies modern thermophilic archaea as the most ancient form of cellular life. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2014; 2014:706468. [PMID: 25249790 PMCID: PMC4164138 DOI: 10.1155/2014/706468] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2013] [Revised: 11/20/2013] [Accepted: 01/17/2014] [Indexed: 12/30/2022]
Abstract
The origins of diversified life remain mysterious despite considerable efforts devoted to untangling the roots of the universal tree of life. Here we reconstructed phylogenies that described the evolution of molecular functions and the evolution of species directly from a genomic census of gene ontology (GO) definitions. We sampled 249 free-living genomes spanning organisms in the three superkingdoms of life, Archaea, Bacteria, and Eukarya, and used the abundance of GO terms as molecular characters to produce rooted phylogenetic trees. Results revealed an early thermophilic origin of Archaea that was followed by genome reduction events in microbial superkingdoms. Eukaryal genomes displayed extraordinary functional diversity and were enriched with hundreds of novel molecular activities not detected in the akaryotic microbial cells. Remarkably, the majority of these novel functions appeared quite late in evolution, synchronized with the diversification of the eukaryal superkingdom. The distribution of GO terms in superkingdoms confirms that Archaea appears to be the simplest and most ancient form of cellular life, while Eukarya is the most diverse and recent.
Collapse
|
26
|
Kim KM, Nasir A, Hwang K, Caetano-Anollés G. A tree of cellular life inferred from a genomic census of molecular functions. J Mol Evol 2014; 79:240-62. [PMID: 25128982 DOI: 10.1007/s00239-014-9637-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 08/05/2014] [Indexed: 10/24/2022]
Abstract
Phylogenomics aims to describe evolutionary relatedness between organisms by analyzing genomic data. The common practice is to produce phylogenomic trees from molecular information in the sequence, order, and content of genes in genomes. These phylogenies describe the evolution of life and become valuable tools for taxonomy. The recent availability of structural and functional data for hundreds of genomes now offers the opportunity to study evolution using more deep, conserved, and reliable sets of molecular features. Here, we reconstruct trees of life from the functions of proteins. We start by inferring rooted phylogenomic trees and networks of organisms directly from Gene Ontology annotations. Phylogenies and networks yield novel insights into the emergence and evolution of cellular life. The ancestor of Archaea originated earlier than the ancestors of Bacteria and Eukarya and was thermophilic. In contrast, basal bacterial lineages were non-thermophilic. A close relationship between Plants and Metazoa was also identified that disagrees with the traditional Fungi-Metazoa grouping. While measures of evolutionary reticulation were minimum in Eukarya and maximum in Bacteria, the massive role of horizontal gene transfer in microbes did not materialize in phylogenomic networks. Phylogenies and networks also showed that the best reconstructions were recovered when problematic taxa (i.e., parasitic/symbiotic organisms) and horizontally transferred characters were excluded from analysis. Our results indicate that functionomic data represent a useful addition to the set of molecular characters used for tree reconstruction and that trees of cellular life carry in deep branches considerable predictive power to explain the evolution of living organisms.
Collapse
Affiliation(s)
- Kyung Mo Kim
- Microbial Resource Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, 305-806, Korea
| | | | | | | |
Collapse
|
27
|
|
28
|
Caetano-Anollés G, Nasir A, Zhou K, Caetano-Anollés D, Mittenthal JE, Sun FJ, Kim KM. Archaea: the first domain of diversified life. ARCHAEA (VANCOUVER, B.C.) 2014; 2014:590214. [PMID: 24987307 PMCID: PMC4060292 DOI: 10.1155/2014/590214] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 02/15/2014] [Accepted: 03/25/2014] [Indexed: 01/23/2023]
Abstract
The study of the origin of diversified life has been plagued by technical and conceptual difficulties, controversy, and apriorism. It is now popularly accepted that the universal tree of life is rooted in the akaryotes and that Archaea and Eukarya are sister groups to each other. However, evolutionary studies have overwhelmingly focused on nucleic acid and protein sequences, which partially fulfill only two of the three main steps of phylogenetic analysis, formulation of realistic evolutionary models, and optimization of tree reconstruction. In the absence of character polarization, that is, the ability to identify ancestral and derived character states, any statement about the rooting of the tree of life should be considered suspect. Here we show that macromolecular structure and a new phylogenetic framework of analysis that focuses on the parts of biological systems instead of the whole provide both deep and reliable phylogenetic signal and enable us to put forth hypotheses of origin. We review over a decade of phylogenomic studies, which mine information in a genomic census of millions of encoded proteins and RNAs. We show how the use of process models of molecular accumulation that comply with Weston's generality criterion supports a consistent phylogenomic scenario in which the origin of diversified life can be traced back to the early history of Archaea.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Kaiyue Zhou
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Derek Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jay E. Mittenthal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, Institute for Genomic Biology and Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Feng-Jie Sun
- School of Science and Technology, Georgia Gwinnett College, Lawrenceville, GA 30043, USA
| | - Kyung Mo Kim
- Microbial Resource Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Republic of Korea
| |
Collapse
|
29
|
Frequent gene fissions associated with human pathogenic bacteria. Genomics 2014; 103:65-75. [DOI: 10.1016/j.ygeno.2014.02.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 01/21/2014] [Accepted: 02/01/2014] [Indexed: 01/05/2023]
|
30
|
Hleap JS, Susko E, Blouin C. Defining structural and evolutionary modules in proteins: a community detection approach to explore sub-domain architecture. BMC STRUCTURAL BIOLOGY 2013; 13:20. [PMID: 24131821 PMCID: PMC4016585 DOI: 10.1186/1472-6807-13-20] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 10/11/2013] [Indexed: 12/23/2022]
Abstract
Background Assessing protein modularity is important to understand protein evolution. Still the question of the existence of a sub-domain modular architecture remains. We propose a graph-theory approach with significance and power testing to identify modules in protein structures. In the first step, clusters are determined by optimizing the partition that maximizes the modularity score. Second, each cluster is tested for significance. Significant clusters are referred to as modules. Evolutionary modules are identified by analyzing homologous structures. Dynamic modules are inferred from sets of snapshots of molecular simulations. We present here a methodology to identify sub-domain architecture robustly, biologically meaningful, and statistically supported. Results The robustness of this new method is tested using simulated data with known modularity. Modules are correctly identified even when there is a low correlation between landmarks within a module. We also analyzed the evolutionary modularity of a data set of α-amylase catalytic domain homologs, and the dynamic modularity of the Niemann-Pick C1 (NPC1) protein N-terminal domain. The α-amylase contains an (α/β)8 barrel (TIM barrel) with the polysaccharides cleavage site and a calcium-binding domain. In this data set we identified four robust evolutionary modules, one of which forms the minimal functional TIM barrel topology. The NPC1 protein is involved in the intracellular lipid metabolism coordinating sterol trafficking. NPC1 N-terminus is the first luminal domain which binds to cholesterol and its oxygenated derivatives. Our inferred dynamic modules in the protein NPC1 are also shown to match functional components of the protein related to the NPC1 disease. Conclusions A domain compartmentalization can be found and described in correlation space. To our knowledge, there is no other method attempting to identify sub-domain architecture from the correlation among residues. Most attempts made focus on sequence motifs of protein-protein interactions, binding sites, or sequence conservancy. We were able to describe functional/structural sub-domain architecture related to key residues for starch cleavage, calcium, and chloride binding sites in the α-amylase, and sterol opening-defining modules and disease-related residues in the NPC1. We also described the evolutionary sub-domain architecture of the α-amylase catalytic domain, identifying the already reported minimum functional TIM barrel.
Collapse
Affiliation(s)
- Jose Sergio Hleap
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, B3H 4R2, Canada.
| | | | | |
Collapse
|
31
|
Caetano-Anollés G, Wang M, Caetano-Anollés D. Structural phylogenomics retrodicts the origin of the genetic code and uncovers the evolutionary impact of protein flexibility. PLoS One 2013; 8:e72225. [PMID: 23991065 PMCID: PMC3749098 DOI: 10.1371/journal.pone.0072225] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Accepted: 07/07/2013] [Indexed: 11/18/2022] Open
Abstract
The genetic code shapes the genetic repository. Its origin has puzzled molecular scientists for over half a century and remains a long-standing mystery. Here we show that the origin of the genetic code is tightly coupled to the history of aminoacyl-tRNA synthetase enzymes and their interactions with tRNA. A timeline of evolutionary appearance of protein domain families derived from a structural census in hundreds of genomes reveals the early emergence of the 'operational' RNA code and the late implementation of the standard genetic code. The emergence of codon specificities and amino acid charging involved tight coevolution of aminoacyl-tRNA synthetases and tRNA structures as well as episodes of structural recruitment. Remarkably, amino acid and dipeptide compositions of single-domain proteins appearing before the standard code suggest archaic synthetases with structures homologous to catalytic domains of tyrosyl-tRNA and seryl-tRNA synthetases were capable of peptide bond formation and aminoacylation. Results reveal that genetics arose through coevolutionary interactions between polypeptides and nucleic acid cofactors as an exacting mechanism that favored flexibility and folding of the emergent proteins. These enhancements of phenotypic robustness were likely internalized into the emerging genetic system with the early rise of modern protein structure.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
- * E-mail:
| | - Minglei Wang
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Derek Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
32
|
Hsu CH, Chen CK, Hwang MJ. The architectural design of networks of protein domain architectures. Biol Lett 2013; 9:20130268. [PMID: 23760167 DOI: 10.1098/rsbl.2013.0268] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein domain architectures (PDAs), in which single domains are linked to form multiple-domain proteins, are a major molecular form used by evolution for the diversification of protein functions. However, the design principles of PDAs remain largely uninvestigated. In this study, we constructed networks to connect domain architectures that had grown out from the same single domain for every single domain in the Pfam-A database and found that there are three main distinctive types of these networks, which suggests that evolution can exploit PDAs in three different ways. Further analysis showed that these three different types of PDA networks are each adopted by different types of protein domains, although many networks exhibit the characteristics of more than one of the three types. Our results shed light on nature's blueprint for protein architecture and provide a framework for understanding architectural design from a network perspective.
Collapse
Affiliation(s)
- Chia-Hsin Hsu
- Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan, Republic of China
| | | | | |
Collapse
|
33
|
Syamaladevi DP, Joshi A, Sowdhamini R. An alignment-free domain architecture similarity search (ADASS) algorithm for inferring homology between multi-domain proteins. Bioinformation 2013; 9:491-9. [PMID: 23861564 PMCID: PMC3705623 DOI: 10.6026/97320630009491] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2012] [Revised: 01/01/2013] [Accepted: 01/02/2013] [Indexed: 11/23/2022] Open
Abstract
Annotations of the genes and their products are largely guided by inferring homology. Sequence
similarity is the primary measure used for annotation purpose however, the domain content and
order were given less importance albeit the fact that domain insertion, deletion, positional
changes can bring in functional varieties. Of late, several methods developed quantify domain
architecture similarity depending on alignments of their sequences and are focused on only homologous
proteins. We present an alignment-free domain architecture-similarity search (ADASS) algorithm that
identifies proteins that share very poor sequence similarity yet having similar domain architectures.
We introduce a “singlet matching-triplet comparison” method in ADASS, wherein triplet of domains is
compared with other triplets in a pair-wise comparison of two domain architectures. Different events
in the triplet comparison are scored as per a scoring scheme and an average pairwise distance score
(Domain Architecture Distance score - DAD Score) is calculated between protein domains architectures.
We use domain architectures of a selected domain termed as centric domain and cluster them based on DAD score.
The algorithm has high Positive Prediction Value (PPV) with respect to the clustering of the sequences of selected
domain architectures. A comparison of domain architecture based dendrograms using ADASS method and an existing
method revealed that ADASS can classify proteins depending on the extent of domain architecture level similarity.
ADASS is more relevant in cases of proteins with tiny domains having little contribution to the overall sequence
similarity but contributing significantly to the overall function.
Collapse
Affiliation(s)
- Divya P Syamaladevi
- Sugarcane Breeding Institute Indian Council of Agricultural Research Coimbatore, India, PIN 641 007 ; National Center for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore 560 065, India
| | | | | |
Collapse
|
34
|
Kamneva OK, Knight SJ, Liberles DA, Ward NL. Analysis of genome content evolution in pvc bacterial super-phylum: assessment of candidate genes associated with cellular organization and lifestyle. Genome Biol Evol 2013; 4:1375-90. [PMID: 23221607 PMCID: PMC3542564 DOI: 10.1093/gbe/evs113] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) super-phylum contains bacteria with either complex cellular organization or simple cell structure; it also includes organisms of different lifestyles (pathogens, mutualists, commensal, and free-living). Genome content evolution of this group has not been studied in a systematic fashion, which would reveal genes underlying the emergence of PVC-specific phenotypes. Here, we analyzed the evolutionary dynamics of 26 PVC genomes and several outgroup species. We inferred HGT, duplications, and losses by reconciliation of 27,123 gene trees with the species phylogeny. We showed that genome expansion and contraction have driven evolution within Planctomycetes and Chlamydiae, respectively, and balanced each other in Verrucomicrobia and Lentisphaerae. We also found that for a large number of genes in PVC genomes the most similar sequences are present in Acidobacteria, suggesting past and/or current ecological interaction between organisms from these groups. We also found evidence of shared ancestry between carbohydrate degradation genes in the mucin-degrading human intestinal commensal Akkermansia muciniphila and sequences from Acidobacteria and Bacteroidetes, suggesting that glycoside hydrolases are transferred laterally between gut microbes and that the process of carbohydrate degradation is crucial for microbial survival within the human digestive system. Further, we identified a highly conserved genetic module preferentially present in compartmentalized PVC species and possibly associated with the complex cell plan in these organisms. This conserved machinery is likely to be membrane targeted and involved in electron transport, although its exact function is unknown. These genes represent good candidates for future functional studies.
Collapse
Affiliation(s)
- Olga K Kamneva
- Department of Molecular Biology, University of Wyoming, WY, USA
| | | | | | | |
Collapse
|
35
|
Affiliation(s)
- Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa 31905, Israel;
| | - Leonid Pereyaslavets
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| | | | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| |
Collapse
|
36
|
Bornberg-Bauer E, Albà MM. Dynamics and adaptive benefits of modular protein evolution. Curr Opin Struct Biol 2013; 23:459-66. [PMID: 23562500 DOI: 10.1016/j.sbi.2013.02.012] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 02/15/2013] [Accepted: 02/15/2013] [Indexed: 11/29/2022]
Abstract
During protein evolution, novel domain arrangements are continuously formed. Rearrangements are important for the creation of molecular biodiversity and for functional molecular changes which underlie developmental shifts in the bauplan of organisms. Here we review the mechanisms by which new arrangements arise and the potential benefits of rearrangements. We concentrate on how new domains emerge and why they rapidly spread across genomes, gaining higher copy numbers than older, more established domains. This spread is most likely a consequence of their high adaptive potential but is unlikely to make up on its own for the drastic loss of domains, which is observed across different taxa. We show that a significant portion of the recently emerged domains, especially those in multidomain families, are highly disordered and speculate about the significance of these findings for the evolvability of novel genetic material.
Collapse
Affiliation(s)
- Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, School of Biological Sciences, University of Münster, Hüfferstrasse 1, D48149 Münster, Germany.
| | | |
Collapse
|
37
|
Bukhari SA, Caetano-Anollés G. Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes. PLoS Comput Biol 2013; 9:e1003009. [PMID: 23555236 PMCID: PMC3610613 DOI: 10.1371/journal.pcbi.1003009] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2012] [Accepted: 02/13/2013] [Indexed: 12/22/2022] Open
Abstract
The spatial arrangements of secondary structures in proteins, irrespective of their connectivity, depict the overall shape and organization of protein domains. These features have been used in the CATH and SCOP classifications to hierarchically partition fold space and define the architectural make up of proteins. Here we use phylogenomic methods and a census of CATH structures in hundreds of genomes to study the origin and diversification of protein architectures (A) and their associated topologies (T) and superfamilies (H). Phylogenies that describe the evolution of domain structures and proteomes were reconstructed from the structural census and used to generate timelines of domain discovery. Phylogenies of CATH domains at T and H levels of structural abstraction and associated chronologies revealed patterns of reductive evolution, the early rise of Archaea, three epochs in the evolution of the protein world, and patterns of structural sharing between superkingdoms. Phylogenies of proteomes confirmed the early appearance of Archaea. While these findings are in agreement with previous phylogenomic studies based on the SCOP classification, phylogenies unveiled sharing patterns between Archaea and Eukarya that are recent and can explain the canonical bacterial rooting typically recovered from sequence analysis. Phylogenies of CATH domains at A level uncovered general patterns of architectural origin and diversification. The tree of A structures showed that ancient structural designs such as the 3-layer (αβα) sandwich (3.40) or the orthogonal bundle (1.10) are comparatively simpler in their makeup and are involved in basic cellular functions. In contrast, modern structural designs such as prisms, propellers, 2-solenoid, super-roll, clam, trefoil and box are not widely distributed and were probably adopted to perform specialized functions. Our timelines therefore uncover a universal tendency towards protein structural complexity that is remarkable.
Collapse
Affiliation(s)
- Syed Abbas Bukhari
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
38
|
Xing S, Li M, Liu P. Evolution of S-domain receptor-like kinases in land plants and origination of S-locus receptor kinases in Brassicaceae. BMC Evol Biol 2013; 13:69. [PMID: 23510165 PMCID: PMC3616866 DOI: 10.1186/1471-2148-13-69] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 03/12/2013] [Indexed: 01/31/2023] Open
Abstract
Background The S-domain serine/threonine receptor-like kinases (SRLKs) comprise one of the largest and most rapidly expanding subfamilies in the plant receptor-like/Pelle kinase (RLKs) family. The founding member of this subfamily, the S-locus receptor kinase (SRK), functions as the female determinant of specificity in the self-incompatibility (SI) responses of crucifers. Two classes of proteins resembling the extracellular S domain (designated S-domain receptor-like proteins, SRLPs) or the intracellular kinase domain (designated S-domain receptor-like cytoplasmic kinases, SRLCKs) of SRK are also ubiquitous in land plants, indicating that the SRLKs are composite molecules that originated by domain fusion of the two component proteins. Here, we explored the origin and diversification of SRLKs by phylogenomic methods. Results Based on the distribution patterns of SRLKs and SRLCKs in a reconciled species-domain tree, a maximum parsimony model was then established for simultaneously inferring and dating gene duplication/loss and fusion /fission events in SRLK evolution. Various SRK alleles from crucifer species were then included in our phylogenetic analyses to infer the origination of SRKs by identifying the proper outgroups. Conclusions Two gene fusion events were inferred and the major gene fusion event occurred in the common ancestor of land plants generated almost all of extant SRLKs. The functional diversification of duplicated SRLKs was illustrated by molecular evolution analyses of SRKs. Our findings support that SRKs originated as two ancient haplotypes derived from a pair of tandem duplicate genes through random regulatory neo-/sub- functionalization in the common ancestor of the Brassicaceae.
Collapse
Affiliation(s)
- Shilai Xing
- Department of Ecology, College of Resources and Environmental Sciences, China Agricultural University, Beijing 100193, People's Republic of China
| | | | | |
Collapse
|
39
|
Moore AD, Grath S, Schüler A, Huylmans AK, Bornberg-Bauer E. Quantification and functional analysis of modular protein evolution in a dense phylogenetic tree. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:898-907. [PMID: 23376183 DOI: 10.1016/j.bbapap.2013.01.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/06/2013] [Accepted: 01/09/2013] [Indexed: 12/24/2022]
Abstract
Modularity is a hallmark of molecular evolution. Whether considering gene regulation, the components of metabolic pathways or signaling cascades, the ability to reuse autonomous modules in different molecular contexts can expedite evolutionary innovation. Similarly, protein domains are the modules of proteins, and modular domain rearrangements can create diversity with seemingly few operations in turn allowing for swift changes to an organism's functional repertoire. Here, we assess the patterns and functional effects of modular rearrangements at high resolution. Using a well resolved and diverse group of pancrustaceans, we illustrate arrangement diversity within closely related organisms, estimate arrangement turnover frequency and establish, for the first time, branch-specific rate estimates for fusion, fission, domain addition and terminal loss. Our results show that roughly 16 new arrangements arise per million years and that between 64% and 81% of these can be explained by simple, single-step modular rearrangement events. We find evidence that the frequencies of fission and terminal deletion events increase over time, and that modular rearrangements impact all levels of the cellular signaling apparatus and thus may have strong adaptive potential. Novel arrangements that cannot be explained by simple modular rearrangements contain a significant amount of repeat domains that occur in complex patterns which we term "supra-repeats". Furthermore, these arrangements are significantly longer than those with a single-step rearrangement solution, suggesting that such arrangements may result from multi-step events. In summary, our analysis provides an integrated view and initial quantification of the patterns and functional impact of modular protein evolution in a well resolved phylogenetic tree. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.
Collapse
Affiliation(s)
- Andrew D Moore
- Institute for Evolution and Biodiversity, Münster, Germany
| | | | | | | | | |
Collapse
|
40
|
Zmasek CM, Godzik A. This Déjà vu feeling--analysis of multidomain protein evolution in eukaryotic genomes. PLoS Comput Biol 2012; 8:e1002701. [PMID: 23166479 PMCID: PMC3499355 DOI: 10.1371/journal.pcbi.1002701] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2012] [Accepted: 07/27/2012] [Indexed: 12/31/2022] Open
Abstract
Evolutionary innovation in eukaryotes and especially animals is at least partially driven by genome rearrangements and the resulting emergence of proteins with new domain combinations, and thus potentially novel functionality. Given the random nature of such rearrangements, one could expect that proteins with particularly useful multidomain combinations may have been rediscovered multiple times by parallel evolution. However, existing reports suggest a minimal role of this phenomenon in the overall evolution of eukaryotic proteomes. We assembled a collection of 172 complete eukaryotic genomes that is not only the largest, but also the most phylogenetically complete set of genomes analyzed so far. By employing a maximum parsimony approach to compare repertoires of Pfam domains and their combinations, we show that independent evolution of domain combinations is significantly more prevalent than previously thought. Our results indicate that about 25% of all currently observed domain combinations have evolved multiple times. Interestingly, this percentage is even higher for sets of domain combinations in individual species, with, for instance, 70% of the domain combinations found in the human genome having evolved independently at least once in other species. We also show that previous, much lower estimates of this rate are most likely due to the small number and biased phylogenetic distribution of the genomes analyzed. The process of independent emergence of identical domain combination is widespread, not limited to domains with specific functional categories. Besides data from large-scale analyses, we also present individual examples of independent domain combination evolution. The surprisingly large contribution of parallel evolution to the development of the domain combination repertoire in extant genomes has profound consequences for our understanding of the evolution of pathways and cellular processes in eukaryotes and for comparative functional genomics. Most proteins in eukaryotes are composed of two or more domains, evolutionary independent units with (often) their own individual functions. The specific repertoire of multidomain proteins in a given species defines the topology of pathways and networks that carry out its metabolic and regulatory processes. When proteins with new domain combinations emerge by gene fusion and fission, it directly affects topology of cellular networks in this organism. To better understand the evolution of such networks we analyzed a large set of eukaryotic genomes for the evolutionary history of known domain combinations. Our analysis shows that 70% of all domain combinations present in the human genome independently appeared in at least one other eukaryotic genome. Overall, over 25% of all known multidomain architectures emerged independently several times in the history of life. The difference between a global and species specific picture can be explained by the existence of a core set of domain combinations that keeps reemerging in different species, which are accompanied by a smaller number of unique domain combinations that do not appear anywhere else.
Collapse
Affiliation(s)
- Christian M. Zmasek
- Program in Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail: (CMZ); (AG)
| | - Adam Godzik
- Program in Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail: (CMZ); (AG)
| |
Collapse
|
41
|
Zhao S, Liang Z, Demko V, Wilson R, Johansen W, Olsen OA, Shalchian-Tabrizi K. Massive expansion of the calpain gene family in unicellular eukaryotes. BMC Evol Biol 2012; 12:193. [PMID: 23020305 PMCID: PMC3563603 DOI: 10.1186/1471-2148-12-193] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 09/24/2012] [Indexed: 11/30/2022] Open
Abstract
Background Calpains are Ca2+-dependent cysteine proteases that participate in a range of crucial cellular processes. Dysfunction of these enzymes may cause, for instance, life-threatening diseases in humans, the loss of sex determination in nematodes and embryo lethality in plants. Although the calpain family is well characterized in animal and plant model organisms, there is a great lack of knowledge about these genes in unicellular eukaryote species (i.e. protists). Here, we study the distribution and evolution of calpain genes in a wide range of eukaryote genomes from major branches in the tree of life. Results Our investigations reveal 24 types of protein domains that are combined with the calpain-specific catalytic domain CysPc. In total we identify 41 different calpain domain architectures, 28 of these domain combinations have not been previously described. Based on our phylogenetic inferences, we propose that at least four calpain variants were established in the early evolution of eukaryotes, most likely before the radiation of all the major supergroups of eukaryotes. Many domains associated with eukaryotic calpain genes can be found among eubacteria or archaebacteria but never in combination with the CysPc domain. Conclusions The analyses presented here show that ancient modules present in prokaryotes, and a few de novo eukaryote domains, have been assembled into many novel domain combinations along the evolutionary history of eukaryotes. Some of the new calpain genes show a narrow distribution in a few branches in the tree of life, likely representing lineage-specific innovations. Hence, the functionally important classical calpain genes found among humans and vertebrates make up only a tiny fraction of the calpain family. In fact, a massive expansion of the calpain family occurred by domain shuffling among unicellular eukaryotes and contributed to a wealth of functionally different genes.
Collapse
Affiliation(s)
- Sen Zhao
- Microbial Evolution Research Group (MERG), Department of Biology, University of Oslo, OSLO, N-0136, Norway
| | | | | | | | | | | | | |
Collapse
|
42
|
Caetano-Anollés G, Nasir A. Benefits of using molecular structure and abundance in phylogenomic analysis. Front Genet 2012; 3:172. [PMID: 22973296 PMCID: PMC3434437 DOI: 10.3389/fgene.2012.00172] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 08/18/2012] [Indexed: 12/25/2022] Open
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana-Champaign, IL, USA
| | | |
Collapse
|
43
|
Suen S, Lu HHS, Yeang CH. Evolution of domain architectures and catalytic functions of enzymes in metabolic systems. Genome Biol Evol 2012; 4:976-93. [PMID: 22936075 PMCID: PMC3468959 DOI: 10.1093/gbe/evs072] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Domain architectures and catalytic functions of enzymes constitute the centerpieces of a metabolic network. These types of information are formulated as a two-layered network consisting of domains, proteins, and reactions-a domain-protein-reaction (DPR) network. We propose an algorithm to reconstruct the evolutionary history of DPR networks across multiple species and categorize the mechanisms of metabolic systems evolution in terms of network changes. The reconstructed history reveals distinct patterns of evolutionary mechanisms between prokaryotic and eukaryotic networks. Although the evolutionary mechanisms in early ancestors of prokaryotes and eukaryotes are quite similar, more novel and duplicated domain compositions with identical catalytic functions arise along the eukaryotic lineage. In contrast, prokaryotic enzymes become more versatile by catalyzing multiple reactions with similar chemical operations. Moreover, different metabolic pathways are enriched with distinct network evolution mechanisms. For instance, although the pathways of steroid biosynthesis, protein kinases, and glycosaminoglycan biosynthesis all constitute prominent features of animal-specific physiology, their evolution of domain architectures and catalytic functions follows distinct patterns. Steroid biosynthesis is enriched with reaction creations but retains a relatively conserved repertoire of domain compositions and proteins. Protein kinases retain conserved reactions but possess many novel domains and proteins. In contrast, glycosaminoglycan biosynthesis has high rates of reaction/protein creations and domain recruitments. Finally, we elicit and validate two general principles underlying the evolution of DPR networks: 1) duplicated enzyme proteins possess similar catalytic functions and 2) the majority of novel domains arise to catalyze novel reactions. These results shed new lights on the evolution of metabolic systems.
Collapse
Affiliation(s)
- Summit Suen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | | | |
Collapse
|
44
|
Nasir A, Kim KM, Caetano-Anolles G. Giant viruses coexisted with the cellular ancestors and represent a distinct supergroup along with superkingdoms Archaea, Bacteria and Eukarya. BMC Evol Biol 2012; 12:156. [PMID: 22920653 PMCID: PMC3570343 DOI: 10.1186/1471-2148-12-156] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 08/22/2012] [Indexed: 11/17/2022] Open
Abstract
Background The discovery of giant viruses with genome and physical size comparable to cellular organisms, remnants of protein translation machinery and virus-specific parasites (virophages) have raised intriguing questions about their origin. Evidence advocates for their inclusion into global phylogenomic studies and their consideration as a distinct and ancient form of life. Results Here we reconstruct phylogenies describing the evolution of proteomes and protein domain structures of cellular organisms and double-stranded DNA viruses with medium-to-very-large proteomes (giant viruses). Trees of proteomes define viruses as a ‘fourth supergroup’ along with superkingdoms Archaea, Bacteria, and Eukarya. Trees of domains indicate they have evolved via massive and primordial reductive evolutionary processes. The distribution of domain structures suggests giant viruses harbor a significant number of protein domains including those with no cellular representation. The genomic and structural diversity embedded in the viral proteomes is comparable to the cellular proteomes of organisms with parasitic lifestyles. Since viral domains are widespread among cellular species, we propose that viruses mediate gene transfer between cells and crucially enhance biodiversity. Conclusions Results call for a change in the way viruses are perceived. They likely represent a distinct form of life that either predated or coexisted with the last universal common ancestor (LUCA) and constitute a very crucial part of our planet’s biosphere.
Collapse
Affiliation(s)
- Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Science, University of Illinois, Urbana, IL 61801, USA
| | | | | |
Collapse
|
45
|
Leclère L, Rentzsch F. Repeated evolution of identical domain architecture in metazoan netrin domain-containing proteins. Genome Biol Evol 2012; 4:883-99. [PMID: 22813778 PMCID: PMC3516229 DOI: 10.1093/gbe/evs061] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2012] [Indexed: 12/13/2022] Open
Abstract
The majority of proteins in eukaryotes are composed of multiple domains, and the number and order of these domains is an important determinant of protein function. Although multidomain proteins with a particular domain architecture were initially considered to have a common evolutionary origin, recent comparative studies of protein families or whole genomes have reported that a minority of multidomain proteins could have appeared multiple times independently. Here, we test this scenario in detail for the signaling molecules netrin and secreted frizzled-related proteins (sFRPs), two groups of netrin domain-containing proteins with essential roles in animal development. Our primary phylogenetic analyses suggest that the particular domain architectures of each of these proteins were present in the eumetazoan ancestor and evolved a second time independently within the metazoan lineage from laminin and frizzled proteins, respectively. Using an array of phylogenetic methods, statistical tests, and character sorting analyses, we show that the polyphyly of netrin and sFRP is well supported and cannot be explained by classical phylogenetic reconstruction artifacts. Despite their independent origins, the two groups of netrins and of sFRPs have the same protein interaction partners (Deleted in Colorectal Cancer/neogenin and Unc5 for netrins and Wnts for sFRPs) and similar developmental functions. Thus, these cases of convergent evolution emphasize the importance of domain architecture for protein function by uncoupling shared domain architecture from shared evolutionary history. Therefore, we propose the terms merology to describe the repeated evolution of proteins with similar domain architecture and discuss the potential of merologous proteins to help understanding protein evolution.
Collapse
Affiliation(s)
- Lucas Leclère
- Sars International Centre for Marine Molecular Biology, University of Bergen, Norway.
| | | |
Collapse
|
46
|
Alvarez-Venegas R, Avramova Z. Evolution of the PWWP-domain encoding genes in the plant and animal lineages. BMC Evol Biol 2012; 12:101. [PMID: 22734652 PMCID: PMC3457860 DOI: 10.1186/1471-2148-12-101] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 06/06/2012] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Conserved domains are recognized as the building blocks of eukaryotic proteins. Domains showing a tendency to occur in diverse combinations ('promiscuous' domains) are involved in versatile architectures in proteins with different functions. Current models, based on global-level analyses of domain combinations in multiple genomes, have suggested that the propensity of some domains to associate with other domains in high-level architectures increases with organismal complexity. Alternative models using domain-based phylogenetic trees propose that domains have become promiscuous independently in different lineages through convergent evolution and are, thus, random with no functional or structural preferences. Here we test whether complex protein architectures have occurred by accretion from simpler systems and whether the appearance of multidomain combinations parallels organismal complexity. As a model, we analyze the modular evolution of the PWWP domain and ask whether its appearance in combinations with other domains into multidomain architectures is linked with the occurrence of more complex life-forms. Whether high-level combinations of domains are conserved and transmitted as stable units (cassettes) through evolution is examined in the genomes of plant or metazoan species selected for their established position in the evolution of the respective lineages. RESULTS Using the domain-tree approach, we analyze the evolutionary origins and distribution patterns of the promiscuous PWWP domain to understand the principles of its modular evolution and its existence in combination with other domains in higher-level protein architectures. We found that as a single module the PWWP domain occurs only in proteins with a limited, mainly, species-specific distribution. Earlier, it was suggested that domain promiscuity is a fast-changing (volatile) feature shaped by natural selection and that only a few domains retain their promiscuity status throughout evolution. In contrast, our data show that most of the multidomain PWWP combinations in extant multicellular organisms (humans or land plants) are present in their unicellular ancestral relatives suggesting they have been transmitted through evolution as conserved linear arrangements ('cassettes'). Among the most interesting biologically relevant results is the finding that the genes of the two plant Trithorax family subgroups (ATX1/2 and ATX3/4/5) have different phylogenetic origins. The two subgroups occur together in the earliest land plants Physcomitrella patens and Selaginella moellendorffii. CONCLUSION Gain/loss of a single PWWP domain is observed throughout evolution reflecting dynamic lineage- or species-specific events. In contrast, higher-level protein architectures involving the PWWP domain have survived as stable arrangements driven by evolutionary descent. The association of PWWP domains with the DNA methyltransferases in O. tauri and in the metazoan lineage seems to have occurred independently consistent with convergent evolution. Our results do not support models wherein more complex protein architectures involving the PWWP domain occur with the appearance of more evolutionarily advanced life forms.
Collapse
Affiliation(s)
- Raúl Alvarez-Venegas
- Department of Genetic Engineering, Centro de Investigación y de Estudios Avanzados, Unidad Irapuato, Irapuato Gto., 36821, Mexico
| | | |
Collapse
|
47
|
Kim KM, Caetano-Anollés G. The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms. BMC Evol Biol 2012; 12:13. [PMID: 22284070 PMCID: PMC3306197 DOI: 10.1186/1471-2148-12-13] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2011] [Accepted: 01/27/2012] [Indexed: 11/23/2022] Open
Abstract
Background The entire evolutionary history of life can be studied using myriad sequences generated by genomic research. This includes the appearance of the first cells and of superkingdoms Archaea, Bacteria, and Eukarya. However, the use of molecular sequence information for deep phylogenetic analyses is limited by mutational saturation, differential evolutionary rates, lack of sequence site independence, and other biological and technical constraints. In contrast, protein structures are evolutionary modules that are highly conserved and diverse enough to enable deep historical exploration. Results Here we build phylogenies that describe the evolution of proteins and proteomes. These phylogenetic trees are derived from a genomic census of protein domains defined at the fold family (FF) level of structural classification. Phylogenomic trees of FF structures were reconstructed from genomic abundance levels of 2,397 FFs in 420 proteomes of free-living organisms. These trees defined timelines of domain appearance, with time spanning from the origin of proteins to the present. Timelines are divided into five different evolutionary phases according to patterns of sharing of FFs among superkingdoms: (1) a primordial protein world, (2) reductive evolution and the rise of Archaea, (3) the rise of Bacteria from the common ancestor of Bacteria and Eukarya and early development of the three superkingdoms, (4) the rise of Eukarya and widespread organismal diversification, and (5) eukaryal diversification. The relative ancestry of the FFs shows that reductive evolution by domain loss is dominant in the first three phases and is responsible for both the diversification of life from a universal cellular ancestor and the appearance of superkingdoms. On the other hand, domain gains are predominant in the last two phases and are responsible for organismal diversification, especially in Bacteria and Eukarya. Conclusions The evolution of functions that are associated with corresponding FFs along the timeline reveals that primordial metabolic domains evolved earlier than informational domains involved in translation and transcription, supporting the metabolism-first hypothesis rather than the RNA world scenario. In addition, phylogenomic trees of proteomes reconstructed from FFs appearing in each of the five phases of the protein world show that trees reconstructed from ancient domain structures were consistently rooted in archaeal lineages, supporting the proposal that the archaeal ancestor is more ancient than the ancestors of other superkingdoms.
Collapse
Affiliation(s)
- Kyung Mo Kim
- Evolutionary Bioinformatics Laboratory, Department of Crop Science, University of Illinois, Urbana, IL 61801, USA
| | | |
Collapse
|
48
|
Kersting AR, Bornberg-Bauer E, Moore AD, Grath S. Dynamics and adaptive benefits of protein domain emergence and arrangements during plant genome evolution. Genome Biol Evol 2012; 4:316-29. [PMID: 22250127 PMCID: PMC3318442 DOI: 10.1093/gbe/evs004] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Plant genomes are generally very large, mostly paleopolyploid, and have numerous gene duplicates and complex genomic features such as repeats and transposable elements. Many of these features have been hypothesized to enable plants, which cannot easily escape environmental challenges, to rapidly adapt. Another mechanism, which has recently been well described as a major facilitator of rapid adaptation in bacteria, animals, and fungi but not yet for plants, is modular rearrangement of protein-coding genes. Due to the high precision of profile-based methods, rearrangements can be well captured at the protein level by characterizing the emergence, loss, and rearrangements of protein domains, their structural, functional, and evolutionary building blocks. Here, we study the dynamics of domain rearrangements and explore their adaptive benefit in 27 plant and 3 algal genomes. We use a phylogenomic approach by which we can explain the formation of 88% of all arrangements by single-step events, such as fusion, fission, and terminal loss of domains. We find many domains are lost along every lineage, but at least 500 domains are novel, that is, they are unique to green plants and emerged more or less recently. These novel domains duplicate and rearrange more readily within their genomes than ancient domains and are overproportionally involved in stress response and developmental innovations. Novel domains more often affect regulatory proteins and show a higher degree of structural disorder than ancient domains. Whereas a relatively large and well-conserved core set of single-domain proteins exists, long multi-domain arrangements tend to be species-specific. We find that duplicated genes are more often involved in rearrangements. Although fission events typically impact metabolic proteins, fusion events often create new signaling proteins essential for environmental sensing. Taken together, the high volatility of single domains and complex arrangements in plant genomes demonstrate the importance of modularity for environmental adaptability of plants.
Collapse
Affiliation(s)
- Anna R Kersting
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Muenster (WWU), Germany
| | | | | | | |
Collapse
|
49
|
The phylogenomic roots of modern biochemistry: origins of proteins, cofactors and protein biosynthesis. J Mol Evol 2012; 74:1-34. [PMID: 22210458 DOI: 10.1007/s00239-011-9480-1] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 12/12/2011] [Indexed: 12/20/2022]
Abstract
The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.
Collapse
|
50
|
Wu YC, Rasmussen MD, Kellis M. Evolution at the subgene level: domain rearrangements in the Drosophila phylogeny. Mol Biol Evol 2011; 29:689-705. [PMID: 21900599 PMCID: PMC3258039 DOI: 10.1093/molbev/msr222] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.
Collapse
Affiliation(s)
- Yi-Chieh Wu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Massachusetts, USA.
| | | | | |
Collapse
|