Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Brenner SE, Hubbard T, Murzin A, Chothia C. Gene duplications in H. influenzae. Nature 1995;378:140. [PMID: 7477316 DOI: 10.1038/378140a0] [Citation(s) in RCA: 74] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Number

Cited by Other Article(s)

Borriello E, Walker SI, Laubichler MD. Cell phenotypes as macrostates of the GRN dynamics. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2020;334:213-224. [DOI: 10.1002/jez.b.22938] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 02/16/2020] [Accepted: 02/17/2020] [Indexed: 01/04/2023]

Will WR, Brzovic P, Le Trong I, Stenkamp RE, Lawrenz MB, Karlinsey JE, Navarre WW, Main-Hester K, Miller VL, Libby SJ, Fang FC. The Evolution of SlyA/RovA Transcription Factors from Repressors to Countersilencers in Enterobacteriaceae. mBio 2019;10:e00009-19. [PMID: 30837332 PMCID: PMC6401476 DOI: 10.1128/mbio.00009-19] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 01/29/2019] [Indexed: 02/02/2023] Open

Abstract

Gene duplication and subsequent evolutionary divergence have allowed conserved proteins to develop unique roles. The MarR family of transcription factors (TFs) has undergone extensive duplication and diversification in bacteria, where they act as environmentally responsive repressors of genes encoding efflux pumps that confer resistance to xenobiotics, including many antimicrobial agents. We have performed structural, functional, and genetic analyses of representative members of the SlyA/RovA lineage of MarR TFs, which retain some ancestral functions, including repression of their own expression and that of divergently transcribed multidrug efflux pumps, as well as allosteric inhibition by aromatic carboxylate compounds. However, SlyA and RovA have acquired the ability to countersilence horizontally acquired genes, which has greatly facilitated the evolution of Enterobacteriaceae by horizontal gene transfer. SlyA/RovA TFs in different species have independently evolved novel regulatory circuits to provide the enhanced levels of expression required for their new role. Moreover, in contrast to MarR, SlyA is not responsive to copper. These observations demonstrate the ability of TFs to acquire new functions as a result of evolutionary divergence of both cis-regulatory sequences and in trans interactions with modulatory ligands.IMPORTANCE Bacteria primarily evolve via horizontal gene transfer, acquiring new traits such as virulence and antibiotic resistance in single transfer events. However, newly acquired genes must be integrated into existing regulatory networks to allow appropriate expression in new hosts. This is accommodated in part by the opposing mechanisms of xenogeneic silencing and countersilencing. An understanding of these mechanisms is necessary to understand the relationship between gene regulation and bacterial evolution. Here we examine the functional evolution of an important lineage of countersilencers belonging to the ancient MarR family of classical transcriptional repressors. We show that although members of the SlyA lineage retain some ancestral features associated with the MarR family, their cis-regulatory sequences have evolved significantly to support their new function. Understanding the mechanistic requirements for countersilencing is critical to understanding the pathoadaptation of emerging pathogens and also has practical applications in synthetic biology.

Collapse

Strygina KV, Börner A, Khlestkina EK. Identification and characterization of regulatory network components for anthocyanin synthesis in barley aleurone. BMC PLANT BIOLOGY 2017;17:184. [PMID: 29143621 PMCID: PMC5688479 DOI: 10.1186/s12870-017-1122-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]

Abstract

BACKGROUND

Among natural populations, there are different colours of barley (Hordeum vulgare L.). The colour of barley grains is directly related to the accumulation of different pigments in the aleurone layer, pericarp and lemma. Blue grain colour is due to the accumulation of anthocyanins in the aleurone layer, which is dependent on the presence of five Blx genes that are not sequenced yet (Blx1, Blx3 and Blx4 genes clustering on chromosome 4HL and Blx2 and Blx5 on 7HL). Due to the health benefits of anthocyanins, blue-grained barley can be considered as a source of dietary food. The goal of the current study was to identify and characterize components of the anthocyanin synthesis regulatory network for the aleurone layer in barley.

RESULTS

The candidate genes for components of the regulatory complex MBW (consisting of transcription factors MYB, bHLH/MYC and WD40) for anthocyanin synthesis in barley aleurone were identified. These genes were designated HvMyc2 (4HL), HvMpc2 (4HL), and HvWD40 (6HL). HvMyc2 was expressed in aleurone cells only. A loss-of-function (frame shift) mutation in HvMyc2 of non-coloured compared to blue-grained barley was revealed. Unlike aleurone-specific HvMyc2, the HvMpc2 gene was expressed in different tissues; however, its activity was not detected in non-coloured aleurone in contrast to a coloured aleurone, and allele-specific mutations in its promoter region were found. The single-copy gene HvWD40, which encodes the required component of the regulatory MBW complex, was expressed constantly in coloured and non-coloured tissues and had no allelic differences. HvMyc2 and HvMpc2 were genetically mapped using allele-specific developed CAPS markers developed. HvMyc2 was mapped in position between SSR loci XGBS0875-4H (3.4 cM distal) and XGBM1048-4H (3.4 cM proximal) matching the region chromosome 4HL where the Blx-cluster was found. In this position, one of the anthocyanin biosynthesis structural genes (HvF3'5'H) was also mapped using an allele-specific CAPS-marker developed in the current study.

CONCLUSIONS

The genes involved in anthocyanin synthesis in the barley aleurone layer were identified and characterized, including components of the regulatory complex MBW, from which the MYC-encoding gene (HvMyc2) appeared to be the main factor underlying variation of barley by aleurone colour.

Collapse

Gene-Family Extension Measures and Correlations. Life (Basel) 2016;6:life6030030. [PMID: 27527218 PMCID: PMC5041006 DOI: 10.3390/life6030030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 07/18/2016] [Accepted: 07/18/2016] [Indexed: 12/28/2022] Open

Cai S, Liu Z, Lee HC. Mean field theory for biology inspired duplication-divergence network model. CHAOS (WOODBURY, N.Y.) 2015;25:083106. [PMID: 26328557 DOI: 10.1063/1.4928212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Espinoza-Valles I, Vora GJ, Lin B, Leekitcharoenphon P, González-Castillo A, Ussery D, Høj L, Gomez-Gil B. Unique and conserved genome regions in Vibrio harveyi and related species in comparison with the shrimp pathogen Vibrio harveyi CAIM 1792. MICROBIOLOGY-SGM 2015. [PMID: 26198743 DOI: 10.1099/mic.0.000141] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Yadav A, Jalan S. Origin and implications of zero degeneracy in networks spectra. CHAOS (WOODBURY, N.Y.) 2015;25:043110. [PMID: 25933658 DOI: 10.1063/1.4917286] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Nunes A, Borrego MJ, Gomes JP. Genomic features beyond Chlamydia trachomatis phenotypes: what do we think we know? INFECTION GENETICS AND EVOLUTION 2013;16:392-400. [PMID: 23523596 DOI: 10.1016/j.meegid.2013.03.018] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2013] [Revised: 02/25/2013] [Accepted: 03/13/2013] [Indexed: 10/27/2022]

Roach JM, Racioppi L, Jones CD, Masci AM. Phylogeny of Toll-like receptor signaling: adapting the innate response. PLoS One 2013;8:e54156. [PMID: 23326591 PMCID: PMC3543326 DOI: 10.1371/journal.pone.0054156] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 12/10/2012] [Indexed: 02/06/2023] Open

Early Career Research Award Lecture. Structure, evolution and dynamics of transcriptional regulatory networks. Biochem Soc Trans 2011;38:1155-78. [PMID: 20863280 DOI: 10.1042/bst0381155] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

SOYER OS, CREEVEY CJ. Duplicate retention in signalling proteins and constraints from network dynamics. J Evol Biol 2010;23:2410-21. [DOI: 10.1111/j.1420-9101.2010.02101.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Sleator RD. An overview of the processes shaping protein evolution. Sci Prog 2010;93:1-6. [PMID: 20222353 PMCID: PMC10365403 DOI: 10.3184/003685009x12605492662844] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Farré D, Albà MM. Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Mol Biol Evol 2009;27:325-35. [PMID: 19822635 DOI: 10.1093/molbev/msp242] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Genomic and structural aspects of protein evolution. Biochem J 2009;419:15-28. [PMID: 19272021 DOI: 10.1042/bj20090122] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

Kummerfeld SK, Teichmann SA. Protein domain organisation: adding order. BMC Bioinformatics 2009;10:39. [PMID: 19178743 PMCID: PMC2657131 DOI: 10.1186/1471-2105-10-39] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2008] [Accepted: 01/29/2009] [Indexed: 11/30/2022] Open

Abstract

Background

Domains are the building blocks of proteins. During evolution, they have been duplicated, fused and recombined, to produce proteins with novel structures and functions. Structural and genome-scale studies have shown that pairs or groups of domains observed together in a protein are almost always found in only one N to C terminal order and are the result of a single recombination event that has been propagated by duplication of the multi-domain unit.

Previous studies of domain organisation have used graph theory to represent the co-occurrence of domains within proteins. We build on this approach by adding directionality to the graphs and connecting nodes based on their relative order in the protein. Most of the time, the linear order of domains is conserved. However, using the directed graph representation we have identified non-linear features of domain organization that are over-represented in genomes. Recognising these patterns and unravelling how they have arisen may allow us to understand the functional relationships between domains and understand how the protein repertoire has evolved.

Results

We identify groups of domains that are not linearly conserved, but instead have been shuffled during evolution so that they occur in multiple different orders. We consider 192 genomes across all three kingdoms of life and use domain and protein annotation to understand their functional significance.

To identify these features and assess their statistical significance, we represent the linear order of domains in proteins as a directed graph and apply graph theoretical methods. We describe two higher-order patterns of domain organisation: clusters and bi-directionally associated domain pairs and explore their functional importance and phylogenetic conservation.

Conclusion

Taking into account the order of domains, we have derived a novel picture of global protein organization. We found that all genomes have a higher than expected degree of clustering and more domain pairs in forward and reverse orientation in different proteins relative to random graphs with identical degree distributions. While these features were statistically over-represented, they are still fairly rare. Looking in detail at the proteins involved, we found strong functional relationships within each cluster. In addition, the domains tended to be involved in protein-protein interaction and are able to function as independent structural units. A particularly striking example was the human Jak-STAT signalling pathway which makes use of a set of domains in a range of orders and orientations to provide nuanced signaling functionality. This illustrated the importance of functional and structural constraints (or lack thereof) on domain organisation.

Collapse

Sales-Pardo M, Chan AOB, Amaral LAN, Guimerà R. Evolution of protein families: is it possible to distinguish between domains of life? Gene 2007;402:81-93. [PMID: 17826006 PMCID: PMC2441766 DOI: 10.1016/j.gene.2007.07.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2007] [Revised: 07/18/2007] [Accepted: 07/23/2007] [Indexed: 11/28/2022]

Zhang Z, Liu C, Skogerbø G, Zhu X, Lu H, Chen L, Shi B, Zhang Y, Wang J, Wu T, Chen R. Dynamic changes in subgraph preference profiles of crucial transcription factors. PLoS Comput Biol 2006;2:e47. [PMID: 16699597 PMCID: PMC1458966 DOI: 10.1371/journal.pcbi.0020047] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2005] [Accepted: 03/24/2006] [Indexed: 12/05/2022] Open

Abstract

Transcription factors with a large number of target genes—transcription hub(s), or THub(s)—are usually crucial components of the regulatory system of a cell, and the different patterns through which they transfer the transcriptional signal to downstream cascades are of great interest. By profiling normalized abundances (A_N) of basic regulatory patterns of individual THubs in the yeast Saccharomyces cerevisiae transcriptional regulation network under five different cellular states and environmental conditions, we have investigated their preferences for different basic regulatory patterns. Subgraph-normalized abundances downstream of individual THubs often differ significantly from that of the network as a whole, and conversely, certain over-represented subgraphs are not preferred by any THub. The THub preferences changed substantially when the cellular or environmental conditions changed. This switching of regulatory pattern preferences suggests that a change in conditions does not only elicit a change in response by the regulatory network, but also a change in the mechanisms by which the response is mediated. The THub subgraph preference profile thus provides a novel tool for description of the structure and organization between the large-scale exponents and local regulatory patterns.

Transcription factors are proteins that bind to short segments of DNA, thereby controlling transcription and expression of other genes. Transcription factors may control a number of other genes, and in turn be controlled by other transcription factors, thus forming an extensive transcriptional network of control and counter-control, which acts through space and time in the cell. In transcriptional networks, transcription factors and their target genes form various patterns (called subgraphs or motifs) that are suspected of being of importance to how transcription factors exert their control of cellular processes. Zhang and colleagues have studied how a subset of transcription factors (called transcription hubs) utilizes such subgraphs in networks generated from yeast cells under various cellular states and environmental conditions. Their analyses show that different transcription hubs in the same network prefer different types of subgraphs, and that these preferences are not governed by subgraph frequencies in the network. They further show that when cellular conditions change, the transcription hubs frequently change their subgraph preferences, indicating that different modes of control require different types of subgraph use. These findings could have implications for our understanding of the mechanisms that underlie the fine-tuned control systems that govern a cell or an organism.

Collapse

Affiliation(s)

Zhihua Zhang Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China Graduate School of the Chinese Academy of Sciences, Beijing, China
Changning Liu Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China Graduate School of the Chinese Academy of Sciences, Beijing, China
Geir Skogerbø Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
Xiaopeng Zhu Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China Graduate School of the Chinese Academy of Sciences, Beijing, China
Hongchao Lu Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China Graduate School of the Chinese Academy of Sciences, Beijing, China
Lan Chen Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China Graduate School of the Chinese Academy of Sciences, Beijing, China
Baochen Shi Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China Graduate School of the Chinese Academy of Sciences, Beijing, China
Yong Zhang Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China Graduate School of the Chinese Academy of Sciences, Beijing, China
Jie Wang Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China Graduate School of the Chinese Academy of Sciences, Beijing, China
Tao Wu Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China Graduate School of the Chinese Academy of Sciences, Beijing, China
Runsheng Chen Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China * To whom correspondence should be addressed. E-mail:

Collapse

Bonomo J, Warnecke T, Hume P, Marizcurrena A, Gill RT. A comparative study of metabolic engineering anti-metabolite tolerance in Escherichia coli. Metab Eng 2006;8:227-39. [PMID: 16497527 DOI: 10.1016/j.ymben.2005.12.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2005] [Revised: 12/15/2005] [Accepted: 12/28/2005] [Indexed: 11/22/2022]

Price GA, Crooks GE, Green RE, Brenner SE. Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap. Bioinformatics 2005;21:3824-31. [PMID: 16105900 DOI: 10.1093/bioinformatics/bti627] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Li H, Pellegrini M, Eisenberg D. Detection of parallel functional modules by comparative analysis of genome sequences. Nat Biotechnol 2005;23:253-60. [PMID: 15696156 DOI: 10.1038/nbt1065] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Teichmann SA, Babu MM. Gene regulatory network growth by duplication. Nat Genet 2004;36:492-6. [PMID: 15107850 DOI: 10.1038/ng1340] [Citation(s) in RCA: 403] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2003] [Accepted: 03/01/2004] [Indexed: 11/09/2022]

Karev GP, Wolf YI, Rzhetsky AY, Berezovskaya FS, Koonin EV. Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol 2002;2:18. [PMID: 12379152 PMCID: PMC137606 DOI: 10.1186/1471-2148-2-18] [Citation(s) in RCA: 112] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2002] [Accepted: 10/14/2002] [Indexed: 11/17/2022] Open

Abstract

BACKGROUND

Power distributions appear in numerous biological, physical and other contexts, which appear to be fundamentally different. In biology, power laws have been claimed to describe the distributions of the connections of enzymes and metabolites in metabolic networks, the number of interactions partners of a given protein, the number of members in paralogous families, and other quantities. In network analysis, power laws imply evolution of the network with preferential attachment, i.e. a greater likelihood of nodes being added to pre-existing hubs. Exploration of different types of evolutionary models in an attempt to determine which of them lead to power law distributions has the potential of revealing non-trivial aspects of genome evolution.

RESULTS

A simple model of evolution of the domain composition of proteomes was developed, with the following elementary processes: i) domain birth (duplication with divergence), ii) death (inactivation and/or deletion), and iii) innovation (emergence from non-coding or non-globular sequences or acquisition via horizontal gene transfer). This formalism can be described as a birth, death and innovation model (BDIM). The formulas for equilibrium frequencies of domain families of different size and the total number of families at equilibrium are derived for a general BDIM. All asymptotics of equilibrium frequencies of domain families possible for the given type of models are found and their appearance depending on model parameters is investigated. It is proved that the power law asymptotics appears if, and only if, the model is balanced, i.e. domain duplication and deletion rates are asymptotically equal up to the second order. It is further proved that any power asymptotic with the degree not equal to -1 can appear only if the hypothesis of independence of the duplication/deletion rates on the size of a domain family is rejected. Specific cases of BDIMs, namely simple, linear, polynomial and rational models, are considered in details and the distributions of the equilibrium frequencies of domain families of different size are determined for each case. We apply the BDIM formalism to the analysis of the domain family size distributions in prokaryotic and eukaryotic proteomes and show an excellent fit between these empirical data and a particular form of the model, the second-order balanced linear BDIM. Calculation of the parameters of these models suggests surprisingly high innovation rates, comparable to the total domain birth (duplication) and elimination rates, particularly for prokaryotic genomes.

CONCLUSIONS

We show that a straightforward model of genome evolution, which does not explicitly include selection, is sufficient to explain the observed distributions of domain family sizes, in which power laws appear as asymptotic. However, for the model to be compatible with the data, there has to be a precise balance between domain birth, death and innovation rates, and this is likely to be maintained by selection. The developed approach is oriented at a mathematical description of evolution of domain composition of proteomes, but a simple reformulation could be applied to models of other evolving networks with preferential attachment.

Collapse

Mercereau-Puijalon O, Barale JC, Bischoff E. Three multigene families in Plasmodium parasites: facts and questions. Int J Parasitol 2002;32:1323-44. [PMID: 12350369 DOI: 10.1016/s0020-7519(02)00111-x] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Abstract

Multigene families optimise fitness by providing a set of related genes with possibly different temporal and/or topological expression patterns. We analyse here the structural organisation and sequence diversity of the rDNA, sera and var C Plasmodium falciparum families, and discuss their consequences for parasite biology. The low rDNA copy number, which reduces reshuffling, is probably the corollary of the need for functionally distinct rRNAs in the insect and in the vertebrate host. The unusual intra-genome and population rDNA sequence diversity results in cells equipped with mosaic ribosome sets. The functional constraints are such that ribosome compatibility could influence parasite fitness and contribute to population structuring. Unlike the dispersed rDNA units, the sera family is arranged as a tandem gene cluster, with seven contiguous similar genes, and one more distantly related paralog. We address the question of the inclusion criteria in family definition. We discuss the results concerning the SERA proteins expression and function in the context of the long overlooked multigene family. The var C module is shared by var genes, 'orphan' var C and var C pseudogenes. Analysis of 125 var C deduced protein sequences highlights a well-conserved framework, including putative phosphorylation sites, consistent with the proposed function of mediating interaction with cytoskeletal proteins. The 5' and 3' flanking sequences of the var C pseudogenes are heterogeneous. In contrast, the flanking sequences of the uninterrupted var C modules show remarkable conservation. This is interesting in view of the silencing activity of the var intronic sequence on var expression. The 5' flanking sequence dichotomy reported for internal and sub-telomeric var genes extends to the 3' flanking sequences. This has profound implications for transcription regulation and generation of diversity. The var C family suggests a role for pseudogenes as a diversity reservoir and in genome dynamics by promoting ectopic recombination.

Collapse

Das R, Junker J, Greenbaum D, Gerstein MB. Global perspectives on proteins: comparing genomes in terms of folds, pathways and beyond. THE PHARMACOGENOMICS JOURNAL 2002;1:115-25. [PMID: 11911438 DOI: 10.1038/sj.tpj.6500021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Orengo CA, Sillitoe I, Reeves G, Pearl FM. Review: what can structural classifications reveal about protein evolution? J Struct Biol 2001;134:145-65. [PMID: 11551176 DOI: 10.1006/jsbi.2001.4398] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Qian J, Stenger B, Wilson CA, Lin J, Jansen R, Teichmann SA, Park J, Krebs WG, Yu H, Alexandrov V, Echols N, Gerstein M. PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information. Nucleic Acids Res 2001;29:1750-64. [PMID: 11292848 PMCID: PMC31319 DOI: 10.1093/nar/29.8.1750] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2000] [Revised: 02/27/2001] [Accepted: 02/27/2001] [Indexed: 11/14/2022] Open

Abstract

As the number of protein folds is quite limited, a mode of analysis that will be increasingly common in the future, especially with the advent of structural genomics, is to survey and re-survey the finite parts list of folds from an expanding number of perspectives. We have developed a new resource, called PartsList, that lets one dynamically perform these comparative fold surveys. It is available on the web at http://bioinfo.mbb.yale.edu/partslist and http://www.partslist.org. The system is based on the existing fold classifications and functions as a form of companion annotation for them, providing 'global views' of many already completed fold surveys. The central idea in the system is that of comparison through ranking; PartsList will rank the approximately 420 folds based on more than 180 attributes. These include: (i) occurrence in a number of completely sequenced genomes (e.g. it will show the most common folds in the worm versus yeast); (ii) occurrence in the structure databank (e.g. most common folds in the PDB); (iii) both absolute and relative gene expression information (e.g. most changing folds in expression over the cell cycle); (iv) protein-protein interactions, based on experimental data in yeast and comprehensive PDB surveys (e.g. most interacting fold); (v) sensitivity to inserted transposons; (vi) the number of functions associated with the fold (e.g. most multi-functional folds); (vii) amino acid composition (e.g. most Cys-rich folds); (viii) protein motions (e.g. most mobile folds); and (ix) the level of similarity based on a comprehensive set of structural alignments (e.g. most structurally variable folds). The integration of whole-genome expression and protein-protein interaction data with structural information is a particularly novel feature of our system. We provide three ways of visualizing the rankings: a profiler emphasizing the progression of high and low ranks across many pre-selected attributes, a dynamic comparer for custom comparisons and a numerical rankings correlator. These allow one to directly compare very different attributes of a fold (e.g. expression level, genome occurrence and maximum motion) in the uniform numerical format of ranks. This uniform framework, in turn, highlights the way that the frequency of many of the attributes falls off with approximate power-law behavior (i.e. according to V(-b), for attribute value V and constant exponent b), with a few folds having large values and most having small values.

Collapse

Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV. Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res 2001;11:555-65. [PMID: 11282971 PMCID: PMC311027 DOI: 10.1101/gr.gr-1660r] [Citation(s) in RCA: 122] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Lineage-Specific Gene Expansions in Bacterial and Archaeal Genomes. Genome Res 2001. [DOI: 10.1101/gr.166001] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Yanai I, Camacho CJ, DeLisi C. Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification. PHYSICAL REVIEW LETTERS 2000;85:2641-2644. [PMID: 10978127 DOI: 10.1103/physrevlett.85.2641] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2000] [Indexed: 05/23/2023]

Kihara D, Kanehisa M. Tandem clusters of membrane proteins in complete genome sequences. Genome Res 2000;10:731-43. [PMID: 10854407 DOI: 10.1101/gr.10.6.731] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Bischoff E, Guillotte M, Mercereau-Puijalon O, Bonnefoy S. A member of the Plasmodium falciparum Pf60 multigene family codes for a nuclear protein expressed by readthrough of an internal stop codon. Mol Microbiol 2000;35:1005-16. [PMID: 10712683 DOI: 10.1046/j.1365-2958.2000.01788.x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D. A census of protein repeats. J Mol Biol 1999;293:151-60. [PMID: 10512723 DOI: 10.1006/jmbi.1999.3136] [Citation(s) in RCA: 316] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Pellegrini M, Marcotte EM, Yeates TO. A fast algorithm for genome-wide analysis of proteins with repeated sequences. Proteins 1999. [DOI: 10.1002/(sici)1097-0134(19990601)35:4<440::aid-prot7>3.0.co;2-y] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Teichmann SA, Chothia C, Gerstein M. Advances in structural genomics. Curr Opin Struct Biol 1999;9:390-9. [PMID: 10361097 DOI: 10.1016/s0959-440x(99)80053-0] [Citation(s) in RCA: 110] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Gerstein M. How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. FOLDING & DESIGN 1999;3:497-512. [PMID: 9889159 DOI: 10.1016/s1359-0278(98)00066-2] [Citation(s) in RCA: 100] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Abstract

BACKGROUND

Determining how representative the known structures are of the proteins encoded by a complete genome is important for assessing to what extent our current picture of protein stability and folding is overly influenced by biases in the structure databank (PDB). It is also important for improving database-based methods of structure prediction and genome annotation.

RESULTS

The known structures are compared to the proteins encoded by eight complete microbial genomes in terms of simple statistics such as sequence length, composition and secondary structure. The known structures are represented by a collection of nonhomologous domains from the PDB and a smaller list of 'biophysical proteins' on which folding experiments have concentrated. The proteins encoded by the genomes are considered as a whole and divided into various regions, such as known-structure homologue, low complexity (nonglobular), transmembrane or linker. Various tests are performed to assess the significance of the reported differences, in both a practical and a statistical sense.

CONCLUSIONS

The proteins encoded by the genomes are significantly different from those in the PDB. Their sequence lengths, which follow an extreme value distribution, are longer than the PDB proteins and much longer than the biophysical proteins. Their composition differs from the PDB proteins in having more Lys, Ile, Asn and Gln and less Cys and Trp. This is true overall and especially for the regions corresponding to soluble proteins of as yet unknown fold. Secondary-structure prediction on these uncharacterized regions indicates that they contain on average more helical structure than the PDB; differences about this mean are small, with yeast having slightly more sheet structure and Haemophilus influenzae and Helicobacter pylori more helical structure. Further information is available through the GeneCensus system at http://bioinfo.mbb.yale.edu/genome.

Collapse

Teichmann SA, Park J, Chothia C. Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc Natl Acad Sci U S A 1998;95:14658-63. [PMID: 9843945 PMCID: PMC24505 DOI: 10.1073/pnas.95.25.14658] [Citation(s) in RCA: 112] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Gerstein M. Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census. Proteins 1998;33:518-34. [PMID: 9849936 DOI: 10.1002/(sici)1097-0134(19981201)33:4<518::aid-prot5>3.0.co;2-j] [Citation(s) in RCA: 91] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Stephens RS, Kalman S, Lammel C, Fan J, Marathe R, Aravind L, Mitchell W, Olinger L, Tatusov RL, Zhao Q, Koonin EV, Davis RW. Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 1998;282:754-9. [PMID: 9784136 DOI: 10.1126/science.282.5389.754] [Citation(s) in RCA: 1133] [Impact Index Per Article: 43.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Fani R, Mori E, Tamburini E, Lazcano A. Evolution of the structure and chromosomal distribution of histidine biosynthetic genes. ORIGINS LIFE EVOL B 1998;28:555-70. [PMID: 9742729 DOI: 10.1023/a:1006531526299] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Gerstein M, Hegyi H. Comparing genomes in terms of protein structure: surveys of a finite parts list. FEMS Microbiol Rev 1998;22:277-304. [PMID: 10357579 DOI: 10.1111/j.1574-6976.1998.tb00371.x] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Abstract

We give an overview of the emerging field of structural genomics, describing how genomes can be compared in terms of protein structure. As the number of genes in a genome and the total number of protein folds are both quite limited, these comparisons take the form of surveys of a finite parts list, similar in respects to demographic censuses. Fold surveys have many similarities with other whole-genome characterizations, e.g., analyses of motifs or pathways. However, structure has a number of aspects that make it particularly suitable for comparing genomes, namely the way it allows for the precise definition of a basic protein module and the fact that it has a better defined relationship to sequence similarity than does protein function. An essential requirement for a structure survey is a library of folds, which groups the known structures into 'fold families.' This library can be built up automatically using a structure comparison program, and we described how important objective statistical measures are for assessing similarities within the library and between the library and genome sequences. After building the library, one can use it to count the number of folds in genomes, expressing the results in the form of Venn diagrams and 'top-10' statistics for shared and common folds. Depending on the counting methodology employed, these statistics can reflect different aspects of the genome, such as the amount of internal duplication or gene expression. Previous analyses have shown that the common folds shared between very different microorganisms, i.e., in different kingdoms, have a remarkably similar structure, being comprised of repeated strand-helix-strand super-secondary structure units. A major difficulty with this sort of 'fold-counting' is that only a small subset of the structures in a complete genome are currently known and this subset is prone to sampling bias. One way of overcoming biases is through structure prediction, which can be applied uniformly and comprehensively to a whole genome. Various investigators have, in fact, already applied many of the existing techniques for predicting secondary structure and transmembrane (TM) helices to the recently sequenced genomes. The results have been consistent: microbial genomes have similar fractions of strands and helices even though they have significantly different amino acid composition. The fraction of membrane proteins with a given number of TM helices falls off rapidly with more TM elements, approximately according to a Zipf law. This latter finding indicates that there is no preference for the highly studied 7-TM proteins in microbial genomes. Continuously updated tables and further information pertinent to this review are available over the web at http://bioinfo.mbb.yale.edu/genome.

Collapse

Nakatsu CH, Korona R, Lenski RE, de Bruijn FJ, Marsh TL, Forney LJ. Parallel and divergent genotypic evolution in experimental populations of Ralstonia sp. J Bacteriol 1998;180:4325-31. [PMID: 9721265 PMCID: PMC107437 DOI: 10.1128/jb.180.17.4325-4331.1998] [Citation(s) in RCA: 47] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Huynen M, Doerks T, Eisenhaber F, Orengo C, Sunyaev S, Yuan Y, Bork P. Homology-based fold predictions for Mycoplasma genitalium proteins. J Mol Biol 1998;280:323-6. [PMID: 9665839 DOI: 10.1006/jmbi.1998.1884] [Citation(s) in RCA: 88] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Koonin EV, Tatusov RL, Galperin MY. Beyond complete genomes: from sequence to structure and function. Curr Opin Struct Biol 1998;8:355-63. [PMID: 9666332 DOI: 10.1016/s0959-440x(98)80070-5] [Citation(s) in RCA: 114] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Ogata H, Goto S, Fujibuchi W, Kanehisa M. Computation with the KEGG pathway database. Biosystems 1998;47:119-28. [PMID: 9715755 DOI: 10.1016/s0303-2647(98)00017-3] [Citation(s) in RCA: 197] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Holm L. Unification of protein families. Curr Opin Struct Biol 1998;8:372-9. [PMID: 9666334 DOI: 10.1016/s0959-440x(98)80072-9] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Levitt M, Gerstein M. A unified statistical framework for sequence comparison and structure comparison. Proc Natl Acad Sci U S A 1998;95:5913-20. [PMID: 9600892 PMCID: PMC34495 DOI: 10.1073/pnas.95.11.5913] [Citation(s) in RCA: 232] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Huynen M, Dandekar T, Bork P. Differential genome analysis applied to the species-specific features of Helicobacter pylori. FEBS Lett 1998;426:1-5. [PMID: 9598967 DOI: 10.1016/s0014-5793(98)00276-2] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Gerstein M, Levitt M. Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins. Protein Sci 1998;7:445-56. [PMID: 9521122 PMCID: PMC2143933 DOI: 10.1002/pro.5560070226] [Citation(s) in RCA: 157] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Abstract

We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.

Collapse

Gerstein M. A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J Mol Biol 1997;274:562-76. [PMID: 9417935 DOI: 10.1006/jmbi.1997.1412] [Citation(s) in RCA: 124] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Abstract

Representative genomes from each of the three kingdoms of life are compared in terms of protein structure, in particular, those of Haemophilus influenzae (a bacteria), Methanococcus jannaschii (an archaeon), and yeast (a eukaryote). The comparison is in the form of a census (or comprehensive accounting) of the relative occurrence of secondary and tertiary structures in the genomes, which particular emphasis on patterns of supersecondary structure. Comparison of secondary structure shows that the three genomes have nearly the same overall secondary-structure content, although they differ markedly in amino acid composition. Comparison of super-secondary structure, using a novel "frequent-words" approach, shows that yeast has a preponderance of consecutive strands (e.g. beta-beta-beta patterns), Haemophilus, consecutive helices (alpha-alpha-alpha), and Methanococcus, alternating helix-strand structures (beta-alpha-beta). Yeast also has significantly more helical membrane proteins than the other two genomes, with most of the differences concentrated in proteins containing two transmembrane segments. Comparison of tertiary structure (by sequence matching and domain-level clustering) highlights the substantial duplication in each genome (approximately 30% to 50%), with the degree of duplication following similar patterns in all three. Many sequence families are shared among the genomes, with the degree of overlap between any two genomes being roughly similar. In total, the three genomes contain 148 of the approximately 300 known protein folds. Forty-five of these 148 that are present in all three genomes are especially enriched in mixed super-secondary structures (alpha/beta). Moreover, the five most common of these 45 (the "top-5") have a remarkably similar super-secondary structure architecture, containing a central sheet of parallel strands with helices packed onto at least one face and beta-alpha-beta connections between adjacent strands. These most basic molecular parts, which, presumably, were present in the last common ancestor to the three Kingdoms, include the TIM-barrel, Rossmann, flavodoxin, thiamin-binding, and P-loop-hydrolase folds.

Collapse

Koonin EV, Galperin MY. Prokaryotic genomes: the emerging paradigm of genome-based microbiology. Curr Opin Genet Dev 1997;7:757-63. [PMID: 9468784 DOI: 10.1016/s0959-437x(97)80037-8] [Citation(s) in RCA: 110] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]