1
|
Abstract
AbstractMicrosatellites or simple sequence repeats (SSRs) are among the genetic markers most widely utilized in research. This includes applications in numerous fields such as genetic conservation, paternity testing, and molecular breeding. Though ordered draft genome assemblies of camels have been announced, including for the Arabian camel, systemic analysis of camel SSRs is still limited. The identification and development of informative and robust molecular SSR markers are essential for marker assisted breeding programs and paternity testing. Here we searched and compared perfect SSRs with 1–6 bp nucleotide motifs to characterize microsatellites for draft genome sequences of the Camelidae. We analyzed and compared the occurrence, relative abundance, relative density, and guanine-cytosine (GC) content in four taxonomically different camelid species: Camelus dromedarius, C. bactrianus, C. ferus, and Vicugna pacos. A total of 546762, 544494, 547974, and 437815 SSRs were mined, respectively. Mononucleotide SSRs were the most frequent in the four genomes, followed in descending order by di-, tetra-, tri-, penta-, and hexanucleotide SSRs. GC content was highest in dinucleotide SSRs and lowest in mononucleotide SSRs. Our results provide further evidence that SSRs are more abundant in noncoding regions than in coding regions. Similar distributions of microsatellites were found in all four species, which indicates that the pattern of microsatellites is conserved in family Camelidae.
Collapse
|
2
|
Greenlip Abalone ( Haliotis laevigata) Genome and Protein Analysis Provides Insights into Maturation and Spawning. G3-GENES GENOMES GENETICS 2019; 9:3067-3078. [PMID: 31413154 PMCID: PMC6778792 DOI: 10.1534/g3.119.400388] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Wild abalone (Family Haliotidae) populations have been severely affected by commercial fishing, poaching, anthropogenic pollution, environment and climate changes. These issues have stimulated an increase in aquaculture production; however production growth has been slow due to a lack of genetic knowledge and resources. We have sequenced a draft genome for the commercially important temperate Australian ‘greenlip’ abalone (Haliotis laevigata, Donovan 1808) and generated 11 tissue transcriptomes from a female adult abalone. Phylogenetic analysis of the greenlip abalone with reference to the Pacific abalone (Haliotis discus hannai) indicates that these abalone species diverged approximately 71 million years ago. This study presents an in-depth analysis into the features of reproductive dysfunction, where we provide the putative biochemical messenger components (neuropeptides) that may regulate reproduction including gonad maturation and spawning. Indeed, we isolate the egg-laying hormone neuropeptide and under trial conditions induce spawning at 80% efficiency. Altogether, we provide a solid platform for further studies aimed at stimulating advances in abalone aquaculture production. The H. laevigata genome and resources are made available to the public on the abalone ‘omics website, http://abalonedb.org.
Collapse
|
3
|
Abstract
Many publicly available data repositories and resources have been developed to support protein-related information management, data-driven hypothesis generation, and biological knowledge discovery. To help researchers quickly find the appropriate protein-related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era.
Collapse
Affiliation(s)
- Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA.
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
- Protein Information Resource, Department of Biochemistry and Molecular and Cellular Biology, Georgetown University Medical Center, Washington, DC, 20007, USA
| |
Collapse
|
4
|
Alexeyenko A, Lindberg J, Pérez-Bercoff A, Sonnhammer ELL. Overview and comparison of ortholog databases. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 3:137-43. [PMID: 24980400 DOI: 10.1016/j.ddtec.2006.06.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Orthologs are an indispensable bridge to transfer biological knowledge between species, from protein annotations to sophisticated disease models. However, orthology assignment is not trivial. A large number of resources now exist, each with its own idiosyncrasies. The goal of this review is to compare their contents and clarify which database is most suited for a certain task.:
Collapse
Affiliation(s)
- Andrey Alexeyenko
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Julia Lindberg
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Asa Pérez-Bercoff
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91, Stockholm, Sweden.
| |
Collapse
|
5
|
Liu G, Zou Y, Cheng Q, Zeng Y, Gu X, Su Z. Age distribution patterns of human gene families: divergent for Gene Ontology categories and concordant between different subcellular localizations. Mol Genet Genomics 2013; 289:137-47. [PMID: 24322347 DOI: 10.1007/s00438-013-0799-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 12/03/2013] [Indexed: 12/13/2022]
Abstract
The age distribution of gene duplication events within the human genome exhibits two waves of duplications along with an ancient component. However, because of functional constraint differences, genes in different functional categories might show dissimilar retention patterns after duplication. It is known that genes in some functional categories are highly duplicated in the early stage of vertebrate evolution. However, the correlations of the age distribution pattern of gene duplication between the different functional categories are still unknown. To investigate this issue, we developed a robust pipeline to date the gene duplication events in the human genome. We successfully estimated about three-quarters of the duplication events within the human genome, along with the age distribution pattern in each Gene Ontology (GO) slim category. We found that some GO slim categories show different distribution patterns when compared to the whole genome. Further hierarchical clustering of the GO slim functional categories enabled grouping into two main clusters. We found that human genes located in the duplicated copy number variant regions, whose duplicate genes have not been fixed in the human population, were mainly enriched in the groups with a high proportion of recently duplicated genes. Moreover, we used a phylogenetic tree-based method to date the age of duplications in three signaling-related gene superfamilies: transcription factors, protein kinases and G-protein coupled receptors. These superfamilies were expressed in different subcellular localizations. They showed a similar age distribution as the signaling-related GO slim categories. We also compared the differences between the age distributions of gene duplications in multiple subcellular localizations. We found that the distribution patterns of the major subcellular localizations were similar to that of the whole genome. This study revealed the whole picture of the evolution patterns of gene functional categories in the human genome.
Collapse
Affiliation(s)
- Gangbiao Liu
- State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Biology Building II 113, Shanghai, 200433, China
| | | | | | | | | | | |
Collapse
|
6
|
Haggerty LS, Jachiet PA, Hanage WP, Fitzpatrick DA, Lopez P, O'Connell MJ, Pisani D, Wilkinson M, Bapteste E, McInerney JO. A pluralistic account of homology: adapting the models to the data. Mol Biol Evol 2013; 31:501-16. [PMID: 24273322 PMCID: PMC3935183 DOI: 10.1093/molbev/mst228] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Defining homologous genes is important in many evolutionary studies but raises obvious issues. Some of these issues are conceptual and stem from our assumptions of how a gene evolves, others are practical, and depend on the algorithmic decisions implemented in existing software. Therefore, to make progress in the study of homology, both ontological and epistemological questions must be considered. In particular, defining homologous genes cannot be solely addressed under the classic assumptions of strong tree thinking, according to which genes evolve in a strictly tree-like fashion of vertical descent and divergence and the problems of homology detection are primarily methodological. Gene homology could also be considered under a different perspective where genes evolve as “public goods,” subjected to various introgressive processes. In this latter case, defining homologous genes becomes a matter of designing models suited to the actual complexity of the data and how such complexity arises, rather than trying to fit genetic data to some a priori tree-like evolutionary model, a practice that inevitably results in the loss of much information. Here we show how important aspects of the problems raised by homology detection methods can be overcome when even more fundamental roots of these problems are addressed by analyzing public goods thinking evolutionary processes through which genes have frequently originated. This kind of thinking acknowledges distinct types of homologs, characterized by distinct patterns, in phylogenetic and nonphylogenetic unrooted or multirooted networks. In addition, we define “family resemblances” to include genes that are related through intermediate relatives, thereby placing notions of homology in the broader context of evolutionary relationships. We conclude by presenting some payoffs of adopting such a pluralistic account of homology and family relationship, which expands the scope of evolutionary analyses beyond the traditional, yet relatively narrow focus allowed by a strong tree-thinking view on gene evolution.
Collapse
Affiliation(s)
- Leanne S Haggerty
- Bioinformatics and Molecular Evolution Unit, Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Significance of population size on the fixation of nonsynonymous mutations in genes under varying levels of selection pressure. Genetics 2013; 193:995-1002. [PMID: 23307899 DOI: 10.1534/genetics.112.147900] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Previous studies observed a higher ratio of divergences at nonsynonymous and synonymous sites (ω = dN/dS) in species with a small population size compared to that estimated for those with a large population size. Here we examined the theoretical relationship between ω, effective population size (Ne), and selection coefficient (s). Our analysis revealed that when purifying selection is high, ω of species with small Ne is much higher than that of species with large Ne. However the difference between the two ω reduces with the decline in selection pressure (s → 0). We examined this relationship using primate and rodent genes and found that the ω estimated for highly constrained genes of primates was up to 2.9 times higher than that obtained for their orthologous rodent genes. Conversely, for genes under weak purifying selection the ω of primates was only 17% higher than that of rodents. When tissue specificity was used as a proxy for selection pressure we found that the ω of broadly expressed genes of primates was up to 2.1-fold higher than that of their rodent counterparts and this difference was only 27% for tissue specific genes. Since most of the nonsynonymous mutations in constrained or broadly expressed genes are deleterious, fixation of these mutations is influenced by Ne. This results in a higher ω of these genes in primates compared to those from rodents. Conversely, the majority of nonsynonymous mutations in less-constrained or tissue-specific genes are neutral or nearly neutral and therefore fixation of them is largely independent of Ne, which leads to the similarity of ω in primates and rodents.
Collapse
|
8
|
Guo B, Zou M, Wagner A. Pervasive indels and their evolutionary dynamics after the fish-specific genome duplication. Mol Biol Evol 2012; 29:3005-22. [PMID: 22490820 DOI: 10.1093/molbev/mss108] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Insertions and deletions (indels) in protein-coding genes are important sources of genetic variation. Their role in creating new proteins may be especially important after gene duplication. However, little is known about how indels affect the divergence of duplicate genes. We here study thousands of duplicate genes in five fish (teleost) species with completely sequenced genomes. The ancestor of these species has been subject to a fish-specific genome duplication (FSGD) event that occurred approximately 350 Ma. We find that duplicate genes contain at least 25% more indels than single-copy genes. These indels accumulated preferentially in the first 40 my after the FSGD. A lack of widespread asymmetric indel accumulation indicates that both members of a duplicate gene pair typically experience relaxed selection. Strikingly, we observe a 30-80% excess of deletions over insertions that is consistent for indels of various lengths and across the five genomes. We also find that indels preferentially accumulate inside loop regions of protein secondary structure and in regions where amino acids are exposed to solvent. We show that duplicate genes with high indel density also show high DNA sequence divergence. Indel density, but not amino acid divergence, can explain a large proportion of the tertiary structure divergence between proteins encoded by duplicate genes. Our observations are consistent across all five fish species. Taken together, they suggest a general pattern of duplicate gene evolution in which indels are important driving forces of evolutionary change.
Collapse
Affiliation(s)
- Baocheng Guo
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | | | | |
Collapse
|
9
|
Raimondi S, Barbarini N, Mangione P, Esposito G, Ricagno S, Bolognesi M, Zorzoli I, Marchese L, Soria C, Bellazzi R, Monti M, Stoppini M, Stefanelli M, Magni P, Bellotti V. The two tryptophans of β2-microglobulin have distinct roles in function and folding and might represent two independent responses to evolutionary pressure. BMC Evol Biol 2011; 11:159. [PMID: 21663612 PMCID: PMC3124429 DOI: 10.1186/1471-2148-11-159] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2011] [Accepted: 06/10/2011] [Indexed: 01/06/2023] Open
Abstract
Background We have recently discovered that the two tryptophans of human β2-microglobulin have distinctive roles within the structure and function of the protein. Deeply buried in the core, Trp95 is essential for folding stability, whereas Trp60, which is solvent-exposed, plays a crucial role in promoting the binding of β2-microglobulin to the heavy chain of the class I major histocompatibility complex (MHCI). We have previously shown that the thermodynamic disadvantage of having Trp60 exposed on the surface is counter-balanced by the perfect fit between it and a cavity within the MHCI heavy chain that contributes significantly to the functional stabilization of the MHCI. Therefore, based on the peculiar differences of the two tryptophans, we have analysed the evolution of β2-microglobulin with respect to these residues. Results Having defined the β2-microglobulin protein family, we performed multiple sequence alignments and analysed the residue conservation in homologous proteins to generate a phylogenetic tree. Our results indicate that Trp60 is highly conserved, whereas some species have a Leu in position 95; the replacement of Trp95 with Leu destabilizes β2-microglobulin by 1 kcal/mol and accelerates the kinetics of unfolding. Both thermodynamic and kinetic data fit with the crystallographic structure of the Trp95Leu variant, which shows how the hydrophobic cavity of the wild-type protein is completely occupied by Trp95, but is only half filled by Leu95. Conclusions We have established that the functional Trp60 has been present within the sequence of β2-microglobulin since the evolutionary appearance of proteins responsible for acquired immunity, whereas the structural Trp95 was selected and stabilized, most likely, for its capacity to fully occupy an internal cavity of the protein thereby creating a better stabilization of its folded state.
Collapse
Affiliation(s)
- Sara Raimondi
- Department of Biochemistry, University of Pavia, via Taramelli 3b, 27100 Pavia, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Subramanian S. Fixation of deleterious mutations at critical positions in human proteins. Mol Biol Evol 2011; 28:2687-93. [PMID: 21498603 DOI: 10.1093/molbev/msr097] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Deleterious mutations associated with human diseases are predominantly found in conserved positions and positions that are essential for the structure and/or function of proteins. However, these mutations are purged from the human population over time and prevented from being fixed. Contrary to this belief, here I show that high proportions of deleterious amino acid changing mutations are fixed at positions critical for the structure and/or function of proteins. Similarly, a high rate of fixation of deleterious mutations was observed in slow-evolving amino acid positions of human proteins. The fraction of deleterious substitutions was found to be two times higher in relatively conserved amino acid positions than in highly variable positions. This study also found fixation of a much higher proportion of radical amino acid changes in primates compared with rodents and artiodactyls in slow-evolving positions. Previous studies observed a higher proportion of nonsynonymous substitutions in humans compared with other mammals, which was taken as indirect evidence for the fixation of deleterious mutations in humans. However, the results of this investigation provide direct evidence for this prediction by suggesting that the excess nonsynonymous mutations fixed in humans are indeed deleterious in nature. Furthermore, these results suggest that studies on disease-associated mutations should consider that a significant fraction of such deleterious mutations has already been fixed in the human genome, and thus, the effects of new mutations at those amino acid positions may not necessarily be deleterious and might even result in reversion to benign phenotypes.
Collapse
Affiliation(s)
- Sankar Subramanian
- Griffith School of Environment, Griffith University, Nathan, Queensland, Australia.
| |
Collapse
|
11
|
Yu S, Song Z, Luo J, Dai Y, Li N. Over-expression of RAD51 or RAD54 but not RAD51/4 enhances extra-chromosomal homologous recombination in the human sarcoma (HT-1080) cell line. J Biotechnol 2011; 154:21-4. [PMID: 21501635 DOI: 10.1016/j.jbiotec.2011.03.023] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Revised: 03/28/2011] [Accepted: 03/29/2011] [Indexed: 10/18/2022]
Abstract
RAD51 and RAD54, members of the RAD52 epistasis group, play key roles in homologous recombination (HR). The efficiency of homologous recombination (HR) can be increased by over-expression of either of them. A vector that allows co-expression of RAD51 and RAD54 was constructed to investigate interactions between the two proteins during extra-chromosomal HR. The efficiency of extra-chromosomal HR evaluated by GFP extra-chromosomal HR was enhanced (110-245%) in different transfected Human sarcoma (HT-1080) cell colonies. We observed that RAD51 clearly promotes extra-chromosomal HR; however, the actions of RAD54 in extra-chromosomal HR were weak. Our data suggest that RAD51 may function as a universal factor during HR, whereas RAD54 mainly functions in other types of HR (gene targeting or intra-chromosomal HR), which involves interaction with chromosomal DNA.
Collapse
Affiliation(s)
- Shengli Yu
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, People's Republic of China.
| | | | | | | | | |
Collapse
|
12
|
Subramanian S, Huynen L, Millar CD, Lambert DM. Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi. BMC Evol Biol 2010; 10:387. [PMID: 21156082 PMCID: PMC3009673 DOI: 10.1186/1471-2148-10-387] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2010] [Accepted: 12/15/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. RESULTS Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. CONCLUSIONS The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Collapse
Affiliation(s)
- Sankar Subramanian
- Griffith School of Environment and the School of Biomolecular and Physical Sciences, Griffith University, 170 Kessels Road, Nathan, Qld 4111 Australia
- Allan Wilson Centre for Molecular Ecology and Evolution, Institute of Molecular BioSciences, Massey University, Auckland, New Zealand
| | - Leon Huynen
- Griffith School of Environment and the School of Biomolecular and Physical Sciences, Griffith University, 170 Kessels Road, Nathan, Qld 4111 Australia
- Allan Wilson Centre for Molecular Ecology and Evolution, Institute of Molecular BioSciences, Massey University, Auckland, New Zealand
| | - Craig D Millar
- Allan Wilson Centre for Molecular Ecology and Evolution, School of Biological Sciences, University of Auckland, Private Bag 92019, Auckland, New Zealand
| | - David M Lambert
- Griffith School of Environment and the School of Biomolecular and Physical Sciences, Griffith University, 170 Kessels Road, Nathan, Qld 4111 Australia
- Allan Wilson Centre for Molecular Ecology and Evolution, Institute of Molecular BioSciences, Massey University, Auckland, New Zealand
| |
Collapse
|
13
|
Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV. OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011. Nucleic Acids Res 2010; 39:D283-8. [PMID: 20972218 PMCID: PMC3013786 DOI: 10.1093/nar/gkq930] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The concept of homology drives speculation on a gene's function in any given species when its biological roles in other species are characterized. With reference to a specific species radiation homologous relations define orthologs, i.e. descendants from a single gene of the ancestor. The large-scale delineation of gene genealogies is a challenging task, and the numerous approaches to the problem reflect the importance of the concept of orthology as a cornerstone for comparative studies. Here, we present the updated OrthoDB catalog of eukaryotic orthologs delineated at each radiation of the species phylogeny in an explicitly hierarchical manner of over 100 species of vertebrates, arthropods and fungi (including the metazoa level). New database features include functional annotations, and quantification of evolutionary divergence and relations among orthologous groups. The interface features extended phyletic profile querying and enhanced text-based searches. The ever-increasing sampling of sequenced eukaryotic genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes. Furthermore, uniform analysis across lineages as different as vertebrates, arthropods and fungi with divergence levels varying from several to hundreds of millions of years will provide essential data for uncovering and quantifying long-term trends of gene evolution. OrthoDB is freely accessible from http://cegg.unige.ch/orthodb.
Collapse
Affiliation(s)
- Robert M Waterhouse
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | | | | | | | | |
Collapse
|
14
|
McDonald LA, Gerrelli D, Fok Y, Hurst LD, Tickle C. Comparison of Iroquois gene expression in limbs/fins of vertebrate embryos. J Anat 2010; 216:683-91. [PMID: 20408909 PMCID: PMC2952381 DOI: 10.1111/j.1469-7580.2010.01233.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2010] [Indexed: 11/30/2022] Open
Abstract
In Drosophila, Iroquois (Irx) genes have various functions including the specification of the identity of wing veins. Vertebrate Iroquois (Irx) genes have been reported to be expressed in the developing digits of mouse limbs. Here we carry out a phylogenetic analysis of vertebrate Irx genes and compare expression in developing limbs of mouse, chick and human embryos and in zebrafish pectoral fin buds. We confirm that the six Irx gene families in vertebrates are well defined and that Clusters A and B are duplicates; in contrast, Irx1 and 3, Irx2 and 5, and Irx4 and 6 are paralogs. All Irx genes in mouse and chick are expressed in developing limbs. Detailed comparison of the expression patterns in mouse and chick shows that expression patterns of genes in the same cluster are generally similar but paralogous genes have different expression patterns. Mouse and chick Irx1 are expressed in digit condensations, whereas mouse and chick Irx6 are expressed interdigitally. The timing of Irx1 expression in individual digits in mouse and chick is different. Irx1 is also expressed in digit condensations in developing human limbs, thus showing conservation of expression of this gene in higher vertebrates. In zebrafish, Irx genes of all but six of the families are expressed in early stage pectoral fin buds but not at later stages, suggesting that these genes are not involved in patterning distal structures in zebrafish fins.
Collapse
Affiliation(s)
- Laura A McDonald
- Department of Biology & Biochemistry, University of BathSomerset, UK
| | - Dianne Gerrelli
- Human Developmental Biology Resource, Neural Development Unit, UCL Institute of Child HealthLondon, UK
| | - Yvonne Fok
- Human Developmental Biology Resource, Neural Development Unit, UCL Institute of Child HealthLondon, UK
| | - Laurence D Hurst
- Department of Biology & Biochemistry, University of BathSomerset, UK
| | - Cheryll Tickle
- Department of Biology & Biochemistry, University of BathSomerset, UK
| |
Collapse
|
15
|
Schreiber F, Pick K, Erpenbeck D, Wörheide G, Morgenstern B. OrthoSelect: a protocol for selecting orthologous groups in phylogenomics. BMC Bioinformatics 2009; 10:219. [PMID: 19607672 PMCID: PMC2719630 DOI: 10.1186/1471-2105-10-219] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 07/16/2009] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Phylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of orthologous sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically. RESULTS We developed a flexible and user-friendly software pipeline, running on desktop machines or computer clusters, that constructs data sets for phylogenomic analyses. It automatically searches assembled EST sequences against databases of orthologous groups (OG), assigns ESTs to these predefined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified orthologous sequences and offers the possibility to further process this alignment in a last step by excluding potentially homoplastic sites and selecting sufficiently conserved parts. Our software pipeline can be used as it is, but it can also be adapted by integrating additional external programs. This makes the pipeline useful for non-bioinformaticians as well as to bioinformatic experts. The software pipeline is especially designed for ESTs, but it can also handle protein sequences. CONCLUSION OrthoSelect is a tool that produces orthologous gene alignments from assembled ESTs. Our tests show that OrthoSelect detects orthologs in EST libraries with high accuracy. In the absence of a gold standard for orthology prediction, we compared predictions by OrthoSelect to a manually created and published phylogenomic data set. Our tool was not only able to rebuild the data set with a specificity of 98%, but it detected four percent more orthologous sequences. Furthermore, the results OrthoSelect produces are in absolut agreement with the results of other programs, but our tool offers a significant speedup and additional functionality, e.g. handling of ESTs, computing sequence alignments, and refining them. To our knowledge, there is currently no fully automated and freely available tool for this purpose. Thus, OrthoSelect is a valuable tool for researchers in the field of phylogenomics who deal with large quantities of EST sequences. OrthoSelect is written in Perl and runs on Linux/Mac OS X. The tool can be downloaded at (http://gobics.de/fabian/orthoselect.php).
Collapse
Affiliation(s)
- Fabian Schreiber
- Abteilung Bioinformatik, Institut für Mikrobiologie und Genetik, Georg-August-Universität Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
- Department für Geo- und Umweltwissenschaften, Ludwig-Maximilians-Universität, Richard-Wagner-Str. 10, 80333 München, Germany
| | - Kerstin Pick
- Department für Geo- und Umweltwissenschaften, Ludwig-Maximilians-Universität, Richard-Wagner-Str. 10, 80333 München, Germany
| | - Dirk Erpenbeck
- Department für Geo- und Umweltwissenschaften, Ludwig-Maximilians-Universität, Richard-Wagner-Str. 10, 80333 München, Germany
| | - Gert Wörheide
- Department für Geo- und Umweltwissenschaften, Ludwig-Maximilians-Universität, Richard-Wagner-Str. 10, 80333 München, Germany
| | - Burkhard Morgenstern
- Abteilung Bioinformatik, Institut für Mikrobiologie und Genetik, Georg-August-Universität Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
| |
Collapse
|
16
|
Huang Y, Zheng Y, Su Z, Gu X. Differences in duplication age distributions between human GPCRs and their downstream genes from a network prospective. BMC Genomics 2009; 10 Suppl 1:S14. [PMID: 19594873 PMCID: PMC2709257 DOI: 10.1186/1471-2164-10-s1-s14] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND How gene duplication has influenced the evolution of gene networks is one of the core problems in evolution. Current duplication-divergence theories generally suggested that genes on the periphery of the networks were preferentially retained after gene duplication. However, previous studies were mostly based on gene networks in invertebrate species, and they had the inherent shortcoming of not being able to provide information on how the duplication-divergence process proceeded along the time axis during major speciation events. RESULTS In this study, we constructed a model system consisting of human G protein-coupled receptors (GPCRs) and their downstream genes in the GPCR pathways. These two groups of genes offered a natural partition of genes in the peripheral and the backbone layers of the network. Analysis of the age distributions of the duplication events in human GPCRs and "downstream genes" gene families indicated that they both experienced an explosive expansion at the time of early vertebrate emergence. However, we found only GPCR families saw a continued expansion after early vertebrates, mostly prominently in several small subfamilies of GPCRs involved in immune responses and sensory responses. CONCLUSION In general, in the human GPCR model system, we found that the position of a gene in the gene networks has significant influences on the likelihood of fixation of its duplicates. However, for a super gene family, the influence was not uniform among subfamilies. For super families, such as GPCRs, whose gene basis of expression diversity was well established at early vertebrates, continued expansions were mostly prominent in particular small subfamilies mainly involved in lineage-specific functions.
Collapse
Affiliation(s)
- Yong Huang
- Department of Genetics, Development, and Cell Biology, Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA.
| | | | | | | |
Collapse
|
17
|
Penel S, Arigon AM, Dufayard JF, Sertier AS, Daubin V, Duret L, Gouy M, Perrière G. Databases of homologous gene families for comparative genomics. BMC Bioinformatics 2009; 10 Suppl 6:S3. [PMID: 19534752 PMCID: PMC2697650 DOI: 10.1186/1471-2105-10-s6-s3] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable. Methods We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster. Results Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at .
Collapse
Affiliation(s)
- Simon Penel
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, Université Claude Bernard - Lyon 1, 43 bd, du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Ranwez V, Clairon N, Delsuc F, Pourali S, Auberval N, Diser S, Berry V. PhyloExplorer: a web server to validate, explore and query phylogenetic trees. BMC Evol Biol 2009; 9:108. [PMID: 19450253 PMCID: PMC2695458 DOI: 10.1186/1471-2148-9-108] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2008] [Accepted: 05/18/2009] [Indexed: 11/11/2022] Open
Abstract
Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: .
Collapse
Affiliation(s)
- Vincent Ranwez
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II, Place E, Bataillon - 34095 Montpellier Cedex 05, France.
| | | | | | | | | | | | | |
Collapse
|
19
|
Studer RA, Robinson-Rechavi M. Large-Scale Analyses of Positive Selection Using Codon Models. Evol Biol 2009. [DOI: 10.1007/978-3-642-00952-5_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
20
|
Roux J, Robinson-Rechavi M. Developmental constraints on vertebrate genome evolution. PLoS Genet 2008; 4:e1000311. [PMID: 19096706 PMCID: PMC2600815 DOI: 10.1371/journal.pgen.1000311] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2008] [Accepted: 11/17/2008] [Indexed: 12/17/2022] Open
Abstract
Constraints in embryonic development are thought to bias the direction of evolution by making some changes less likely, and others more likely, depending on their consequences on ontogeny. Here, we characterize the constraints acting on genome evolution in vertebrates. We used gene expression data from two vertebrates: zebrafish, using a microarray experiment spanning 14 stages of development, and mouse, using EST counts for 26 stages of development. We show that, in both species, genes expressed early in development (1) have a more dramatic effect of knock-out or mutation and (2) are more likely to revert to single copy after whole genome duplication, relative to genes expressed late. This supports high constraints on early stages of vertebrate development, making them less open to innovations (gene gain or gene loss). Results are robust to different sources of data—gene expression from microarrays, ESTs, or in situ hybridizations; and mutants from directed KO, transgenic insertions, point mutations, or morpholinos. We determine the pattern of these constraints, which differs from the model used to describe vertebrate morphological conservation (“hourglass” model). While morphological constraints reach a maximum at mid-development (the “phylotypic” stage), genomic constraints appear to decrease in a monotonous manner over developmental time. Because embryonic development must proceed correctly for an animal to survive, changes in evolution are constrained according to their effects on development. Changes that disrupt development too dramatically are thus rare in evolution. While this has been long observed at the morphological level, it has been more difficult to characterize the impact of such constraints on the genome. In this study, we investigate the effect of gene expression over vertebrate developmental time (from early to late development) on two main features: the gravity of mutation effects (i.e., is removal of the gene lethal?) and the propensity of the gene to remain in double copy after a duplication. We see that both features are consistent, in both zebrafish and mouse, in indicating a strong effect of constraints, which are progressively weaker towards late development, in early development on the genome.
Collapse
Affiliation(s)
- Julien Roux
- Université de Lausanne, Département d'Ecologie et d'Evolution, Quartier Sorge, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Université de Lausanne, Département d'Ecologie et d'Evolution, Quartier Sorge, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
21
|
Lacroix V, Cottret L, Thébault P, Sagot MF. An introduction to metabolic networks and their structural analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:594-617. [PMID: 18989046 DOI: 10.1109/tcbb.2008.79] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
There has been a renewed interest for metabolism in the computational biology community, leading to an avalanche of papers coming from methodological network analysis as well as experimental and theoretical biology. This paper is meant to serve as an initial guide for both the biologists interested in formal approaches and the mathematicians or computer scientists wishing to inject more realism into their models. The paper is focused on the structural aspects of metabolism only. The literature is vast enough already, and the thread through it difficult to follow even for the more experienced worker in the field. We explain methods for acquiring data and reconstructing metabolic networks, and review the various models that have been used for their structural analysis. Several concepts such as modularity are introduced, as are the controversies that have beset the field these past few years, for instance, on whether metabolic networks are small-world or scale-free, and on which model better explains the evolution of metabolism. Clarifying the work that has been done also helps in identifying open questions and in proposing relevant future directions in the field, which we do along the paper and in the conclusion.
Collapse
Affiliation(s)
- Vincent Lacroix
- Genome Bioinformatics Research Group, Centre de Regulacio Genomica (CRG), PRBB, Aiguader 88, 08003 Barcelona, Spain.
| | | | | | | |
Collapse
|
22
|
The quest for orthologs: finding the corresponding gene across genomes. Trends Genet 2008; 24:539-51. [PMID: 18819722 DOI: 10.1016/j.tig.2008.08.009] [Citation(s) in RCA: 181] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Revised: 08/20/2008] [Accepted: 08/21/2008] [Indexed: 11/23/2022]
Abstract
Orthology is a key evolutionary concept in many areas of genomic research. It provides a framework for subjects as diverse as the evolution of genomes, gene functions, cellular networks and functional genome annotation. Although orthologous proteins usually perform equivalent functions in different species, establishing true orthologous relationships requires a phylogenetic approach, which combines both trees and graphs (networks) using reliable species phylogeny and available genomic data from more than two species, and an insight into the processes of molecular evolution. Here, we evaluate the available bioinformatics tools and provide a set of guidelines to aid researchers in choosing the most appropriate tool for any situation.
Collapse
|
23
|
Liberles DA, Dittmar K. Characterizing gene family evolution. Biol Proced Online 2008; 10:66-73. [PMID: 19461954 PMCID: PMC2683547 DOI: 10.1251/bpo144] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Revised: 03/17/2008] [Accepted: 04/07/2008] [Indexed: 11/23/2022] Open
Abstract
Gene families are widely used in comparative genomics, molecular evolution, and in systematics. However, they are constructed in different manners, their data analyzed and interpreted differently, with different underlying assumptions, leading to sometimes divergent conclusions. In systematics, concepts like monophyly and the dichotomy between homoplasy and homology have been central to the analysis of phylogenies. We critique the traditional use of such concepts as applied to gene families and give examples of incorrect inferences they may lead to. Operational definitions that have emerged within functional genomics are contrasted with the common formal definitions derived from systematics. Lastly, we question the utility of layers of homology and the meaning of homology at the character state level in the context of sequence evolution. From this, we move forward to present an idealized strategy for characterizing gene family evolution for both systematic and functional purposes, including recent methodological improvements.
Collapse
|
24
|
Studer RA, Penel S, Duret L, Robinson-Rechavi M. Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes. Genome Res 2008; 18:1393-402. [PMID: 18562677 DOI: 10.1101/gr.076992.108] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
A stringent branch-site codon model was used to detect positive selection in vertebrate evolution. We show that the test is robust to the large evolutionary distances involved. Positive selection was detected in 77% of 884 genes studied. Most positive selection concerns a few sites on a single branch of the phylogenetic tree: Between 0.9% and 4.7% of sites are affected by positive selection depending on the branches. No functional category was overrepresented among genes under positive selection. Surprisingly, whole genome duplication had no effect on the prevalence of positive selection, whether the fish-specific genome duplication or the two rounds at the origin of vertebrates. Thus positive selection has not been limited to a few gene classes, or to specific evolutionary events such as duplication, but has been pervasive during vertebrate evolution.
Collapse
Affiliation(s)
- Romain A Studer
- Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland
| | | | | | | |
Collapse
|
25
|
Abstract
This unit provides a general introduction to phylogeny. It defines common terms and discusses the issue of rooting trees, in addition to comparing gene and species trees. Methods for inferring phylogenies, such as distance methods, parsimony methods, and maximum likelihood are also presented. The unit concludes with discussion of how to assess tree confidence.
Collapse
|
26
|
Ho MR, Jang WJ, Chen CH, Ch'ang LY, Lin WC. Designating eukaryotic orthology via processed transcription units. Nucleic Acids Res 2008; 36:3436-42. [PMID: 18445630 PMCID: PMC2425467 DOI: 10.1093/nar/gkn227] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Orthology is a widely used concept in comparative and evolutionary genomics. In addition to prokaryotic orthology, delineating eukaryotic orthology has provided insight into the evolution of higher organisms. Indeed, many eukaryotic ortholog databases have been established for this purpose. However, unlike prokaryotes, alternative splicing (AS) has hampered eukaryotic orthology assignments. Therefore, existing databases likely contain ambiguous eukaryotic ortholog relationships and possibly misclassify alternatively spliced protein isoforms as in-paralogs, which are duplicated genes that arise following speciation. Here, we propose a new approach for designating eukaryotic orthology using processed transcription units, and we present an orthology database prototype using the human and mouse genomes. Currently existing programs cover less than 69% of the human reference sequences when assigning human/mouse orthologs. In contrast, our method encompasses up to 80% of the human reference sequences. Moreover, the ortholog database presented herein is more than 92% consistent with the existing databases. In addition to managing AS, this approach is capable of identifying orthologs of embedded genes and fusion genes using syntenic evidence. In summary, this new approach is sensitive, specific and can generate a more comprehensive and accurate compilation of eukaryotic orthologs.
Collapse
Affiliation(s)
- Meng-Ru Ho
- Institute of Biomedical Informatics, National Yang-Ming University, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | | | | | | | | |
Collapse
|
27
|
Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldón T. The human phylome. Genome Biol 2008; 8:R109. [PMID: 17567924 PMCID: PMC2394744 DOI: 10.1186/gb-2007-8-6-r109] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2006] [Revised: 03/16/2007] [Accepted: 06/13/2007] [Indexed: 01/09/2023] Open
Abstract
The human phylome, which includes evolutionary relationships of all human proteins and their homologs among thirty-nine fully sequenced eukaryotes, is reconstructed. Background: Phylogenomics analyses serve to establish evolutionary relationships among organisms and their genes. A phylome, the complete collection of all gene phylogenies in a genome, constitutes a valuable source of information, but its use in large genomes still constitutes a technical challenge. The use of phylomes also requires the development of new methods that help us to interpret them. Results: We reconstruct here the human phylome, which includes the evolutionary relationships of all human proteins and their homologs among 39 fully sequenced eukaryotes. Phylogenetic techniques used include alignment trimming, branch length optimization, evolutionary model testing and maximum likelihood and Bayesian methods. Although differences with alternative topologies are minor, most of the trees support the Coelomata and Unikont hypotheses as well as the grouping of primates with laurasatheria to the exclusion of rodents. We assess the extent of gene duplication events and their relationship with the functional roles of the protein families involved. We find support for at least one, and probably two, rounds of whole genome duplications before vertebrate radiation. Using a novel algorithm that is independent from a species phylogeny, we derive orthology and paralogy relationships of human proteins among eukaryotic genomes. Conclusion: Topological variations among phylogenies for different genes are to be expected, highlighting the danger of gene-sampling effects in phylogenomic analyses. Several links can be established between the functions of gene families duplicated at certain phylogenetic splits and major evolutionary transitions in those lineages. The pipeline implemented here can be easily adapted for use in other organisms.
Collapse
Affiliation(s)
- Jaime Huerta-Cepas
- Bioinformatics Department, Centro de Investigación Príncipe Felipe, Autopista del Saler, 46013 Valencia, Spain
| | - Hernán Dopazo
- Bioinformatics Department, Centro de Investigación Príncipe Felipe, Autopista del Saler, 46013 Valencia, Spain
| | - Joaquín Dopazo
- Bioinformatics Department, Centro de Investigación Príncipe Felipe, Autopista del Saler, 46013 Valencia, Spain
| | - Toni Gabaldón
- Bioinformatics Department, Centro de Investigación Príncipe Felipe, Autopista del Saler, 46013 Valencia, Spain
| |
Collapse
|
28
|
Jost MC, Hillis DM, Lu Y, Kyle JW, Fozzard HA, Zakon HH. Toxin-resistant sodium channels: parallel adaptive evolution across a complete gene family. Mol Biol Evol 2008; 25:1016-24. [PMID: 18258611 DOI: 10.1093/molbev/msn025] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Approximately 75% of vertebrate proteins belong to protein families encoded by multiple evolutionarily related genes, a pattern that emerged as a result of gene and genome duplications over the course of vertebrate evolution. In families of genes with similar or related functions, adaptation to a strong selective agent should involve multiple adaptive changes across the entire gene family. However, we know of no evolutionary studies that have explicitly addressed this point. Here, we show how 4 taxonomically diverse species of pufferfishes (Tetraodontidae) each evolved resistance to the guanidinium toxins tetrodotoxin (TTX) and saxitoxin (STX) via parallel amino acid replacements across all 8 sodium channels present in teleost fish genomes. This resulted in diverse suites of coexisting sodium channel types that all confer varying degrees of toxin resistance, yet show remarkable convergence among genes and phylogenetically diverse species. Using site-directed mutagenesis and expression of a vertebrate sodium channel, we also demonstrate that resistance to TTX/STX is enhanced up to 15-fold by single, frequently observed replacements at 2 sites that have not previously been implicated in toxin binding but show similar or identical replacements in pufferfishes and in distantly related vertebrate and nonvertebrate animals. This study presents an example of natural selection acting upon a complete gene family, repeatedly arriving at a diverse but limited number of adaptive changes within the same genome. To be maximally informative, we suggest that future studies of molecular adaptation should consider all functionally similar paralogs of the affected gene family.
Collapse
Affiliation(s)
- Manda Clair Jost
- Sections of Integrative Biology and Neurobiology and Center for Computational Biology, School of Biological Sciences, University of Texas at Austin, USA.
| | | | | | | | | | | |
Collapse
|
29
|
Different functional classes of genes are characterized by different compositional properties. FEBS Lett 2007; 581:5819-24. [DOI: 10.1016/j.febslet.2007.11.052] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 11/19/2022]
|
30
|
Huerta-Cepas J, Bueno A, Dopazo J, Gabaldón T. PhylomeDB: a database for genome-wide collections of gene phylogenies. Nucleic Acids Res 2007; 36:D491-6. [PMID: 17962297 PMCID: PMC2238872 DOI: 10.1093/nar/gkm899] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The complete collection of evolutionary histories of all genes in a genome, also known as phylome, constitutes a valuable source of information. The reconstruction of phylomes has been previously prevented by large demands of time and computer power, but is now feasible thanks to recent developments in computers and algorithms. To provide a publicly available repository of complete phylomes that allows researchers to access and store large-scale phylogenomic analyses, we have developed PhylomeDB. PhylomeDB is a database of complete phylomes derived for different genomes within a specific taxonomic range. All phylomes in the database are built using a high-quality phylogenetic pipeline that includes evolutionary model testing and alignment trimming phases. For each genome, PhylomeDB provides the alignments, phylogentic trees and tree-based orthology predictions for every single encoded protein. The current version of PhylomeDB includes the phylomes of Human, the yeast Saccharomyces cerevisiae and the bacterium Escherichia coli, comprising a total of 32 289 seed sequences with their corresponding alignments and 172 324 phylogenetic trees. PhylomeDB can be publicly accessed at http://phylomedb.bioinfo.cipf.es
Collapse
Affiliation(s)
- Jaime Huerta-Cepas
- Bioinformatics Department, Centro de Investigación Príncipe Felipe, Avda. Autopista del Saler, 13 Valencia 46013, Spain
| | | | | | | |
Collapse
|
31
|
Kriventseva EV, Rahman N, Espinosa O, Zdobnov EM. OrthoDB: the hierarchical catalog of eukaryotic orthologs. Nucleic Acids Res 2007; 36:D271-5. [PMID: 17947323 PMCID: PMC2238902 DOI: 10.1093/nar/gkm845] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The concept of orthology is widely used to relate genes across different species using comparative genomics, and it provides the basis for inferring gene function. Here we present the web accessible OrthoDB database that catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species phylogeny, from more general groups to more fine-grained delineations between closely related species. We used a COG-like and Inparanoid-like ortholog delineation procedure on the basis of all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes, focusing on vertebrates, insects and fungi to facilitate further comparative studies. The database is freely available at http://cegg.unige.ch/orthodb
Collapse
Affiliation(s)
- Evgenia V. Kriventseva
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, 1 rue Michel-Servet, Department of Structural Biology and Bioinformatics, University of Geneva Medical School, 1 rue Michel-Servet, 1211 Geneva, Switzerland and Imperial College London, South Kensington Campus, SW7 2AZ London, UK
| | - Nazim Rahman
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, 1 rue Michel-Servet, Department of Structural Biology and Bioinformatics, University of Geneva Medical School, 1 rue Michel-Servet, 1211 Geneva, Switzerland and Imperial College London, South Kensington Campus, SW7 2AZ London, UK
| | - Octavio Espinosa
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, 1 rue Michel-Servet, Department of Structural Biology and Bioinformatics, University of Geneva Medical School, 1 rue Michel-Servet, 1211 Geneva, Switzerland and Imperial College London, South Kensington Campus, SW7 2AZ London, UK
| | - Evgeny M. Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, 1 rue Michel-Servet, Department of Structural Biology and Bioinformatics, University of Geneva Medical School, 1 rue Michel-Servet, 1211 Geneva, Switzerland and Imperial College London, South Kensington Campus, SW7 2AZ London, UK
- *To whom correspondence should be addressed.+41 22 379 59 73+41 22 379 57 06
| |
Collapse
|
32
|
Gouy M, Delmotte S. Remote access to ACNUC nucleotide and protein sequence databases at PBIL. Biochimie 2007; 90:555-62. [PMID: 17825976 DOI: 10.1016/j.biochi.2007.07.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2007] [Accepted: 07/03/2007] [Indexed: 10/23/2022]
Abstract
The ACNUC biological sequence database system provides powerful and fast query and extraction capabilities to a variety of nucleotide and protein sequence databases. The collection of ACNUC databases served by the Pôle Bio-Informatique Lyonnais includes the EMBL, GenBank, RefSeq and UniProt nucleotide and protein sequence databases and a series of other sequence databases that support comparative genomics analyses: HOVERGEN and HOGENOM containing families of homologous protein-coding genes from vertebrate and prokaryotic genomes, respectively; Ensembl and Genome Reviews for analyses of prokaryotic and of selected eukaryotic genomes. This report describes the main features of the ACNUC system and the access to ACNUC databases from any internet-connected computer. Such access was made possible by the definition of a remote ACNUC access protocol and the implementation of Application Programming Interfaces between the C, Python and R languages and this communication protocol. Two retrieval programs for ACNUC databases, Query_win, with a graphical user interface and raa_query, with a command line interface, are also described. Altogether, these bioinformatics tools provide users with either ready-to-use means of querying remote sequence databases through a variety of selection criteria, or a simple way to endow application programs with an extensive access to these databases. Remote access to ACNUC databases is open to all and fully documented (http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html).
Collapse
Affiliation(s)
- Manolo Gouy
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, 69622 Villeurbanne Cedex, France.
| | | |
Collapse
|
33
|
Kim SH, Elango N, Warden C, Vigoda E, Yi SV. Heterogeneous genomic molecular clocks in primates. PLoS Genet 2006; 2:e163. [PMID: 17029560 PMCID: PMC1592237 DOI: 10.1371/journal.pgen.0020163] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2006] [Accepted: 08/10/2006] [Indexed: 12/22/2022] Open
Abstract
Using data from primates, we show that molecular clocks in sites that have been part of a CpG dinucleotide in recent past (CpG sites) and non-CpG sites are of markedly different nature, reflecting differences in their molecular origins. Notably, single nucleotide substitutions at non-CpG sites show clear generation-time dependency, indicating that most of these substitutions occur by errors during DNA replication. On the other hand, substitutions at CpG sites occur relatively constantly over time, as expected from their primary origin due to methylation. Therefore, molecular clocks are heterogeneous even within a genome. Furthermore, we propose that varying frequencies of CpG dinucleotides in different genomic regions may have contributed significantly to conflicting earlier results on rate constancy of mammalian molecular clock. Our conclusion that different regions of genomes follow different molecular clocks should be considered when inferring divergence times using molecular data and in phylogenetic analysis.
Collapse
Affiliation(s)
- Seong-Ho Kim
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Navin Elango
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Charles Warden
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Eric Vigoda
- College of Computing, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Soojin V Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| |
Collapse
|
34
|
Smith BD, Raines RT. Genetic selection for critical residues in ribonucleases. J Mol Biol 2006; 362:459-78. [PMID: 16920150 DOI: 10.1016/j.jmb.2006.07.020] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2006] [Revised: 07/03/2006] [Accepted: 07/11/2006] [Indexed: 11/24/2022]
Abstract
Homologous mammalian proteins were subjected to an exhaustive search for residues that are critical to their structure/function. Error-prone polymerase chain reactions were used to generate random mutations in the genes of bovine pancreatic ribonuclease (RNase A) and human angiogenin, and a genetic selection based on the intrinsic cytotoxicity of ribonucleolytic activity was used to isolate inactive variants. Twenty-three of the 124 residues in RNase A were found to be intolerant to substitution with at least one particular amino acid. Twenty-nine of the 123 residues in angiogenin were likewise intolerant. In both RNase A and angiogenin, only six residues appeared to be wholly intolerant to substitution: two histidine residues involved in general acid/base catalysis and four cysteine residues that form two disulfide bonds. With few exceptions, the remaining critical residues were buried in the hydrophobic core of the proteins. Most of these residues were found to tolerate only conservative substitutions. The importance of a particular residue as revealed by this genetic selection correlated with its sequence conservation, though several non-conserved residues were found to be critical for protein structure/function. Despite voluminous research on RNase A, the importance of many residues identified herein was unknown, and those can now serve as targets for future work. Moreover, a comparison of the critical residues in RNase A and human angiogenin, which share only 35% amino acid sequence identity, provides a unique perspective on the molecular evolution of the RNase A superfamily, as well as an impetus for applying this methodology to other ribonucleases.
Collapse
Affiliation(s)
- Bryan D Smith
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | |
Collapse
|
35
|
A computational prediction of isochores based on hidden Markov models. Gene 2006; 385:41-9. [PMID: 17020791 DOI: 10.1016/j.gene.2006.04.032] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Revised: 03/17/2006] [Accepted: 04/03/2006] [Indexed: 11/30/2022]
Abstract
Mammalian genomes are organised into a mosaic of regions (in general more than 300 kb in length), with differing, relatively homogeneous G+C contents. The G+C content is the basic characteristic of isochores, but they have also been associated with many other biological properties. For instance, the genes are more compact and their density is highest in G+C rich isochores. Various ways of locating isochores in the human genome have been developed, but such methods use only the base composition of the DNA sequences. The present paper proposes a new method, based on a hidden Markov model, which takes into account several of the biological properties associated with the isochore structure of a genome. This method leads to good segmentation of the human genome into isochores, and also permits a new analysis of the known heterogeneity of G+C rich isochores: most (60%) of the G+C poor genes embedded in G+C rich isochores have UTR sequences characteristic of G+C rich genes. This genomic feature is discussed in the context of both evolution and genome function.
Collapse
|
36
|
Abstract
Several studies of nucleotide substitution patterns in mammalian species suggested that GC-rich isochores might be vanishing in mammalian genomes. However, the number of genes and the number of genomes included in these studies might not have given a reliable broad view of the trend in GC change in mammals. It is therefore worth exploiting this issue with a broader coverage of mammalian genomes using a reliable approach, the maximum likelihood approach. We have applied two maximum likelihood methods to infer the ancestral GC contents of 176 mammalian genes from representative eutherian species and at least one marsupial species. Except for a large GC decrease in marsupial genes, we found no general decreasing trend in GC content in GC-rich genes or in other genes among eutherian mammals; indeed, the GC content of GC-rich genes appears to have increased in recent times in some genomes, e.g., the rabbit. For the large GC decrease in marsupials, it could be mainly due to the great reduction in chromosome number, which could lead to a large reduction in recombination rate and thus also a large reduction in the rate of gene conversion. Since many eutherian mammals still maintain a fairly large number of chromosomes, it is unlikely that GC-rich isochores are vanishing in these mammals.
Collapse
Affiliation(s)
- Jianying Gu
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA
| | | |
Collapse
|
37
|
Fortes GG, Bouza C, Martínez P, Sánchez L. Diversity in isochore structure among cold-blooded vertebrates based on GC content of coding and non-coding sequences. Genetica 2006; 129:281-9. [PMID: 16897446 DOI: 10.1007/s10709-006-0009-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2005] [Accepted: 04/19/2006] [Indexed: 11/29/2022]
Abstract
To review the general consideration about the different compositional structure of warm and cold-blooded vertebrates genomes, we used of the increasing number of genetic sequences, including coding (exons) and non-coding (introns) regions, that have been deposited on the databases throughout last years. The nucleotide distributions of the third codon positions (GC3) have been analyzed in 1510 coding sequences (CDS) of fish, 1414 CDS of amphibians and 320 CDS of reptiles. Also, the relationship between GC content of 74, 56 and 25 CDS of fish, amphibians and reptiles, respectively and that of their corresponding introns (GCI) have been considerated. In accordance with recent data, sequence analysis showed the presence of very GC3-rich CDS in these poikilotherm vertebrates. However, very high diversity in compositional patterns among different orders of fish, amphibians and reptiles was found. Significant positive correlations between GC3 and GCI was also confirmed for the genes analyzed. Nevertheless, introns resulted to be poorer in GC than their corresponding CDS, this difference being larger than in human genome. Because the limited number of available sequences including exons and introns we must be cautious about the results derived from them. However, the indicious of higher GC richness of coding sequences than of their corresponding introns could aid to understand the discrepancy of sequence analysis with the ultracentrifugation studies in cold-blooded vertebrates that did not predict the existence of GC-rich isochores.
Collapse
Affiliation(s)
- Gloria G Fortes
- Departamento de Genética, Facultad de Veterinaria, Universidad de Santiago de Compostela, Lugo, Spain
| | | | | | | |
Collapse
|
38
|
Brunet FG, Roest Crollius H, Paris M, Aury JM, Gibert P, Jaillon O, Laudet V, Robinson-Rechavi M. Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol 2006; 23:1808-16. [PMID: 16809621 DOI: 10.1093/molbev/msl049] [Citation(s) in RCA: 290] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Teleost fishes provide the first unambiguous support for ancient whole-genome duplication in an animal lineage. Studies in yeast or plants have shown that the effects of such duplications can be mediated by a complex pattern of gene retention and changes in evolutionary pressure. To explore such patterns in fishes, we have determined by phylogenetic analysis the evolutionary origin of 675 Tetraodon duplicated genes assigned to chromosomes, using additional data from other species of actinopterygian fishes. The subset of genes, which was retained in double after the genome duplication, is enriched in development, signaling, behavior, and regulation functional categories. The evolutionary rate of duplicate fish genes appears to be determined by 3 forces: 1) fish proteins evolve faster than mammalian orthologs; 2) the genes kept in double after genome duplication represent the subset under strongest purifying selection; and 3) following duplication, there is an asymmetric acceleration of evolutionary rate in one of the paralogs. These results show that similar mechanisms are at work in fishes as in yeast or plants and provide a framework for future investigation of the consequences of duplication in fishes and other animals.
Collapse
Affiliation(s)
- Frédéric G Brunet
- Laboratoire de Biologie Moléculaire de la Cellule, INRA LA 1237, CNRS UMR5161, IFR 128 BioSciences Lyon-Gerland, Ecole Normale Supérieure de Lyon, Lyon, France
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Khelifi A, Meunier J, Duret L, Mouchiroud D. GC content evolution of the human and mouse genomes: insights from the study of processed pseudogenes in regions of different recombination rates. J Mol Evol 2006; 62:745-52. [PMID: 16752212 DOI: 10.1007/s00239-005-0186-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2005] [Accepted: 02/02/2006] [Indexed: 01/27/2023]
Abstract
Processed pseudogenes are generated by reverse transcription of a functional gene. They are generally nonfunctional after their insertion and, as a consequence, are no longer subjected to the selective constraints associated with functional genes. Because of this property they can be used as neutral markers in molecular evolution. In this work, we investigated the relationship between the evolution of GC content in recently inserted processed pseudogenes and the local recombination pattern in two mammalian genomes (human and mouse). We confirmed, using original markers, that recombination drives GC content in the human genome and we demonstrated that this is also true for the mouse genome despite lower recombination rates. Finally, we discussed the consequences on isochores evolution and the contrast between the human and the mouse pattern.
Collapse
Affiliation(s)
- Adel Khelifi
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard-Lyon 1, 16 rue Raphael Dubois, 69622 Villeurbanne Cedex, France.
| | | | | | | |
Collapse
|
40
|
Keane TM, Creevey CJ, Pentony MM, Naughton TJ, Mclnerney JO. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 2006; 6:29. [PMID: 16563161 PMCID: PMC1435933 DOI: 10.1186/1471-2148-6-29] [Citation(s) in RCA: 805] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Accepted: 03/24/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner. RESULTS We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins. CONCLUSION This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.
Collapse
Affiliation(s)
- Thomas M Keane
- Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Co. Kildare, Ireland
| | | | - Melissa M Pentony
- Department of Computer Science, University College London, Gower Street, London, UK
| | - Thomas J Naughton
- Department of Computer Science, National University of Ireland, Maynooth, Co. Kildare, Ireland
| | - James O Mclnerney
- Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Co. Kildare, Ireland
| |
Collapse
|
41
|
Whelan S, de Bakker PIW, Quevillon E, Rodriguez N, Goldman N. PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res 2006; 34:D327-31. [PMID: 16381879 PMCID: PMC1347450 DOI: 10.1093/nar/gkj087] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
PANDIT is a database of homologous sequence alignments accompanied by estimates of their corresponding phylogenetic trees. It provides a valuable resource to those studying phylogenetic methodology and the evolution of coding-DNA and protein sequences. Currently in version 17.0, PANDIT comprises 7738 families of homologous protein domains; for each family, DNA and corresponding amino acid sequence multiple alignments are available together with high quality phylogenetic tree estimates. Recent improvements include expanded methods for phylogenetic tree inference, assessment of alignment quality and a redesigned web interface, available at the URL .
Collapse
Affiliation(s)
- Simon Whelan
- EMBL-European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
42
|
Abstract
Orthologs and paralogs are two fundamentally different types of homologous genes that evolved, respectively, by vertical descent from a single ancestral gene and by duplication. Orthology and paralogy are key concepts of evolutionary genomics. A clear distinction between orthologs and paralogs is critical for the construction of a robust evolutionary classification of genes and reliable functional annotation of newly sequenced genomes. Genome comparisons show that orthologous relationships with genes from taxonomically distant species can be established for the majority of the genes from each sequenced genome. This review examines in depth the definitions and subtypes of orthologs and paralogs, outlines the principal methodological approaches employed for identification of orthology and paralogy, and considers evolutionary and functional implications of these concepts.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
| |
Collapse
|
43
|
Ezawa K, OOta S, Saitou N. Genome-Wide Search of Gene Conversions in Duplicated Genes of Mouse and Rat. Mol Biol Evol 2006; 23:927-40. [PMID: 16407460 DOI: 10.1093/molbev/msj093] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Gene conversion is considered to play important roles in the formation of genomic makeup such as homogenization of multigene families and diversification of alleles. We devised two statistical tests on quartets for detecting gene conversion events. Each "quartet" consists of two pairs of orthologous sequences supposed to have been generated by a duplication event and a subsequent speciation of two closely related species. As example data, EnsEMBL mouse and rat cDNA sequences were used to obtain a genome-wide picture of gene conversion events. We extensively sampled 2,641 quartets that appear to have resulted from duplications after the divergence of primates and rodents and before mouse-rat speciation. Combination of our new tests with Sawyer's and Takahata's tests enhanced the detection sensitivity while keeping false positives as few as possible. About 18% (488 quartets) were shown to be highly positive for gene conversion using this combined test. Out of them, 340 (13% of the total) showed signs of gene conversion in mouse sequence pairs. Those gene conversion-positive gene pairs are mostly linked in the same chromosomes, with the proportion of positive pairs in the linked and unlinked categories being 15% and 1%, respectively. Statistical analyses showed that (1) the susceptibility to gene conversion correlates negatively with the physical distance, especially the frequency of 29% was observed for gene pairs whose distances are smaller than 55 kb; (2) the occurrence of gene conversions does not depend on the transcriptional direction; (3) small gene families consisting of between three and six contiguous genes are highly prone to gene conversion; and (4) frequency of gene conversions greatly varies depending on functional categories, and cadherins favor gene conversion, while vomeronasal receptors type 1 and immunoglobulin V-type proteins disfavor it. These findings will be useful to deepen the understanding of the roles of gene conversion.
Collapse
Affiliation(s)
- Kiyoshi Ezawa
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | | | | |
Collapse
|
44
|
Rayko E, Jabbari K, Bernardi G. The evolution of introns in human duplicated genes. Gene 2006; 365:41-7. [PMID: 16356663 DOI: 10.1016/j.gene.2005.09.038] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2005] [Revised: 07/07/2005] [Accepted: 09/07/2005] [Indexed: 11/17/2022]
Abstract
In previous work [Jabbari, K., Rayko, E., Bernardi, G., 2003. The major shifts of human duplicated genes. Gene 317, 203-208], we investigated the fate of ancient duplicated genes after the compositional transitions that occurred between the genomes of cold- and warm-blooded vertebrates. We found that the majority of duplicated copies were transposed to the "ancestral genome core", the gene-dense genome compartment that underwent a GC enrichment at the compositional transitions. Here, we studied the consequences of the events just outlined on the introns of duplicated genes. We found that, while intron number was highly conserved, total intron size (the sum of intron sizes within any given gene) was smaller in the GC-rich copies compared to the GC-poor copies, especially in dispersed copies (i.e., copies located on different chromosomes or chromosome arms). GC-rich copies also showed higher densities of CpG islands and Alus, whereas GC-poor copies were characterized by higher densities of LINEs. The features of the copies that underwent the compositional transition and became GC-richer are suggestive of, or related to, functional changes.
Collapse
Affiliation(s)
- Edda Rayko
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, 2 Place Jussieu, F-75005 Paris, France.
| | | | | |
Collapse
|
45
|
Hurst LD, Lercher MJ. Unusual linkage patterns of ligands and their cognate receptors indicate a novel reason for non-random gene order in the human genome. BMC Evol Biol 2005; 5:62. [PMID: 16277660 PMCID: PMC1309615 DOI: 10.1186/1471-2148-5-62] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2005] [Accepted: 11/08/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prior to the sequencing of the human genome it was typically assumed that, tandem duplication aside, gene order is for the most part random. Numerous observers, however, highlighted instances in which a ligand was linked to one of its cognate receptors, with some authors suggesting that this may be a general and/or functionally important pattern, possibly associated with recombination modification between epistatically interacting loci. Here we ask whether ligands are more closely linked to their receptors than expected by chance. RESULTS We find no evidence that ligands are linked to their receptors more closely than expected by chance. However, in the human genome there are approximately twice as many co-occurrences of ligand and receptor on the same human chromosome as expected by chance. Although a weak effect, the latter might be consistent with a past history of block duplication. Successful duplication of some ligands, we hypothesise, is more likely if the cognate receptor is duplicated at the same time, so ensuring appropriate titres of the two products. CONCLUSION While there is an excess of ligands and their receptors on the same human chromosome, this cannot be accounted for by classical models of non-random gene order, as the linkage of ligands/receptors is no closer than expected by chance. Alternative hypotheses for non-random gene order are hence worth considering.
Collapse
MESH Headings
- Animals
- Chromosome Mapping
- Chromosomes/ultrastructure
- Chromosomes, Human
- Dose-Response Relationship, Drug
- Epistasis, Genetic
- Evolution, Molecular
- Gene Conversion
- Gene Duplication
- Genetic Linkage
- Genome, Human
- Humans
- Ligands
- Linkage Disequilibrium
- Mice
- Models, Genetic
- Models, Statistical
- Multigene Family
- Protein Binding
- Recombination, Genetic
- Selection, Genetic
- Sequence Analysis, DNA
- Species Specificity
- Synteny
Collapse
Affiliation(s)
- Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Martin J Lercher
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
46
|
Raes J, Van de Peer Y. Functional divergence of proteins through frameshift mutations. Trends Genet 2005; 21:428-31. [PMID: 15951050 DOI: 10.1016/j.tig.2005.05.013] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2004] [Revised: 04/21/2005] [Accepted: 05/26/2005] [Indexed: 11/21/2022]
Abstract
Frameshift mutations are generally considered to be deleterious and of little importance for the evolution of novel gene functions. However, by screening an exhaustive set of vertebrate gene families, we found that, when a second transcript encoding the original gene product compensates for this mutation, frameshift mutations can be retained for millions of years and enable new gene functions to be acquired.
Collapse
Affiliation(s)
- Jeroen Raes
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent University, Technologiepark 927, B-9052 Ghent, Belgium
| | | |
Collapse
|
47
|
Chamary JV, Hurst LD. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol 2005; 6:R75. [PMID: 16168082 PMCID: PMC1242210 DOI: 10.1186/gb-2005-6-9-r75] [Citation(s) in RCA: 238] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2005] [Revised: 06/08/2005] [Accepted: 07/20/2005] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In mammals, contrary to what is usually assumed, recent evidence suggests that synonymous mutations may not be selectively neutral. This position has proven contentious, not least because of the absence of a viable mechanism. Here we test whether synonymous mutations might be under selection owing to their effects on the thermodynamic stability of mRNA, mediated by changes in secondary structure. RESULTS We provide numerous lines of evidence that are all consistent with the above hypothesis. Most notably, by simulating evolution and reallocating the substitutions observed in the mouse lineage, we show that the location of synonymous mutations is non-random with respect to stability. Importantly, the preference for cytosine at 4-fold degenerate sites, diagnostic of selection, can be explained by its effect on mRNA stability. Likewise, by interchanging synonymous codons, we find naturally occurring mRNAs to be more stable than simulant transcripts. Housekeeping genes, whose proteins are under strong purifying selection, are also under the greatest pressure to maintain stability. CONCLUSION Taken together, our results provide evidence that, in mammals, synonymous sites do not evolve neutrally, at least in part owing to selection on mRNA stability. This has implications for the application of synonymous divergence in estimating the mutation rate.
Collapse
Affiliation(s)
- JV Chamary
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| | - Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| |
Collapse
|
48
|
Abstract
Paralogy (common ancestry through gene duplication rather than speciation) is widely recognized as an important problem for molecular systematists. This chapter introduces the concepts of paralogy and orthology and explains why paralogy can complicate both systematic work and other studies of molecular evolution. The definition of paralogy is explicitly phylogenetic, and phylogenetic methods are crucial in elucidating the pattern of paralogy. In particular, knowledge of the species phylogeny is key. I introduce the theory behind methods for detecting paralogy and briefly discuss two particular software implementations of phylogenetic methods to detect paralogy from molecular data. I also introduce a statistical method for detecting paralogy and some future directions for work on paralogy detection.
Collapse
Affiliation(s)
- James A Cotton
- Department of Zoology, The Natural History Museum, London SW7 5BD, United Kingdom
| |
Collapse
|
49
|
Pinto JP, Conceição NM, Viegas CSB, Leite RB, Hurst LD, Kelsh RN, Cancela ML. Identification of a new pebp2alphaA2 isoform from zebrafish runx2 capable of inducing osteocalcin gene expression in vitro. J Bone Miner Res 2005; 20:1440-53. [PMID: 16007341 DOI: 10.1359/jbmr.050318] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2004] [Revised: 01/19/2005] [Accepted: 03/16/2005] [Indexed: 11/18/2022]
Abstract
UNLABELLED The zebrafish runx2b transcription factor is an ortholog of RUNX2 and is highly conserved at the structural level. The runx2b pebp2alphaA2 isoform induces osteocalcin gene expression by binding to a specific region of the promoter and seems to have been selectively conserved in the teleost lineage. INTRODUCTION RUNX2 (also known as CBFA1/Osf2/AML3/PEBP2alphaA) is a transcription factor essential for bone formation in mammals, as well as for osteoblast and chondrocyte differentiation, through regulation of expression of several bone- and cartilage-related genes. Since its discovery, Runx2 has been the subject of intense studies, mainly focused in unveiling regulatory targets of this transcription factor in high vertebrates. However, no single study has been published addressing the role of Runx2 in bone metabolism of low vertebrates. While analyzing the zebrafish (Danio rerio) runx2 gene, we identified the presence of two orthologs of RUNX2, which we named runx2a and runx2b and cloned a pebp2alphaA-like transcript of the runx2b gene, which we named pebp2alphaA2. MATERIALS AND METHODS Zebrafish runx2b gene and cDNA were isolated by RT-PCR and sequence data mining. The 3D structure of runx2b runt domain was modeled using mouse Runx1 runt as template. The regulatory effect of pebp2alphaA2 on osteocalcin expression was analyzed by transient co-transfection experiments using a luciferase reporter gene. Phylogenetic analysis of available Runx sequences was performed with TREE_PUZZLE 5.2. and MrBayes. RESULTS AND CONCLUSIONS We showed that the runx2b gene structure is highly conserved between mammals and fish. Zebrafish runx2b has two promoter regions separated by a large intron. Sequence analysis suggested that the runx2b gene encodes three distinct isoforms, by a combination of alternative splicing and differential promoter activation, as described for the human gene. We have cloned a pebp2alphaA-like transcript of the runx2b gene, which we named pebp2alphaA2, and showed its high degree of sequence similarity with the mammalian pebp2alphaA. The cloned zebrafish osteocalcin promoter was found to contain three putative runx2-binding elements, and one of them, located at -221 from the ATG, was capable of mediating pebp2alphaA2 transactivation. In addition, cross-species transactivation was also confirmed because the mouse Cbfa1 was able to induce the zebrafish osteocalcin promoter, whereas the zebrafish pebp2alphaA2 activated the murine osteocalcin promoter. These results are consistent with the high degree of evolutionary conservation of these proteins. The 3D structure of the runx2b runt domain was modeled based on the runt domain of mouse Runx1. Results show a high degree of similarity in the 3D configuration of the DNA binding regions from both domains, with significant differences only observed in non-DNA binding regions or in DNA-binding regions known to accommodate considerable structure flexibility. Phylogenetic analysis was used to clarify the relationship between the isoforms of each of the two zebrafish Runx2 orthologs and other Runx proteins. Both zebrafish runx2 genes clustered with other Runx2 sequences. The duplication event seemed, however, to be so old that, whereas Runx2b clearly clusters with the other fish sequences, it is unclear whether Runx2a clusters with Runx2 from higher vertebrates or from other fish.
Collapse
Affiliation(s)
- Jorge P Pinto
- CCMAR, University of Algarve, Campus de Gambelas, Faro, Portugal
| | | | | | | | | | | | | |
Collapse
|
50
|
Khelifi A, Adel K, Duret L, Laurent D, Mouchiroud D, Dominique M. HOPPSIGEN: a database of human and mouse processed pseudogenes. Nucleic Acids Res 2005; 33:D59-66. [PMID: 15608268 PMCID: PMC540038 DOI: 10.1093/nar/gki084] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Processed pseudogenes result from reverse transcribed mRNAs. In general, because processed pseudogenes lack promoters, they are no longer functional from the moment they are inserted into the genome. Subsequently, they freely accumulate substitutions, insertions and deletions. Moreover, the ancestral structure of processed pseudogenes could be easily inferred using the sequence of their functional homologous genes. Owing to these characteristics, processed pseudogenes represent good neutral markers for studying genome evolution. Recently, there is an increasing interest for these markers, particularly to help gene prediction in the field of genome annotation, functional genomics and genome evolution analysis (patterns of substitution). For these reasons, we have developed a method to annotate processed pseudogenes in complete genomes. To make them useful to different fields of research, we stored them in a nucleic acid database after having annotated them. In this work, we screened both mouse and human complete genomes from ENSEMBL to find processed pseudogenes generated from functional genes with introns. We used a conservative method to detect processed pseudogenes in order to minimize the rate of false positive sequences. Within processed pseudogenes, some are still having a conserved open reading frame and some have overlapping gene locations. We designated as retroelements all reverse transcribed sequences and more strictly, we designated as processed pseudogenes, all retroelements not falling in the two former categories (having a conserved open reading or overlapping gene locations). We annotated 5823 retroelements (5206 processed pseudogenes) in the human genome and 3934 (3428 processed pseudogenes) in the mouse genome. Compared to previous estimations, the total number of processed pseudogenes was underestimated but the aim of this procedure was to generate a high-quality dataset. To facilitate the use of processed pseudogenes in studying genome structure and evolution, DNA sequences from processed pseudogenes, and their functional reverse transcribed homologs, are now stored in a nucleic acid database, HOPPSIGEN. HOPPSIGEN can be browsed on the PBIL (Pôle Bioinformatique Lyonnais) World Wide Web server (http://pbil.univ-lyon1.fr/) or fully downloaded for local installation.
Collapse
Affiliation(s)
- Adel Khelifi
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard-Lyon 1, 43 bd. du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | | | | | | | |
Collapse
|