51
|
Lam TTY, Hon CC, Tang JW. Use of phylogenetics in the molecular epidemiology and evolutionary studies of viral infections. Crit Rev Clin Lab Sci 2010; 47:5-49. [PMID: 20367503 DOI: 10.3109/10408361003633318] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Since DNA sequencing techniques first became available almost 30 years ago, the amount of nucleic acid sequence data has increased enormously. Phylogenetics, which is widely applied to compare and analyze such data, is particularly useful for the analysis of genes from rapidly evolving viruses. It has been used extensively to describe the molecular epidemiology and transmission of the human immunodeficiency virus (HIV), the origins and subsequent evolution of the severe acute respiratory syndrome (SARS)-associated coronavirus (SCoV), and, more recently, the evolving epidemiology of avian influenza as well as seasonal and pandemic human influenza viruses. Recent advances in phylogenetic methods can infer more in-depth information about the patterns of virus emergence, adding to the conventional approaches in viral epidemiology. Examples of this information include estimations (with confidence limits) of the actual time of the origin of a new viral strain or its emergence in a new species, viral recombination and reassortment events, the rate of population size change in a viral epidemic, and how the virus spreads and evolves within a specific population and geographical region. Such sequence-derived information obtained from the phylogenetic tree can assist in the design and implementation of public health and therapeutic interventions. However, application of many of these advanced phylogenetic methods are currently limited to specialized phylogeneticists and statisticians, mainly because of their mathematical basis and their dependence on the use of a large number of computer programs. This review attempts to bridge this gap by presenting conceptual, technical, and practical aspects of applying phylogenetic methods in studies of influenza, HIV, and SCoV. It aims to provide, with minimal mathematics and statistics, a practical overview of how phylogenetic methods can be incorporated into virological studies by clinical and laboratory specialists.
Collapse
Affiliation(s)
- Tommy Tsan-Yuk Lam
- School of Biological Sciences, The University of Hong Kong, Hong Kong Special Administrative Region, China
| | | | | |
Collapse
|
52
|
Kordík P, Koutník J, Drchal J, Kovářík O, Čepek M, Šnorek M. Meta-learning approach to neural network optimization. Neural Netw 2010; 23:568-82. [DOI: 10.1016/j.neunet.2010.02.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2009] [Revised: 02/03/2010] [Accepted: 02/03/2010] [Indexed: 01/17/2023]
|
53
|
Pavlopoulos GA, Soldatos TG, Barbosa-Silva A, Schneider R. A reference guide for tree analysis and visualization. BioData Min 2010; 3:1. [PMID: 20175922 PMCID: PMC2844399 DOI: 10.1186/1756-0381-3-1] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 02/22/2010] [Indexed: 11/10/2022] Open
Abstract
The quantities of data obtained by the new high-throughput technologies, such as microarrays or ChIP-Chip arrays, and the large-scale OMICS-approaches, such as genomics, proteomics and transcriptomics, are becoming vast. Sequencing technologies become cheaper and easier to use and, thus, large-scale evolutionary studies towards the origins of life for all species and their evolution becomes more and more challenging. Databases holding information about how data are related and how they are hierarchically organized expand rapidly. Clustering analysis is becoming more and more difficult to be applied on very large amounts of data since the results of these algorithms cannot be efficiently visualized. Most of the available visualization tools that are able to represent such hierarchies, project data in 2D and are lacking often the necessary user friendliness and interactivity. For example, the current phylogenetic tree visualization tools are not able to display easy to understand large scale trees with more than a few thousand nodes. In this study, we review tools that are currently available for the visualization of biological trees and analysis, mainly developed during the last decade. We describe the uniform and standard computer readable formats to represent tree hierarchies and we comment on the functionality and the limitations of these tools. We also discuss on how these tools can be developed further and should become integrated with various data sources. Here we focus on freely available software that offers to the users various tree-representation methodologies for biological data analysis.
Collapse
Affiliation(s)
| | - Theodoros G Soldatos
- Structural and Computational Biology Unit, EMBL, Meyerhofstrasse 1, Heidelberg, Germany
| | - Adriano Barbosa-Silva
- Computational Biology and Data Mining Group, Max-Delbrück Center for Molecular Medicine, Robert-Rössle-Strasse, 10, D-13125, Berlin, Germany
| | - Reinhard Schneider
- Structural and Computational Biology Unit, EMBL, Meyerhofstrasse 1, Heidelberg, Germany
| |
Collapse
|
54
|
Irimia M, Maeso I, Gunning PW, Garcia-Fernàndez J, Roy SW. Internal and external paralogy in the evolution of tropomyosin genes in metazoans. Mol Biol Evol 2010; 27:1504-17. [PMID: 20147436 DOI: 10.1093/molbev/msq018] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Nature contains a tremendous diversity of forms both at the organismal and genomic levels. This diversity motivates the twin central questions of molecular evolution: what are the molecular mechanisms of adaptation, and what are the functional consequences of genomic diversity. We report a 22-species comparative analysis of tropomyosin (PPM) genes, which exist in a variety of forms and are implicated in the emergence of a wealth of cellular functions, including the novel muscle functions integral to the functional diversification of bilateral animals. TPM genes encode either or both of long-form [284 amino acid (aa)] and short-form (approximately 248 aa) proteins. Consistent with a role of TPM diversification in the origins and radiation of bilaterians, we find evidence that the muscle-specific long-form protein arose in proximal bilaterian ancestors (the bilaterian 'stem'). Duplication of the 5' end of the gene led to alternative promoters encoding long- and short-form transcripts with distinct functions. This dual-function gene then underwent strikingly parallel evolution in different bilaterian lineages. In each case, recurrent tandem exon duplication and mutually exclusive alternative splicing of the duplicates, with further association between these alternatively spliced exons along the gene, led to long- and short-form-specific exons, allowing for gradual emergence of alternative "internal paralogs" within the same gene. We term these Mutually exclusively Alternatively spliced Tandemly duplicated Exon sets "MATEs". This emergence of internal paralogs in various bilaterians has employed every single TPM exon in at least one lineage and reaches striking levels of divergence with up to 77% of long- and short-form transcripts being transcribed from different genomic regions. Interestingly, in some lineages, these internal alternatively spliced paralogs have subsequently been "externalized" by full gene duplication and reciprocal retention/loss of the two transcript isoforms, a particularly clear case of evolution by subfunctionalization. This parallel evolution of TPM genes in diverse metazoans attests to common selective forces driving divergence of different gene transcripts and represents a striking case of emergence of evolutionary novelty by alternative splicing.
Collapse
Affiliation(s)
- Manuel Irimia
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | | | | | | | | |
Collapse
|
55
|
Covariation of branch lengths in phylogenies of functionally related genes. PLoS One 2009; 4:e8487. [PMID: 20041191 PMCID: PMC2793527 DOI: 10.1371/journal.pone.0008487] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 11/25/2009] [Indexed: 12/05/2022] Open
Abstract
Recent studies have shown evidence for the coevolution of functionally-related genes. This coevolution is a result of constraints to maintain functional relationships between interacting proteins. The studies have focused on the correlation in gene tree branch lengths of proteins that are directly interacting with each other. We here hypothesize that the correlation in branch lengths is not limited only to proteins that directly interact, but also to proteins that operate within the same pathway. Using generalized linear models as a basis of identifying correlation, we attempted to predict the gene ontology (GO) terms of a gene based on its gene tree branch lengths. We applied our method to a dataset consisting of proteins from ten prokaryotic species. We found that the degree of accuracy to which we could predict the function of the proteins from their gene tree varied substantially with different GO terms. In particular, our model could accurately predict genes involved in translation and certain ribosomal activities with the area of the receiver-operator curve of up to 92%. Further analysis showed that the similarity between the trees of genes labeled with similar GO terms was not limited to genes that physically interacted, but also extended to genes functioning within the same pathway. We discuss the relevance of our findings as it relates to the use of phylogenetic methods in comparative genomics.
Collapse
|
56
|
O'Connor TD, Mundy NI. Genotype-phenotype associations: substitution models to detect evolutionary associations between phenotypic variables and genotypic evolutionary rate. ACTA ACUST UNITED AC 2009; 25:i94-100. [PMID: 19478022 PMCID: PMC2687985 DOI: 10.1093/bioinformatics/btp231] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Motivation: Mapping between genotype and phenotype is one of the primary goals of evolutionary genetics but one that has received little attention at the interspecies level. Recent developments in phylogenetics and statistical modelling have typically been used to examine molecular and phenotypic evolution separately. We have used this background to develop phylogenetic substitution models to test for associations between evolutionary rate of genotype and phenotype. We do this by creating hybrid rate matrices between genotype and phenotype. Results: Simulation results show our models to be accurate in detecting genotype–phenotype associations and robust for various factors that typically affect maximum likelihood methods, such as number of taxa, level of relevant signal, proportion of sites affected and length of evolutionary divergence. Further, simulations show that our method is robust to homogeneity assumptions. We apply the models to datasets of male reproductive system genes in relation to mating systems of primates. We show that evolution of semenogelin II is significantly associated with mating systems whereas two negative control genes (cytochrome b and peptidase inhibitor 3) show no significant association. This provides the first hybrid substitution model of which we are aware to directly test the association between genotype and phenotype using a phylogenetic framework. Availability: Perl and HYPHY scripts are available upon request from the authors. Contact:to252@cam.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
57
|
Blitvich BJ, Lin M, Dorman KS, Soto V, Hovav E, Tucker BJ, Staley M, Platt KB, Bartholomay LC. Genomic sequence and phylogenetic analysis of Culex flavivirus, an insect-specific flavivirus, isolated from Culex pipiens (Diptera: Culicidae) in Iowa. JOURNAL OF MEDICAL ENTOMOLOGY 2009; 46:934-41. [PMID: 19645300 PMCID: PMC2741316 DOI: 10.1603/033.046.0428] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Adult mosquitoes (Diptera: Culicidae) were collected in 2007 and tested for specific viruses, including West Nile virus, as part of the ongoing arbovirus surveillance efforts in the state of Iowa. A subset of these mosquitoes (6,061 individuals in 340 pools) was further tested by reverse transcription-polymerase chain reaction (RT-PCR) using flavivirus universal primers. Of the 211 pools of Culex pipiens (L.) tested, 50 were positive. One of 51 pools of Culex tarsalis Coquillet was also positive. The flavivirus minimum infection rates (expressed as the number of positive mosquito pools per 1,000 mosquitoes tested) for Cx. pipiens and Cx. tarsalis were 10.3 and 1.2, respectively. Flavivirus RNA was not detected in Aedes triseriatus (Say) (52 pools), Culex erraticus (Dyar & Knab) (25 pools), or Culex territans Walker (one pool). Sequence analysis of all RT-PCR products revealed that the mosquitoes had been infected with Culex flavivirus (CxFV), an insect-specific virus previously isolated in Japan, Indonesia, Texas, Mexico, Guatemala and Trinidad. The complete genome of one isolate was sequenced, as were the envelope protein genes of eight other isolates. Phylogenetic analysis revealed that CxFV isolates from the United States (Iowa and Texas) are more closely related to CxFV isolates from Asia than those from Mexico, Guatemala, and Trinidad.
Collapse
Affiliation(s)
- Bradley J Blitvich
- Department of Veterinary Microbiology and Preventive Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA 50011, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
58
|
Fourcade S, Ruiz M, Camps C, Schlüter A, Houten SM, Mooyer PAW, Pàmpols T, Dacremont G, Wanders RJA, Giròs M, Pujol A. A key role for the peroxisomal ABCD2 transporter in fatty acid homeostasis. Am J Physiol Endocrinol Metab 2009; 296:E211-21. [PMID: 18854420 DOI: 10.1152/ajpendo.90736.2008] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Peroxisomes are essential organelles exerting key functions in fatty acid metabolism such as the degradation of very long-chain fatty acids (VLCFAs). VLCFAs accumulate in X-adrenoleukodystrophy (X-ALD), a disease caused by deficiency of the Abcd1 peroxisomal transporter. Its closest homologue, Abcd2, exhibits a high degree of functional redundancy on the catabolism of VLCFA, being able to prevent X-ALD-related neurodegeneration in the mouse. In the search for specific roles of Abcd2, we screened fatty acid profiles in organs and primary neurons of mutant knockout mice lacking Abcd2 in basal conditions and under dietary challenges. Our results indicate that ABCD2 plays a role in the degradation of long-chain saturated and omega9-monounsaturated fatty acids and in the synthesis of docosahexanoic acid (DHA). Also, we demonstrated a defective VLCFA beta-oxidation ex vivo in brain slices of Abcd1 and Abcd2 knockouts, using radiolabeled hexacosanoic acid and the precursor of DHA as substrates. As DHA levels are inversely correlated with the incidence of Alzheimer's and several degenerative conditions, we suggest that ABCD2 may act as modulator/modifier gene and therapeutic target in rare and common human disorders.
Collapse
Affiliation(s)
- Stéphane Fourcade
- Centre de Genètica Mèdica i Molecular, Institut d'Investigació Biomèdica de Bellvitge, L'Hospitalet de Llobregat, Barcelona, Spain
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
59
|
Maddison DR, Moore W, Baker MD, Ellis TM, Ober KA, Cannone JJ, Gutell RR. Monophyly of terrestrial adephagan beetles as indicated by three nuclear genes (Coleoptera: Carabidae and Trachypachidae). ZOOL SCR 2009; 38:43-62. [PMID: 19789725 PMCID: PMC2752903 DOI: 10.1111/j.1463-6409.2008.00359.x] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The beetle suborder Adephaga is traditionally divided into two sections on the basis of habitat, terrestrial Geadephaga and aquatic Hydradephaga. Monophyly of both groups is uncertain, and the relationship of the two groups has implications for inferring habitat transitions within Adephaga. Here we examine phylogenetic relationships of these groups using evidence provided by DNA sequences from all four suborders of beetles, including 60 species of Adephaga, four Archostemata, three Myxophaga, and ten Polyphaga. We studied 18S ribosomal DNA and 28S ribosomal DNA, aligned with consideration of secondary structure, as well as the nuclear protein-coding gene wingless. Independent and combined Bayesian, likelihood, and parsimony analyses of all three genes supported placement of Trachypachidae in a monophyletic Geadephaga, although for analyses of 28S rDNA and some parsimony analyses only if Coleoptera is constrained to be monophyletic. Most analyses showed limited support for the monophyly of Hydradephaga. Outside of Adephaga, there is support from the ribosomal genes for a sister group relationship between Adephaga and Polyphaga. Within the small number of sampled Polyphaga, analyses of 18S rDNA, wingless, and the combined matrix supports monophyly of Polyphaga exclusive of Scirtoidea. Unconstrained analyses of the evolution of habitat suggest that Adephaga was ancestrally aquatic with one transition to terrestrial. However, in analyses constrained to disallow changes from aquatic to terrestrial habitat, the phylogenies imply two origins of aquatic habit within Adephaga.
Collapse
Affiliation(s)
- D R Maddison
- Department of Entomology, University of Arizona, Tucson, AZ, 85721
| | | | | | | | | | | | | |
Collapse
|
60
|
Ross HA, Murugan S, Li WLS. Testing the reliability of genetic methods of species identification via simulation. Syst Biol 2008; 57:216-30. [PMID: 18398767 DOI: 10.1080/10635150802032990] [Citation(s) in RCA: 203] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Although genetic methods of species identification, especially DNA barcoding, are strongly debated, tests of these methods have been restricted to a few empirical cases for pragmatic reasons. Here we use simulation to test the performance of methods based on sequence comparison (BLAST and genetic distance) and tree topology over a wide range of evolutionary scenarios. Sequences were simulated on a range of gene trees spanning almost three orders of magnitude in tree depth and in coalescent depth; that is, deep or shallow trees with deep or shallow coalescences. When the query's conspecific sequences were included in the reference alignment, the rate of positive identification was related to the degree to which different species were genetically differentiated. The BLAST, distance, and liberal tree-based methods returned higher rates of correct identification than did the strict tree-based requirement that the query was within, but not sister to, a single-species clade. Under this more conservative approach, ambiguous outcomes occurred in inverse proportion to the number of reference sequences per species. When the query's conspecific sequences were not in the reference alignment, only the strict tree-based approach was relatively immune to making false-positive identifications. Thresholds affected the rates at which false-positive identifications were made when the query's species was unrepresented in the reference alignment but did not otherwise influence outcomes. A conservative approach using the strict tree-based method should be used initially in large-scale identification systems, with effort made to maximize sequence sampling within species. Once the genetic variation within a taxonomic group is well characterized and the taxonomy resolved, then the choice of method used should be dictated by considerations of computational efficiency. The requirement for extensive genetic sampling may render these techniques inappropriate in some circumstances.
Collapse
Affiliation(s)
- Howard A Ross
- Bioinformatics Institute, University of Auckland, Private Bag 92019, Auckland Mail Centre, Auckland 1142, New Zealand.
| | | | | |
Collapse
|
61
|
Soria-Carrasco V, Castresana J. Estimation of Phylogenetic Inconsistencies in the Three Domains of Life. Mol Biol Evol 2008; 25:2319-29. [DOI: 10.1093/molbev/msn176] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
62
|
Essoussi N, Boujenfa K, Limam M. A comparison of MSA tools. Bioinformation 2008; 2:452-5. [PMID: 18841241 PMCID: PMC2561165 DOI: 10.6026/97320630002452] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2008] [Revised: 06/13/2008] [Accepted: 07/06/2008] [Indexed: 12/04/2022] Open
Abstract
Multiple sequence alignment (MSA) is essential in phylogenetic, evolutionary and functional analysis. Several MSA tools are available in the literature. Here, we use several MSA tools such as
ClustalX, Align-m, T-Coffee, SAGA, ProbCons, MAFFT, MUSCLE and DIALIGN to illustrate comparative phylogenetic trees analysis for two datasets. Results show that there is no single MSA tool that
consistently outperforms the rest in producing reliable phylogenetic trees.
Collapse
Affiliation(s)
- Nadia Essoussi
- LARODEC, High Institute of Management, University of Tunis, Tunis, Tunisia
| | | | | |
Collapse
|
63
|
Becker B, Hoef-Emden K, Melkonian M. Chlamydial genes shed light on the evolution of photoautotrophic eukaryotes. BMC Evol Biol 2008; 8:203. [PMID: 18627593 PMCID: PMC2490706 DOI: 10.1186/1471-2148-8-203] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2008] [Accepted: 07/15/2008] [Indexed: 11/10/2022] Open
Abstract
Background Chlamydiae are obligate intracellular bacteria of protists, invertebrates and vertebrates, but have not been found to date in photosynthetic eukaryotes (algae and embryophytes). Genes of putative chlamydial origin, however, are present in significant numbers in sequenced genomes of photosynthetic eukaryotes. It has been suggested that such genes were acquired by an ancient horizontal gene transfer from Chlamydiae to the ancestor of photosynthetic eukaryotes. To further test this hypothesis, an extensive search for proteins of chlamydial origin was performed using several recently sequenced algal genomes and EST databases, and the proteins subjected to phylogenetic analyses. Results A total of 39 proteins of chlamydial origin were retrieved from the photosynthetic eukaryotes analyzed and their identity verified through phylogenetic analyses. The distribution of the chlamydial proteins among four groups of photosynthetic eukaryotes (Viridiplantae, Rhodoplantae, Glaucoplantae, Bacillariophyta) was complex suggesting multiple acquisitions and losses. Evidence is presented that all except one of the chlamydial genes originated from an ancient endosymbiosis of a chlamydial bacterium into the ancestor of the Plantae before their divergence into Viridiplantae, Rhodoplantae and Glaucoplantae, i.e. more than 1.1 BYA. The chlamydial proteins subsequently spread through secondary plastid endosymbioses to other eukaryotes. Of 20 chlamydial proteins recovered from the genomes of two Bacillariophyta, 10 were of rhodoplant, and 10 of viridiplant origin suggesting that they were acquired by two different secondary endosymbioses. Phylogenetic analyses of concatenated sequences demonstrated that the viridiplant secondary endosymbiosis likely occurred before the divergence of Chlorophyta and Streptophyta. Conclusion We identified 39 proteins of chlamydial origin in photosynthetic eukaryotes signaling an ancient invasion of the ancestor of the Plantae by a chlamydial bacterium accompanied by horizontal gene transfer. Subsequently, chlamydial proteins spread through secondary endosymbioses to other eukaryotes. We conclude that intracellular chlamydiae likely persisted throughout the early history of the Plantae donating genes to their hosts that replaced their cyanobacterial/plastid homologs thus shaping early algal/plant evolution before they eventually vanished.
Collapse
Affiliation(s)
- Burkhard Becker
- Botanisches Institut, Universität zu Köln, Gyrhofstr. 15, 50931 Köln, Germany.
| | | | | |
Collapse
|
64
|
D'Aniello S, Irimia M, Maeso I, Pascual-Anaya J, Jiménez-Delgado S, Bertrand S, Garcia-Fernàndez J. Gene expansion and retention leads to a diverse tyrosine kinase superfamily in amphioxus. Mol Biol Evol 2008; 25:1841-54. [PMID: 18550616 DOI: 10.1093/molbev/msn132] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Tyrosine kinase (TK) proteins play a central role in cellular behavior and development of animals. The expansion of this superfamily is regarded as a key event in the evolution of the complex signaling pathways and gene networks of metazoans and is a prominent example of how shuffling of protein modules may generate molecular novelties. Using the intron/exon structure within the TK domain (TK intron code) as a complementary tool for the assignment of orthology and paralogy, we identified and studied the 118 TK proteins of the amphioxus Branchiostoma floridae genome to elucidate TK gene family evolution in metazoans and chordates in particular. Unlike all characterized metazoans to date, amphioxus has members of all known widespread TK families, with not a single loss. Putting amphioxus TKs in an evolutionary context, including new data from the cnidarian Nematostella vectensis, the echinoderm Strongylocentrotus purpuratus, and the ascidian Ciona intestinalis, we suggest new evolutionary histories for different TK families and draw a new global picture of gene loss/gain in the different phyla. Surprisingly, our survey also detected an unprecedented expansion of a group of closely related TK families, including TIE, FGFR, PDGFR, and RET, due most probably to massive gene duplication and exon shuffling. Based on their highly similar intron/exon structure at the TK domain, we suggest that this group of TK families constitute a superfamily of TK proteins, which we termed EXpanding TK, after their seemingly unique propensity to gene duplication and exon shuffling, not only in amphioxus but also across all metazoan groups. Due to this extreme tendency to both retention and expansion of TK genes, amphioxus harbors the richest and most diverse TK repertoire among all metazoans studied so far, retaining most of the gene complement of its ancestors, but having evolved its own repertoire of genetic novelties.
Collapse
Affiliation(s)
- Salvatore D'Aniello
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | | | | | | | | | | | | |
Collapse
|
65
|
Goode M, Rodrigo AG. Using PEBBLE for the evolutionary analysis of serially sampled molecular sequences. ACTA ACUST UNITED AC 2008; Chapter 6:Unit 6.8. [PMID: 18428729 DOI: 10.1002/0471250953.bi0608s05] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The PEBBLE (Phylogenetics, Evolutionary Biology, and Bioinformatics in a moduLar Environment) application is a relative newcomer to the field of phylogenetic applications. Although designed as a customizable generalist application, PEBBLE was initially developed to implement procedures for the analysis of sequences associated with different sampling times, e.g., rapidly evolving viral genes sampled over the course of infection, or ancient DNA sequences. The basic protocol describes the use of PEBBLE to infer a phylogenetic tree using the sUPGMA algorithm, and the inference of substitution rate parameters using maximum likelihood. The alternate and support protocols describe the simulation capabilities of PEBBLE, and general use of the PEBBLE application, respectively.
Collapse
Affiliation(s)
- Matthew Goode
- Bioinformatics Institute and The Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand
| | | |
Collapse
|
66
|
Peterson MW, Colosimo ME. TreeViewJ: An application for viewing and analyzing phylogenetic trees. SOURCE CODE FOR BIOLOGY AND MEDICINE 2007; 2:7. [PMID: 17974028 PMCID: PMC2170439 DOI: 10.1186/1751-0473-2-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2007] [Accepted: 10/31/2007] [Indexed: 11/28/2022]
Abstract
Background Phylogenetic trees are widely used to visualize evolutionary relationships between different organisms or samples of the same organism. There exists a variety of both free and commercial tree visualization software available, but limitations in these programs often require researchers to use multiple programs for analysis, annotation, and the production of publication-ready images. Results We present TreeViewJ, a Java tool for visualizing, editing and analyzing phylogenetic trees. The software allows researchers to color and change the width of branches that they wish to highlight, and add names to nodes. If collection dates are available for taxa, the software can map them onto a timeline, and sort the tree in ascending or descending date order. Conclusion TreeViewJ is a tool for researchers to visualize, edit, "decorate," and produce publication-ready images of phylogenetic trees. It is open-source, and released under an GPL license, and available at .
Collapse
Affiliation(s)
- Matthew W Peterson
- The MITRE Corporation, 202 Burlington Rd, Bedford, MA, USA.,Department of Biomedical Engineering, Boston University, 44 Cummington St, Boston, MA, USA
| | | |
Collapse
|
67
|
Brachmann AO, Joyce SA, Jenke-Kodama H, Schwär G, Clarke DJ, Bode HB. A Type II Polyketide Synthase is Responsible for Anthraquinone Biosynthesis inPhotorhabdus luminescens. Chembiochem 2007; 8:1721-8. [PMID: 17722122 DOI: 10.1002/cbic.200700300] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Type II polyketide synthases are involved in the biosynthesis of numerous clinically relevant secondary metabolites with potent antibiotic or anticancer activity. Until recently the only known producers of type II PKSs were members of the Gram-positive actimomycetes, well-known producers of secondary metabolites in general. Here we present the second example of a type II PKS from Gram-negative bacteria. We have identified the biosynthesis gene cluster responsible for the production of anthraquinones (AQs) from the entomopathogenic bacterium Photorhabdus luminescens. This is the first example of AQ production in Gram-negative bacteria, and their heptaketide origin was confirmed by feeding experiments. Deletion of a cyclase/aromatase involved in AQ biosynthesis resulted in accumulation of mutactin and dehydromutactin, which have been described as shunt products of typical octaketide compounds from streptomycetes, and a pathway for AQ formation from octaketide intermediates is discussed.
Collapse
Affiliation(s)
- Alexander O Brachmann
- Pharmazeutische Biotechnologie, Universität des Saarlandes, Postfach 151150, 66041 Saarbrücken, Germany
| | | | | | | | | | | |
Collapse
|
68
|
Ahlenstiel G, Roomp K, Däumer M, Nattermann J, Vogel M, Rockstroh JK, Beerenwinkel N, Kaiser R, Nischalke HD, Sauerbruch T, Lengauer T, Spengler U. Selective pressures of HLA genotypes and antiviral therapy on human immunodeficiency virus type 1 sequence mutation at a population level. CLINICAL AND VACCINE IMMUNOLOGY : CVI 2007; 14:1266-73. [PMID: 17715334 PMCID: PMC2168106 DOI: 10.1128/cvi.00169-07] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The objective of this study was a comprehensive analysis of the immune-driven evolution of viruses of human immunodeficiency virus type 1 (HIV-1) clade B in a large patient cohort treated at a single hospital in Germany and its implications for antiretroviral therapy. We examined the association of the HLA-A, HLA-B, and HLA-DRB1 alleles with the emergence of mutations in the complete protease gene and the first 330 codons of the reverse transcriptase (RT) gene of HIV-1, studying their distribution and persistence and their impact on antiviral drug therapy. The clinical data for 179 HIV-infected patients, the results of HLA genotyping, and virus sequences were analyzed using a variety of statistical approaches. We describe new HLA-associated mutations in both viral protease and RT, several of which are associated with HLA-DRB1. The mutations reported are remarkably persistent within our cohort, developing more slowly in a minority of patients. Interestingly, several HLA-associated mutations occur at the same positions as drug resistance mutations in patient viruses, where the viral sequence was acquired before exposure to these drugs. The influence of HLA on thymidine analogue mutation pathways was not observed. We were able to confirm immune-driven selection pressure by major histocompatibility complex (MHC) class I and II alleles through the identification of HLA-associated mutations. HLA-B alleles were involved in more associations (68%) than either HLA-A (23%) or HLA-DRB1 (9%). As several of the HLA-associated mutations lie at positions associated with drug resistance, our results indicate possible negative effects of HLA genotypes on the development of HIV-1 drug resistance.
Collapse
Affiliation(s)
- Golo Ahlenstiel
- Department of Internal Medicine I, University of Bonn, 53105 Bonn, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
69
|
Abstract
OBJECTIVE To quantify the similarity (or lack of) between the phylogenetic substructure of HIV-1 groups O and M. METHODS Two phylogenetic tree statistics--the subtype diversity ratio (SDR) and the subtype diversity variance (SDV)--were used in conjunction with bootstrap replicates on gag, pol and env sequence alignments of group O and M strains. Randomly generated phylogenetic trees were used as a control. RESULTS We show that, as expected, the established global-group M subtypes have a high degree of phylogenetic symmetry in relation to each other in terms of inter- and intra-subtype diversification. They are significantly different from the substructure present amongst the random trees. To the contrary, the group O diversification does not display this highly symmetrical substructure and is not significantly different from the substructure present on randomly generated trees. Phylogenies comprised of group M strains from the epicentre of the HIV/AIDS pandemic, the Democratic Republic of Congo (DRC), exhibit a substructure more similar to group O than to global-group M. CONCLUSIONS The substructure present within groups O and M is quantifiably different. The well defined clades, the subtypes that characterize group M diversification, are not present in group O or amongst group M strains from the DRC. The group M subtypes are thus unique and a signature of pandemic HIV-1.
Collapse
Affiliation(s)
- John Archer
- Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester, UK
| | | |
Collapse
|
70
|
Talavera G, Castresana J. Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments. Syst Biol 2007; 56:564-77. [PMID: 17654362 DOI: 10.1080/10635150701472164] [Citation(s) in RCA: 3677] [Impact Index Per Article: 204.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Alignment quality may have as much impact on phylogenetic reconstruction as the phylogenetic methods used. Not only the alignment algorithm, but also the method used to deal with the most problematic alignment regions, may have a critical effect on the final tree. Although some authors remove such problematic regions, either manually or using automatic methods, in order to improve phylogenetic performance, others prefer to keep such regions to avoid losing any information. Our aim in the present work was to examine whether phylogenetic reconstruction improves after alignment cleaning or not. Using simulated protein alignments with gaps, we tested the relative performance in diverse phylogenetic analyses of the whole alignments versus the alignments with problematic regions removed with our previously developed Gblocks program. We also tested the performance of more or less stringent conditions in the selection of blocks. Alignments constructed with different alignment methods (ClustalW, Mafft, and Probcons) were used to estimate phylogenetic trees by maximum likelihood, neighbor joining, and parsimony. We show that, in most alignment conditions, and for alignments that are not too short, removal of blocks leads to better trees. That is, despite losing some information, there is an increase in the actual phylogenetic signal. Overall, the best trees are obtained by maximum-likelihood reconstruction of alignments cleaned by Gblocks. In general, a relaxed selection of blocks is better for short alignment, whereas a stringent selection is more adequate for longer ones. Finally, we show that cleaned alignments produce better topologies although, paradoxically, with lower bootstrap. This indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies.
Collapse
Affiliation(s)
- Gerard Talavera
- Department of Physiology, Institute of Molecular Biology of Barcelona, Barcelona, Spain
| | | |
Collapse
|
71
|
Navaud O, Dabos P, Carnus E, Tremousaygue D, Hervé C. TCP Transcription Factors Predate the Emergence of Land Plants. J Mol Evol 2007; 65:23-33. [PMID: 17568984 DOI: 10.1007/s00239-006-0174-z] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2006] [Accepted: 01/17/2007] [Indexed: 10/23/2022]
Abstract
TCP proteins are plant-specific transcription factors identified so far only in angiosperms and shown to be involved in specifying plant morphologies. However, the functions of these proteins remain largely unknown. Our study is the first phylogenetic analysis comparing the TCP genes from higher and lower plants, and it dates the emergence of the TCP family to before the split of the Zygnemophyta. EST database analysis and CODEHOP PCR amplification revealed TCP genes in basal land plant genomes and also in their close freshwater algal relatives. Based on an extensive survey of TCP genes, families of TCP proteins were characterized in the Arabidopsis thaliana, poplar, rice, club-moss, and moss genomes. The phylogenetic trees indicate a continuous expansion of the TCP family during the diversification of the Phragmoplastophyta and a similar degree of expansion in several angiosperm lineages. TCP paralogues were identified in all genomes studied, and Ks values indicate that TCP genes expanded during genome duplication events. MEME and SIMPLE analyses detected conserved motifs and low-complexity regions, respectively, outside of the TCP domain, which reinforced the previous description of a "mosaic" structure of TCP proteins.
Collapse
Affiliation(s)
- Olivier Navaud
- CNRS UMR2594/INRA UMR441, Laboratoire des Interactions Plantes Microorganismes, BP 52627 Chemin de borde rouge, F-31326 Castanet-Tolosan, France
| | | | | | | | | |
Collapse
|
72
|
Keane TM, Naughton TJ, McInerney JO. MultiPhyl: a high-throughput phylogenomics webserver using distributed computing. Nucleic Acids Res 2007; 35:W33-7. [PMID: 17553837 PMCID: PMC1933173 DOI: 10.1093/nar/gkm359] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php.
Collapse
Affiliation(s)
- Thomas M Keane
- Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA Hinxton, UK.
| | | | | |
Collapse
|
73
|
Ruano-Rubio V, Fares MA. Artifactual phylogenies caused by correlated distribution of substitution rates among sites and lineages: the good, the bad, and the ugly. Syst Biol 2007; 56:68-82. [PMID: 17366138 DOI: 10.1080/10635150601175578] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Despite the advances in understanding molecular evolution, current phylogenetic methods barely take account of a fraction of the complexity of evolution. We are chiefly constrained by our incomplete knowledge of molecular evolutionary processes and the limits of computational power. These limitations lead to the establishment of either biologically simplistic models that rarely account for a fraction of the complexity involved or overfitting models that add little resolution to the problem. Such oversimplified models may lead us to assign high confidence to an incorrect tree (inconsistency). Rate-across-site (RAS) models are commonly used evolutionary models in phylogenetic studies. These account for heterogeneity in the evolutionary rates among sites but do not account for changing within-site rates across lineages (heterotachy). If heterotachy is common, using RAS models may lead to systematic errors in tree inference. In this work we show possible misleading effects in tree inference when the assumption of constant within-site rates across lineages is violated using maximum likelihood. Using a simulation study, we explore the ways in which gamma stationary models can lead to wrong topology or to deceptive bootstrap support values when the within-site rates change across lineages. More precisely, we show that different degrees of heterotachy mislead phylogenetic inference when the model assumed is stationary. Finally, we propose a geometry-based approach to visualize and to test for the possible existence of bias due to heterotachy.
Collapse
Affiliation(s)
- Valentin Ruano-Rubio
- Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Ireland
| | | |
Collapse
|
74
|
Welsch C, Albrecht M, Maydt J, Herrmann E, Welker MW, Sarrazin C, Scheidig A, Lengauer T, Zeuzem S. Structural and functional comparison of the non-structural protein 4B in flaviviridae. J Mol Graph Model 2007; 26:546-57. [PMID: 17507273 DOI: 10.1016/j.jmgm.2007.03.012] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2006] [Revised: 03/23/2007] [Accepted: 03/28/2007] [Indexed: 12/27/2022]
Abstract
Flaviviridae are evolutionarily related viruses, comprising the hepatitis C virus (HCV), with the non-structural protein 4B (NS4B) as one of the least characterized proteins. NS4B is located in the endoplasmic reticulum membrane and is assumed to be a multifunctional protein. However, detailed structure information is missing. The hydrophobic nature of NS4B is a major difficulty for many experimental techniques. We applied bioinformatics methods to analyse structural and functional properties of NS4B in different viruses. We distinguish a central non-globular membrane portion with four to five transmembrane regions from an N- and C-terminal part with non-transmembrane helical elements. We demonstrate high similarity in sequence and structure for the C-terminal part within the flaviviridae family. A palmitoylation site contained in the C-terminal part of HCV is equally conserved in GB virus B. Furthermore, we identify and characterize an N-terminal basic leucine zipper (bZIP) motif in HCV, which is suggestive of a functionally important interaction site. In addition, we model the interaction of the bZIP region with the recently identified interaction partner CREB-RP/ATF6beta, a human activating transcription factor involved in ER-stress. In conclusion, the versatile structure, together with functional sites and motifs, possibly enables NS4B to adopt a role as protein hub in the membranous web interaction network of virus and host proteins. Important structural and functional properties of NS4B are predicted with implications for ER-stress response, altered gene expression and replication efficacy.
Collapse
Affiliation(s)
- Christoph Welsch
- Internal Medicine II, Saarland University Hospital, Kirrberger Strasse, 66421 Homburg/Saar, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
75
|
Mulero MC, Aubareda A, Schlüter A, Pérez-Riba M. RCAN3, a novel calcineurin inhibitor that down-regulates NFAT-dependent cytokine gene expression. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2007; 1773:330-41. [PMID: 17270291 DOI: 10.1016/j.bbamcr.2006.12.007] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2006] [Revised: 10/31/2006] [Accepted: 12/05/2006] [Indexed: 02/05/2023]
Abstract
The regulators of calcineurin (RCAN) proteins, previously known as calcipressins, have been considered to be a well conserved family from yeast to human based on the conservation of their FLISPP motif. Here, after performing a RCAN comparative genomic analysis we propose the existence of a novel functionally closely related RCAN subfamily restricted to vertebrates, the other RCAN proteins being considered only as distantly related members of the family. In addition, while three paralogous RCAN genes are found in vertebrates, there is only one in the other members of Eukarya. Moreover, besides the FLISPP motif, these paralogous genes have two others conserved motifs, the Cn-inhibitor RCAN (CIC) and the PxIxxT, which are restricted to vertebrates. In humans, RCAN1 and RCAN2 bind and inhibit Cn through their C-terminal region. Given the high amino acid identity in this region among human RCANs, authors in the field have hypothesized a role for RCAN3 in inhibiting Cn activity. Here, we demonstrate for the first time that human RCAN3, encoded by the RCAN3 (also known as DSCR1L2) gene, interacts physically and functionally with Cn. This interaction takes place only through the RCAN3 CIC motif. Overexpression of this sequence inhibits Cn activity towards the nuclear factor of activated T cells (NFAT) transcription factors and down-regulates NFAT-dependent cytokine gene expression in activated human Jurkat T cells.
Collapse
Affiliation(s)
- M Carme Mulero
- Medical and Molecular Genetics Center, Institut de Recerca Oncològica, IDIBELL, Gran Via s/n Km 2.7, 08907 L'Hospitalet de Llobregat, Barcelona, Spain
| | | | | | | |
Collapse
|
76
|
Chovancová E, Kosinski J, Bujnicki JM, Damborský J. Phylogenetic analysis of haloalkane dehalogenases. Proteins 2007; 67:305-16. [PMID: 17295320 DOI: 10.1002/prot.21313] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Haloalkane dehalogenases (HLDs) are enzymes that catalyze the cleavage of carbon-halogen bonds by a hydrolytic mechanism. Although comparative biochemical analyses have been published, no classification system has been proposed for HLDs, to date, that reconciles their phylogenetic and functional relationships. In the study presented here, we have analyzed all sequences and structures of genuine HLDs and their homologs detectable by database searches. Phylogenetic analyses revealed that the HLD family can be divided into three subfamilies denoted HLD-I, HLD-II, and HLD-III, of which HLD-I and HLD-III are predicted to be sister-groups. A mismatch between the HLD protein tree and the tree of species, as well as the presence of more than one HLD gene in a few genomes, suggest that horizontal gene transfers, and perhaps also multiple gene duplications and losses have been involved in the evolution of this family. Most of the biochemically characterized HLDs are found in the HLD-II subfamily. The dehalogenating activity of two members of the newly identified HLD-III subfamily has only recently been confirmed, in a study motivated by this phylogenetic analysis. A novel type of the catalytic pentad (Asp-His-Asp+Asn-Trp) was predicted for members of the HLD-III subfamily. Calculation of the evolutionary rates and lineage-specific innovations revealed a common conserved core as well as a set of residues that characterizes each HLD subfamily. The N-terminal part of the cap domain is one of the most variable regions within the whole family as well as within individual subfamilies, and serves as a preferential site for the location of relatively long insertions. The highest variability of discrete sites was observed among residues that are structural components of the access channels. Mutations at these sites modify the anatomy of the channels, which are important for the exchange of ligands between the buried active site and the bulk solvent, thus creating a structural basis for the molecular evolution of new substrate specificities. Our analysis sheds light on the evolutionary history of HLDs and provides a structural framework for designing enzymes with new specificities.
Collapse
Affiliation(s)
- Eva Chovancová
- Loschmidt Laboratories, Faculty of Science, Masaryk University, Brno, Czech Republic
| | | | | | | |
Collapse
|
77
|
Chevenet F, Brun C, Bañuls AL, Jacq B, Christen R. TreeDyn: towards dynamic graphics and annotations for analyses of trees. BMC Bioinformatics 2006; 7:439. [PMID: 17032440 PMCID: PMC1615880 DOI: 10.1186/1471-2105-7-439] [Citation(s) in RCA: 776] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2006] [Accepted: 10/10/2006] [Indexed: 11/10/2022] Open
Abstract
Background Analyses of biomolecules for biodiversity, phylogeny or structure/function studies often use graphical tree representations. Many powerful tree editors are now available, but existing tree visualization tools make little use of meta-information related to the entities under study such as taxonomic descriptions or gene functions that can hardly be encoded within the tree itself (if using popular tree formats). Consequently, a tedious manual analysis and post-processing of the tree graphics are required if one needs to use external information for displaying or investigating trees. Results We have developed TreeDyn, a tool using annotations and dynamic graphical methods for editing and analyzing multiple trees. The main features of TreeDyn are 1) the management of multiple windows and multiple trees per window, 2) the export of graphics to several standard file formats with or without HTML encapsulation and a new format called TGF, which enables saving and restoring graphical analysis, 3) the projection of texts or symbols facing leaf labels or linked to nodes, through manual pasting or by using annotation files, 4) the highlight of graphical elements after querying leaf labels (or annotations) or by selection of graphical elements and information extraction, 5) the highlight of targeted trees according to a source tree browsed by the user, 6) powerful scripts for automating repetitive graphical tasks, 7) a command line interpreter enabling the use of TreeDyn through CGI scripts for online building of trees, 8) the inclusion of a library of packages dedicated to specific research fields involving trees. Conclusion TreeDyn is a tree visualization and annotation tool which includes tools for tree manipulation and annotation and uses meta-information through dynamic graphical operators or scripting to help analyses and annotations of single trees or tree collections.
Collapse
Affiliation(s)
- François Chevenet
- Laboratoire de Génétique et Evolution des Maladies Infectieuses, UMR CNRS/IRD 2724, IRD, 911 avenue Agropolis, BP 64501, 34394 Montpellier Cedex 5, France
| | - Christine Brun
- Institut de Biologie du Développement de Marseille-Luminy, CNRS UMR 6216, Parc Scientifique et Technologique de Luminy, Case 907, 13288 Marseille Cedex 9, France
| | - Anne-Laure Bañuls
- Laboratoire de Génétique et Evolution des Maladies Infectieuses, UMR CNRS/IRD 2724, IRD, 911 avenue Agropolis, BP 64501, 34394 Montpellier Cedex 5, France
| | - Bernard Jacq
- Institut de Biologie du Développement de Marseille-Luminy, CNRS UMR 6216, Parc Scientifique et Technologique de Luminy, Case 907, 13288 Marseille Cedex 9, France
| | - Richard Christen
- Laboratoire de Biologie Virtuelle, CNRS UMR 6543, Université de Nice Sophia Antipolis, Centre de Biochimie, Campus Valrose, 06108 Nice, France
| |
Collapse
|
78
|
Hon CC, Lam TY, Drummond A, Rambaut A, Lee YF, Yip CW, Zeng F, Lam PY, Ng PTW, Leung FCC. Phylogenetic analysis reveals a correlation between the expansion of very virulent infectious bursal disease virus and reassortment of its genome segment B. J Virol 2006; 80:8503-9. [PMID: 16912300 PMCID: PMC1563883 DOI: 10.1128/jvi.00585-06] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Infectious bursal disease virus (IBDV) is a birnavirus causing immunosuppressive disease in chickens. Emergence of the very virulent form of IBDV (vvIBDV) in the late 1980s dramatically changed the epidemiology of the disease. In this study, we investigated the phylogenetic origins of its genome segments and estimated the time of emergence of their most recent common ancestors. Moreover, with recently developed coalescence techniques, we reconstructed the past population dynamics of vvIBDV and timed the onset of its expansion to the late 1980s. Our analysis suggests that genome segment A of vvIBDV emerged at least 20 years before its expansion, which argues against the hypothesis that mutation of genome segment A is the major contributing factor in the emergence and expansion of vvIBDV. Alternatively, the phylogeny of genome segment B suggests a possible reassortment event estimated to have taken place around the mid-1980s, which seems to coincide with its expansion within approximately 5 years. We therefore hypothesize that the reassortment of genome segment B initiated vvIBDV expansion in the late 1980s, possibly by enhancing the virulence of the virus synergistically with its existing genome segment A. This report reveals the possible mechanisms leading to the emergence and expansion of vvIBDV, which would certainly provide insights into the scope of surveillance and prevention efforts regarding the disease.
Collapse
Affiliation(s)
- Chung-Chau Hon
- Department of Zoology, The University of Hong Kong, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
79
|
ZHANG AIBING, TAN SHENGJIANG, SOTA TEIJI. autoinfer1.0: a computer program to infer biogeographical events automatically. ACTA ACUST UNITED AC 2006. [DOI: 10.1111/j.1471-8286.2006.01376.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
80
|
Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT. Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genes Dev 2006; 16:1099-108. [PMID: 16899658 PMCID: PMC1557764 DOI: 10.1101/gr.5322306] [Citation(s) in RCA: 211] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2006] [Indexed: 11/25/2022]
Abstract
Using 1128 protein-coding gene families from 11 completely sequenced cyanobacterial genomes, we attempt to quantify horizontal gene transfer events within cyanobacteria, as well as between cyanobacteria and other phyla. A novel method of detecting and enumerating potential horizontal gene transfer events within a group of organisms based on analyses of "embedded quartets" allows us to identify phylogenetic signal consistent with a plurality of gene families, as well as to delineate cases of conflict to the plurality signal, which include horizontally transferred genes. To infer horizontal gene transfer events between cyanobacteria and other phyla, we added homologs from 168 available genomes. We screened phylogenetic trees reconstructed for each of these extended gene families for highly supported monophyly of cyanobacteria (or lack of it). Cyanobacterial genomes reveal a complex evolutionary history, which cannot be represented by a single strictly bifurcating tree for all genes or even most genes, although a single completely resolved phylogeny was recovered from the quartets' plurality signals. We find more conflicts within cyanobacteria than between cyanobacteria and other phyla. We also find that genes from all functional categories are subject to transfer. However, in interphylum as compared to intraphylum transfers, the proportion of metabolic (operational) gene transfers increases, while the proportion of informational gene transfers decreases.
Collapse
Affiliation(s)
- Olga Zhaxybayeva
- Genome Atlantic and Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia B3H 1X5, Canada.
| | | | | | | | | |
Collapse
|
81
|
Carter D, Durbin R. Vertebrate gene finding from multiple-species alignments using a two-level strategy. Genome Biol 2006; 7 Suppl 1:S6.1-12. [PMID: 16925840 PMCID: PMC1810555 DOI: 10.1186/gb-2006-7-s1-s6] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background One way in which the accuracy of gene structure prediction in vertebrate DNA sequences can be improved is by analyzing alignments with multiple related species, since functional regions of genes tend to be more conserved. Results We describe DOGFISH, a vertebrate gene finder consisting of a cleanly separated site classifier and structure predictor. The classifier scores potential splice sites and other features, using sequence alignments between multiple vertebrate species, while the structure predictor hypothesizes coding transcripts by combining these scores using a simple model of gene structure. This also identifies and assigns confidence scores to possible additional exons. Performance is assessed on the ENCODE regions. We predict transcripts and exons across the whole human genome, and identify over 10,000 high confidence new coding exons not in the Ensembl gene set. Conclusion We present a practical multiple species gene prediction method. Accuracy improves as additional species, up to at least eight, are introduced. The novel predictions of the whole-genome scan should support efficient experimental verification.
Collapse
Affiliation(s)
- David Carter
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | |
Collapse
|
82
|
Abstract
The SMC (structural maintenance of chromosomes) proteins are a highly conserved and ubiquitous family of ATPases, found in nearly all living organisms examined, where they play crucial roles in transmission of the hereditary material. However, the extent to which efficient ATP hydrolysis is required for SMC function has been a matter of some debate. Here we investigate the potential functional significance of ATP binding and hydrolysis in different eukaryotic SMC proteins, both by comparing the conservation of conserved ATPase motifs and by exploring potential coevolution between associated domains. In this way, we have been able to account for the reduced requirement for ATPase activity in cohesin's SMC3 and demonstrate the greater apparent conservation requirements for such activity in condensin SMC proteins. Finally, we explore possible interactions between the SMC and non-SMC components of the condensin complex that are required for full condensin activity and may modulate ATPase activity in the holocomplex.
Collapse
Affiliation(s)
- Neville Cobbe
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Michael Swann Building, King's Buildings, Edinburgh, United Kingdom.
| | | |
Collapse
|
83
|
Buendia P, Narasimhan G. Serial NetEvolve: a flexible utility for generating serially-sampled sequences along a tree or recombinant network. Bioinformatics 2006; 22:2313-4. [PMID: 16844708 DOI: 10.1093/bioinformatics/btl387] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Serial NetEvolve is a flexible simulation program that generates DNA sequences evolved along a tree or recombinant network. It offers a user-friendly Windows graphical interface and a Windows or Linux simulator with a diverse selection of parameters to control the evolutionary model. Serial NetEvolve is a modification of the Treevolve program with the following additional features: simulation of serially-sampled data, the choice of either a clock-like or a variable rate model of sequence evolution, sampling from the internal nodes and the output of the randomly generated tree or network in our newly proposed NeTwick format. AVAILABILITY From website http://biorg.cis.fiu.edu/SNE Contacts: giri@cis.fiu.edu SUPPLEMENTARY INFORMATION Manual and examples available from http://biorg.cis.fiu.edu/SNE.
Collapse
Affiliation(s)
- Patricia Buendia
- Bioinformatics Research Group (BioRG), School of Computing and Information Science, Florida International University Miami, FL 33199, USA
| | | |
Collapse
|
84
|
Williams PD, Pollock DD, Blackburne BP, Goldstein RA. Assessing the accuracy of ancestral protein reconstruction methods. PLoS Comput Biol 2006; 2:e69. [PMID: 16789817 PMCID: PMC1480538 DOI: 10.1371/journal.pcbi.0020069] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2005] [Accepted: 05/04/2006] [Indexed: 11/18/2022] Open
Abstract
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of "ancestral sequences" inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a "best guess" amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.
Collapse
Affiliation(s)
- Paul D Williams
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| | - David D Pollock
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Benjamin P Blackburne
- Division of Mathematical Biology, National Institute of Medical Research, Mill Hill, London, United Kingdom
| | - Richard A Goldstein
- Division of Mathematical Biology, National Institute of Medical Research, Mill Hill, London, United Kingdom
| |
Collapse
|
85
|
Robbertse B, Reeves JB, Schoch CL, Spatafora JW. A phylogenomic analysis of the Ascomycota. Fungal Genet Biol 2006; 43:715-25. [PMID: 16781175 DOI: 10.1016/j.fgb.2006.05.001] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2005] [Revised: 04/26/2006] [Accepted: 05/01/2006] [Indexed: 11/18/2022]
Abstract
An automated procedure was developed to extract orthologous sequences from fungal genomes and incorporate them into phylogenomic analyses in a timely and efficient manner. This approach involves parsing an all versus all BLASTP search of 17 proteomes and creating a similarity matrix from e-values, which is then used to cluster proteins into related groups by means of a Markov Clustering algorithm. After performing this analysis at different stringency levels, 854 single copy protein clusters, which were ubiquitously distributed in all 17 proteomes, were identified. These clusters were culled to include only those clusters where all proteins had best hits to and received hits from a protein within the same cluster. The final data set included gapless alignments for 781 clusters of orthologous sequences that were concatenated into one super alignment containing 195,664 amino acid characters. Neighbor-joining distance and maximum likelihood analyses resulted in identical topologies and all except one node received 100% bootstrap support. The node supporting Stagonospora nodorum's position received 83% support or higher; it was also the only taxon differentially resolved in the maximum parsimony analyses. All analyses resolved the two derived subphyla Pezizomycotina and Saccharomycotina, and Schizosaccharomyces pombe as an early diverging lineage of the Ascomycota. Importantly, these analyses resolved the Leotiomycetes as the sister group to the Sordariomycetes, a region of the Ascomycota phylogeny that has remained problematic in molecular phylogenetic studies of more limited character sampling. Additional phylogenetic analyses which included orthologous sequences from an unannotated ascomycotan genome (e.g., Coccidioides immitis) and subsets of orthologs with different characteristics supported this topology. Phylogenetic analyses of the 595 orthologs which included C. immitis resulted in an identical topology to the previous 781 ortholog analysis and correctly placed C. immitis in the Eurotiomycetes. This demonstrated the correct identification of orthologs and the ability to incorporate unannotated genomic data into a common phylogenetic analysis.
Collapse
Affiliation(s)
- Barbara Robbertse
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, 97333, USA.
| | | | | | | |
Collapse
|
86
|
Reugels AM, Boggetti B, Scheer N, Campos-Ortega JA. Asymmetric localization of Numb:EGFP in dividing neuroepithelial cells during neurulation inDanio rerio. Dev Dyn 2006; 235:934-48. [PMID: 16493689 DOI: 10.1002/dvdy.20699] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
In the neural plate and tube of the zebrafish embryo, cells divide with their mitotic spindles oriented parallel to the plane of the neuroepithelium, whilst in the neural keel and rod, the spindle is oriented perpendicular to it. This change is achieved by a 90 degrees rotation of the mitotic spindle. We cloned zebrafish homologues of the gene for the Drosophila cell fate determinant Numb, and analyzed the localization of EGFP fusion proteins in vivo in dividing neuroepithelial cells during neurulation. Whereas Numb isoform 3 and the related protein Numblike are localized in the cytoplasm, Numb isoform 1 is localized to the cell membrane. Time-lapse analyses showed that Numb 1 is distributed uniformly around the cell cortex in dividing cells during plate and keel stages, but becomes localized at the basolateral membrane of some dividing cells during the transition from neural rod to tube. Using in vitro mutagenesis and Numb:EGFP deletion constructs, we showed that the first 196 amino acids of Numb are sufficient for this localization. Furthermore, we found that an 11-amino acid insertion in the PTB domain is essential for localization to the cortex, whereas amino acids 2-12 mediate the basolateral localization in the neural tube stage.
Collapse
Affiliation(s)
- Alexander M Reugels
- Institut für Entwicklungsbiologie, Universität zu Köln, 50923 Köln, Germany.
| | | | | | | |
Collapse
|
87
|
Keane TM, Creevey CJ, Pentony MM, Naughton TJ, Mclnerney JO. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 2006; 6:29. [PMID: 16563161 PMCID: PMC1435933 DOI: 10.1186/1471-2148-6-29] [Citation(s) in RCA: 805] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Accepted: 03/24/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner. RESULTS We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins. CONCLUSION This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.
Collapse
Affiliation(s)
- Thomas M Keane
- Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Co. Kildare, Ireland
| | | | - Melissa M Pentony
- Department of Computer Science, University College London, Gower Street, London, UK
| | - Thomas J Naughton
- Department of Computer Science, National University of Ireland, Maynooth, Co. Kildare, Ireland
| | - James O Mclnerney
- Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Co. Kildare, Ireland
| |
Collapse
|
88
|
Zhaxybayeva O, Lapierre P, Gogarten JP. Ancient gene duplications and the root(s) of the tree of life. PROTOPLASMA 2005; 227:53-64. [PMID: 16389494 DOI: 10.1007/s00709-005-0135-1] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2005] [Accepted: 05/31/2005] [Indexed: 05/06/2023]
Abstract
Tracing organismal histories on the timescale of the tree of life remains one of the challenging tasks in evolutionary biology. The hotly debated questions include the evolutionary relationship between the three domains of life (e.g., which of the three domains are sister domains, are the domains para-, poly-, or monophyletic) and the location of the root within the universal tree of life. For the latter, many different points of view have been considered but so far no consensus has been reached. The only widely accepted rationale to root the universal tree of life is to use anciently duplicated paralogous genes that are present in all three domains of life. To date only few anciently duplicated gene families useful for phylogenetic reconstruction have been identified. Here we present results from a systematic search for ancient gene duplications using twelve representative, completely sequenced, archaeal and bacterial genomes. Phylogenetic analyses of identified cases show that the majority of datasets support a root between Archaea and Bacteria; however, some datasets support alternative hypotheses, and all of them suffer from a lack of strong phylogenetic signal. The results are discussed with respect to the impact of horizontal gene transfer on the ability to reconstruct organismal evolution. The exchange of genetic information between divergent organisms gives rise to mosaic genomes, where different genes in a genome have different histories. Simulations show that even low rates of horizontal gene transfer dramatically complicate the reconstruction of organismal evolution, and that the different most recent common molecular ancestors likely existed at different times and in different lineages.
Collapse
Affiliation(s)
- Olga Zhaxybayeva
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut 06269-31258, USA
| | | | | |
Collapse
|
89
|
Hoef-Emden K, Tran HD, Melkonian M. Lineage-specific variations of congruent evolution among DNA sequences from three genomes, and relaxed selective constraints on rbcL in Cryptomonas (Cryptophyceae). BMC Evol Biol 2005; 5:56. [PMID: 16232313 PMCID: PMC1285359 DOI: 10.1186/1471-2148-5-56] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2005] [Accepted: 10/18/2005] [Indexed: 11/10/2022] Open
Abstract
Background Plastid-bearing cryptophytes like Cryptomonas contain four genomes in a cell, the nucleus, the nucleomorph, the plastid genome and the mitochondrial genome. Comparative phylogenetic analyses encompassing DNA sequences from three different genomes were performed on nineteen photosynthetic and four colorless Cryptomonas strains. Twenty-three rbcL genes and fourteen nuclear SSU rDNA sequences were newly sequenced to examine the impact of photosynthesis loss on codon usage in the rbcL genes, and to compare the rbcL gene phylogeny in terms of tree topology and evolutionary rates with phylogenies inferred from nuclear ribosomal DNA (concatenated SSU rDNA, ITS2 and partial LSU rDNA), and nucleomorph SSU rDNA. Results Largely congruent branching patterns and accelerated evolutionary rates were found in nucleomorph SSU rDNA and rbcL genes in a clade that consisted of photosynthetic and colorless species suggesting a coevolution of the two genomes. The extremely accelerated rates in the rbcL phylogeny correlated with a shift from selection to mutation drift in codon usage of two-fold degenerate NNY codons comprising the amino acids asparagine, aspartate, histidine, phenylalanine, and tyrosine. Cysteine was the sole exception. The shift in codon usage seemed to follow a gradient from early diverging photosynthetic to late diverging photosynthetic or heterotrophic taxa along the branches. In the early branching taxa, codon preferences were changed in one to two amino acids, whereas in the late diverging taxa, including the colorless strains, between four and five amino acids showed changes in codon usage. Conclusion Nucleomorph and plastid gene phylogenies indicate that loss of photosynthesis in the colorless Cryptomonas strains examined in this study possibly was the result of accelerated evolutionary rates that started already in photosynthetic ancestors. Shifts in codon usage are usually considered to be caused by changes in functional constraints and in gene expression levels. Thus, the increasing influence of mutation drift on codon usage along the clade may indicate gradually relaxed constraints and reduced expression levels on the rbcL gene, finally correlating with a loss of photosynthesis in the colorless Cryptomonas paramaecium strains.
Collapse
Affiliation(s)
- Kerstin Hoef-Emden
- Universität zu Köln, Botanisches Institut, Lehrstuhl I; Gyrhofstr. 15, 50931 Köln, Germany
| | - Hoang-Dung Tran
- Universität zu Köln, Botanisches Institut, Lehrstuhl I; Gyrhofstr. 15, 50931 Köln, Germany
| | - Michael Melkonian
- Universität zu Köln, Botanisches Institut, Lehrstuhl I; Gyrhofstr. 15, 50931 Köln, Germany
| |
Collapse
|
90
|
Janke A, Gullberg A, Hughes S, Aggarwal RK, Arnason U. Mitogenomic Analyses Place the Gharial (Gavialis gangeticus) on the Crocodile Tree and Provide Pre-K/T Divergence Times for Most Crocodilians. J Mol Evol 2005; 61:620-6. [PMID: 16211427 DOI: 10.1007/s00239-004-0336-9] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2004] [Accepted: 06/14/2005] [Indexed: 10/25/2022]
Abstract
Based on morphological analyses, extant members of the order Crocodylia are divided into three families, Alligatoridae, Crocodylidae, and Gavialidae. Gavialidae includes one species, the gharial, Gavialis gangeticus. In this study we have examined crocodilian relationships in phylogenetic analyses of seven mitochondrial genomes that have been sequenced in their entirety. The analyses did not support the morphologically acknowledged separate position of the gharial in the crocodilian tree. Instead the gharial joined the false gharial (Tomistoma schlegelii) on a common branch that was shown to constitute a sister group to traditional Crocodylidae (less Tomistoma). Thus, the analyses suggest the recognition of only two Crocodylia families, Alligatoridae and Crocodylidae, with the latter encompassing traditional Crocodylidae plus Gavialis/Tomistoma. A molecular dating of the divergence between Alligatoridae and Crocodylidae suggests that this basal split among recent crocodilians took place approximately 140 million years before present, at the Jurassic/Cretaceous boundary. The results suggest that at least five crocodilian lineages survived the mass extinction at the KT boundary.
Collapse
Affiliation(s)
- Axel Janke
- Department of Cell and Organism Biology, Division of Evolutionary Molecular Systematics, University of Lund, Sölvegatan 29, S-223 62 Lund, Sweden.
| | | | | | | | | |
Collapse
|
91
|
Otto H, Reche PA, Bazan F, Dittmar K, Haag F, Koch-Nolte F. In silico characterization of the family of PARP-like poly(ADP-ribosyl)transferases (pARTs). BMC Genomics 2005; 6:139. [PMID: 16202152 PMCID: PMC1266365 DOI: 10.1186/1471-2164-6-139] [Citation(s) in RCA: 197] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2005] [Accepted: 10/04/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND ADP-ribosylation is an enzyme-catalyzed posttranslational protein modification in which mono(ADP-ribosyl)transferases (mARTs) and poly(ADP-ribosyl)transferases (pARTs) transfer the ADP-ribose moiety from NAD onto specific amino acid side chains and/or ADP-ribose units on target proteins. RESULTS Using a combination of database search tools we identified the genes encoding recognizable pART domains in the public genome databases. In humans, the pART family encompasses 17 members. For 16 of these genes, an orthologue exists also in the mouse, rat, and pufferfish. Based on the degree of amino acid sequence similarity in the catalytic domain, conserved intron positions, and fused protein domains, pARTs can be divided into five major subgroups. All six members of groups 1 and 2 contain the H-Y-E trias of amino acid residues found also in the active sites of Diphtheria toxin and Pseudomonas exotoxin A, while the eleven members of groups 3 - 5 carry variations of this motif. The pART catalytic domain is found associated in Lego-like fashion with a variety of domains, including nucleic acid-binding, protein-protein interaction, and ubiquitylation domains. Some of these domain associations appear to be very ancient since they are observed also in insects, fungi, amoebae, and plants. The recently completed genome of the pufferfish T. nigroviridis contains recognizable orthologues for all pARTs except for pART7. The nearly completed albeit still fragmentary chicken genome contains recognizable orthologues for twelve pARTs. Simpler eucaryotes generally contain fewer pARTs: two in the fly D. melanogaster, three each in the mosquito A. gambiae, the nematode C. elegans, and the ascomycete microfungus G. zeae, six in the amoeba E. histolytica, nine in the slime mold D. discoideum, and ten in the cress plant A. thaliana. GenBank contains two pART homologues from the large double stranded DNA viruses Chilo iridescent virus and Bacteriophage Aeh1 and only a single entry (from V. cholerae) showing recognizable homology to the pART-like catalytic domains of Diphtheria toxin and Pseudomonas exotoxin A. CONCLUSION The pART family, which encompasses 17 members in the human and 16 members in the mouse, can be divided into five subgroups on the basis of sequence similarity, phylogeny, conserved intron positions, and patterns of genetically fused protein domains.
Collapse
Affiliation(s)
- Helge Otto
- Institute of Immunology, University Hospital Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| | - Pedro A Reche
- DNAX Research Institute, Palo Alto, CA 94304, USA
- Dana-Farber Cancer Institute, Harvard University, Boston, MA 02115, USA
| | - Fernando Bazan
- DNAX Research Institute, Palo Alto, CA 94304, USA
- Depts. of Molecular Biology and Protein Engineering, Genentech, SF, CA 94080, USA
| | - Katharina Dittmar
- Department of Integrative Biology, Brigham Young University, Provo, UT 84602, USA
| | - Friedrich Haag
- Institute of Immunology, University Hospital Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| | - Friedrich Koch-Nolte
- Institute of Immunology, University Hospital Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| |
Collapse
|
92
|
Butt D, Roger AJ, Blouin C. libcov: a C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny. BMC Bioinformatics 2005; 6:138. [PMID: 15938750 PMCID: PMC1175080 DOI: 10.1186/1471-2105-6-138] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2004] [Accepted: 06/06/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An increasing number of bioinformatics methods are considering the phylogenetic relationships between biological sequences. Implementing new methodologies using the maximum likelihood phylogenetic framework can be a time consuming task. RESULTS The bioinformatics library libcov is a collection of C++ classes that provides a high and low-level interface to maximum likelihood phylogenetics, sequence analysis and a data structure for structural biological methods. libcov can be used to compute likelihoods, search tree topologies, estimate site rates, cluster sequences, manipulate tree structures and compare phylogenies for a broad selection of applications. CONCLUSION Using this library, it is possible to rapidly prototype applications that use the sophistication of phylogenetic likelihoods without getting involved in a major software engineering project. libcov is thus a potentially valuable building block to develop in-house methodologies in the field of protein phylogenetics.
Collapse
Affiliation(s)
- Davin Butt
- Faculty of Computer Science, Dalhousie University, 6050 University Ave. Halifax, NS, B3H 1W5, Canada
| | - Andrew J Roger
- Dept. of Biochemistry and Molecular Biology, Dalhousie University, Tupper Medical Building, Halifax, NS, B3H 1X5, Canada
- Canadian Institute for Advanced Research (CIAR)
| | - Christian Blouin
- Faculty of Computer Science, Dalhousie University, 6050 University Ave. Halifax, NS, B3H 1W5, Canada
- Dept. of Biochemistry and Molecular Biology, Dalhousie University, Tupper Medical Building, Halifax, NS, B3H 1X5, Canada
- Canadian Institute for Advanced Research (CIAR)
| |
Collapse
|
93
|
Abstract
SUMMARY Using an appropriate model of amino acid replacement is very important for the study of protein evolution and phylogenetic inference. We have built a tool for the selection of the best-fit model of evolution, among a set of candidate models, for a given protein sequence alignment. AVAILABILITY ProtTest is available under the GNU license from http://darwin.uvigo.es
Collapse
Affiliation(s)
- Federico Abascal
- Department of Biochemistry, Genetics and Immunology, Universidad de Vigo, Spain.
| | | | | |
Collapse
|
94
|
Keane TM, Naughton TJ, Travers SAA, McInerney JO, McCormack GP. DPRml: distributed phylogeny reconstruction by maximum likelihood. Bioinformatics 2004; 21:969-74. [PMID: 15513992 DOI: 10.1093/bioinformatics/bti100] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION In recent years there has been increased interest in producing large and accurate phylogenetic trees using statistical approaches. However for a large number of taxa, it is not feasible to construct large and accurate trees using only a single processor. A number of specialized parallel programs have been produced in an attempt to address the huge computational requirements of maximum likelihood. We express a number of concerns about the current set of parallel phylogenetic programs which are currently severely limiting the widespread availability and use of parallel computing in maximum likelihood-based phylogenetic analysis. RESULTS We have identified the suitability of phylogenetic analysis to large-scale heterogeneous distributed computing. We have completed a distributed and fully cross-platform phylogenetic tree building program called distributed phylogeny reconstruction by maximum likelihood. It uses an already proven maximum likelihood-based tree building algorithm and a popular phylogenetic analysis library for all its likelihood calculations. It offers one of the most extensive sets of DNA substitution models currently available. We are the first, to our knowledge, to report the completion of a distributed phylogenetic tree building program that can achieve near-linear speedup while only using the idle clock cycles of machines. For those in an academic or corporate environment with hundreds of idle desktop machines, we have shown how distributed computing can deliver a 'free' ML supercomputer.
Collapse
Affiliation(s)
- T M Keane
- Department of Computer Science, National University of Ireland Maynooth, Ireland
| | | | | | | | | |
Collapse
|
95
|
Abstract
UNLABELLED The HyPhypackage is designed to provide a flexible and unified platform for carrying out likelihood-based analyses on multiple alignments of molecular sequence data, with the emphasis on studies of rates and patterns of sequence evolution. AVAILABILITY http://www.hyphy.org CONTACT muse@stat.ncsu.edu SUPPLEMENTARY INFORMATION HyPhydocumentation and tutorials are available at http://www.hyphy.org.
Collapse
|
96
|
Nilsson MA, Arnason U, Spencer PBS, Janke A. Marsupial relationships and a timeline for marsupial radiation in South Gondwana. Gene 2004; 340:189-96. [PMID: 15475160 DOI: 10.1016/j.gene.2004.07.040] [Citation(s) in RCA: 134] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2004] [Revised: 06/15/2004] [Accepted: 07/23/2004] [Indexed: 10/26/2022]
Abstract
Recent marsupials include about 280 species divided into 18 families and seven orders. Approximately 200 species live in Australia/New Guinea. The remaining species inhabit South America with some of these secondarily ranging into North America. In this study, we examine marsupial relationships and estimate their divergences times using complete mitochondrial (mt) genomes. The sampling, which includes nine new mtDNAs and a total number of 19 marsupial genomes, encompasses all extant orders and 14 families. The analysis identified a basal split between Didelphimorphia and remaining orders about 69 million years before present (MYBP), while other ordinal divergences were placed in Tertiary times. The monotypic South American order Microbiotheria (Dromiciops gliroides, Monito del Monte) was solidly nested among its Australian counterparts. The results suggest that marsupials colonized Australia twice from Antarctica/South America and that the divergence between Microbiotheria and its Australian relatives coincided with the geological separation of Antarctica and Australia. Within Australia itself, several of the deepest divergences were estimated to have taken place close to the Eocene/Oligocene transition.
Collapse
Affiliation(s)
- Maria A Nilsson
- Department of Cell and Organism Biology, Division of Evolutionary Molecular Systematics, University of Lund, S-223 62 Lund, Sweden.
| | | | | | | |
Collapse
|
97
|
Zhaxybayeva O, Hamel L, Raymond J, Gogarten JP. Visualization of the phylogenetic content of five genomes using dekapentagonal maps. Genome Biol 2004; 5:R20. [PMID: 15003123 PMCID: PMC395770 DOI: 10.1186/gb-2004-5-3-r20] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2003] [Revised: 12/18/2003] [Accepted: 01/13/2004] [Indexed: 11/12/2022] Open
Abstract
Dekapentagonal maps depict phylogenetic information for orthologous genes present in five genomes, and provide a pre-screen for putatively horizontally transferred genes. The methods presented here summarize phylogenetic relationships of genomes in visually appealing and informative figures. Dekapentagonal maps depict phylogenetic information for orthologous genes present in five genomes, and provide a pre-screen for putatively horizontally transferred genes. If the majority of individual gene phylogenies are unresolved, bipartition histograms provide a means of uncovering and analyzing the plurality consensus. Analyses of genomes representing five photosynthetic bacterial phyla and of the prokaryotic contributions to the eukaryotic cell illustrate the utility of the methods.
Collapse
Affiliation(s)
- Olga Zhaxybayeva
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA
| | - Lutz Hamel
- Department of Computer Science and Statistics, University of Rhode Island, Kingston, RI 02881, USA
| | - Jason Raymond
- Department of Chemistry and Biochemistry, Arizona State University, Tempe, AZ 85287-1604, USA
| | - J Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA
| |
Collapse
|
98
|
Cobbe N, Heck MMS. The Evolution of SMC Proteins: Phylogenetic Analysis and Structural Implications. Mol Biol Evol 2004; 21:332-47. [PMID: 14660695 DOI: 10.1093/molbev/msh023] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The SMC proteins are found in nearly all living organisms examined, where they play crucial roles in mitotic chromosome dynamics, regulation of gene expression, and DNA repair. We have explored the phylogenetic relationships of SMC proteins from prokaryotes and eukaryotes, as well as their relationship to similar ABC ATPases, using maximum-likelihood analyses. We have also investigated the coevolution of different domains of eukaryotic SMC proteins and attempted to account for the evolutionary patterns we have observed in terms of available structural data. Based on our analyses, we propose that each of the six eukaryotic SMC subfamilies originated through a series of ancient gene duplication events, with the condensins evolving more rapidly than the cohesins. In addition, we show that the SMC5 and SMC6 subfamily members have evolved comparatively rapidly and suggest that these proteins may perform redundant functions in higher eukaryotes. Finally, we propose a possible structure for the SMC5/SMC6 heterodimer based on patterns of coevolution.
Collapse
Affiliation(s)
- Neville Cobbe
- Wellcome Trust Centre for Cell Biology, Institute of Cell and Molecular Biology, University of Edinburgh, United Kingdom
| | | |
Collapse
|
99
|
Zhaxybayeva O, Gogarten JP. An improved probability mapping approach to assess genome mosaicism. BMC Genomics 2003; 4:37. [PMID: 12974984 PMCID: PMC222983 DOI: 10.1186/1471-2164-4-37] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2003] [Accepted: 09/15/2003] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Maximum likelihood and posterior probability mapping are useful visualization techniques that are used to ascertain the mosaic nature of prokaryotic genomes. However, posterior probabilities, especially when calculated for four-taxon cases, tend to overestimate the support for tree topologies. Furthermore, because of poor taxon sampling four-taxon analyses suffer from sensitivity to the long branch attraction artifact. Here we extend the probability mapping approach by improving taxon sampling of the analyzed datasets, and by using bootstrap support values, a more conservative tool to assess reliability. RESULTS Quartets of orthologous proteins were complemented with homologs from selected reference genomes. The mapping of bootstrap support values from these extended datasets gives results similar to the original maximum likelihood and posterior probability mapping. The more conservative nature of the plotted support values allows to focus further analyses on those protein families that strongly disagree with the majority or plurality of genes present in the analyzed genomes. CONCLUSION Posterior probability is a non-conservative measure for support, and posterior probability mapping only provides a quick estimation of phylogenetic information content of four genomes. This approach can be utilized as a pre-screen to select genes that might have been horizontally transferred. Better taxon sampling combined with subtree analyses prevents the inconsistencies associated with four-taxon analyses, but retains the power of visual representation. Nevertheless, a case-by-case inspection of individual multi-taxon phylogenies remains necessary to differentiate unrecognized paralogy and shared phylogenetic reconstruction artifacts from horizontal gene transfer events.
Collapse
Affiliation(s)
- Olga Zhaxybayeva
- Department of Molecular and Cell Biology, University of Connecticut, 91 North Eagleville Road, Storrs, CT, 06269-3125, USA
| | - J Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, 91 North Eagleville Road, Storrs, CT, 06269-3125, USA
| |
Collapse
|
100
|
Raymond J, Zhaxybayeva O, Gogarten JP, Blankenship RE. Evolution of photosynthetic prokaryotes: a maximum-likelihood mapping approach. Philos Trans R Soc Lond B Biol Sci 2003; 358:223-30. [PMID: 12594930 PMCID: PMC1693105 DOI: 10.1098/rstb.2002.1181] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Reconstructing the early evolution of photosynthesis has been guided in part by the geological record, but the complexity and great antiquity of these early events require molecular genetic techniques as the primary tools of inference. Recent genome sequencing efforts have made whole genome data available from representatives of each of the five phyla of bacteria with photosynthetic members, allowing extensive phylogenetic comparisons of these organisms. Here, we have undertaken whole genome comparisons using maximum likelihood to compare 527 unique sets of orthologous genes from all five photosynthetic phyla. Substantiating recent whole genome analyses of other prokaryotes, our results indicate that horizontal gene transfer (HGT) has played a significant part in the evolution of these organisms, resulting in genomes with mosaic evolutionary histories. A small plurality phylogenetic signal was observed, which may be a core of remnant genes not subject to HGT, or may result from a propensity for gene exchange between two or more of the photosynthetic organisms compared.
Collapse
Affiliation(s)
- Jason Raymond
- Department of Chemistry and Biochemistry, Arizona State University, Tempe, AZ 85287-1604, USA
| | | | | | | |
Collapse
|