1
|
Todd BP, Downard KM. Structural Phylogenetics with Protein Mass Spectrometry: A Proof-of-Concept. Protein J 2024; 43:997-1008. [PMID: 39078529 DOI: 10.1007/s10930-024-10227-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2024] [Indexed: 07/31/2024]
Abstract
It is demonstrated, for the first time, that a mass spectrometry approach (known as phylonumerics) can be successfully implemented for structural phylogenetics investigations to chart the evolution of a protein's structure and function. Illustrated for the compact globular protein myoglobin, peptide masses produced from the proteolytic digestion of the protein across animal species generate trees congruent to the sequence tree counterparts. Single point mutations calculated during the same mass tree building step can be followed along interconnected branches of the tree and represent a viable structural metric. A mass tree built for 15 diverse animal species, easily resolve the birds from mammal species, and the ruminant mammals from the remainder of the animals. Mutations within helix-spanning peptide segments alter both the mass and structure of the protein in these segments. Greater evolution is found in the B-helix over the A, E, F, G and H helices. A further mass tree study, of six more closely related primate species, resolves gorilla from the other primates based on a P22S mutation within the B-helix. The remaining five primates are resolved into two groups based on whether they contain a glycine or serine at position 23 in the same helix. The orangutan is resolved from the gibbon and siamang by its G-helix C110S mutation, while homo sapiens are resolved from chimpanzee based on the Q116H mutation. All are associated with structural perturbations in such helices. These structure altering mutations can be tracked along interconnecting branches of a mass tree, to follow the protein's structure and evolution, and ultimately the evolution of the species in which the proteins are expressed. Those that have the greatest impact on a protein's structure, its function, and ultimately the evolution of the species, can be selectively tracked or monitored.
Collapse
Affiliation(s)
- Benjamin P Todd
- Infectious Disease Responses Laboratory, Prince of Wales Clinical Research Sciences, Sydney, NSW, Australia
| | - Kevin M Downard
- Infectious Disease Responses Laboratory, Prince of Wales Clinical Research Sciences, Sydney, NSW, Australia.
| |
Collapse
|
2
|
Doorenweerd C, San Jose M, Leblanc L, Barr N, Geib SM, Chung AYC, Dupuis JR, Ekayanti A, Fiegalan E, Hemachandra KS, Aftab Hossain M, Huang CL, Hsu YF, Morris KY, Maryani A Mustapeng A, Niogret J, Pham TH, Thi Nguyen N, Sirisena UGAI, Todd T, Rubinoff D. Towards a better future for DNA barcoding: Evaluating monophyly- and distance-based species identification using COI gene fragments of Dacini fruit flies. Mol Ecol Resour 2024; 24:e13987. [PMID: 38956928 DOI: 10.1111/1755-0998.13987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/14/2024] [Accepted: 06/17/2024] [Indexed: 07/04/2024]
Abstract
The utility of a universal DNA 'barcode' fragment (658 base pairs of the Cytochrome C Oxidase I [COI] gene) has been established as a useful tool for species identification, and widely criticized as one for understanding the evolutionary history of a group. Large amounts of COI sequence data have been produced that hold promise for rapid species identification, for example, for biosecurity. The fruit fly tribe Dacini holds about a thousand species, of which 80 are pests of economic concern. We generated a COI reference library for 265 species of Dacini containing 5601 sequences that span most of the COI gene using circular consensus sequencing. We compared distance metrics versus monophyly assessments for species identification and although we found a 'soft' barcode gap around 2% pairwise distance, the exceptions to this rule dictate that a monophyly assessment is the only reliable method for species identification. We found that all fragments regularly used for Dacini fruit fly identification >450 base pairs long provide similar resolution. 11.3% of the species in our dataset were non-monophyletic in a COI tree, which is mostly due to species complexes. We conclude with recommendations for the future generation and use of COI libraries. We revise the generic assignment of Dacus transversus stat. rev. Hardy 1982, and Dacus perpusillus stat. rev. Drew 1971 and we establish Dacus maculipterus White 1998 syn. nov. as a junior synonym of Dacus satanas Liang et al. 1993.
Collapse
Affiliation(s)
- Camiel Doorenweerd
- Entomology Section, Department of Plant and Environmental Protection Sciences, College of Tropical Agriculture and Human Resources, University of Hawai'i at Mānoa, Honolulu, Hawaii, USA
| | - Michael San Jose
- Entomology Section, Department of Plant and Environmental Protection Sciences, College of Tropical Agriculture and Human Resources, University of Hawai'i at Mānoa, Honolulu, Hawaii, USA
| | - Luc Leblanc
- Department of Entomology, Plant Pathology and Nematology, University of Idaho, Moscow, Idaho, USA
| | - Norman Barr
- United States Department of Agriculture, Animal and Plant Health Inspection Service, Plant Protection and Quarantine, Science & Technology, Insect Management and Molecular Diagnostics Laboratory, Edinburg, Texas, USA
| | - Scott M Geib
- Tropical Pest Genetics and Molecular Biology Research Unit, Daniel K. Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, Hawaii, USA
| | - Arthur Y C Chung
- Forest Research Centre, Sabah Forestry Department, Sandakan, Sabah, Malaysia
| | - Julian R Dupuis
- Department of Entomology, University of Kentucky, Lexington, Kentucky, USA
| | - Arni Ekayanti
- Niogret Ecology Consulting LLC, Wotu, Luwu Timor, Sulawesi Seleaton, Indonesia
| | - Elaida Fiegalan
- Department of Crop Protection, College of Agriculture, Central Luzon State University, Science City of Muñoz, Nueva Ecija, Philippines
| | | | - Mohammad Aftab Hossain
- Insect Biotechnology Division, Institute of Food and Radiation Biology, Bangladesh Atomic Energy Commission, Dhaka, Bangladesh
| | - Chia-Lung Huang
- Institute of Oceanography, Minjiang University, Fuzhou, Fujian, China
| | - Yu-Feng Hsu
- Department of Life Science, National Taiwan Normal University, Taipei, Taiwan, ROC
| | - Kimberly Y Morris
- Tropical Pest Genetics and Molecular Biology Research Unit, Daniel K. Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, Hawaii, USA
| | | | - Jerome Niogret
- Centre for Tropical Environmental & Sustainability Science, Nguma-Bada Campus, James Cook University, Smithfield, Queensland, Australia
| | - Thai Hong Pham
- Mientrung Institute for Scientific Research, Vietnam Academy of Science and Technology (VAST), Hue, Vietnam
- Vietnam National Museum of Nature & Graduate School of Science and Technology, VAST, Hanoi, Vietnam
| | - Nhien Thi Nguyen
- Faculty of Biotechnology, Vietnam National University of Agriculture, Hanoi, Vietnam
| | - Uda G A I Sirisena
- Department of Plant Sciences, Faculty of Agriculture, Rajarata University of Sri Lanka, Mihintale, Sri Lanka
| | - Terrence Todd
- United States Department of Agriculture, Animal and Plant Health Inspection Service, Plant Protection and Quarantine, Science & Technology, Insect Management and Molecular Diagnostics Laboratory, Edinburg, Texas, USA
| | - Daniel Rubinoff
- Entomology Section, Department of Plant and Environmental Protection Sciences, College of Tropical Agriculture and Human Resources, University of Hawai'i at Mānoa, Honolulu, Hawaii, USA
| |
Collapse
|
3
|
Dysin AP, Shcherbakov YS, Nikolaeva OA, Terletskii VP, Tyshchenko VI, Dementieva NV. Salmonidae Genome: Features, Evolutionary and Phylogenetic Characteristics. Genes (Basel) 2022; 13:genes13122221. [PMID: 36553488 PMCID: PMC9778375 DOI: 10.3390/genes13122221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/19/2022] [Accepted: 11/24/2022] [Indexed: 11/29/2022] Open
Abstract
The salmon family is one of the most iconic and economically important fish families, primarily possessing meat of excellent taste as well as irreplaceable nutritional and biological value. One of the most common and, therefore, highly significant members of this family, the Atlantic salmon (Salmo salar L.), was not without reason one of the first fish species for which a high-quality reference genome assembly was produced and published. Genomic advancements are becoming increasingly essential in both the genetic enhancement of farmed salmon and the conservation of wild salmon stocks. The salmon genome has also played a significant role in influencing our comprehension of the evolutionary and functional ramifications of the ancestral whole-genome duplication event shared by all Salmonidae species. Here we provide an overview of the current state of research on the genomics and phylogeny of the various most studied subfamilies, genera, and individual salmonid species, focusing on those studies that aim to advance our understanding of salmonid ecology, physiology, and evolution, particularly for the purpose of improving aquaculture production. This review should make potential researchers pay attention to the current state of research on the salmonid genome, which should potentially attract interest in this important problem, and hence the application of new technologies (such as genome editing) in uncovering the genetic and evolutionary features of salmoniforms that underlie functional variation in traits of commercial and scientific importance.
Collapse
Affiliation(s)
- Artem P. Dysin
- Russian Research Institute of Farm Animal Genetics and Breeding-Branch of the L.K. Ernst Federal Research Center for Animal Husbandry, Pushkin, 196601 St. Petersburg, Russia
- Correspondence:
| | - Yuri S. Shcherbakov
- Russian Research Institute of Farm Animal Genetics and Breeding-Branch of the L.K. Ernst Federal Research Center for Animal Husbandry, Pushkin, 196601 St. Petersburg, Russia
| | - Olga A. Nikolaeva
- Russian Research Institute of Farm Animal Genetics and Breeding-Branch of the L.K. Ernst Federal Research Center for Animal Husbandry, Pushkin, 196601 St. Petersburg, Russia
| | - Valerii P. Terletskii
- All-Russian Research Veterinary Institute of Poultry Science-Branch of the Federal Scientific Center, All-Russian Research and Technological Poultry Institute (ARRVIPS), Lomonosov, 198412 St. Petersburg, Russia
| | - Valentina I. Tyshchenko
- Russian Research Institute of Farm Animal Genetics and Breeding-Branch of the L.K. Ernst Federal Research Center for Animal Husbandry, Pushkin, 196601 St. Petersburg, Russia
| | - Natalia V. Dementieva
- Russian Research Institute of Farm Animal Genetics and Breeding-Branch of the L.K. Ernst Federal Research Center for Animal Husbandry, Pushkin, 196601 St. Petersburg, Russia
| |
Collapse
|
4
|
Pla-Díaz M, Sánchez-Busó L, Giacani L, Šmajs D, Bosshard PP, Bagheri HC, Schuenemann VJ, Nieselt K, Arora N, González-Candelas F. Evolutionary processes in the emergence and recent spread of the syphilis agent, Treponema pallidum. Mol Biol Evol 2021; 39:6427636. [PMID: 34791386 PMCID: PMC8789261 DOI: 10.1093/molbev/msab318] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The incidence of syphilis has risen worldwide in the last decade in spite of being an easily treated infection. The causative agent of this sexually transmitted disease is the bacterium Treponema pallidum subspecies pallidum (TPA), very closely related to subsp. pertenue (TPE) and endemicum (TEN), responsible for the human treponematoses yaws and bejel, respectively. Although much focus has been placed on the question of the spatial and temporary origins of TPA, the processes driving the evolution and epidemiological spread of TPA since its divergence from TPE and TEN are not well understood. Here, we investigate the effects of recombination and selection as forces of genetic diversity and differentiation acting during the evolution of T. pallidum subspecies. Using a custom-tailored procedure, named phylogenetic incongruence method, with 75 complete genome sequences, we found strong evidence for recombination among the T. pallidum subspecies, involving 12 genes and 21 events. In most cases, only one recombination event per gene was detected and all but one event corresponded to intersubspecies transfers, from TPE/TEN to TPA. We found a clear signal of natural selection acting on the recombinant genes, which is more intense in their recombinant regions. The phylogenetic location of the recombination events detected and the functional role of the genes with signals of positive selection suggest that these evolutionary processes had a key role in the evolution and recent expansion of the syphilis bacteria and significant implications for the selection of vaccine candidates and the design of a broadly protective syphilis vaccine.
Collapse
Affiliation(s)
- Marta Pla-Díaz
- Unidad Mixta Infección y Salud Pública FISABIO/Universidad de Valencia-I2SysBio, Spain.,CIBER in Epidemiology and Public Health, Spain
| | - Leonor Sánchez-Busó
- Genomics and Health Area, Foundation for the Promotion of Health and Biomedical Research in the Valencian Community (FISABIO-Public Health), Valencia, Spain
| | - Lorenzo Giacani
- Department of Medicine, Division of Allergy and Infectious Diseases, and Department of Global Health, University of Washington, Seattle, WA, USA
| | - David Šmajs
- Department of Biology, Faculty of Medicine, Masaryk University, Czech Republic
| | - Philipp P Bosshard
- Department of Dermatology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | | | | | - Kay Nieselt
- Center for Bioinformatics, University of Tübingen, Germany
| | - Natasha Arora
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Switzerland.,Zurich Institute of Forensic Medicine, University of Zurich, Switzerland
| | - Fernando González-Candelas
- Unidad Mixta Infección y Salud Pública FISABIO/Universidad de Valencia-I2SysBio, Spain.,CIBER in Epidemiology and Public Health, Spain.,Genomics and Health Area, Foundation for the Promotion of Health and Biomedical Research in the Valencian Community (FISABIO-Public Health), Valencia, Spain
| |
Collapse
|
5
|
Liu B, Thippabhotla S, Zhang J, Zhong C. DRAGoM: Classification and Quantification of Noncoding RNA in Metagenomic Data. Front Genet 2021; 12:669495. [PMID: 34025724 PMCID: PMC8131839 DOI: 10.3389/fgene.2021.669495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/23/2021] [Indexed: 12/21/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play important regulatory and functional roles in microorganisms, such as regulation of gene expression, signaling, protein synthesis, and RNA processing. Hence, their classification and quantification are central tasks toward the understanding of the function of the microbial community. However, the majority of the current metagenomic sequencing technologies generate short reads, which may contain only a partial secondary structure that complicates ncRNA homology detection. Meanwhile, de novo assembly of the metagenomic sequencing data remains challenging for complex communities. To tackle these challenges, we developed a novel algorithm called DRAGoM (Detection of RNA using Assembly Graph from Metagenomic data). DRAGoM first constructs a hybrid graph by merging an assembly string graph and an assembly de Bruijn graph. Then, it classifies paths in the hybrid graph and their constituent readsinto differentncRNA families based on both sequence and structural homology. Our benchmark experiments show that DRAGoMcan improve the performance and robustness over traditional approaches on the classification and quantification of a wide class of ncRNA families.
Collapse
Affiliation(s)
- Ben Liu
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Sirisha Thippabhotla
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Jun Zhang
- Division of Medical Oncology, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, United States.,Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS, United States
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States.,Bioengineering Program, The University of Kansas, Lawrence, KS, United States.,Center for Computational Biology, The University of Kansas, Lawrence, KS, United States
| |
Collapse
|
6
|
Striedter GF. Variation across Species and Levels: Implications for Model Species Research. BRAIN, BEHAVIOR AND EVOLUTION 2019; 93:57-69. [PMID: 31416083 DOI: 10.1159/000499664] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2019] [Accepted: 03/08/2019] [Indexed: 11/19/2022]
Abstract
The selection of model species tends to involve two typically unstated assumptions, namely: (1) that the similarity between species decreases steadily with phylogenetic distance, and (2) that similarities are greater at lower levels of biological organization. The first assumption holds on average, but species similarities tend to decrease with the square root of divergence time, rather than linearly, and lineages with short generation times (which includes most model species) tend to diverge faster than average, making the decrease in similarity non-monotonic. The second assumption is more difficult to test. Comparative molecular research has traditionally emphasized species similarities over differences, whereas comparative research at higher levels of organization frequently highlights the species differences. However, advances in comparative genomics have brought to light a great variety of species differences, not just in gene regulation but also in protein coding genes. Particularly relevant are cases in which homologous high-level characters are based on non-homologous genes. This phenomenon of non-orthologous gene displacement, or "deep non-homology," indicates that species differences at the molecular level can be surprisingly large. Given these observations, it is not surprising that some findings obtained in model species do not generalize across species as well as researchers had hoped, even if the research is molecular.
Collapse
Affiliation(s)
- Georg F Striedter
- Department of Neurobiology and Behavior, University of California Irvine, Irvine, California, USA,
| |
Collapse
|
7
|
Vinaiphat A, Thongboonkerd V. Chaperonomics in leptospirosis. Expert Rev Proteomics 2018; 15:569-579. [PMID: 30004813 DOI: 10.1080/14789450.2018.1500901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
INTRODUCTION Knowledge of the function of molecular chaperones is required for a better understanding of cellular proteostasis. Nevertheless, such information is currently dispersed as most of previous studies investigated chaperones on a single-angle basis. Recently, a new subdiscipline of chaperonology, namely 'chaperonomics' (defined as 'systematic analysis of chaperone genes, transcripts, proteins, or their interaction networks using omics technologies'), has been emerging to better understand biological, physiological, and pathological roles of chaperones. Areas covered: This review provides broad overviews of bacterial chaperones, heat shock proteins (HSPs), and leptospirosis, and then focuses on recent progress of chaperonomics applied to define roles of HSPs in various pathogenic and saprophytic leptospiral species and serovars. Expert commentary: Comprehensive analysis of leptospiral chaperones/HSPs using a chaperonomics approach holds great promise for better understanding of functional roles of chaperones/HSPs in bacterial survival and disease pathogenesis. Moreover, this new approach may also lead to further development of chaperones/HSPs-based diagnostics and/or vaccine discovery for leptospirosis.
Collapse
Affiliation(s)
- Arada Vinaiphat
- a Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital , Mahidol University , Bangkok , Thailand
| | - Visith Thongboonkerd
- a Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital , Mahidol University , Bangkok , Thailand
| |
Collapse
|
8
|
Gatherer D. Genome Signatures, Self-Organizing Maps and Higher Order Phylogenies: A Parametric Analysis. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Genome signatures are data vectors derived from the compositional statistics of DNA. The self-organizing map (SOM) is a neural network method for the conceptualisation of relationships within complex data, such as genome signatures. The various parameters of the SOM training phase are investigated for their effect on the accuracy of the resulting output map. It is concluded that larger SOMs, as well as taking longer to train, are less sensitive in phylogenetic classification of unknown DNA sequences. However, where a classification can be made, a larger SOM is more accurate. Increasing the number of iterations in the training phase of the SOM only slightly increases accuracy, without improving sensitivity. The optimal length of the DNA sequence k-mer from which the genome signature should be derived is 4 or 5, but shorter values are almost as effective. In general, these results indicate that small, rapidly trained SOMs are generally as good as larger, longer trained ones for the analysis of genome signatures. These results may also be more generally applicable to the use of SOMs for other complex data sets, such as microarray data.
Collapse
Affiliation(s)
- Derek Gatherer
- MRC Virology Unit, Institute of Virology. Church Street, Glasgow G11 5JR, UK
| |
Collapse
|
9
|
Hallas JM, Chichvarkhin A, Gosliner TM. Aligning evidence: concerns regarding multiple sequence alignments in estimating the phylogeny of the Nudibranchia suborder Doridina. ROYAL SOCIETY OPEN SCIENCE 2017; 4:171095. [PMID: 29134101 PMCID: PMC5666284 DOI: 10.1098/rsos.171095] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 09/20/2017] [Indexed: 06/07/2023]
Abstract
Molecular estimates of phylogenetic relationships rely heavily on multiple sequence alignment construction. There has been little consensus, however, on how to properly address issues pertaining to the alignment of variable regions. Here, we construct alignments from four commonly sequenced molecular markers (16S, 18S, 28S and cytochrome c oxidase subunit I) for the Nudibranchia using three different methodologies: (i) strict mathematical algorithm; (ii) exclusion of variable or divergent regions and (iii) manually curated, and examine how different alignment construction methods can affect phylogenetic signal and phylogenetic estimates for the suborder Doridina. Phylogenetic informativeness (PI) profiles suggest that the molecular markers tested lack the power to resolve relationships at the base of the Doridina, while being more robust at family-level classifications. This supports the lack of consistent resolution between the 19 families within the Doridina across all three alignments. Most of the 19 families were recovered as monophyletic, and instances of non-monophyletic families were consistently recovered between analyses. We conclude that the alignment of variable regions has some effect on phylogenetic estimates of the Doridina, but these effects can vary depending on the size and scope of the phylogenetic query and PI of molecular markers.
Collapse
Affiliation(s)
- Joshua M. Hallas
- Department of Biology, University of Nevada, Reno. 1664 N. Virginia St, Reno, NV 89557, USA
- Department of Invertebrate Zoology and Geology, California Academy of Sciences, 55 Music Concourse Dr Golden Gate Park, San Francisco, CA 94118, USA
| | - Anton Chichvarkhin
- National Scientific Center of Marine Biology, Far East Branch of Russian Academy of Sciences, Palchevskogo 17, Vladivostok 690041, Russia
- Far Eastern Federal University, Sukhanova 8, Vladivostok 690950, Russia
| | - Terrence M. Gosliner
- Department of Invertebrate Zoology and Geology, California Academy of Sciences, 55 Music Concourse Dr Golden Gate Park, San Francisco, CA 94118, USA
| |
Collapse
|
10
|
Dimond JL, Gamblewood SK, Roberts SB. Genetic and epigenetic insight into morphospecies in a reef coral. Mol Ecol 2017; 26:5031-5042. [DOI: 10.1111/mec.14252] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Revised: 07/06/2017] [Accepted: 07/07/2017] [Indexed: 12/27/2022]
Affiliation(s)
- James L. Dimond
- School of Aquatic and Fishery Sciences University of Washington Seattle WA USA
- Shannon Point Marine Center Western Washington University Anacortes WA USA
| | | | - Steven B. Roberts
- School of Aquatic and Fishery Sciences University of Washington Seattle WA USA
| |
Collapse
|
11
|
Akand EH, Downard KM. Mutational analysis employing a phylogenetic mass tree approach in a study of the evolution of the influenza virus. Mol Phylogenet Evol 2017; 112:209-217. [DOI: 10.1016/j.ympev.2017.04.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2017] [Revised: 03/29/2017] [Accepted: 04/05/2017] [Indexed: 11/28/2022]
|
12
|
Zaucha J, Heddle JG. Resurrecting the Dead (Molecules). Comput Struct Biotechnol J 2017; 15:351-358. [PMID: 28652896 PMCID: PMC5472138 DOI: 10.1016/j.csbj.2017.05.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 05/11/2017] [Accepted: 05/21/2017] [Indexed: 12/15/2022] Open
Abstract
Biological molecules, like organisms themselves, are subject to genetic drift and may even become "extinct". Molecules that are no longer extant in living systems are of high interest for several reasons including insight into how existing life forms evolved and the possibility that they may have new and useful properties no longer available in currently functioning molecules. Predicting the sequence/structure of such molecules and synthesizing them so that their properties can be tested is the basis of "molecular resurrection" and may lead not only to a deeper understanding of evolution, but also to the production of artificial proteins with novel properties and even to insight into how life itself began.
Collapse
Affiliation(s)
- Jan Zaucha
- Departament of Computer Science, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TQ, United Kingdom
| | - Jonathan G. Heddle
- Bionanoscience and Biochemistry Laboratory, Jagiellonian University, Malopolska Centre of Biotechnology, Gronstajowa 7A, 30-387 Kraków, Poland
| |
Collapse
|
13
|
Fournier E, Giraud T, Albertini C, Brygoo Y. Partition of theBotrytis cinereacomplex in France using multiple gene genealogies. Mycologia 2017. [DOI: 10.1080/15572536.2006.11832734] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Elisabeth Fournier
- PMDV, INRA Centre de Versailles, Route de Saint-Cyr, F-78026 Versailles cedex, France
| | - Tatiana Giraud
- ESE, Bât. 360, UMR 8079 Université Paris Sud-CNRS, F-91405 Orsay cedex, France
| | - Catherine Albertini
- Phytopharmacie et Médiateurs Chimiques, INRA Centre de Versailles, Route de Saint-Cyr, F-78026 Versailles cedex, France
| | - Yves Brygoo
- PMDV, INRA Centre de Versailles, Route de Saint-Cyr, F-78026 Versailles cedex, France
| |
Collapse
|
14
|
Pratlong M, Rancurel C, Pontarotti P, Aurelle D. Monophyly of Anthozoa (Cnidaria): why do nuclear and mitochondrial phylogenies disagree? ZOOL SCR 2016. [DOI: 10.1111/zsc.12208] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Marine Pratlong
- Aix Marseille Univ; Univ Avignon; CNRS; IRD; IMBE; Marseille France
- Aix Marseille Univ; CNRS; Centrale Marseille, I2M, Equipe Evolution Biologique et Modélisation; Marseille France
| | - Corinne Rancurel
- INRA; University Nice Sophia Antipolis; CNRS; UMR 1355-7254 Institut Sophia Agrobiotech; Sophia Antipolis France
| | - Pierre Pontarotti
- Aix Marseille Univ; CNRS; Centrale Marseille, I2M, Equipe Evolution Biologique et Modélisation; Marseille France
| | - Didier Aurelle
- Aix Marseille Univ; Univ Avignon; CNRS; IRD; IMBE; Marseille France
| |
Collapse
|
15
|
Yin M, Liu X, Xu B, Huang J, Zheng Q, Yang Z, Feng Z, Han ZG, Hu W. Genetic variation between Schistosoma japonicum lineages from lake and mountainous regions in China revealed by resequencing whole genomes. Acta Trop 2016; 161:79-85. [PMID: 27207135 DOI: 10.1016/j.actatropica.2016.05.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Revised: 04/25/2016] [Accepted: 05/16/2016] [Indexed: 02/08/2023]
Abstract
Schistosoma infection is a major cause of morbidity and mortality worldwide. Schistosomiasis japonica is endemic in mainland China along the Yangtze River, typically distributed in two geographical categories of lake and mountainous regions. Study on schistosome genetic diversity is of interest in respect of understanding parasite biology and transmission, and formulating control strategy. Certain genetic variations may be associated with adaptations to different ecological habitats. The aim of this study is to gain insight into Schistosoma japonicum genetic variation, evolutionary origin and associated causes of different geographic lineages through examining homozygous Single Nucleotide Polymorphisms (SNPs) based on resequenced genome data. We collected S. japonicum samples from four sites, three in the lake regions (LR) of mid-east (Guichi and Tonglin in Anhui province, Laogang in Hunan province) and one in mountainous region (MR) (Xichang in Sichuan province) of south-west of China, resequenced their genomes using Next Generation Sequencing (NGS) technology, and made use of the available database of S. japonicum draft genomic sequence as a reference in genome mapping. A total of 14,575 SNPs from 2059 genes were identified in the four lineages. Phylogenetic analysis confirmed significant genetic variation exhibited between the different geographical lineages, and further revealed that the MR Xichang lineage is phylogenetically closer to LR Guich lineage than to other two LR lineages, and the MR lineage might be evolved from LR lineages. More than two thirds of detected SNPs were nonsynonymous; functional annotation of the SNP-containing genes showed that they are involved mainly in biological processes such as signaling and response to stimuli. Notably, unique nonsynonymous SNP variations were detected in 66 genes of MR lineage, inferring possible genetic adaption to mountainous ecological condition.
Collapse
|
16
|
Lamoury FMJ, Jacka B, Bartlett S, Bull RA, Wong A, Amin J, Schinkel J, Poon AF, Matthews GV, Grebely J, Dore GJ, Applegate TL. The Influence of Hepatitis C Virus Genetic Region on Phylogenetic Clustering Analysis. PLoS One 2015; 10:e0131437. [PMID: 26192190 PMCID: PMC4507989 DOI: 10.1371/journal.pone.0131437] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 06/01/2015] [Indexed: 02/06/2023] Open
Abstract
Sequencing is important for understanding the molecular epidemiology and viral evolution of hepatitis C virus (HCV) infection. To date, there is little standardisation among sequencing protocols, in-part due to the high genetic diversity that is observed within HCV. This study aimed to develop a novel, practical sequencing protocol that covered both conserved and variable regions of the viral genome and assess the influence of each subregion, sequence concatenation and unrelated reference sequences on phylogenetic clustering analysis. The Core to the hypervariable region 1 (HVR1) of envelope-2 (E2) and non-structural-5B (NS5B) regions of the HCV genome were amplified and sequenced from participants from the Australian Trial in Acute Hepatitis C (ATAHC), a prospective study of the natural history and treatment of recent HCV infection. Phylogenetic trees were constructed using a general time-reversible substitution model and sensitivity analyses were completed for every subregion. Pairwise distance, genetic distance and bootstrap support were computed to assess the impact of HCV region on clustering results as measured by the identification and percentage of participants falling within all clusters, cluster size, average patristic distance, and bootstrap value. The Robinson-Foulds metrics was also used to compare phylogenetic trees among the different HCV regions. Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results. The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome. The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.
Collapse
Affiliation(s)
- François M. J. Lamoury
- The Kirby Institute, University of New South Wales Australia, Sydney, Australia
- * E-mail:
| | - Brendan Jacka
- The Kirby Institute, University of New South Wales Australia, Sydney, Australia
| | - Sofia Bartlett
- The Kirby Institute, University of New South Wales Australia, Sydney, Australia
| | - Rowena A. Bull
- Inflammation and Infection Research Centre, School of Medical Sciences, University of New South Wales Australia, Sydney, Australia
| | - Arthur Wong
- The Kirby Institute, University of New South Wales Australia, Sydney, Australia
| | - Janaki Amin
- The Kirby Institute, University of New South Wales Australia, Sydney, Australia
| | - Janke Schinkel
- Academic Medical Centre, Department of Medical Microbiology, Section of Clinical Virology, Amsterdam, The Netherlands
| | - Art F. Poon
- BC Centre for Excellence in HIV/AIDS, Vancouver, Canada
- Department of Medicine, University of British Columbia, Vancouver, Canada
| | - Gail V. Matthews
- The Kirby Institute, University of New South Wales Australia, Sydney, Australia
| | - Jason Grebely
- The Kirby Institute, University of New South Wales Australia, Sydney, Australia
| | - Gregory J. Dore
- The Kirby Institute, University of New South Wales Australia, Sydney, Australia
- HIV/Immunology/Infectious Diseases Clinical Services Unit, St Vincent’s Hospital, Sydney, Australia
| | - Tanya L. Applegate
- The Kirby Institute, University of New South Wales Australia, Sydney, Australia
| |
Collapse
|
17
|
Paparini A, McInnes LM, Di Placido D, Mackereth G, Tompkins DM, Clough R, Ryan UM, Irwin PJ. Piroplasms of New Zealand seabirds. Parasitol Res 2014; 113:4407-14. [PMID: 25204728 DOI: 10.1007/s00436-014-4118-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2014] [Accepted: 08/27/2014] [Indexed: 12/16/2022]
Abstract
Blood and ectoparasitic ticks were collected from migratory seabirds in New Zealand, including Australasian gannets (n = 13) from two sites and red-billed gulls (n = 9) and white-fronted terns (n = 2) from a third location. Blood smears were screened for parasite presence by microscopy, while DNA from blood samples was subjected to PCR for the presence of tick-transmitted protozoan haemoparasites belonging to the order Piroplasmida. Parasites were identified by comparing small subunit ribosomal RNA (18S rDNA) gene sequences to related sequences on GenBank. Analyses indicated that nine birds were infected with unknown variants of a Babesia poelea-like parasite (recorded as genotypes I and II), while four harboured a piroplasm that was genetically similar to Babesia kiwiensis. There was no parasite stratification by bird species; both the gannets and gulls were positive for all three parasites, while the terns were positive for the B. kiwiensis-like and the B. poelea-like (genotype I) parasites. The B. kiwiensis-like parasite found in the birds was also found in two species of ticks: Carios capensis and Ixodes eudyptidis. This represents the first report of Babesia-positive ticks parasitising seabirds in New Zealand. The lack of host specificity and evidence of wide ranging distributions of the three piroplasm genotypes suggests there is a high degree of haemoparasite transmission occurring naturally between New Zealand seabird populations and species.
Collapse
Affiliation(s)
- Andrea Paparini
- Vector and Waterborne Pathogen Research Group, School of Veterinary & Life Sciences, Murdoch University, 90 South Street, Murdoch, WA, 6150, Australia
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Sleator RD. A beginner's guide to phylogenetics. MICROBIAL ECOLOGY 2013; 66:1-4. [PMID: 23624570 DOI: 10.1007/s00248-013-0236-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Accepted: 04/17/2013] [Indexed: 06/02/2023]
Abstract
Metagenomics and the development of high throughput next generation sequencing capabilities have forced significant development in the field of phylogenetics: the study of the evolutionary relatedness of the planet's inhabitants. Herein, I review the major tree-building strategies, challenges and opportunities which exist in this rapidly expanding field of evolutionary biology.
Collapse
|
19
|
Pariselle A, Boeger WA, Snoeks J, Bilong Bilong CF, Morand S, Vanhove MPM. The monogenean parasite fauna of cichlids: a potential tool for host biogeography. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2011; 2011:471480. [PMID: 21869935 PMCID: PMC3157826 DOI: 10.4061/2011/471480] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2010] [Revised: 02/21/2011] [Accepted: 04/19/2011] [Indexed: 11/20/2022]
Abstract
We discuss geographical distribution and phylogeny of Dactylogyridea (Monogenea) parasitizing Cichlidae to elucidate their hosts' history. Although mesoparasitic Monogenea (Enterogyrus spp.) show typical vicariant distribution, ectoparasitic representatives from different continents are not considered sister taxa, hence their distribution cannot result from vicariance alone. Because of the close host-parasite relationship, this might indicate that present-day cichlid distribution may also reflect dispersal through coastal or brackish waters. Loss of ectoparasites during transoceanic migration, followed by lateral transfer from other fish families might explain extant host-parasite associations. Because of its mesoparasitic nature, hence not subject to salinity variations of the host's environment, Enterogyrus could have survived marine migrations, intolerable for ectoparasites. Host-switches and salinity transitions may be invoked to explain the pattern revealed by a preliminary morphological phylogeny of monogenean genera from Cichlidae and other selected Monogenea genera, rendering the parasite distribution explicable under both vicariance and dispersal. Testable hypotheses are put forward in this parasitological approach to cichlid biogeography. Along with more comprehensive in-depth morphological phylogeny, comparison with molecular data, clarifying dactylogyridean evolution on different continents and from various fish families, and providing temporal information on host-parasite history, are needed to discriminate between the possible scenarios.
Collapse
Affiliation(s)
- Antoine Pariselle
- ISE-M, UMR5554 CNRS, UR226 IRD (ex-ORSTOM), Université Montpellier II—CC 063, 34095 Montpellier Cedex 5, France
| | - Walter A. Boeger
- Laboratório de Ecologia Molecular e Parasitologia Evolutiva, Grupo Integrado de Aquicultura e Estudos Ambientais, Universidade Federal do Paraná, Setor de Ciências Biológicas, Departamento de Zoologia, Caixa Postal 19073, CEP 81531-980, Curitiba, PR, Brazil
| | - Jos Snoeks
- Ichthyology Unit, African Zoology Department, Royal Museum for Central Africa, Leuvensesteenweg 13, 3080 Tervuren, Belgium
- Laboratory of Animal Diversity and Systematics, Biology Department, Katholieke Universiteit Leuven, Charles Deberiotstraat 32, 3000 Leuven, Belgium
| | - Charles F. Bilong Bilong
- Laboratoire de Parasitologie et d'Ecologie, Département de Biologie et Physiologie Animales, Université de Yaoundé I, BP 812, Yaoundé, Cameroon
| | - Serge Morand
- ISE-M, UMR5554 CNRS, UR226 IRD (ex-ORSTOM), Université Montpellier II—CC 063, 34095 Montpellier Cedex 5, France
| | - Maarten P. M. Vanhove
- Ichthyology Unit, African Zoology Department, Royal Museum for Central Africa, Leuvensesteenweg 13, 3080 Tervuren, Belgium
- Laboratory of Animal Diversity and Systematics, Biology Department, Katholieke Universiteit Leuven, Charles Deberiotstraat 32, 3000 Leuven, Belgium
| |
Collapse
|
20
|
Lespinats S, Grando D, Maréchal E, Hakimi MA, Tenaillon O, Bastien O. How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling. Evol Bioinform Online 2011; 7:61-85. [PMID: 21697992 PMCID: PMC3118699 DOI: 10.4137/ebo.s7048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon. To study how these features might influence phylogeny reconstructions, we examined a particular popular method: the Fitch-Margoliash algorithm, which belongs to the Least Squares methods. We show that the Least Squares methods are closely related to Multi Dimensional Scaling. Indeed, criteria for Fitch-Margoliash and Sammon’s mapping are somewhat similar. However, the prolific research in Multi Dimensional Scaling has definitely allowed outclassing Sammon’s mapping. Least Square methods for tree reconstruction can now take advantage of these improvements. However, “false neighborhood” and “tears” are the two main risks in dimensionality reduction field: “false neighborhood” corresponds to a widely separated data in the original space that are found close in representation space, and neighbor data that are displayed in remote positions constitute a “tear”. To address this problem, we took advantage of the concepts of “continuity” and “trustworthiness” in the tree reconstruction field, which limit the risk of “false neighborhood” and “tears”. We also point out the concentration of measured phenomenon as a source of error and introduce here new criteria to build phylogenies with improved preservation of distances and robustness. The authors and the Evolutionary Bioinformatics Journal dedicate this article to the memory of Professor W.M. Fitch (1929–2011).
Collapse
Affiliation(s)
- Sylvain Lespinats
- UMR INSERM unité U722 and Université Denis Diderot-Paris 7, Faculté de médecine, site Xavier Bichat, 16 rue Henri Huchard, 75870 Paris cedex 18, France
| | | | | | | | | | | |
Collapse
|
21
|
Abstract
The recent rapid expansion in the DNA and protein databases, arising from large-scale genomic and metagenomic sequence projects, has forced significant development in the field of phylogenetics: the study of the evolutionary relatedness of the planet's inhabitants. Advances in phylogenetic analysis have greatly transformed our view of the landscape of evolutionary biology, transcending the view of the tree of life that has shaped evolutionary theory since Darwinian times. Indeed, modern phylogenetic analysis no longer focuses on the restricted Darwinian-Mendelian model of vertical gene transfer, but must also consider the significant degree of lateral gene transfer, which connects and shapes almost all living things. Herein, I review the major tree-building methods, their strengths, weaknesses and future prospects.
Collapse
|
22
|
Albayrak A, Otu HH, Sezerman UO. Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets. BMC Bioinformatics 2010; 11:428. [PMID: 20718947 PMCID: PMC2936399 DOI: 10.1186/1471-2105-11-428] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2010] [Accepted: 08/18/2010] [Indexed: 11/30/2022] Open
Abstract
Background Phylogenetic analysis can be used to divide a protein family into subfamilies in the absence of experimental information. Most phylogenetic analysis methods utilize multiple alignment of sequences and are based on an evolutionary model. However, multiple alignment is not an automated procedure and requires human intervention to maintain alignment integrity and to produce phylogenies consistent with the functional splits in underlying sequences. To address this problem, we propose to use the alignment-free Relative Complexity Measure (RCM) combined with reduced amino acid alphabets to cluster protein families into functional subtypes purely on sequence criteria. Comparison with an alignment-based approach was also carried out to test the quality of the clustering. Results We demonstrate the robustness of RCM with reduced alphabets in clustering of protein sequences into families in a simulated dataset and seven well-characterized protein datasets. On protein datasets, crotonases, mandelate racemases, nucleotidyl cyclases and glycoside hydrolase family 2 were clustered into subfamilies with 100% accuracy whereas acyl transferase domains, haloacid dehalogenases, and vicinal oxygen chelates could be assigned to subfamilies with 97.2%, 96.9% and 92.2% accuracies, respectively. Conclusions The overall combination of methods in this paper is useful for clustering protein families into subtypes based on solely protein sequence information. The method is also flexible and computationally fast because it does not require multiple alignment of sequences.
Collapse
Affiliation(s)
- Aydin Albayrak
- Biological Sciences and Bioengineering, Sabanci University, Orhanli, Tuzla, Istanbul, Turkey
| | | | | |
Collapse
|
23
|
Cutiño-Jiménez AM, Martins-Pinheiro M, Lima WC, Martín-Tornet A, Morales OG, Menck CFM. Evolutionary placement of Xanthomonadales based on conserved protein signature sequences. Mol Phylogenet Evol 2009; 54:524-34. [PMID: 19786109 DOI: 10.1016/j.ympev.2009.09.026] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Revised: 09/11/2009] [Accepted: 09/21/2009] [Indexed: 11/30/2022]
Abstract
Xanthomonadales comprises one of the largest phytopathogenic bacterial groups, and is currently classified within the gamma-proteobacteria. However, the phylogenetic placement of this group is not clearly resolved, and the results of different studies contradict one another. In this work, the evolutionary position of Xanthomonadales was determined by analyzing the presence of shared insertions and deletions (INDELs) in highly conserved proteins. Several distinctive insertions found in most of the members of the gamma-proteobacteria are absent in Xanthomonadales and groups such as Legionelalles, Chromatiales, Methylococcales, Thiotrichales and Cardiobacteriales. These INDELs were most likely introduced after the branching of Xanthomonadales from most of the gamma-proteobacteria and provide evidence for the phylogenetic placement of the early gamma-proteobacteria. Moreover, other proteins contain insertions exclusive to the Xanthomonadales order, confirming that this is a monophyletic group and provide important specific genetic markers. Thus, the data presented clearly support the Xanthomonadales group as an independent subdivision, and constitute one of the deepest branching lineage within the gamma-proteobacteria clade.
Collapse
Affiliation(s)
- Ania M Cutiño-Jiménez
- Department of Biology, Facultad de Ciencias Naturales, Universidad de Oriente, Ave. Patricio Lumumba s/n., Santiago de Cuba, CP 90 500, Cuba
| | | | | | | | | | | |
Collapse
|
24
|
Abstract
As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for the best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe the chain of decisions accompanying a metagenomic project from the viewpoint of the bioinformatic analysis step by step. We guide the reader through a standard workflow for a metagenomic project beginning with presequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries, and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic data sets in contrast to genome projects. Different types of data analyses particular to metagenomes are then presented, including binning, dominant population analysis, and gene-centric analysis. Finally, data management issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.
Collapse
|
25
|
Han D, Fan Y, Hu Z. An evaluation of four phylogenetic markers in Nostoc: implications for cyanobacterial phylogenetic studies at the intrageneric level. Curr Microbiol 2008; 58:170-6. [PMID: 18972163 DOI: 10.1007/s00284-008-9302-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2006] [Accepted: 10/12/2006] [Indexed: 11/24/2022]
Abstract
The success of some phylogenetic markers in cyanobacteria owes to the design of cyanobacteria-specific primers, but a few studies have directly investigated the evolution "behavior" of the loci. In this study, we performed a case study in Nostoc to evaluate rpoC1, hetR, rbcLX, and 16S rRNA-tRNA(Ile)-tRNA(Ala)-23S rRNA internal transcribed spacer (ITS) as phylogenetic markers. The results indicated that the gene trees of these loci are not congruent with the phylogeny based on 16S rRNA gene. The mechanisms contributing to the incongruence include randomized variation and recombination. As the results suggested, one should be careful to choose the molecular markers for phylogenetic reconstruction at the intrageneric level in cyanobacteria.
Collapse
Affiliation(s)
- D Han
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | | | | |
Collapse
|
26
|
A simple, fast, and accurate method of phylogenomic inference. Genome Biol 2008; 9:R151. [PMID: 18851752 PMCID: PMC2760878 DOI: 10.1186/gb-2008-9-10-r151] [Citation(s) in RCA: 348] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2008] [Revised: 09/26/2008] [Accepted: 10/13/2008] [Indexed: 11/10/2022] Open
Abstract
An automated pipeline for phylogenomic analysis (AMPHORA) is presented that overcomes existing limits to large-scale protein phylogenetic inference. The explosive growth of genomic data provides an opportunity to make increased use of protein markers for phylogenetic inference. We have developed an automated pipeline for phylogenomic analysis (AMPHORA) that overcomes the existing bottlenecks limiting large-scale protein phylogenetic inference. We demonstrated its high throughput capabilities and high quality results by constructing a genome tree of 578 bacterial species and by assigning phylotypes to 18,607 protein markers identified in metagenomic data collected from the Sargasso Sea.
Collapse
|
27
|
Levasseur A, Pontarotti P, Poch O, Thompson JD. Strategies for reliable exploitation of evolutionary concepts in high throughput biology. Evol Bioinform Online 2008; 4:121-37. [PMID: 19204813 PMCID: PMC2614184 DOI: 10.4137/ebo.s597] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology.
Collapse
Affiliation(s)
- Anthony Levasseur
- Phylogenomics Laboratory, EA 3781 Evolution Biologique, Université de Provence, 13331 Marseille, France
| | | | | | | |
Collapse
|
28
|
Cobbett A, Wilkinson M, Wills MA. Fossils Impact as Hard as Living Taxa in Parsimony Analyses of Morphology. Syst Biol 2007; 56:753-66. [PMID: 17886145 DOI: 10.1080/10635150701627296] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Abstract
Systematists disagree whether data from fossils should be included in parsimony analyses. In a handful of well-documented cases, the addition of fossil data radically overturns a hypothesis of relationships based on extant taxa alone. Fossils can break up long branches and preserve character combinations closer in time to deep splitting events. However, fossils usually require more interpretation than extant taxa, introducing greater potential for spurious codings. Moreover, because fossils often have more "missing" codings, they are frequently accused of increasing numbers of MPTs, frustrating resolution and reducing support. Despite the controversy, remarkably little is known about the effects of fossils more generally. Here we provide the first systematic study, investigating empirically the behavior of fossil and extant taxa in 45 published morphological data sets. First-order jackknifing is used to determine the effects that each terminal has on inferred relationships, on the number of MPTs, and on CI' and RI as measures of homoplasy. Bootstrap leaf stabilities provide a proxy for the contribution of individual taxa to the branch support in the rest of the tree. There is no significant difference in the impact of fossil versus extant taxa on relationships, numbers of MPTs, and CI' or RI. However, adding individual fossil taxa is more likely to reduce the total branch support of the tree than adding extant taxa. This must be weighed against the superior taxon sampling afforded by including judiciously coded fossils, providing data from otherwise unsampled regions of the tree. We therefore recommend that investigators should include fossils, in the absence of compelling and case specific reasons for their exclusion.
Collapse
Affiliation(s)
- Andrea Cobbett
- Department of Biology and Biochemistry, The University of Bath, Claverton Down, Bath, UK
| | | | | |
Collapse
|
29
|
MURPHY NICHOLASP, CAREY DANIELLE, CASTRO LYDAR, DOWTON MARK, AUSTIN ANDREWD. Phylogeny of the platygastroid wasps (Hymenoptera) based on sequences from the 18S rRNA, 28S rRNA and cytochrome oxidase I genes: implications for the evolution of the ovipositor system and host relationships. Biol J Linn Soc Lond 2007. [DOI: 10.1111/j.1095-8312.2007.00825.x] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
30
|
Navaud O, Dabos P, Carnus E, Tremousaygue D, Hervé C. TCP Transcription Factors Predate the Emergence of Land Plants. J Mol Evol 2007; 65:23-33. [PMID: 17568984 DOI: 10.1007/s00239-006-0174-z] [Citation(s) in RCA: 155] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2006] [Accepted: 01/17/2007] [Indexed: 10/23/2022]
Abstract
TCP proteins are plant-specific transcription factors identified so far only in angiosperms and shown to be involved in specifying plant morphologies. However, the functions of these proteins remain largely unknown. Our study is the first phylogenetic analysis comparing the TCP genes from higher and lower plants, and it dates the emergence of the TCP family to before the split of the Zygnemophyta. EST database analysis and CODEHOP PCR amplification revealed TCP genes in basal land plant genomes and also in their close freshwater algal relatives. Based on an extensive survey of TCP genes, families of TCP proteins were characterized in the Arabidopsis thaliana, poplar, rice, club-moss, and moss genomes. The phylogenetic trees indicate a continuous expansion of the TCP family during the diversification of the Phragmoplastophyta and a similar degree of expansion in several angiosperm lineages. TCP paralogues were identified in all genomes studied, and Ks values indicate that TCP genes expanded during genome duplication events. MEME and SIMPLE analyses detected conserved motifs and low-complexity regions, respectively, outside of the TCP domain, which reinforced the previous description of a "mosaic" structure of TCP proteins.
Collapse
Affiliation(s)
- Olivier Navaud
- CNRS UMR2594/INRA UMR441, Laboratoire des Interactions Plantes Microorganismes, BP 52627 Chemin de borde rouge, F-31326 Castanet-Tolosan, France
| | | | | | | | | |
Collapse
|
31
|
Kirzhner V, Paz A, Volkovich Z, Nevo E, Korol A. Different clustering of genomes across life using the A-T-C-G and degenerate R-Y alphabets: early and late signaling on genome evolution? J Mol Evol 2007; 64:448-56. [PMID: 17479343 DOI: 10.1007/s00239-006-0178-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2006] [Accepted: 01/11/2007] [Indexed: 10/23/2022]
Abstract
In this study, we have calculated distances between genomes based on our previously developed compositional spectra (CS) analysis. The study was conducted using genomes of 39 species of Eukarya, Eubacteria, and Archaea. Based on CS distances, we produced two different consensus dendrograms for four- and two-letter (purine-pyrimidine) alphabets. A comparison of the obtained structure using purine-pyrimidine alphabet with the standard three-kingdom (3K) scheme reveals substantial similarity. Surprisingly, this is not the case when the same procedure is based on the four-letter alphabet. In this situation, we also found three main clusters-but different from those in the 3K scheme. In particular, one of the clusters includes Eukarya and thermophilic bacteria and a part of the considered Archaea species. We speculate that the key factor in the last classification (based on the A-T-G-C alphabet) is related to ecology: two ecological parameters, temperature and oxygen, distinctly explain the clustering revealed by compositional spectra in the four-letter alphabet. Therefore, we assume that this result reflects two interdependent processes: evolutionary divergence and superimposed ecological convergence of the genomes, albeit another process, horizontal transfer, cannot be excluded as an important contributing factor.
Collapse
Affiliation(s)
- V Kirzhner
- Institute of Evolution, University of Haifa, Mount Carmel, Haifa, Israel.
| | | | | | | | | |
Collapse
|
32
|
Swidan F, Ziv-Ukelson M, Pinter RY. On the repeat-annotated phylogenetic tree reconstruction problem. J Comput Biol 2007; 13:1397-418. [PMID: 17061918 DOI: 10.1089/cmb.2006.13.1397] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
A new problem in phylogenetic inference is presented, based on recent biological findings indicating a strong association between reversals (i.e., inversions) and repeats. These biological findings are formalized here in a new mathematical model, called repeat-annotated phylogenetic trees (RAPT). We show that, under RAPT, the evolutionary process--including both the tree-topology as well as internal node genome orders--is uniquely determined, a property that is of major significance both in theory and in practice. Furthermore, the repeats are employed to provide linear-time algorithms for reconstructing both the genomic orders and the phylogeny, which are NP-hard problems under the classical model of sorting by reversals (SBR).
Collapse
Affiliation(s)
- Firas Swidan
- Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel.
| | | | | |
Collapse
|
33
|
Brocchieri L, Conway de Macario E, Macario AJL. Chaperonomics, a new tool to study ageing and associated diseases. Mech Ageing Dev 2006; 128:125-36. [PMID: 17123587 DOI: 10.1016/j.mad.2006.11.019] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The participation of molecular chaperones in the process of senescence and in the mechanisms of age-related diseases is currently under investigation in many laboratories. However, accurate, complete information about the number and diversity of chaperone genes in any given genome is scarce. Consequently, the results of efforts aimed at elucidating the role of chaperones in ageing and disease are often confusing and contradictory. To remedy this situation, we have developed chaperonomics, including means to identify and characterize chaperone genes and their families applicable to humans and model organisms. The problem is difficult because in eukaryotic organisms chaperones have evolved into complex multi-gene families. For instance, the occurrence of multiple paralogs in a single genome makes it difficult to interpret results if consideration is not given to the fact that similar but distinct chaperone genes can be differentially expressed in separate cellular compartments, tissues, and developmental stages. The availability of complete genome sequences allows implementation of chaperonomics with the purpose of understanding the composition of chaperone families in all cell compartments, their evolutionary and functional relations and, ultimately, their role in pathogenesis. Here, we present a series of concatenated, complementary procedures for identifying, characterizing, and classifying chaperone genes in genomes and for elucidating evolutionary relations and structural features useful in predicting functional properties. We illustrate the procedures with applications to the complex family of hsp70 genes and show that the kind of data obtained can provide a solid basis for future research.
Collapse
Affiliation(s)
- Luciano Brocchieri
- University of Florida, College of Medicine, Department of Molecular Genetics and Microbiology, UF Genetics Institute, P.O. Box 103610, Gainesville, FL 32610-3610, USA
| | | | | |
Collapse
|
34
|
Macario AJL, Brocchieri L, Shenoy AR, Conway de Macario E. Evolution of a Protein-Folding Machine: Genomic and Evolutionary Analyses Reveal Three Lineages of the Archaeal hsp70(dnaK) Gene. J Mol Evol 2006; 63:74-86. [PMID: 16788741 DOI: 10.1007/s00239-005-6207-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2005] [Accepted: 03/14/2006] [Indexed: 11/27/2022]
Abstract
The stress chaperone protein Hsp70 (DnaK) (abbreviated DnaK) and its co-chaperones Hsp40(DnaJ) (or DnaJ) and GrpE are universal in bacteria and eukaryotes but occur only in some archaea clustered in the order 5'-grpE-dnaK-dnaJ-3' in a locus termed Locus I. Three structural varieties of Locus I, termed Types I, II, and III, were identified, respectively, in Methanosarcinales, in Thermoplasmatales and Methanothermobacter thermoautotrophicus, and in Halobacteriales. These Locus I types corresponded to three groups identified by phylogenetic trees of archaeal DnaK proteins including the same archaeal subdivisions. These archaeal DnaK groups were not significantly interrelated, clustering instead with DnaKs from three bacterial lineages, Methanosarcinales with Firmicutes, Thermoplasmatales and M. thermoautotrophicus with Thermotoga, and Halobacteriales with Actinobacteria, suggesting that the three archaeal types of Locus I were acquired by independent events of lateral gene transfer. These associations, however, lacked strong bootstrap support and were sensitive to dataset choice and tree-reconstruction method. Structural features of dnaK loci in bacteria revealed that Methanosarcinales and Firmicutes shared a similar structure, also common to most other bacterial groups. Structural differences were observed instead in Thermotoga compared to Thermoplasmatales and M. thermoautotrophicus, and in Actinobacteria compared to Halobacteriales. It was also found that the association between the DnaK sequences from Halobacteriales and Actinobacteria likely reflects common biases in their amino acid compositions. Although the loci structural features and the DnaK trees suggested the possibility of lateral gene transfer between Firmicutes and Methanosarcinales, the similarity between the archaeal and the ancestral bacterial loci favors the more parsimonious hypothesis that all archaeal sequences originated from a unique prokaryotic ancestor.
Collapse
Affiliation(s)
- Alberto J L Macario
- Division of Molecular Medicine, Wadsworth Center, Room B-749, New York State Department of Health, Empire State Plaza, P.O. Box 509, Albany, NY 12201-0509, USA
| | | | | | | |
Collapse
|
35
|
Arnedo MA, Gillespie RG. Species diversification patterns in the Polynesian jumping spider genus Havaika Prószyński, 2001 (Araneae, Salticidae). Mol Phylogenet Evol 2006; 41:472-95. [PMID: 16837219 DOI: 10.1016/j.ympev.2006.05.012] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2006] [Revised: 05/10/2006] [Accepted: 05/13/2006] [Indexed: 11/16/2022]
Abstract
Hotspot archipelagoes provide exceptional models for the study of the evolutionary process, due to the effects of isolation and topographical diversity in inducing the formation of unique biotic assemblages. In this paper, we examine the evolutionary patterns exhibited by the jumping spider genus Havaika Prószyński, 2001 in the Polynesian islands of the Hawaiian and Marquesas chains. To date, systematic research on Havaika has been seriously limited by the poor taxonomic knowledge on the group, which was based on a handful of specimens that showed continuous variability and lacked clear-cut diagnostic characters. Here, we circumvent this problem by inferring a phylogeny based on DNA sequences of several fragments including both mitochondrial (protein coding cytochrome oxidase I, NAD1 dehydrogenase, ribosomal 16S, and tRNA leu) and nuclear (internal transcribed spacer 2) genes, and a statistical morphological analyses of a large sample of specimens. Results suggest that the Marquesan and Hawaiian Havaika may be the result of independent colonizations. Furthermore, data provide little support for the standard "progression rule" (evolution in the direction of older to younger islands) in Hawaiian Islands. This may be explained by a recent arrival of the group: age estimates of the different lineages suggest that Havaika colonized the Hawaiian Islands after most of the extant islands were already formed. The lack of clear-cut diagnostic characters among species may also be explained by the recent origin of the group since molecular data do not provide any evidence of hybridization among lineages. Quantitative morphological data coupled with the phylogenetic information allow us to reevaluate the current limitation of Havaika taxonomy. Molecular data support the existence of at least four different evolutionary lineages that are further morphologically diagnosable. However, genealogical relationships are better predicted by geographical affinity (i.e. island) than by morphological characters used in the original descriptions of the species. A pattern of size segregation linked to largely overlapping distributions of some of the species hints at a potential involvement of competition in generating morphological diversity. This study contributes to our understanding on the origin and shaping of the biodiversity of oceanic islands and sets the stage for more detailed studies on particular aspects of these previously overlooked spiders.
Collapse
Affiliation(s)
- Miquel A Arnedo
- Division of Insect Biology, University of California-Berkeley, ESPM 201 Wellman Hall, Berkeley, CA 94720-3112, USA.
| | | |
Collapse
|
36
|
Bradley ME, Benner SA. Integrating protein structures and precomputed genealogies in the Magnum database: examples with cellular retinoid binding proteins. BMC Bioinformatics 2006; 7:89. [PMID: 16504077 PMCID: PMC1475641 DOI: 10.1186/1471-2105-7-89] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2005] [Accepted: 02/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use. RESULTS The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1) multiple sequence alignments, 2) mapping of alignment sites to crystal structure sites, 3) phylogenetic trees, 4) inferred ancestral sequences at internal tree nodes, and 5) amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures. CONCLUSION We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural bioinformatics resources that are useful for identifying experimentally testable hypotheses about the molecular basis of protein behaviors and functions, as illustrated with the examples from the cellular retinoid binding proteins.
Collapse
Affiliation(s)
- Michael E Bradley
- Department of Chemistry, University of Florida, P.O. Box 117200, Gainesville, FL, 32611, USA
- Division of Biological Sciences, Department of Ecology and Evolution, University of Chicago, 1101 East 57Street, Chicago, IL, 60615, USA
| | - Steven A Benner
- Foundation for Applied Molecular Evolution, 1115 NW 14Avenue, Gainesville, FL, 32601, USA
| |
Collapse
|
37
|
Kirzhner V, Bolshoy A, Volkovich Z, Korol A, Nevo E. Large-scale genome clustering across life based on a linguistic approach. Biosystems 2006; 81:208-22. [PMID: 15936870 DOI: 10.1016/j.biosystems.2005.04.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2005] [Accepted: 04/13/2005] [Indexed: 11/24/2022]
Abstract
With the availability of genome sequences, the possibility of new phylogenetic reconstructions arises in order to reveal genomic relationships among organisms. According to the compositional-spectra (CS) approach proposed in our previous studies, any genomic sequence can be characterized by a distribution of frequencies of imperfect matching of words (oligonucleotides). In the current application of CS-analysis, we attempted to analyze the cluster structure of genomes across life. It appeared that compositional spectra show a clear three-group clustering of the compared prokaryotic and eukaryotic genomes. Unexpectedly, this grouping seriously differs from the classical Universal Tree of Life structure represented by common kingdoms known as Eubacteria, Archaebacteria, and Eukarya. The revealed CS-clustering displays high stability, putatively reflecting its objective nature, and still enigmatic biological significance that may result from convergent evolution driven by ecological selection. We believe that our approach provides a new and wider (compared to traditional methods) perspective of extracting genomic information of high evolutionary relevance.
Collapse
Affiliation(s)
- Valery Kirzhner
- Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel.
| | | | | | | | | |
Collapse
|
38
|
Chapus C, Dufraigne C, Edwards S, Giron A, Fertil B, Deschavanne P. Exploration of phylogenetic data using a global sequence analysis method. BMC Evol Biol 2005; 5:63. [PMID: 16280081 PMCID: PMC1310607 DOI: 10.1186/1471-2148-5-63] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2005] [Accepted: 11/09/2005] [Indexed: 11/13/2022] Open
Abstract
Background Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets. Results Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers. Conclusion The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis.
Collapse
Affiliation(s)
- Charles Chapus
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U 726, Case 7113, Tour 53-54, 2 place Jussieu, 75005 Paris, France
- Current address: Dept. of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138 USA
| | | | - Scott Edwards
- Dept. of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138 USA
| | - Alain Giron
- Inserm U494, 91 bd de l'Hopital 75634 Paris CEDEX 13, France
| | - Bernard Fertil
- Inserm U494, 91 bd de l'Hopital 75634 Paris CEDEX 13, France
| | - Patrick Deschavanne
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U 726, Case 7113, Tour 53-54, 2 place Jussieu, 75005 Paris, France
| |
Collapse
|
39
|
Abstract
The Arthur M. Sackler Colloquium of the National Academy of Sciences, "Frontiers in Bioinformatics: Unsolved Problems and Challenges," organized by David Eisenberg, Russ Altman, and myself, was held October 15-17, 2004, to provide a forum for discussing concepts and methods in bioinformatics serving the biological and medical sciences. The deluge of genomic and proteomic data in the last two decades has driven the creation of tools that search and analyze biomolecular sequences and structures. Bioinformatics is highly interdisciplinary, using knowledge from mathematics, statistics, computer science, biology, medicine, physics, chemistry, and engineering.
Collapse
Affiliation(s)
- Samuel Karlin
- Department of Mathematics, Stanford University, Stanford, CA 94305-2125, USA.
| |
Collapse
|
40
|
Paz A, Kirzhner V, Nevo E, Korol A. Coevolution of DNA-interacting proteins and genome "dialect". Mol Biol Evol 2005; 23:56-64. [PMID: 16151189 DOI: 10.1093/molbev/msj007] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Several species-specific characteristics of genome organization that are superimposed on its coding aspects were proposed earlier, including genome signature (GS), genome accent, and compositional spectrum (CS). These notions could be considered as representatives of genome dialect (GD). We measured within the Proteobacteria some GD representatives, the relative abundance of dinucleotides or GS, the profiles of occurrence of 10 nucleotide words (CS), and the profiles of occurrence of 20 nucleotide words, using a degenerate two-letter alphabet (purine-pyrimidine compositional spectra [PPCS]). Here, we show that the evolutionary distances between DNA repair and recombination orthologous enzymes (especially those of the nucleotide excision repair system) are highly correlated with PPCS and GS distances. Orthologous proteins involved in structural or metabolic processes (control group) have significantly lower correlations of their evolutionary distances with the PPCS and GS distances. We hypothesize that the high correlation of the evolutionary distances of the DNA repair orthologous enzymes with their GD is a result of the coevolution of the DNA repair enzymes' structures and GDs. Species GDs could be substantially influenced by the function of DNA polymerase I (the bacterial major DNA repair polymerase). This might cause the correlation of species GDs differentiation with evolutionary changes of species DNA polymerase I. Simultaneously, the structures of DNA repair-recombination enzymes might be evolutionarily sensitive and responsive to changes in the structure of their substrate-the DNA (including those that are represented by GD differentiation). We further discuss the rationale and mechanisms of the hypothesized coevolution. We suggest that stress might be an important cause of changes in the repair-recombination genes and the GD and the trigger of the aforementioned coevolution process. Other triggers might be massive horizontal gene transfer and ecological selection.
Collapse
Affiliation(s)
- A Paz
- Institute of Evolution, University of Haifa, Mount Carmel, Haifa, Israel
| | | | | | | |
Collapse
|
41
|
Aagesen L. Direct optimization, affine gap costs, and node stability. Mol Phylogenet Evol 2005; 36:641-53. [PMID: 15935703 DOI: 10.1016/j.ympev.2005.04.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2004] [Revised: 04/11/2005] [Accepted: 04/12/2005] [Indexed: 11/27/2022]
Abstract
The outcome of a phylogenetic analysis based on DNA sequence data is highly dependent on the homology-assignment step and may vary with alignment parameter costs. Robustness to changes in parameter costs is therefore a desired quality of a data set because the final conclusions will be less dependent on selecting a precise optimal cost set. Here, node stability is explored in relationship to separate versus combined analysis in three different data sets, all including several data partitions. Robustness to changes in cost sets is measured as number of successive changes that can be made in a given cost set before a specific clade is lost. The changes are in all cases base change cost, gap penalties, and adding/removing/changing affine gap costs. When combining data partitions, the number of clades that appear in the entire parameter space is not remarkably increased, in some cases this number even decreased. However, when combining data partitions the trees from cost sets including affine gap costs were always more similar than the trees were from cost sets without affine gap costs. This was not the case when the data partitions were analyzed independently. When data sets were combined approximately 80% of the clades found under cost sets including affine gap costs resisted at least one change to the cost set.
Collapse
Affiliation(s)
- Lone Aagesen
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024-5192, USA.
| |
Collapse
|
42
|
Bell-Pedersen D, Cassone VM, Earnest DJ, Golden SS, Hardin PE, Thomas TL, Zoran MJ. Circadian rhythms from multiple oscillators: lessons from diverse organisms. Nat Rev Genet 2005; 6:544-56. [PMID: 15951747 PMCID: PMC2735866 DOI: 10.1038/nrg1633] [Citation(s) in RCA: 1006] [Impact Index Per Article: 50.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The organization of biological activities into daily cycles is universal in organisms as diverse as cyanobacteria, fungi, algae, plants, flies, birds and man. Comparisons of circadian clocks in unicellular and multicellular organisms using molecular genetics and genomics have provided new insights into the mechanisms and complexity of clock systems. Whereas unicellular organisms require stand-alone clocks that can generate 24-hour rhythms for diverse processes, organisms with differentiated tissues can partition clock function to generate and coordinate different rhythms. In both cases, the temporal coordination of a multi-oscillator system is essential for producing robust circadian rhythms of gene expression and biological activity.
Collapse
Affiliation(s)
- Deborah Bell-Pedersen
- Center for Research on Biological Clocks, Department of Biology, Texas A&M University, College Station, Texas 77843-3258, USA.
| | | | | | | | | | | | | |
Collapse
|
43
|
Karlin S, Mrázek J, Ma J, Brocchieri L. Predicted highly expressed genes in archaeal genomes. Proc Natl Acad Sci U S A 2005; 102:7303-8. [PMID: 15883368 PMCID: PMC1129124 DOI: 10.1073/pnas.0502313102] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Based primarily on 16S rRNA sequence comparisons, life has been broadly divided into the three domains of Bacteria, Archaea, and Eukarya. Archaea is further classified into Crenarchaea and Euryarchaea. Archaea generally thrive in extreme environments as assessed by temperature, pH, and salinity. For many prokaryotic organisms, ribosomal proteins (RP), transcription/translation factors, and chaperone genes tend to be highly expressed. A gene is predicted highly expressed (PHX) if its codon usage is rather similar to the average codon usage of at least one of the RP, transcription/translation factors, and chaperone gene classes and deviates strongly from the average gene of the genome. The thermosome (Ths) chaperonin family represents the most salient PHX genes among Archaea. The chaperones Trigger factor and HSP70 have overlapping functions in the folding process, but both of these proteins are lacking in most archaea where they may be substituted by the chaperone prefoldin. Other distinctive PHX proteins of Archaea, absent from Bacteria, include the proliferating cell nuclear antigen PCNA, a replication auxiliary factor responsible for tethering the catalytic unit of DNA polymerase to DNA during high-speed replication, and the acidic RP P0, which helps to initiate mRNA translation at the ribosome. Other PHX genes feature Cell division control protein 48 (Cdc48), whereas the bacterial septation proteins FtsZ and minD are lacking in Crenarchaea. RadA is a major DNA repair and recombination protein of Archaea. Archaeal genomes feature a strong Shine-Dalgarno ribosome-binding motif more pronounced in Euryarchaea compared with Crenarchaea.
Collapse
Affiliation(s)
- Samuel Karlin
- Department of Mathematics, Stanford University, Stanford, CA 94305-2125, USA.
| | | | | | | |
Collapse
|
44
|
Aagesen L, Petersen G, Seberg O. Sequence length variation, indel costs, and congruence in sensitivity analysis. Cladistics 2005; 21:15-30. [DOI: 10.1111/j.1096-0031.2005.00053.x] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
45
|
A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities. BMC Bioinformatics 2005; 6:49. [PMID: 15757521 PMCID: PMC555736 DOI: 10.1186/1471-2105-6-49] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2004] [Accepted: 03/10/2005] [Indexed: 11/15/2022] Open
Abstract
Background Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. Results We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. Conclusion The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.
Collapse
|
46
|
O'Malley MA, Boucher Y. Paradigm change in evolutionary microbiology. STUDIES IN HISTORY AND PHILOSOPHY OF BIOLOGICAL AND BIOMEDICAL SCIENCES 2005; 36:183-208. [PMID: 16120264 DOI: 10.1016/j.shpsc.2004.12.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2004] [Revised: 07/19/2004] [Indexed: 05/04/2023]
Abstract
Thomas Kuhn had little to say about scientific change in biological science, and biologists are ambivalent about how applicable his framework is for their disciplines. We apply Kuhn's account of paradigm change to evolutionary microbiology, where key Darwinian tenets are being challenged by two decades of findings from molecular phylogenetics. The chief culprit is lateral gene transfer, which undermines the role of vertical descent and the representation of evolutionary history as a tree of life. To assess Kuhn's relevance to this controversy, we add a social analysis of the scientists involved to the historical and philosophical debates. We conclude that while Kuhn's account may capture aspects of the pattern (or outcome) of an episode of scientific change, he has little to say about how the process of generating new understandings is occurring in evolutionary microbiology. Once Kuhn's application is limited to that of an initial investigative probe into how scientific problem-solving occurs, his disciplinary scope becomes broader.
Collapse
Affiliation(s)
- Maureen A O'Malley
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada B3H 1X5.
| | | |
Collapse
|
47
|
Silver MR, Kawauchi H, Nozaki M, Sower SA. Cloning and analysis of the lamprey GnRH-III cDNA from eight species of lamprey representing the three families of Petromyzoniformes. Gen Comp Endocrinol 2004; 139:85-94. [PMID: 15474539 DOI: 10.1016/j.ygcen.2004.07.011] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/07/2004] [Revised: 07/16/2004] [Accepted: 07/21/2004] [Indexed: 11/24/2022]
Abstract
The lamprey, which are divided into three families, including the Petromyzonidae, Geotriidae, and Mordaciidae, have been shown to regulate the reproductive axis through a functional hypothalamic-pituitary-gonadal axis. To date, two forms of gonadotropin-releasing hormone (GnRH) have been identified in the sea lamprey (Petromyzon marinus), lamprey GnRH-I (decapeptide and cDNA) and lamprey GnRH-III (decapeptide), both of which have been shown to be expressed in the preoptic-anterior hypothalamic region and both forms have been demonstrated to regulate reproductive function (i.e. steroidogenesis and gametogenesis). The objective of this study was to isolate the cDNA encoding the prepro-lamprey GnRH-III from eight species of lamprey using a PCR based subcloning procedure. A degenerate primer designed to the lamprey GnRH-III decapeptide was used to amplify the 3' end of each transcript, while gene specific primers were used to amplify the 5' ends. Phylogenetic analysis using the prepro-lamprey GnRH-III amino acid sequences was performed, in which the lamprey GnRH-III sequences divided into three groups, supporting the current view of the lamprey lineage at the family level. Finally, a phylogenetic analysis of these newly identified deduced amino acid sequences together with 64 previously described GnRH sequences suggests that the lamprey GnRHs are unique, as they group together separately from the three previously described paralogous lineages of the GnRH family.
Collapse
Affiliation(s)
- Matthew R Silver
- Department of Biochemistry and Molecular Biology, University of New Hampshire, 46 College Road, Durham 03824, USA
| | | | | | | |
Collapse
|
48
|
Reimann A, Nurhayati N, Backenköhler A, Ober D. Repeated evolution of the pyrrolizidine alkaloid-mediated defense system in separate angiosperm lineages. THE PLANT CELL 2004; 16:2772-84. [PMID: 15466410 PMCID: PMC520970 DOI: 10.1105/tpc.104.023176] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Species of several unrelated families within the angiosperms are able to constitutively produce pyrrolizidine alkaloids as a defense against herbivores. In pyrrolizidine alkaloid (PA) biosynthesis, homospermidine synthase (HSS) catalyzes the first specific step. HSS was recruited during angiosperm evolution from deoxyhypusine synthase (DHS), an enzyme involved in the posttranslational activation of eukaryotic initiation factor 5A. Phylogenetic analysis of 23 cDNA sequences coding for HSS and DHS of various angiosperm species revealed at least four independent recruitments of HSS from DHS: one within the Boraginaceae, one within the monocots, and two within the Asteraceae family. Furthermore, sequence analyses indicated elevated substitution rates within HSS-coding sequences after each gene duplication, with an increased level of nonsynonymous mutations. However, the contradiction between the polyphyletic origin of the first enzyme in PA biosynthesis and the structural identity of the final biosynthetic PA products needs clarification.
Collapse
Affiliation(s)
- Andreas Reimann
- Institut für Pharmazeutische Biologie der Technischen Universität, 38106 Braunschweig, Germany
| | | | | | | |
Collapse
|
49
|
Morales ME, Kalinna BH, Heyers O, Mann VH, Schulmeister A, Copeland CS, Loukas A, Brindley PJ. Genomic organization of the Schistosoma mansoni aspartic protease gene, a platyhelminth orthologue of mammalian lysosomal cathepsin D. Gene 2004; 338:99-109. [PMID: 15302411 DOI: 10.1016/j.gene.2004.05.017] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2004] [Revised: 05/06/2004] [Accepted: 05/17/2004] [Indexed: 10/26/2022]
Abstract
Schistosomes are considered the most important of the helminth parasites of humans in terms of morbidity and mortality. Schistosomes employ proteolytic enzymes to digest host hemoglobin from ingested human blood, including a cathepsin D-like, aspartic protease that is overexpressed in the gut of the adult female schistosome. Because of its key role in parasite nutrition, this enzyme represents a potential intervention target. To continue exploration of this potential, here we have determined the sequence, structure and genomic organization of the cathepsin D gene locus of Schistosoma mansoni. Using the cDNA encoding S. mansoni cathepsin D as a probe, we isolated several positive bacterial artificial chromosomes (BAC) from a BAC library that represents an approximately 8-fold coverage of the schistosome genome. Sequencing of BAC clone 25-J-24 revealed that the cathepsin D gene locus was approximately 13 kb in length, and included seven exons interrupted by six introns. The exons ranged in length from 49 to 294 bp, and the introns from 30 to 5025 bp. The genomic organization of schistosome cathepsin D was similar in sequence, structure and complexity to human cathepsin D, including to a greater or lesser extent the conservation of all six exon/intron boundaries of the schistosome gene. It was less similar to aspartic protease genes of the nematodes Caenorhabditis elegans and Haemonchus contortus, and dissimilar to those of plasmepsins from malarial parasites. Examination of the introns revealed the presence of endogenous mobile genetic elements including SR2, the ASL-associated retrotransposon, and the SINE-like element, SMalpha. Phylogenetically, schistosome cathepsin D appeared to be more closely related to mammalian cathepsin D than to other sub-families of eukaryotic aspartic proteases known from mammals. Taken together, these features indicated that schistosome cathepsin D is a platyhelminth orthologue of mammalian lysosomal cathepsin D.
Collapse
Affiliation(s)
- Maria E Morales
- Department of Tropical Medicine, School of Public Health and Tropical Medicine, Tulane University Health Sciences Center, New Orleans, LA 70112, USA
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Haibin W, Ji Q, Bailin H. Prokaryote phylogeny based on ribosomal proteins and aminoacyl tRNA synthetases by using the compositional distance approach. SCIENCE IN CHINA. SERIES C, LIFE SCIENCES 2004; 47:313-21. [PMID: 15493472 PMCID: PMC7088628 DOI: 10.1360/03yc0137] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/26/2003] [Revised: 12/19/2003] [Indexed: 11/11/2022]
Abstract
In order to show that the newly developed K-string composition distance method, based on counting oligopeptide frequencies, for inferring phylogenetic relations of prokaryotes works equally well without requiring the whole proteome data, we used all ribosomal proteins and the set of aminoacyl tRNA synthetases for each species. The latter group has been known to yield inconsistent trees if used individually. Our trees are obtained without making any sequence alignment. Altogether 16 Archaea, 105 Bacteria and 2 Eucarya are represented on the tree. Most of the lower branchings agree well with the latest, 2003, Outline of the second edition of the Bergey's Manual of Systematic Bacteriology and the trees also suggest some relationships among higher taxa.
Collapse
Affiliation(s)
- Wei Haibin
- College of Life Sciences, Zhejiang University, 310027 Hangzhou, China
- Hangzhou Branch, Beijing Genomics Institute, Chinese Academy of Sciences, 310008 Hangzhou, China
| | - Qi Ji
- T-Life Research Center, Fudan University, 200433 Shanghai, China
- Institute of Theoretical Physics, Academia Sinica, 100080 Beijing, China
| | - Hao Bailin
- Hangzhou Branch, Beijing Genomics Institute, Chinese Academy of Sciences, 310008 Hangzhou, China
- T-Life Research Center, Fudan University, 200433 Shanghai, China
| |
Collapse
|