1
|
Avram O, Durmus B, Rakocz N, Corradetti G, An U, Nitalla MG, Rudas Á, Wakatsuki Y, Hirabayashi K, Velaga S, Tiosano L, Corvi F, Verma A, Karamat A, Lindenberg S, Oncel D, Almidani L, Hull V, Fasih-Ahmad S, Esmaeilkhanian H, Wykoff CC, Rahmani E, Arnold CW, Zhou B, Zaitlen N, Gronau I, Sankararaman S, Chiang JN, Sadda SR, Halperin E. SLIViT: a general AI framework for clinical-feature diagnosis from limited 3D biomedical-imaging data. Res Sq 2023:rs.3.rs-3044914. [PMID: 38045283 PMCID: PMC10690310 DOI: 10.21203/rs.3.rs-3044914/v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
We present SLIViT, a deep-learning framework that accurately measures disease-related risk factors in volumetric biomedical imaging, such as magnetic resonance imaging (MRI) scans, optical coherence tomography (OCT) scans, and ultrasound videos. To evaluate SLIViT, we applied it to five different datasets of these three different data modalities tackling seven learning tasks (including both classification and regression) and found that it consistently and significantly outperforms domain-specific state-of-the-art models, typically improving performance (ROC AUC or correlation) by 0.1-0.4. Notably, compared to existing approaches, SLIViT can be applied even when only a small number of annotated training samples is available, which is often a constraint in medical applications. When trained on less than 700 annotated volumes, SLIViT obtained accuracy comparable to trained clinical specialists while reducing annotation time by a factor of 5,000 demonstrating its utility to automate and expedite ongoing research and other practical clinical scenarios. *Oren Avram and Berkin Durmus equally contributed to this work. **Srinivas R. Sadda and Eran Halperin jointly supervised this study.
Collapse
|
2
|
Lauterbur ME, Cavassim MIA, Gladstein AL, Gower G, Pope NS, Tsambos G, Adrion J, Belsare S, Biddanda A, Caudill V, Cury J, Echevarria I, Haller BC, Hasan AR, Huang X, Iasi LNM, Noskova E, Obsteter J, Pavinato VAC, Pearson A, Peede D, Perez MF, Rodrigues MF, Smith CCR, Spence JP, Teterina A, Tittes S, Unneberg P, Vazquez JM, Waples RK, Wohns AW, Wong Y, Baumdicker F, Cartwright RA, Gorjanc G, Gutenkunst RN, Kelleher J, Kern AD, Ragsdale AP, Ralph PL, Schrider DR, Gronau I. Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations. eLife 2023; 12:RP84874. [PMID: 37342968 DOI: 10.7554/elife.84874] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2023] Open
Abstract
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.
Collapse
Affiliation(s)
- M Elise Lauterbur
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, United States
| | - Maria Izabel A Cavassim
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, United States
| | | | - Graham Gower
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Nathaniel S Pope
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Georgia Tsambos
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Jeffrey Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
- Ancestry DNA, San Francisco, United States
| | - Saurabh Belsare
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | | | - Victoria Caudill
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Jean Cury
- Universite Paris-Saclay, CNRS, INRIA, Laboratoire Interdisciplinaire des Sciences du Numerique, Orsay, France
| | | | - Benjamin C Haller
- Department of Computational Biology, Cornell University, Ithaca, United States
| | - Ahmed R Hasan
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
- Department of Biology, University of Toronto Mississauga, Mississauga, Canada
| | - Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | | | - Ekaterina Noskova
- Computer Technologies Laboratory, ITMO University, St Petersburg, Russian Federation
| | - Jana Obsteter
- Agricultural Institute of Slovenia, Department of Animal Science, Ljubljana, Slovenia
| | | | - Alice Pearson
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - David Peede
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, United States
- Center for Computational Molecular Biology, Brown University, Providence, United States
| | - Manolo F Perez
- Department of Genetics and Evolution, Federal University of Sao Carlos, Sao Carlos, Brazil
| | - Murillo F Rodrigues
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Chris C R Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Jeffrey P Spence
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Anastasia Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Silas Tittes
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Per Unneberg
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Juan Manuel Vazquez
- Department of Integrative Biology, University of California, Berkeley, Berkeley, United States
| | - Ryan K Waples
- Department of Biostatistics, University of Washington, Seattle, United States
| | | | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Franz Baumdicker
- Cluster of Excellence - Controlling Microbes to Fight Infections, Eberhard Karls Universit¨at Tubingen, Tubingen, Germany
| | - Reed A Cartwright
- School of Life Sciences and The Biodesign Institute, Arizona State University, Tempe, United States
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, United States
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, United States
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
- Department of Mathematics, University of Oregon, Eugene, United States
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
| |
Collapse
|
3
|
Chavez DE, Gronau I, Hains T, Dikow RB, Frandsen PB, Figueiró HV, Garcez FS, Tchaicka L, de Paula RC, Rodrigues FHG, Jorge RSP, Lima ES, Songsasen N, Johnson WE, Eizirik E, Koepfli KP, Wayne RK. Comparative genomics uncovers the evolutionary history, demography, and molecular adaptations of South American canids. Proc Natl Acad Sci U S A 2022; 119:e2205986119. [PMID: 35969758 PMCID: PMC9407222 DOI: 10.1073/pnas.2205986119] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 06/28/2022] [Indexed: 11/18/2022] Open
Abstract
The remarkable radiation of South American (SA) canids produced 10 extant species distributed across diverse habitats, including disparate forms such as the short-legged, hypercarnivorous bush dog and the long-legged, largely frugivorous maned wolf. Despite considerable research spanning nearly two centuries, many aspects of their evolutionary history remain unknown. Here, we analyzed 31 whole genomes encompassing all extant SA canid species to assess phylogenetic relationships, interspecific hybridization, historical demography, current genetic diversity, and the molecular bases of adaptations in the bush dog and maned wolf. We found that SA canids originated from a single ancestor that colonized South America 3.9 to 3.5 Mya, followed by diversification east of the Andes and then a single colonization event and radiation of Lycalopex species west of the Andes. We detected extensive historical gene flow between recently diverged lineages and observed distinct patterns of genomic diversity and demographic history in SA canids, likely induced by past climatic cycles compounded by human-induced population declines. Genome-wide scans of selection showed that disparate limb proportions in the bush dog and maned wolf may derive from mutations in genes regulating chondrocyte proliferation and enlargement. Further, frugivory in the maned wolf may have been enabled by variants in genes associated with energy intake from short-chain fatty acids. In contrast, unique genetic variants detected in the bush dog may underlie interdigital webbing and dental adaptations for hypercarnivory. Our analyses shed light on the evolution of a unique carnivoran radiation and how it was shaped by South American topography and climate change.
Collapse
Affiliation(s)
- Daniel E. Chavez
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095
- Biodesign Institute, School of Life Sciences, Arizona State University, Tempe, AZ 85287
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Reichman University, Herzliya 46150, Israel
| | - Taylor Hains
- Committee on Evolutionary Biology, University of Chicago, Chicago, IL 60637
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC 20560
| | - Paul B. Frandsen
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC 20560
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602
| | - Henrique V. Figueiró
- Smithsonian’s National Zoo and Conservation Biology Institute, Center for Species Survival, Front Royal, VA 22630
- School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, 90619-900, Brazil
| | - Fabrício S. Garcez
- School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, 90619-900, Brazil
| | - Ligia Tchaicka
- Rede de Biodiversidade e Biotecnologia da Amazônia, Curso de Pós-Graduação em Recursos Aquáticos e Pesca, Universidade Estadual do Maranhão, São Luis, 2016-8100, Brazil
| | - Rogério C. de Paula
- Centro Nacional de Pesquisa e Conservação de Mamíferos Carnívoros, Instituto Chico Mendes de Conservação da Biodiversidade, 12952-011, Atibaia, Brazil
| | - Flávio H. G. Rodrigues
- Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Rodrigo S. P. Jorge
- Centro Nacional de Avaliação da Biodiversidade e de Pesquisa e Conservação do Cerrado, Instituto Chico Mendes de Conservação da Biodiversidade, Brasilia, 70670-350, Brazil
| | - Edson S. Lima
- Private address, Nova Xavantina, MT, 78690-000, Brazil
| | - Nucharin Songsasen
- Smithsonian’s National Zoo and Conservation Biology Institute, Center for Species Survival, Front Royal, VA 22630
| | - Warren E. Johnson
- Smithsonian’s National Zoo and Conservation Biology Institute, Center for Species Survival, Front Royal, VA 22630
| | - Eduardo Eizirik
- School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, 90619-900, Brazil
- Instituto Pró-Carnívoros, Atibaia, 12945-010, Brazil
- Instituto Nacional de Ciência e Tecnologia em Ecologia Evolução Conservação da Biodiverside, Universidade Federal de GoiásGoiânia, 74690-900, Brazil
| | - Klaus-Peter Koepfli
- Smithsonian’s National Zoo and Conservation Biology Institute, Center for Species Survival, Front Royal, VA 22630
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095
| |
Collapse
|
4
|
Leibovich Z, Gronau I. Optimal Design of Synthetic DNA Sequences Without Unwanted Binding Sites. J Comput Biol 2022; 29:974-986. [PMID: 35648072 DOI: 10.1089/cmb.2021.0417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Synthesizing DNA molecules by design has become an essential tool in molecular biology and is expected to become ubiquitous in the coming decade. Successful design of a synthetic DNA molecule often requires satisfying multiple objectives, some of which may conflict with others. One particularly important objective is the elimination of unwanted protein binding sites, which may interfere with the desired function of the synthesized molecule. While most design tools offer this fundamental capability, they do not follow a systematic approach that guarantees elimination of all unwanted sites whenever a feasible solution exists. Furthermore, the algorithms these tools use (when published) are often quite naive and inefficient. We present a formal description of the binding site elimination problem and suggest several efficient algorithms that eliminate unwanted patterns with minimum interference to the desired function of the synthesized sequence. These algorithms are simple, efficient, and flexible and, therefore, can be easily incorporated in all existing DNA design tools, enhancing their design capabilities.
Collapse
Affiliation(s)
- Zehavit Leibovich
- Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
| |
Collapse
|
5
|
Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL, Gower G, Kyriazis CC, Ragsdale AP, Tsambos G, Baumdicker F, Carlson J, Cartwright RA, Durvasula A, Gronau I, Kim BY, McKenzie P, Messer PW, Noskova E, Ortega-Del Vecchyo D, Racimo F, Struck TJ, Gravel S, Gutenkunst RN, Lohmueller KE, Ralph PL, Schrider DR, Siepel A, Kelleher J, Kern AD. A community-maintained standard library of population genetic models. eLife 2020; 9:e54967. [PMID: 32573438 PMCID: PMC7438115 DOI: 10.7554/elife.54967] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 06/15/2020] [Indexed: 12/18/2022] Open
Abstract
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
Collapse
Affiliation(s)
- Jeffrey R Adrion
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
| | - Christopher B Cole
- Weatherall Institute of Molecular Medicine, University of OxfordOxfordUnited Kingdom
| | - Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor LaboratoryCold Spring HarborUnited States
| | - Jared G Galloway
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
| | - Ariella L Gladstein
- Department of Genetics, University of North Carolina at Chapel HillChapel HillUnited States
| | - Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, University of CopenhagenCopenhagenDenmark
| | - Christopher C Kyriazis
- Department of Ecology and Evolutionary Biology, University of California, Los AngelesLos AngelesUnited States
| | | | - Georgia Tsambos
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of MelbourneMelbourneAustralia
| | - Franz Baumdicker
- Department of Mathematical Stochastics, University of FreiburgFreiburgGermany
| | - Jedidiah Carlson
- Department of Genome Sciences, University of WashingtonSeattleUnited States
| | - Reed A Cartwright
- The Biodesign Institute and The School of Life Sciences, Arizona State UniversityTempeUnited States
| | - Arun Durvasula
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los AngelesLos AngelesUnited States
| | - Ilan Gronau
- The Efi Arazi School of Computer Science, Herzliya Interdisciplinary CenterHerzliyaIsrael
| | - Bernard Y Kim
- Department of Biology, Stanford UniversityStanfordUnited States
| | - Patrick McKenzie
- Department of Ecology, Evolution, and Environmental Biology, Columbia UniversityNew YorkUnited States
| | - Philipp W Messer
- Department of Computational BiologyCornell UniversityIthacaUnited States
| | - Ekaterina Noskova
- Computer Technologies Laboratory, ITMO UniversitySaint PetersburgRussian Federation
| | - Diego Ortega-Del Vecchyo
- International Laboratory for Human Genome Research, National Autonomous University of MexicoJuriquillaMexico
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, Globe Institute, University of CopenhagenCopenhagenDenmark
| | - Travis J Struck
- Departmentof Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Simon Gravel
- Department of Human Genetics, McGill UniversityMontrealCanada
| | - Ryan N Gutenkunst
- Departmentof Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los AngelesLos AngelesUnited States
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los AngelesLos AngelesUnited States
| | - Peter L Ralph
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
- Department of Mathematics, University of OregonEugeneUnited States
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel HillChapel HillUnited States
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor LaboratoryCold Spring HarborUnited States
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of OxfordOxfordUnited Kingdom
| | - Andrew D Kern
- Department of Biology and Institute of Ecology and Evolution, University of OregonEugeneUnited States
| |
Collapse
|
6
|
Chavez DE, Gronau I, Hains T, Kliver S, Koepfli KP, Wayne RK. Comparative genomics provides new insights into the remarkable adaptations of the African wild dog (Lycaon pictus). Sci Rep 2019; 9:8329. [PMID: 31171819 PMCID: PMC6554312 DOI: 10.1038/s41598-019-44772-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 05/22/2019] [Indexed: 12/02/2022] Open
Abstract
Within the Canidae, the African wild dog (Lycaon pictus) is the most specialized with regards to cursorial adaptations (specialized for running), having only four digits on their forefeet. In addition, this species is one of the few canids considered to be an obligate meat-eater, possessing a robust dentition for taking down large prey, and displays one of the most variable coat colorations amongst mammals. Here, we used comparative genomic analysis to investigate the evolutionary history and genetic basis for adaptations associated with cursoriality, hypercanivory, and coat color variation in African wild dogs. Genome-wide scans revealed unique amino acid deletions that suggest a mode of evolutionary digit loss through expanded apoptosis in the developing first digit. African wild dog-specific signals of positive selection also uncovered a putative mechanism of molar cusp modification through changes in genes associated with the sonic hedgehog (SHH) signaling pathway, required for spatial patterning of teeth, and three genes associated with pigmentation. Divergence time analyses suggest the suite of genomic changes we identified evolved ~1.7 Mya, coinciding with the diversification of large-bodied ungulates. Our results show that comparative genomics is a powerful tool for identifying the genetic basis of evolutionary changes in Canidae.
Collapse
Affiliation(s)
- Daniel E Chavez
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, 90095, USA.
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Herzliya Interdisciplinary Center (IDC), Herzliya, 46150, Israel
| | - Taylor Hains
- Environmental Science and Policy, Johns Hopkins University, Washington, D.C., 20036, USA
| | - Sergei Kliver
- Institute of Molecular and Cellular Biology, Novosibirsk, 630090, Russian Federation
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, National Zoological Park, Washington, D.C., 20008, USA
- Theodosius Dobzhansky Center for Genome Bioinformatics, Saint Petersburg State University, Saint Petersburg, 199034, Russian Federation
| | - Robert K Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, 90095, USA
| |
Collapse
|
7
|
Abstract
A response to Hohenlohe et al.
Collapse
Affiliation(s)
- Bridgett M. vonHoldt
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - James A. Cahill
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY 10065, USA
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Herzliya Interdisciplinary Center, Herzliya 46510, Israel
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jeff Wall
- Department of Epidemiology and Biostatistics, School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA 90094, USA
| |
Collapse
|
8
|
vonHoldt BM, Cahill JA, Fan Z, Gronau I, Robinson J, Pollinger JP, Shapiro B, Wall J, Wayne RK. Whole-genome sequence analysis shows that two endemic species of North American wolf are admixtures of the coyote and gray wolf. Sci Adv 2016; 2:e1501714. [PMID: 29713682 PMCID: PMC5919777 DOI: 10.1126/sciadv.1501714] [Citation(s) in RCA: 115] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 06/28/2016] [Indexed: 05/22/2023]
Abstract
Protection of populations comprising admixed genomes is a challenge under the Endangered Species Act (ESA), which is regarded as the most powerful species protection legislation ever passed in the United States but lacks specific provisions for hybrids. The eastern wolf is a newly recognized wolf-like species that is highly admixed and inhabits the Great Lakes and eastern United States, a region previously thought to be included in the geographic range of only the gray wolf. The U.S. Fish and Wildlife Service has argued that the presence of the eastern wolf, rather than the gray wolf, in this area is grounds for removing ESA protection (delisting) from the gray wolf across its geographic range. In contrast, the red wolf from the southeastern United States was one of the first species protected under the ESA and was protected despite admixture with coyotes. We use whole-genome sequence data to demonstrate a lack of unique ancestry in eastern and red wolves that would not be expected if they represented long divergent North American lineages. These results suggest that arguments for delisting the gray wolf are not valid. Our findings demonstrate how a strict designation of a species under the ESA that does not consider admixture can threaten the protection of endangered entities. We argue for a more balanced approach that focuses on the ecological context of admixture and allows for evolutionary processes to potentially restore historical patterns of genetic variation.
Collapse
Affiliation(s)
- Bridgett M. vonHoldt
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - James A. Cahill
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Zhenxin Fan
- Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu 610064, People’s Republic of China
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Herzliya Interdisciplinary Center, Herzliya 46150, Israel
| | - Jacqueline Robinson
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA 90095–1606, USA
| | - John P. Pollinger
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA 90095–1606, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jeff Wall
- Department of Epidemiology and Biostatistics, School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA 90095–1606, USA
| |
Collapse
|
9
|
Freedman AH, Schweizer RM, Ortega-Del Vecchyo D, Han E, Davis BW, Gronau I, Silva PM, Galaverni M, Fan Z, Marx P, Lorente-Galdos B, Ramirez O, Hormozdiari F, Alkan C, Vilà C, Squire K, Geffen E, Kusak J, Boyko AR, Parker HG, Lee C, Tadigotla V, Siepel A, Bustamante CD, Harkins TT, Nelson SF, Marques-Bonet T, Ostrander EA, Wayne RK, Novembre J. Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs. PLoS Genet 2016; 12:e1005851. [PMID: 26943675 PMCID: PMC4778760 DOI: 10.1371/journal.pgen.1005851] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 01/18/2016] [Indexed: 12/31/2022] Open
Abstract
Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.
Collapse
Affiliation(s)
- Adam H. Freedman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| | - Rena M. Schweizer
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Diego Ortega-Del Vecchyo
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Eunjung Han
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Brian W. Davis
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | | | | | - Zhenxin Fan
- Key Laboratory of Bioresources and Ecoenvironment, Sichuan University, Chengdu, China
| | - Peter Marx
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| | - Belen Lorente-Galdos
- ICREA at Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Oscar Ramirez
- ICREA at Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Farhad Hormozdiari
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California, United States of America
| | | | - Carles Vilà
- Estación Biológia de Doñana EBD-CSIC, Sevilla, Spain
| | - Kevin Squire
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Eli Geffen
- Department of Zoology, Tel Aviv University, Tel Aviv, Israel
| | - Josip Kusak
- Department of Biology, University of Zagreb, Zagreb, Croatia
| | - Adam R. Boyko
- Department of Biomedical Sciences, Cornell University, Ithaca, New York, United States of America
| | - Heidi G. Parker
- ICREA at Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Clarence Lee
- Life Technologies, Foster City, California, United States of America
| | - Vasisht Tadigotla
- Life Technologies, Foster City, California, United States of America
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | | | | | - Stanley F. Nelson
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Tomas Marques-Bonet
- ICREA at Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
- Centro Nacional de Analisis Genomico (CNAG/PCB), Baldiri Reixach 4–8, Barcelona, Spain
| | - Elaine A. Ostrander
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - John Novembre
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
10
|
Kuhlwilm M, Gronau I, Hubisz MJ, de Filippo C, Prado-Martinez J, Kircher M, Fu Q, Burbano HA, Lalueza-Fox C, de la Rasilla M, Rosas A, Rudan P, Brajkovic D, Kucan Ž, Gušic I, Marques-Bonet T, Andrés AM, Viola B, Pääbo S, Meyer M, Siepel A, Castellano S. Ancient gene flow from early modern humans into Eastern Neanderthals. Nature 2016; 530:429-33. [PMID: 26886800 DOI: 10.1038/nature16544] [Citation(s) in RCA: 208] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 12/17/2015] [Indexed: 12/11/2022]
Abstract
It has been shown that Neanderthals contributed genetically to modern humans outside Africa 47,000-65,000 years ago. Here we analyse the genomes of a Neanderthal and a Denisovan from the Altai Mountains in Siberia together with the sequences of chromosome 21 of two Neanderthals from Spain and Croatia. We find that a population that diverged early from other modern humans in Africa contributed genetically to the ancestors of Neanderthals from the Altai Mountains roughly 100,000 years ago. By contrast, we do not detect such a genetic contribution in the Denisovan or the two European Neanderthals. We conclude that in addition to later interbreeding events, the ancestors of Neanderthals from the Altai Mountains and early modern humans met and interbred, possibly in the Near East, many thousands of years earlier than previously thought.
Collapse
Affiliation(s)
- Martin Kuhlwilm
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Herzliya Interdisciplinary Center (IDC), Herzliya 46150, Israel
| | - Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA
| | - Cesare de Filippo
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | | | - Martin Kircher
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany.,Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Qiaomei Fu
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany.,Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.,Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, IVPP, CAS, Beijing 100044, China
| | - Hernán A Burbano
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany.,Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | | | - Marco de la Rasilla
- Área de Prehistoria, Departamento de Historia, Universidad de Oviedo, 33011 Oviedo, Spain
| | - Antonio Rosas
- Departamento de Paleobiología, Museo Nacional de Ciencias Naturales, CSIC, 28006 Madrid, Spain
| | - Pavao Rudan
- Anthropology Center of the Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Dejana Brajkovic
- Croatian Academy of Sciences and Arts, Institute for Quaternary Paleontology and Geology, 10000 Zagreb, Croatia
| | - Željko Kucan
- Anthropology Center of the Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Ivan Gušic
- Anthropology Center of the Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), 08003 Barcelona, Spain.,Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain.,Centro Nacional de Análisis Genómico (CRG-CNAG), 08028 Barcelona, Spain
| | - Aida M Andrés
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Bence Viola
- Department of Anthropology, University of Toronto, Toronto, Ontario M5S 2S2, Canada.,Department of Human Evolution, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Svante Pääbo
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Matthias Meyer
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA.,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sergi Castellano
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| |
Collapse
|
11
|
Fan Z, Silva P, Gronau I, Wang S, Armero AS, Schweizer RM, Ramirez O, Pollinger J, Galaverni M, Ortega Del-Vecchyo D, Du L, Zhang W, Zhang Z, Xing J, Vilà C, Marques-Bonet T, Godinho R, Yue B, Wayne RK. Worldwide patterns of genomic variation and admixture in gray wolves. Genome Res 2015; 26:163-73. [PMID: 26680994 PMCID: PMC4728369 DOI: 10.1101/gr.197517.115] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 12/15/2015] [Indexed: 12/25/2022]
Abstract
The gray wolf (Canis lupus) is a widely distributed top predator and ancestor of the domestic dog. To address questions about wolf relationships to each other and dogs, we assembled and analyzed a data set of 34 canine genomes. The divergence between New and Old World wolves is the earliest branching event and is followed by the divergence of Old World wolves and dogs, confirming that the dog was domesticated in the Old World. However, no single wolf population is more closely related to dogs, supporting the hypothesis that dogs were derived from an extinct wolf population. All extant wolves have a surprisingly recent common ancestry and experienced a dramatic population decline beginning at least ∼30 thousand years ago (kya). We suggest this crisis was related to the colonization of Eurasia by modern human hunter–gatherers, who competed with wolves for limited prey but also domesticated them, leading to a compensatory population expansion of dogs. We found extensive admixture between dogs and wolves, with up to 25% of Eurasian wolf genomes showing signs of dog ancestry. Dogs have influenced the recent history of wolves through admixture and vice versa, potentially enhancing adaptation. Simple scenarios of dog domestication are confounded by admixture, and studies that do not take admixture into account with specific demographic models are problematic.
Collapse
Affiliation(s)
- Zhenxin Fan
- Key Laboratory of Bioresources and Ecoenvironment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu 610064, People's Republic of China; Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA
| | - Pedro Silva
- CIBIO-UP, University of Porto, Vairão, 4485-661, Portugal
| | - Ilan Gronau
- Efi Arazi School of Computer Science, the Herzliya Interdisciplinary Center (IDC), Herzliya 46150, Israel
| | - Shuoguo Wang
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, New Jersey 08854, USA
| | | | - Rena M Schweizer
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA
| | - Oscar Ramirez
- ICREA at Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain
| | - John Pollinger
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA
| | | | - Diego Ortega Del-Vecchyo
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California 90095-1606, USA
| | - Lianming Du
- Key Laboratory of Bioresources and Ecoenvironment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu 610064, People's Republic of China
| | - Wenping Zhang
- Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, Chengdu Research Base of Giant Panda Breeding, Chengdu, Sichuan Province, People's Republic of China, 610081
| | - Zhihe Zhang
- Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, Chengdu Research Base of Giant Panda Breeding, Chengdu, Sichuan Province, People's Republic of China, 610081
| | - Jinchuan Xing
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, New Jersey 08854, USA; Human Genetics Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, New Jersey 08854, USA
| | - Carles Vilà
- Centro Nacional de Análisis Genómico (CNAG), Parc Científic de Barcelona, 08028 Barcelona, Spain
| | - Tomas Marques-Bonet
- ICREA at Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain; Centro Nacional de Análisis Genómico (CNAG), Parc Científic de Barcelona, 08028 Barcelona, Spain
| | - Raquel Godinho
- CIBIO-UP, University of Porto, Vairão, 4485-661, Portugal
| | - Bisong Yue
- Key Laboratory of Bioresources and Ecoenvironment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu 610064, People's Republic of China
| | - Robert K Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA
| |
Collapse
|
12
|
Campagna L, Gronau I, Silveira LF, Siepel A, Lovette IJ. Distinguishing noise from signal in patterns of genomic divergence in a highly polymorphic avian radiation. Mol Ecol 2015; 24:4238-51. [DOI: 10.1111/mec.13314] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 07/03/2015] [Indexed: 01/10/2023]
Affiliation(s)
- Leonardo Campagna
- Fuller Evolutionary Biology Program; Cornell Laboratory of Ornithology; 159 Sapsucker Woods Road Ithaca NY 14850 USA
- Department of Ecology and Evolutionary Biology; Cornell University; 215 Tower Road Ithaca NY 14853 USA
| | - Ilan Gronau
- Efi Arazi School of Computer Science; Herzliya Interdisciplinary Center (IDC); P.O. Box 167, Kanfei Nesharim St. Herzliya 46150 Israel
| | - Luís Fábio Silveira
- Seção de Aves; Museu de Zoologia, Universidade de São Paulo (MZUSP); Caixa Postal 42.494 CEP 04218-970 São Paulo SP Brazil
| | - Adam Siepel
- Watson School of Biological Sciences; Simons Center for Quantitative Biology; Cold Spring Harbor Laboratory; One Bungtown Road Cold Spring Harbor NY 11724 USA
| | - Irby J. Lovette
- Fuller Evolutionary Biology Program; Cornell Laboratory of Ornithology; 159 Sapsucker Woods Road Ithaca NY 14850 USA
- Department of Ecology and Evolutionary Biology; Cornell University; 215 Tower Road Ithaca NY 14853 USA
| |
Collapse
|
13
|
Gulko B, Hubisz MJ, Gronau I, Siepel A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 2015; 47:276-83. [PMID: 25599402 PMCID: PMC4342276 DOI: 10.1038/ng.3196] [Citation(s) in RCA: 173] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 12/19/2014] [Indexed: 12/17/2022]
Abstract
We describe a novel computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These fitness consequence (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct “fingerprints” based on high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types based on public data from ENCODE. Compared with conventional conservation scores, fitCons scores show considerably improved prediction power for cis-regulatory elements. In addition, fitCons scores indicate that 4.2–7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and they suggest that recent evolutionary turnover has had limited impact on the functional content of the genome.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- 1] Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA. [2] Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
14
|
Mohammed J, Bortolamiol-Becet D, Flynt AS, Gronau I, Siepel A, Lai EC. Adaptive evolution of testis-specific, recently evolved, clustered miRNAs in Drosophila. RNA 2014; 20:1195-209. [PMID: 24942624 PMCID: PMC4105746 DOI: 10.1261/rna.044644.114] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Accepted: 04/11/2014] [Indexed: 05/09/2023]
Abstract
The propensity of animal miRNAs to regulate targets bearing modest complementarity, most notably via pairing with miRNA positions ∼2-8 (the "seed"), is believed to drive major aspects of miRNA evolution. First, minimal targeting requirements have allowed most conserved miRNAs to acquire large target cohorts, thus imposing strong selection on miRNAs to maintain their seed sequences. Second, the modest pairing needed for repression suggests that evolutionarily nascent miRNAs may generally induce net detrimental, rather than beneficial, regulatory effects. Hence, levels and activities of newly emerged miRNAs are expected to be limited to preserve the status quo of gene expression. In this study, we unexpectedly show that Drosophila testes specifically express a substantial miRNA population that contravenes these tenets. We find that multiple genomic clusters of testis-restricted miRNAs harbor recently evolved miRNAs, whose experimentally verified orthologs exhibit divergent sequences, even within seed regions. Moreover, this class of miRNAs exhibits higher expression and greater phenotypic capacities in transgenic misexpression assays than do non-testis-restricted miRNAs of similar evolutionary age. These observations suggest that these testis-restricted miRNAs may be evolving adaptively, and several methods of evolutionary analysis provide strong support for this notion. Consistent with this, proof-of-principle tests show that orthologous miRNAs with divergent seeds can distinguish target sensors in a species-cognate manner. Finally, we observe that testis-restricted miRNA clusters exhibit extraordinary dynamics of miRNA gene flux in other Drosophila species. Altogether, our findings reveal a surprising tissue-directed influence of miRNA evolution, involving a distinct mode of miRNA function connected to adaptive gene regulation in the testis.
Collapse
Affiliation(s)
- Jaaved Mohammed
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York 10065, USA
| | - Diane Bortolamiol-Becet
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| | - Alex S Flynt
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
| | - Eric C Lai
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| |
Collapse
|
15
|
Abstract
The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.
Collapse
Affiliation(s)
- Matthew D. Rasmussen
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail: (MDR); (AS)
| | - Melissa J. Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambs, United Kingdom
- * E-mail: (MDR); (AS)
| |
Collapse
|
16
|
Freedman AH, Gronau I, Schweizer RM, Ortega-Del Vecchyo D, Han E, Silva PM, Galaverni M, Fan Z, Marx P, Lorente-Galdos B, Beale H, Ramirez O, Hormozdiari F, Alkan C, Vilà C, Squire K, Geffen E, Kusak J, Boyko AR, Parker HG, Lee C, Tadigotla V, Siepel A, Bustamante CD, Harkins TT, Nelson SF, Ostrander EA, Marques-Bonet T, Wayne RK, Novembre J. Genome sequencing highlights the dynamic early history of dogs. PLoS Genet 2014; 10:e1004016. [PMID: 24453982 PMCID: PMC3894170 DOI: 10.1371/journal.pgen.1004016] [Citation(s) in RCA: 323] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Accepted: 10/28/2013] [Indexed: 11/18/2022] Open
Abstract
To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary. The process of dog domestication is still poorly understood, largely because no studies thus far have leveraged deeply sequenced whole genomes from wolves and dogs to simultaneously evaluate support for the proposed source regions: East Asia, the Middle East, and Europe. To investigate dog origins, we sequence three wolf genomes from the putative centers of origin, two basal dog breeds (Basenji and Dingo), and a golden jackal as an outgroup. We find that none of the wolf lineages from the hypothesized domestication centers is supported as the source lineage for dogs, and that dogs and wolves diverged 11,000–16,000 years ago in a process involving extensive admixture and that was followed by a bottleneck in wolves. In addition, we investigate the amylase (AMY2B) gene family expansion in dogs, which has recently been suggested as being critical to domestication in response to increased dietary starch. We find standing variation in AMY2B copy number in wolves and show that some breeds, such as Dingo and Husky, lack the AMY2B expansion. This suggests that, at the beginning of the domestication process, dogs may have been characterized by a more carnivorous diet than their modern day counterparts, a diet held in common with early hunter-gatherers.
Collapse
Affiliation(s)
- Adam H. Freedman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Rena M. Schweizer
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Diego Ortega-Del Vecchyo
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Eunjung Han
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
| | | | | | - Zhenxin Fan
- Key Laboratory of Bioresources and Ecoenvironment, Sichuan University, Chengdu, China
| | - Peter Marx
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary
| | | | - Holly Beale
- National Institutes of Health/NHGRI, Bethesda, Maryland, United States of America
| | - Oscar Ramirez
- Institut de Biologia Evolutiva (CSIC-Univ Pompeu Fabra), Barcelona, Spain
| | - Farhad Hormozdiari
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California, United States of America
| | | | - Carles Vilà
- Estación Biológia de Doñana EBD-CSIC, Sevilla, Spain
| | - Kevin Squire
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Eli Geffen
- Department of Zoology, Tel Aviv University, Tel Aviv, Israel
| | | | - Adam R. Boyko
- Department of Veterinary Medicine, Cornell University, Ithaca, New York, United States of America
| | - Heidi G. Parker
- National Institutes of Health/NHGRI, Bethesda, Maryland, United States of America
| | - Clarence Lee
- Life Technologies, Foster City, California, United States of America
| | - Vasisht Tadigotla
- Life Technologies, Foster City, California, United States of America
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | | | | | - Stanley F. Nelson
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, United States of America
| | - Elaine A. Ostrander
- National Institutes of Health/NHGRI, Bethesda, Maryland, United States of America
| | - Tomas Marques-Bonet
- Institut de Biologia Evolutiva (CSIC-Univ Pompeu Fabra), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA). 08010, Barcelona, Spain
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail: (RKW); (JN)
| | - John Novembre
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail: (RKW); (JN)
| |
Collapse
|
17
|
Arbiza L, Gronau I, Aksoy BA, Hubisz MJ, Gulko B, Keinan A, Siepel A. Genome-wide inference of natural selection on human transcription factor binding sites. Nat Genet 2013; 45:723-9. [PMID: 23749186 DOI: 10.1038/ng.2658] [Citation(s) in RCA: 106] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Accepted: 05/08/2013] [Indexed: 11/09/2022]
Abstract
For decades, it has been hypothesized that gene regulation has had a central role in human evolution, yet much remains unknown about the genome-wide impact of regulatory mutations. Here we use whole-genome sequences and genome-wide chromatin immunoprecipitation and sequencing data to demonstrate that natural selection has profoundly influenced human transcription factor binding sites since the divergence of humans from chimpanzees 4-6 million years ago. Our analysis uses a new probabilistic method, called INSIGHT, for measuring the influence of selection on collections of short, interspersed noncoding elements. We find that, on average, transcription factor binding sites have experienced somewhat weaker selection than protein-coding genes. However, the binding sites of several transcription factors show clear evidence of adaptation. Several measures of selection are strongly correlated with predicted binding affinity. Overall, regulatory elements seem to contribute substantially to both adaptive substitutions and deleterious polymorphisms with key implications for human evolution and disease.
Collapse
Affiliation(s)
- Leonardo Arbiza
- Department of Biological Statistics & Computational Biology, Cornell University, Ithaca, NY, USA
| | | | | | | | | | | | | |
Collapse
|
18
|
Gronau I, Arbiza L, Mohammed J, Siepel A. Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Mol Biol Evol 2013; 30:1159-71. [PMID: 23386628 DOI: 10.1093/molbev/mst019] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Complete genome sequences contain valuable information about natural selection, but this information is difficult to access for short, widely scattered noncoding elements such as transcription factor binding sites or small noncoding RNAs. Here, we introduce a new computational method, called Inference of Natural Selection from Interspersed Genomically coHerent elemenTs (INSIGHT), for measuring the influence of natural selection on such elements. INSIGHT uses a generative probabilistic model to contrast patterns of polymorphism and divergence in the elements of interest with those in flanking neutral sites, pooling weak information from many short elements in a manner that accounts for variation among loci in mutation rates and coalescent times. The method is able to disentangle the contributions of weak negative, strong negative, and positive selection based on their distinct effects on patterns of polymorphism and divergence. It obtains information about divergence from multiple outgroup genomes using a general statistical phylogenetic approach. The INSIGHT model is efficiently fitted to genome-wide data using an approximate expectation maximization algorithm. Using simulations, we show that the method can accurately estimate the parameters of interest even in complex demographic scenarios, and that it significantly improves on methods based on summary statistics describing polymorphism and divergence. To demonstrate the usefulness of INSIGHT, we apply it to several classes of human noncoding RNAs and to GATA2-binding sites in the human genome.
Collapse
Affiliation(s)
- Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, USA
| | | | | | | |
Collapse
|
19
|
Doerr D, Gronau I, Moran S, Yavneh I. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions. Algorithms Mol Biol 2012; 7:22. [PMID: 22938153 PMCID: PMC3538584 DOI: 10.1186/1748-7188-7-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 06/28/2012] [Indexed: 11/24/2022] Open
Abstract
UNLABELLED BACKGROUND Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. RESULTS This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura's two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. CONCLUSIONS We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.
Collapse
Affiliation(s)
- Daniel Doerr
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA
| | - Shlomo Moran
- Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel
| | - Irad Yavneh
- Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
20
|
Choi SC, Rasmussen MD, Hubisz MJ, Gronau I, Stanhope MJ, Siepel A. Replacing and additive horizontal gene transfer in Streptococcus. Mol Biol Evol 2012; 29:3309-20. [PMID: 22617954 DOI: 10.1093/molbev/mss138] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The prominent role of Horizontal Gene Transfer (HGT) in the evolution of bacteria is now well documented, but few studies have differentiated between evolutionary events that predominantly cause genes in one lineage to be replaced by homologs from another lineage ("replacing HGT") and events that result in the addition of substantial new genomic material ("additive HGT"). Here in, we make use of the distinct phylogenetic signatures of replacing and additive HGTs in a genome-wide study of the important human pathogen Streptococcus pyogenes (SPY) and its close relatives S. dysgalactiae subspecies equisimilis (SDE) and S. dysgalactiae subspecies dysgalactiae (SDD). Using recently developed statistical models and computational methods, we find evidence for abundant gene flow of both kinds within each of the SPY and SDE clades and of reduced levels of exchange between SPY and SDD. In addition, our analysis strongly supports a pronounced asymmetry in SPY-SDE gene flow, favoring the SPY-to-SDE direction. This finding is of particular interest in light of the recent increase in virulence of pathogenic SDE. We find much stronger evidence for SPY-SDE gene flow among replacing than among additive transfers, suggesting a primary influence from homologous recombination between co-occurring SPY and SDE cells in human hosts. Putative virulence genes are correlated with transfer events, but this correlation is found to be driven by additive, not replacing, HGTs. The genes affected by additive HGTs are enriched for functions having to do with transposition, recombination, and DNA integration, consistent with previous findings, whereas replacing HGTs seen to influence a more diverse set of genes. Additive transfers are also found to be associated with evidence of positive selection. These findings shed new light on the manner in which HGT has shaped pathogenic bacterial genomes.
Collapse
Affiliation(s)
- Sang Chul Choi
- Department of Biological Statistics and Computational Biology, Cornell University
| | | | | | | | | | | |
Collapse
|
21
|
Gronau I, Hubisz MJ, Gulko B, Danko CG, Siepel A. Bayesian inference of ancient human demography from individual genome sequences. Nat Genet 2011; 43:1031-4. [PMID: 21926973 PMCID: PMC3245873 DOI: 10.1038/ng.937] [Citation(s) in RCA: 369] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Accepted: 08/16/2011] [Indexed: 11/26/2022]
Abstract
Besides their value for biomedicine, individual genome sequences are a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters from sequences for six individuals from diverse human populations. We use a Bayesian, coalescent-based approach to extract information about ancestral population sizes, divergence times, and migration rates from inferred genealogies at many neutrally evolving loci from across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San of Southern Africa diverged from other human populations 108–157 thousand years ago (kya), that Eurasians diverged from an ancestral African population 38–64 kya, and that the effective population size of the ancestors of all modern humans was ~9,000.
Collapse
Affiliation(s)
- Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | | | | | | | | |
Collapse
|
22
|
Abstract
Distance-based phylogenetic reconstruction methods use the evolutionary distances between species in order to reconstruct the tree spanning them. The evolutionary distance between two species, which is computed from their DNA (or protein) sequences, is typically considered as a fixed function of these sequences, predetermined by the assumed model of evolution. This article continues the line of research that attempts to adjust to each given set of input sequences a distance function which maximizes the expected accuracy of the reconstructed tree. Specifically, we present methods for selecting distance functions that considerably improve the accuracy of quartets constructed by the four-point method in Kimura's 2-parameter model, where special emphasis is given to the case of non-homogenous quartets.
Collapse
Affiliation(s)
- Ilan Gronau
- Department of Computer Science, Technion, Haifa, Israel.
| | | | | |
Collapse
|
23
|
Linshiz G, Yehezkel TB, Kaplan S, Gronau I, Ravid S, Adar R, Shapiro E. Recursive construction of perfect DNA molecules from imperfect oligonucleotides. Mol Syst Biol 2008; 4:191. [PMID: 18463615 PMCID: PMC2424292 DOI: 10.1038/msb.2008.26] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2008] [Accepted: 03/13/2008] [Indexed: 11/24/2022] Open
Abstract
Making faultless complex objects from potentially faulty building blocks is a fundamental challenge in computer engineering, nanotechnology and synthetic biology. Here, we show for the first time how recursion can be used to address this challenge and demonstrate a recursive procedure that constructs error-free DNA molecules and their libraries from error-prone oligonucleotides. Divide and Conquer (D&C), the quintessential recursive problem-solving technique, is applied in silico to divide the target DNA sequence into overlapping oligonucleotides short enough to be synthesized directly, albeit with errors; error-prone oligonucleotides are recursively combined in vitro, forming error-prone DNA molecules; error-free fragments of these molecules are then identified, extracted and used as new, typically longer and more accurate, inputs to another iteration of the recursive construction procedure; the entire process repeats until an error-free target molecule is formed. Our recursive construction procedure surpasses existing methods for de novo DNA synthesis in speed, precision, amenability to automation, ease of combining synthetic and natural DNA fragments, and ability to construct designer DNA libraries. It thus provides a novel and robust foundation for the design and construction of synthetic biological molecules and organisms.
Collapse
Affiliation(s)
- Gregory Linshiz
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | | | | | | | | | | | | |
Collapse
|
24
|
|
25
|
Abstract
Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of edge-weighted trees by distances to LCAs (Least Common Ancestors). This characterization enables a direct application of ultrametric reconstruction techniques to trees which are not necessarily ultrametric. A simple and natural neighbor joining criterion based on this observation is used to provide a family of efficient neighbor-joining algorithms. These algorithms are shown to reconstruct a refinement of the Buneman tree, which implies optimal robustness to noise under criteria defined by Atteson. In this sense, they outperform many popular algorithms such as Saitou and Nei's NJ. One member of this family is used to provide a new simple version of the 3-approximation algorithm for the closest additive metric under the iota (infinity) norm. A byproduct of our work is a novel technique which yields a time optimal O (n (2)) implementation of common clustering algorithms such as UPGMA.
Collapse
Affiliation(s)
- Ilan Gronau
- Department of Computer Science, Technion, Haifa, Israel
| | | |
Collapse
|