501
|
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol 2015; 33:623-30. [DOI: 10.1038/nbt.3238] [Citation(s) in RCA: 687] [Impact Index Per Article: 76.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 04/08/2015] [Indexed: 02/07/2023]
|
502
|
Abstract
The human genome sequence has profoundly altered our understanding of biology, human diversity, and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past 10 years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them, as well as the challenges facing current sequencing platforms and their clinical application.
Collapse
Affiliation(s)
- Jason A Reuter
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Damek V Spacek
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
503
|
Abstract
Human genomes are diploid and, for their complete description and interpretation, it is necessary not only to discover the variation they contain but also to arrange it onto chromosomal haplotypes. Although whole-genome sequencing is becoming increasingly routine, nearly all such individual genomes are mostly unresolved with respect to haplotype, particularly for rare alleles, which remain poorly resolved by inferential methods. Here, we review emerging technologies for experimentally resolving (that is, 'phasing') haplotypes across individual whole-genome sequences. We also discuss computational methods relevant to their implementation, metrics for assessing their accuracy and completeness, and the relevance of haplotype information to applications of genome sequencing in research and clinical medicine.
Collapse
|
504
|
Flot JF, Marie-Nelly H, Koszul R. Contact genomics: scaffolding and phasing (meta)genomes using chromosome 3D physical signatures. FEBS Lett 2015; 589:2966-74. [PMID: 25935414 DOI: 10.1016/j.febslet.2015.04.034] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Revised: 04/17/2015] [Accepted: 04/17/2015] [Indexed: 12/12/2022]
Abstract
High-throughput DNA sequencing technologies are fuelling an accelerating trend to assemble de novo or resequence the genomes of numerous species as well as to complete unfinished assemblies. While current DNA sequencing technologies remain limited to reading stretches of a few hundreds or thousands of base pairs, experimental and computational methods are continuously improving with the goal of assembling entire genomes from large numbers of short DNA sequences. However, the algorithms that piece together DNA strands face important limitations due, notably, to the presence of repeated sequences or of multiple haplotypes within one genome, thus leaving many assemblies incomplete. Recently, the realization that the physical contacts experienced by a portion of a DNA molecule could be used as a robust and quantitative assay to determine its genomic position has led to the emerging field of contact genomics, which promises to revolutionize current genome assembly approaches by exploiting the flexible polymer properties of chromosomes. Here we review the current applications of contact genomics to genome scaffolding, haplotyping and metagenomic assembly, then outline the future developments we envision.
Collapse
Affiliation(s)
- Jean-François Flot
- Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.
| | - Hervé Marie-Nelly
- Institut Pasteur, Department of Genomes and Genetics, Groupe Régulation Spatiale des Génomes, 75015 Paris, France; CNRS, UMR 3525, 75015 Paris, France.
| | - Romain Koszul
- Institut Pasteur, Department of Genomes and Genetics, Groupe Régulation Spatiale des Génomes, 75015 Paris, France; CNRS, UMR 3525, 75015 Paris, France.
| |
Collapse
|
505
|
Kremkow BG, Baik JY, MacDonald ML, Lee KH. CHOgenome.org 2.0: Genome resources and website updates. Biotechnol J 2015; 10:931-8. [DOI: 10.1002/biot.201400646] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 02/20/2015] [Accepted: 04/01/2015] [Indexed: 12/18/2022]
|
506
|
Kehr B, Melsted P, Halldórsson BV. PopIns: population-scale detection of novel sequence insertions. Bioinformatics 2015; 32:961-7. [PMID: 25926346 DOI: 10.1093/bioinformatics/btv273] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 04/22/2015] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION The detection of genomic structural variation (SV) has advanced tremendously in recent years due to progress in high-throughput sequencing technologies. Novel sequence insertions, insertions without similarity to a human reference genome, have received less attention than other types of SVs due to the computational challenges in their detection from short read sequencing data, which inherently involves de novo assembly. De novo assembly is not only computationally challenging, but also requires high-quality data. Although the reads from a single individual may not always meet this requirement, using reads from multiple individuals can increase power to detect novel insertions. RESULTS We have developed the program PopIns, which can discover and characterize non-reference insertions of 100 bp or longer on a population scale. In this article, we describe the approach we implemented in PopIns. It takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions. Our tests on simulated data indicate that the merging step greatly improves the quality and reliability of predicted insertions and that PopIns shows significantly better recall and precision than the recent tool MindTheGap. Preliminary results on a dataset of 305 Icelanders demonstrate the practicality of the new approach. AVAILABILITY AND IMPLEMENTATION The source code of PopIns is available from http://github.com/bkehr/popins CONTACT birte.kehr@decode.is SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Birte Kehr
- deCODE genetics/Amgen, Reykjavík, Iceland
| | - Páll Melsted
- deCODE genetics/Amgen, Reykjavík, Iceland, Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland and
| | - Bjarni V Halldórsson
- deCODE genetics/Amgen, Reykjavík, Iceland, Institute of Biomedical and Neural Engineering, Reykjavík University, Reykjavík, Iceland
| |
Collapse
|
507
|
Madoui MA, Engelen S, Cruaud C, Belser C, Bertrand L, Alberti A, Lemainque A, Wincker P, Aury JM. Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics 2015; 16:327. [PMID: 25927464 PMCID: PMC4460631 DOI: 10.1186/s12864-015-1519-z] [Citation(s) in RCA: 140] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 04/10/2015] [Indexed: 11/10/2022] Open
Abstract
Background Long-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments. Results The assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads. Conclusions We described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1519-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mohammed-Amin Madoui
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, BP5706, 91057, Evry, France.
| | - Stefan Engelen
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, BP5706, 91057, Evry, France.
| | - Corinne Cruaud
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, BP5706, 91057, Evry, France.
| | - Caroline Belser
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, BP5706, 91057, Evry, France.
| | - Laurie Bertrand
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, BP5706, 91057, Evry, France.
| | - Adriana Alberti
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, BP5706, 91057, Evry, France.
| | - Arnaud Lemainque
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, BP5706, 91057, Evry, France.
| | - Patrick Wincker
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, BP5706, 91057, Evry, France. .,Université d'Evry Val d'Essonne, UMR 8030, CP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, 91057, Evry, France.
| | - Jean-Marc Aury
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, BP5706, 91057, Evry, France.
| |
Collapse
|
508
|
Risca VI, Greenleaf WJ. Unraveling the 3D genome: genomics tools for multiscale exploration. Trends Genet 2015; 31:357-72. [PMID: 25887733 DOI: 10.1016/j.tig.2015.03.010] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Revised: 03/16/2015] [Accepted: 03/24/2015] [Indexed: 12/15/2022]
Abstract
A decade of rapid method development has begun to yield exciting insights into the 3D architecture of the metazoan genome and the roles it may play in regulating transcription. Here we review core methods and new tools in the modern genomicist's toolbox at three length scales, ranging from single base pairs to megabase-scale chromosomal domains, and discuss the emerging picture of the 3D genome that these tools have revealed. Blind spots remain, especially at intermediate length scales spanning a few nucleosomes, but thanks in part to new technologies that permit targeted alteration of chromatin states and time-resolved studies, the next decade holds great promise for hypothesis-driven research into the mechanisms that drive genome architecture and transcriptional regulation.
Collapse
Affiliation(s)
- Viviana I Risca
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - William J Greenleaf
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
509
|
Parks MM, Lawrence CE, Raphael BJ. Detecting non-allelic homologous recombination from high-throughput sequencing data. Genome Biol 2015; 16:72. [PMID: 25886137 PMCID: PMC4425883 DOI: 10.1186/s13059-015-0633-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 03/16/2015] [Indexed: 12/27/2022] Open
Abstract
Non-allelic homologous recombination (NAHR) is a common mechanism for generating genome rearrangements and is implicated in numerous genetic disorders, but its detection in high-throughput sequencing data poses a serious challenge. We present a probabilistic model of NAHR and demonstrate its ability to find NAHR in low-coverage sequencing data from 44 individuals. We identify NAHR-mediated deletions or duplications in 109 of 324 potential NAHR loci in at least one of the individuals. These calls segregate by ancestry, are more common in closely spaced repeats, often result in duplicated genes or pseudogenes, and affect highly studied genes such as GBA and CYP2E1.
Collapse
Affiliation(s)
- Matthew M Parks
- Division of Applied Mathematics, Brown University, Providence, USA.
| | - Charles E Lawrence
- Division of Applied Mathematics, Brown University, Providence, USA. .,Center for Computational Molecular Biology, Brown University, Providence, USA.
| | - Benjamin J Raphael
- Center for Computational Molecular Biology, Brown University, Providence, USA. .,Department of Computer Science, Brown University, Providence, USA.
| |
Collapse
|
510
|
Fungtammasan A, Ananda G, Hile SE, Su MSW, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications. Genome Res 2015; 25:736-49. [PMID: 25823460 PMCID: PMC4417121 DOI: 10.1101/gr.185892.114] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Accepted: 03/16/2015] [Indexed: 11/24/2022]
Abstract
Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Guruprasad Ananda
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania 16802, USA
| | - Suzanne E Hile
- Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Marcia Shu-Wei Su
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Chen Sun
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Paul Medvedev
- Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania 16802, USA; Department of Computer Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Kristin Eckert
- Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, Pennsylvania State University College of Medicine, Hershey, Pennsylvania 17033, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; Center for Medical Genomics, Pennsylvania State University, University Park, Pennsylvania 16802, USA; The Genome Science Institute at the Huck Institutes of Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
511
|
Henikoff JG, Thakur J, Kasinathan S, Henikoff S. A unique chromatin complex occupies young α-satellite arrays of human centromeres. SCIENCE ADVANCES 2015; 1:e1400234. [PMID: 25927077 PMCID: PMC4410388 DOI: 10.1126/sciadv.1400234] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
The intractability of homogeneous α-satellite arrays has impeded understanding of human centromeres. Artificial centromeres are produced from higher-order repeats (HORs) present at centromere edges, although the exact sequences and chromatin conformations of centromere cores remain unknown. We use high-resolution chromatin immunoprecipitation (ChIP) of centromere components followed by clustering of sequence data as an unbiased approach to identify functional centromere sequences. We find that specific dimeric α-satellite units shared by multiple individuals dominate functional human centromeres. We identify two recently homogenized α-satellite dimers that are occupied by precisely positioned CENP-A (cenH3) nucleosomes with two ~100-base pair (bp) DNA wraps in tandem separated by a CENP-B/CENP-C-containing linker, whereas pericentromeric HORs show diffuse positioning. Precise positioning is largely maintained, whereas abundance decreases exponentially with divergence, which suggests that young α-satellite dimers with paired ~100-bp particles mediate evolution of functional human centromeres. Our unbiased strategy for identifying functional centromeric sequences should be generally applicable to tandem repeat arrays that dominate the centromeres of most eukaryotes.
Collapse
Affiliation(s)
- Jorja G. Henikoff
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Jitendra Thakur
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
| | - Sivakanthan Kasinathan
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Medical Scientist Training Program, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Steven Henikoff
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
- Corresponding author. E-mail:
| |
Collapse
|
512
|
Carlson KD, Sudmant PH, Press MO, Eichler EE, Shendure J, Queitsch C. MIPSTR: a method for multiplex genotyping of germline and somatic STR variation across many individuals. Genome Res 2015; 25:750-61. [PMID: 25659649 PMCID: PMC4417122 DOI: 10.1101/gr.182212.114] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 02/05/2015] [Indexed: 12/21/2022]
Abstract
Short tandem repeats (STRs) are highly mutable genetic elements that often reside in regulatory and coding DNA. The cumulative evidence of genetic studies on individual STRs suggests that STR variation profoundly affects phenotype and contributes to trait heritability. Despite recent advances in sequencing technology, STR variation has remained largely inaccessible across many individuals compared to single nucleotide variation or copy number variation. STR genotyping with short-read sequence data is confounded by (1) the difficulty of uniquely mapping short, low-complexity reads; and (2) the high rate of STR amplification stutter. Here, we present MIPSTR, a robust, scalable, and affordable method that addresses these challenges. MIPSTR uses targeted capture of STR loci by single-molecule Molecular Inversion Probes (smMIPs) and a unique mapping strategy. Targeted capture and our mapping strategy resolve the first challenge; the use of single molecule information resolves the second challenge. Unlike previous methods, MIPSTR is capable of distinguishing technical error due to amplification stutter from somatic STR mutations. In proof-of-principle experiments, we use MIPSTR to determine germline STR genotypes for 102 STR loci with high accuracy across diverse populations of the plant A. thaliana. We show that putatively functional STRs may be identified by deviation from predicted STR variation and by association with quantitative phenotypes. Using DNA mixing experiments and a mutant deficient in DNA repair, we demonstrate that MIPSTR can detect low-frequency somatic STR variants. MIPSTR is applicable to any organism with a high-quality reference genome and is scalable to genotyping many thousands of STR loci in thousands of individuals.
Collapse
Affiliation(s)
- Keisha D Carlson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Peter H Sudmant
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Maximilian O Press
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; Howard Hughes Medical Institute, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
513
|
Affiliation(s)
- Alan C Ward
- Department of Microbiology, Chung Ang University, College of Medicine, Seoul 06974, Korea
- School of Biology, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| | - Wonyong Kim
- Department of Microbiology, Chung Ang University, College of Medicine, Seoul 06974, Korea
| |
Collapse
|
514
|
Yoder AD, Larsen PA. The molecular evolutionary dynamics of the vomeronasal receptor (class 1) genes in primates: a gene family on the verge of a functional breakdown. Front Neuroanat 2014; 8:153. [PMID: 25565978 PMCID: PMC4264469 DOI: 10.3389/fnana.2014.00153] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 11/25/2014] [Indexed: 01/24/2023] Open
Abstract
Olfaction plays a critical role in both survival of the individual and in the propagation of species. Studies from across the mammalian clade have found a remarkable correlation between organismal lifestyle and molecular evolutionary properties of receptor genes in both the main olfactory system (MOS) and the vomeronasal system (VNS). When a large proportion of intact (and putatively functional) copies is observed, the inference is made that a particular mode of chemoreception is critical for an organism’s fit to its environment and is thus under strong positive selection. Conversely, when the receptors in question show a disproportionately large number of pseudogene copies, this contraction is interpreted as evidence of relaxed selection potentially leading to gene family extinction. Notably, it appears that a risk factor for gene family extinction is a high rate of nonsynonymous substitution. A survey of intact vs. pseudogene copies among primate vomeronasal receptor Class one genes (V1Rs) appears to substantiate this hypothesis. Molecular evolutionary complexities in the V1R gene family combine rapid rates of gene duplication, gene conversion, lineage-specific expansions, deletions, and/or pseudogenization. An intricate mix of phylogenetic footprints and current adaptive landscapes have left their mark on primate V1Rs suggesting that the primate clade offers an ideal model system for exploring the molecular evolutionary and functional properties of the VNS of mammals. Primate V1Rs tell a story of ancestral function and divergent selection as species have moved into ever diversifying adaptive regimes. The sensitivity to functional collapse in these genes, consequent to their precariously high rates of nonsynonymous substitution, confer a remarkable capacity to reveal the lifestyles of the genomes that they presently occupy as well as those of their ancestors.
Collapse
Affiliation(s)
- Anne D Yoder
- Department of Biology, Duke University Durham, NC, USA
| | | |
Collapse
|