1
|
Noh HJ, Turner-Maier J, Schulberg SA, Fitzgerald ML, Johnson J, Allen KN, Hückstädt LA, Batten AJ, Alfoldi J, Costa DP, Karlsson EK, Zapol WM, Buys ES, Lindblad-Toh K, Hindle AG. The Antarctic Weddell seal genome reveals evidence of selection on cardiovascular phenotype and lipid handling. Commun Biol 2022; 5:140. [PMID: 35177770 PMCID: PMC8854659 DOI: 10.1038/s42003-022-03089-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 01/31/2022] [Indexed: 12/24/2022] Open
Abstract
AbstractThe Weddell seal (Leptonychotes weddellii) thrives in its extreme Antarctic environment. We generated the Weddell seal genome assembly and a high-quality annotation to investigate genome-wide evolutionary pressures that underlie its phenotype and to study genes implicated in hypoxia tolerance and a lipid-based metabolism. Genome-wide analyses included gene family expansion/contraction, positive selection, and diverged sequence (acceleration) compared to other placental mammals, identifying selection in coding and non-coding sequence in five pathways that may shape cardiovascular phenotype. Lipid metabolism as well as hypoxia genes contained more accelerated regions in the Weddell seal compared to genomic background. Top-significant genes were SUMO2 and EP300; both regulate hypoxia inducible factor signaling. Liver expression of four genes with the strongest acceleration signals differ between Weddell seals and a terrestrial mammal, sheep. We also report a high-density lipoprotein-like particle in Weddell seal serum not present in other mammals, including the shallow-diving harbor seal.
Collapse
|
2
|
Xu Z, Dixon JR. Genome reconstruction and haplotype phasing using chromosome conformation capture methodologies. Brief Funct Genomics 2021; 19:139-150. [PMID: 31875884 DOI: 10.1093/bfgp/elz026] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 09/06/2019] [Accepted: 09/15/2019] [Indexed: 12/22/2022] Open
Abstract
Genomic analysis of individuals or organisms is predicated on the availability of high-quality reference and genotype information. With the rapidly dropping costs of high-throughput DNA sequencing, this is becoming readily available for diverse organisms and for increasingly large populations of individuals. Despite these advances, there are still aspects of genome sequencing that remain challenging for existing sequencing methods. This includes the generation of long-range contiguity during genome assembly, identification of structural variants in both germline and somatic tissues, the phasing of haplotypes in diploid organisms and the resolution of genome sequence for organisms derived from complex samples. These types of information are valuable for understanding the role of genome sequence and genetic variation on genome function, and numerous approaches have been developed to address them. Recently, chromosome conformation capture (3C) experiments, such as the Hi-C assay, have emerged as powerful tools to aid in these challenges for genome reconstruction. We will review the current use of Hi-C as a tool for aiding in genome sequencing, addressing the applications, strengths, limitations and potential future directions for the use of 3C data in genome analysis. We argue that unique features of Hi-C experiments make this data type a powerful tool to address challenges in genome sequencing, and that future integration of Hi-C data with alternative sequencing assays will facilitate the continuing revolution in genomic analysis and genome sequencing.
Collapse
|
3
|
Han X, Yan G, Ma Y, Miao W, Wang G. Sequencing and characterization of the macronuclear rDNA minichromosome of the protozoan Tetrahymena pyriformis. Int J Biol Macromol 2020; 147:576-581. [PMID: 31931068 DOI: 10.1016/j.ijbiomac.2020.01.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Revised: 12/06/2019] [Accepted: 01/06/2020] [Indexed: 10/25/2022]
Abstract
Tetrahymena ribosomal DNA (rDNA) is an ideal system for studying eukaryotic DNA replication and gene transcription. In this study, we developed a new method to isolate rDNA from Tetrahymena cells and used it to sequence and annotate the complete 19,670 bp macronuclear rDNA minichromosome of Tetrahymena pyriformis, a species that lacks the germ-line micronucleus and is unable to undergo sexual reproduction. The key features of T. pyriformis and Tetrahymena thermophila rDNA sequences were then compared. Our results showed (i) the short inverted repeats (M repeats) essential for formation of rDNA minichromosome palindromic structure during sexual reproduction in Tetrahymena are highly conserved in T. pyriformis; (ii) in contrast to T. thermophila, which has two tandem domains that coordinately regulate rDNA replication, T. pyriformis has only a single domain; (iii) the 35S pre-rRNA precursor has 80.25% similarity between the two species; and (iv) the G + C content is higher in the transcribed region than the non-transcribed region in both species, but the GC-skew is more stable in T. pyriformis. The new isolation method and annotated information for the T. pyriformis rDNA minichromosome will provide a useful resource for studying DNA replication and chromosome copy number control in Tetrahymena.
Collapse
Affiliation(s)
- Xiaojie Han
- College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China; Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guanxiong Yan
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Ma
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Miao
- College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China; Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; University of Chinese Academy of Sciences, Beijing 100049, China; State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; CAS Center for Excellence in Animal Evolution and Genetics, Kunming 650223, China
| | - Guangying Wang
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
| |
Collapse
|
4
|
Yang X, Yang Y, Ling J, Guan J, Guo X, Dong D, Jin L, Huang S, Liu J, Li G. A high-throughput BAC end analysis protocol (BAC-anchor) for profiling genome assembly and physical mapping. PLANT BIOTECHNOLOGY JOURNAL 2020; 18:364-372. [PMID: 31254434 PMCID: PMC6953197 DOI: 10.1111/pbi.13203] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 06/20/2019] [Accepted: 06/25/2019] [Indexed: 06/09/2023]
Abstract
Traditional approaches for sequencing insertion ends of bacterial artificial chromosome (BAC) libraries are laborious and expensive, which are currently some of the bottlenecks limiting a better understanding of the genomic features of auto- or allopolyploid species. Here, we developed a highly efficient and low-cost BAC end analysis protocol, named BAC-anchor, to identify paired-end reads containing large internal gaps. Our approach mainly focused on the identification of high-throughput sequencing reads carrying restriction enzyme cutting sites and searching for large internal gaps based on the mapping locations of both ends of the reads. We sequenced and analysed eight libraries containing over 3 200 000 BAC end clones derived from the BAC library of the tetraploid potato cultivar C88 digested with two restriction enzymes, Cla I and Mlu I. About 25% of the BAC end reads carrying cutting sites generated a 60-100 kb internal gap in the potato DM reference genome, which was consistent with the mapping results of Sanger sequencing of the BAC end clones and indicated large differences between autotetraploid and haploid genotypes in potato. A total of 5341 Cla I- and 165 Mlu I-derived unique reads were distributed on different chromosomes of the DM reference genome and could be used to establish a physical map of target regions and assemble the C88 genome. The reads that matched different chromosomes are especially significant for the further assembly of complex polyploid genomes. Our study provides an example of analysing high-coverage BAC end libraries with low sequencing cost and is a resource for further genome sequencing studies.
Collapse
Affiliation(s)
- Xiaohui Yang
- Vegetable and Flower Research Institute of Shandong Academy of Agricultural SciencesMolecular Biology Key Laboratory of Shandong Facility VegetableNational Vegetable Improvement Center Shandong Sub‐CenterHuang‐Huai‐Hai Region Scientific Observation and Experimental Station of VegetablesMinistry of Agriculture and Rural AffairsJinanChina
- Institute of Vegetables and FlowersChinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Tuber and Root CropMinistry of Agriculture and Rural AffairsBeijingChina
| | - Yu Yang
- Vegetable and Flower Research Institute of Shandong Academy of Agricultural SciencesMolecular Biology Key Laboratory of Shandong Facility VegetableNational Vegetable Improvement Center Shandong Sub‐CenterHuang‐Huai‐Hai Region Scientific Observation and Experimental Station of VegetablesMinistry of Agriculture and Rural AffairsJinanChina
| | - Jian Ling
- Institute of Vegetables and FlowersChinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Tuber and Root CropMinistry of Agriculture and Rural AffairsBeijingChina
| | - Jiantao Guan
- Institute of Crop ScienceChinese Academy of Agricultural SciencesNational Key Facility for Crop Resources and Genetic ImprovementBeijingChina
| | - Xiao Guo
- Vegetable and Flower Research Institute of Shandong Academy of Agricultural SciencesMolecular Biology Key Laboratory of Shandong Facility VegetableNational Vegetable Improvement Center Shandong Sub‐CenterHuang‐Huai‐Hai Region Scientific Observation and Experimental Station of VegetablesMinistry of Agriculture and Rural AffairsJinanChina
| | - Daofeng Dong
- Vegetable and Flower Research Institute of Shandong Academy of Agricultural SciencesMolecular Biology Key Laboratory of Shandong Facility VegetableNational Vegetable Improvement Center Shandong Sub‐CenterHuang‐Huai‐Hai Region Scientific Observation and Experimental Station of VegetablesMinistry of Agriculture and Rural AffairsJinanChina
| | - Liping Jin
- Institute of Vegetables and FlowersChinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Tuber and Root CropMinistry of Agriculture and Rural AffairsBeijingChina
| | - Sanwen Huang
- Institute of Vegetables and FlowersChinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Tuber and Root CropMinistry of Agriculture and Rural AffairsBeijingChina
| | - Jun Liu
- Institute of Crop ScienceChinese Academy of Agricultural SciencesNational Key Facility for Crop Resources and Genetic ImprovementBeijingChina
| | - Guangcun Li
- Vegetable and Flower Research Institute of Shandong Academy of Agricultural SciencesMolecular Biology Key Laboratory of Shandong Facility VegetableNational Vegetable Improvement Center Shandong Sub‐CenterHuang‐Huai‐Hai Region Scientific Observation and Experimental Station of VegetablesMinistry of Agriculture and Rural AffairsJinanChina
- Institute of Vegetables and FlowersChinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Tuber and Root CropMinistry of Agriculture and Rural AffairsBeijingChina
| |
Collapse
|
5
|
Dai Z, Li T, Li J, Han Z, Pan Y, Tang S, Diao X, Luo M. High-throughput long paired-end sequencing of a Fosmid library by PacBio. PLANT METHODS 2019; 15:142. [PMID: 31788019 PMCID: PMC6878638 DOI: 10.1186/s13007-019-0525-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 11/12/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND Large insert paired-end sequencing technologies are important tools for assembling genomes, delineating associated breakpoints and detecting structural rearrangements. To facilitate the comprehensive detection of inter- and intra-chromosomal structural rearrangements or variants (SVs) and complex genome assembly with long repeats and segmental duplications, we developed a new method based on single-molecule real-time synthesis sequencing technology for generating long paired-end sequences of large insert DNA libraries. RESULTS A Fosmid vector, pHZAUFOS3, was developed with the following new features: (1) two 18-bp non-palindromic I-SceI sites flank the cloning site, and another two sites are present in the skeleton of the vector, allowing long DNA inserts (and the long paired-ends in this paper) to be recovered as single fragments and the vector (~ 8 kb) to be fragmented into 2-3 kb fragments by I-SceI digestion and therefore was effectively removed from the long paired-ends (5-10 kb); (2) the chloramphenicol (Cm) resistance gene and replicon (oriV), necessary for colony growth, are located near the two sides of the cloning site, helping to increase the proportion of the paired-end fragments to single-end fragments in the paired-end libraries. Paired-end libraries were constructed by ligating the size-selected, mechanically sheared pooled Fosmid DNA fragments to the Ampicillin (Amp) resistance gene fragment and screening the colonies with Cm and Amp. We tested this method on yeast and Setaria italica Yugu1. Fosmid-size paired-ends with an average length longer than 2 kb for each end were generated. The N50 scaffold lengths of the de novo assemblies of the yeast and S. italica Yugu1 genomes were significantly improved. Five large and five small structural rearrangements or assembly errors spanning tens of bp to tens of kb were identified in S. italica Yugu1 including deletions, inversions, duplications and translocations. CONCLUSIONS We developed a new method for long paired-end sequencing of large insert libraries, which can efficiently improve the quality of de novo genome assembly and identify large and small structural rearrangements or assembly errors.
Collapse
Affiliation(s)
- Zhaozhao Dai
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Tong Li
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Jiadong Li
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Zhifei Han
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Yonglong Pan
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| | - Sha Tang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 10081 China
| | - Xianmin Diao
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 10081 China
| | - Meizhong Luo
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070 China
| |
Collapse
|
6
|
Ying H, Hayward DC, Cooke I, Wang W, Moya A, Siemering KR, Sprungala S, Ball EE, Forêt S, Miller DJ. The Whole-Genome Sequence of the Coral Acropora millepora. Genome Biol Evol 2019; 11:1374-1379. [PMID: 31059562 PMCID: PMC6501875 DOI: 10.1093/gbe/evz077] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/26/2019] [Indexed: 12/17/2022] Open
Affiliation(s)
- Hua Ying
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Acton, Australian Capital Territory, Australia
| | - David C Hayward
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Acton, Australian Capital Territory, Australia
| | - Ira Cooke
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, Queensland, Australia
| | - Weiwen Wang
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Acton, Australian Capital Territory, Australia
| | - Aurelie Moya
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Queensland, Australia
| | - Kirby R Siemering
- Australian Genome Research Facility Ltd, Level 13, Victorian Comprehensive Cancer Centre, Melbourne, Victoria, Australia
| | - Susanne Sprungala
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, Queensland, Australia
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Queensland, Australia
| | - Eldon E Ball
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Acton, Australian Capital Territory, Australia
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Queensland, Australia
| | - Sylvain Forêt
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Acton, Australian Capital Territory, Australia
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Queensland, Australia
| | - David J Miller
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, Queensland, Australia
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Queensland, Australia
| |
Collapse
|
7
|
Zhou B, Ho SS, Greer SU, Zhu X, Bell JM, Arthur JG, Spies N, Zhang X, Byeon S, Pattni R, Ben-Efraim N, Haney MS, Haraksingh RR, Song G, Ji HP, Perrin D, Wong WH, Abyzov A, Urban AE. Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562. Genome Res 2019; 29:472-484. [PMID: 30737237 PMCID: PMC6396411 DOI: 10.1101/gr.234948.118] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 12/28/2018] [Indexed: 11/24/2022]
Abstract
K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT. Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Steve S Ho
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Stephanie U Greer
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Xiaowei Zhu
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - John M Bell
- Stanford Genome Technology Center, Stanford University, Palo Alto, California 94304, USA
| | - Joseph G Arthur
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - Noah Spies
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Pathology, Stanford University School of Medicine, Stanford, California 94305, USA.,Genome-Scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Xianglong Zhang
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Seunggyu Byeon
- School of Computer Science and Engineering, College of Engineering, Pusan National University, Busan 46241, South Korea
| | - Reenal Pattni
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Noa Ben-Efraim
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Michael S Haney
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Rajini R Haraksingh
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Giltae Song
- School of Computer Science and Engineering, College of Engineering, Pusan National University, Busan 46241, South Korea
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA.,Stanford Genome Technology Center, Stanford University, Palo Alto, California 94304, USA
| | - Dimitri Perrin
- Science and Engineering Faculty, Queensland University of Technology, Brisbane, QLD 4001, Australia
| | - Wing H Wong
- Department of Statistics, Stanford University, Stanford, California 94305, USA.,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA.,Tashia and John Morgridge Faculty Scholar, Stanford Child Health Research Institute, Stanford, California 94305, USA
| |
Collapse
|
8
|
Lu FH, McKenzie N, Kettleborough G, Heavens D, Clark MD, Bevan MW. Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries. Gigascience 2018; 7:4995264. [PMID: 29762659 PMCID: PMC5967450 DOI: 10.1093/gigascience/giy053] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 05/04/2018] [Indexed: 12/20/2022] Open
Abstract
Background The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. Results Here we use a Fosill 38-kb jumping library to assess medium and longer–range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. Conclusions Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.
Collapse
Affiliation(s)
- Fu-Hao Lu
- John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Neil McKenzie
- John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | | | - Darren Heavens
- The Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Matthew D Clark
- The Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Michael W Bevan
- John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| |
Collapse
|
9
|
Di Genova A, Ruz GA, Sagot MF, Maass A. Fast-SG: an alignment-free algorithm for hybrid assembly. Gigascience 2018; 7:4993155. [PMID: 29741627 PMCID: PMC6007556 DOI: 10.1093/gigascience/giy048] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 03/01/2018] [Accepted: 04/19/2018] [Indexed: 12/01/2022] Open
Abstract
Background Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. Results Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). Conclusions Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.
Collapse
Affiliation(s)
- Alex Di Genova
- Facultad de Ingeniería y Ciencias, Universidad Adolfo Ibáñez, Santiago, Chile
- Mathomics Bioinformatics Laboratory, Center for Mathematical Modeling, University of Chile, Av. Beauchef 851., 7th floor, Santiago, Chile
- Inria Grenoble Rhon̂e-Alpes, 655, Avenue de l’Europe, 38334 Montbonnot, France
- CNRS, UMR5558, Université Claude Bernard Lyon 1, 43, Boulevard du 11 Novembre 1918, 69622 Villeurbanne, France
- Fondap Center for Genome Regulation, Av. Blanco Encalada 2085, 3rd floor, Santiago, Chile
| | - Gonzalo A Ruz
- Facultad de Ingeniería y Ciencias, Universidad Adolfo Ibáñez, Santiago, Chile
- Center of Applied Ecology and Sustainability (CAPES), Santiago, Chile
| | - Marie-France Sagot
- Inria Grenoble Rhon̂e-Alpes, 655, Avenue de l’Europe, 38334 Montbonnot, France
- CNRS, UMR5558, Université Claude Bernard Lyon 1, 43, Boulevard du 11 Novembre 1918, 69622 Villeurbanne, France
| | - Alejandro Maass
- Mathomics Bioinformatics Laboratory, Center for Mathematical Modeling, University of Chile, Av. Beauchef 851., 7th floor, Santiago, Chile
- Fondap Center for Genome Regulation, Av. Blanco Encalada 2085, 3rd floor, Santiago, Chile
- Department of Mathematical Engineering, University of Chile, Av. Beauchef 851., 5th floor, Santiago, Chile
| |
Collapse
|
10
|
Zhu BH, Xiao J, Xue W, Xu GC, Sun MY, Li JT. P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads. BMC Genomics 2018; 19:175. [PMID: 29499650 PMCID: PMC5834899 DOI: 10.1186/s12864-018-4567-3] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 02/22/2018] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Obtaining complete gene structures is one major goal of genome assembly. Some gene regions are fragmented in low quality and high-quality assemblies. Therefore, new approaches are needed to recover gene regions. Genomes are widely transcribed, generating messenger and non-coding RNAs. These widespread transcripts can be used to scaffold genomes and complete transcribed regions. RESULTS We present P_RNA_scaffolder, a fast and accurate tool using paired-end RNA-sequencing reads to scaffold genomes. This tool aims to improve the completeness of both protein-coding and non-coding genes. After this tool was applied to scaffolding human contigs, the structures of both protein-coding genes and circular RNAs were almost completely recovered and equivalent to those in a complete genome, especially for long proteins and long circular RNAs. Tested in various species, P_RNA_scaffolder exhibited higher speed and efficiency than the existing state-of-the-art scaffolders. This tool also improved the contiguity of genome assemblies generated by current mate-pair scaffolding and third-generation single-molecule sequencing assembly. CONCLUSIONS The P_RNA_scaffolder can improve the contiguity of genome assembly and benefit gene prediction. This tool is available at http://www.fishbrowser.org/software/P_RNA_scaffolder .
Collapse
Affiliation(s)
- Bai-Han Zhu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China.,College of Fisheries and Life Science, Shanghai Ocean University, Shanghai, 201306, China
| | - Jun Xiao
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China.,College of Fisheries and Life Science, Shanghai Ocean University, Shanghai, 201306, China
| | - Wei Xue
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China
| | - Gui-Cai Xu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China.,College of Marine Science, Zhejiang Ocean University, Zhoushan, 316022, China
| | - Ming-Yuan Sun
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China.,College of Fisheries and Life Science, Shanghai Ocean University, Shanghai, 201306, China
| | - Jiong-Tang Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, Beijing, 100141, China.
| |
Collapse
|
11
|
Wei X, Xu Z, Wang G, Hou J, Ma X, Liu H, Liu J, Chen B, Luo M, Xie B, Li R, Ruan J, Liu X. pBACode: a random-barcode-based high-throughput approach for BAC paired-end sequencing and physical clone mapping. Nucleic Acids Res 2017; 45:e52. [PMID: 27980066 PMCID: PMC5397170 DOI: 10.1093/nar/gkw1261] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 12/09/2016] [Indexed: 12/14/2022] Open
Abstract
Applications that use Bacterial Artificial Chromosome (BAC) libraries often require paired-end sequences and knowledge of the physical location of each clone in plates. To facilitate obtaining this information in high-throughput, we generated pBACode vectors: a pool of BAC cloning vectors, each with a pair of random barcodes flanking its cloning site. In a pBACode BAC library, the BAC ends and their linked barcodes can be sequenced in bulk. Barcode pairs are determined by sequencing the empty pBACode vectors, which allows BAC ends to be paired according to their barcodes. For physical clone mapping, the barcodes are used as unique markers for their linked genomic sequence. After multi-dimensional pooling of BAC clones, the barcodes are sequenced and deconvoluted to locate each clone. We generated a pBACode library of 94,464 clones for the flounder Paralichthys olivaceus and obtained paired-end sequence from 95.4% of the clones. Incorporating BAC paired-ends into the genome preassembly improved its continuity by over 10-fold. Furthermore, we were able to use the barcodes to map the physical locations of each clone in just 50 pools, with up to 11 808 clones per pool. Our physical clone mapping located 90.2% of BAC clones, enabling targeted characterization of chromosomal rearrangements.
Collapse
Affiliation(s)
- Xiaolin Wei
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PTN (Peking University-Tsinghua University-National Institute of Biological Sciences) Joint Graduate Program, Beijing 100084, China.,School of Life Sciences, Peking University, Beijing 100084, China
| | - Zhichao Xu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PTN (Peking University-Tsinghua University-National Institute of Biological Sciences) Joint Graduate Program, Beijing 100084, China
| | - Guixing Wang
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Qinhuangdao 066100, China
| | - Jilun Hou
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Qinhuangdao 066100, China
| | - Xiaopeng Ma
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PTN (Peking University-Tsinghua University-National Institute of Biological Sciences) Joint Graduate Program, Beijing 100084, China
| | - Haijin Liu
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Qinhuangdao 066100, China
| | - Jiadong Liu
- National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Bo Chen
- National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Meizhong Luo
- National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Bingyan Xie
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ruiqiang Li
- Novogene Bioinformatics Institute, Beijing 100083, China
| | - Jue Ruan
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Xiao Liu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
12
|
Whole-Genome Restriction Mapping by "Subhaploid"-Based RAD Sequencing: An Efficient and Flexible Approach for Physical Mapping and Genome Scaffolding. Genetics 2017; 206:1237-1250. [PMID: 28468906 PMCID: PMC5500127 DOI: 10.1534/genetics.117.200303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 04/17/2017] [Indexed: 11/18/2022] Open
Abstract
Assembly of complex genomes using short reads remains a major challenge, which usually yields highly fragmented assemblies. Generation of ultradense linkage maps is promising for anchoring such assemblies, but traditional linkage mapping methods are hindered by the infrequency and unevenness of meiotic recombination that limit attainable map resolution. Here we develop a sequencing-based "in vitro" linkage mapping approach (called RadMap), where chromosome breakage and segregation are realized by generating hundreds of "subhaploid" fosmid/bacterial-artificial-chromosome clone pools, and by restriction site-associated DNA sequencing of these clone pools to produce an ultradense whole-genome restriction map to facilitate genome scaffolding. A bootstrap-based minimum spanning tree algorithm is developed for grouping and ordering of genome-wide markers and is implemented in a user-friendly, integrated software package (AMMO). We perform extensive analyses to validate the power and accuracy of our approach in the model plant Arabidopsis thaliana and human. We also demonstrate the utility of RadMap for enhancing the contiguity of a variety of whole-genome shotgun assemblies generated using either short Illumina reads (300 bp) or long PacBio reads (6-14 kb), with up to 15-fold improvement of N50 (∼816 kb-3.7 Mb) and high scaffolding accuracy (98.1-98.5%). RadMap outperforms BioNano and Hi-C when input assembly is highly fragmented (contig N50 = 54 kb). RadMap can capture wide-range contiguity information and provide an efficient and flexible tool for high-resolution physical mapping and scaffolding of highly fragmented assemblies.
Collapse
|
13
|
Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, Aiden EL. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 2017; 356:92-95. [PMID: 28336562 PMCID: PMC5635820 DOI: 10.1126/science.aal3327] [Citation(s) in RCA: 1186] [Impact Index Per Article: 169.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 03/13/2017] [Indexed: 01/04/2023]
Abstract
The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aeaegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species.
Collapse
Affiliation(s)
- Olga Dudchenko
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Computer Science and Computational and Applied Mathematics, Rice University, Houston, TX 77030, USA
- Center for Theoretical and Biological Physics, Rice University, Houston, TX 77030, USA
| | - Sanjit S Batra
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Computer Science and Computational and Applied Mathematics, Rice University, Houston, TX 77030, USA
| | - Arina D Omer
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Computer Science and Computational and Applied Mathematics, Rice University, Houston, TX 77030, USA
| | - Sarah K Nyquist
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Computer Science and Computational and Applied Mathematics, Rice University, Houston, TX 77030, USA
| | - Marie Hoeger
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Computer Science and Computational and Applied Mathematics, Rice University, Houston, TX 77030, USA
| | - Neva C Durand
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Computer Science and Computational and Applied Mathematics, Rice University, Houston, TX 77030, USA
| | - Muhammad S Shamim
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Computer Science and Computational and Applied Mathematics, Rice University, Houston, TX 77030, USA
| | - Ido Machol
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Computer Science and Computational and Applied Mathematics, Rice University, Houston, TX 77030, USA
| | - Eric S Lander
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA
- Department of Biology, MIT, Cambridge, MA 02139, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Aviva Presser Aiden
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Bioengineering, Rice University, Houston, TX 77030, USA
- Department of Pediatrics, Texas Children's Hospital, Houston, TX 77030, USA
| | - Erez Lieberman Aiden
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Computer Science and Computational and Applied Mathematics, Rice University, Houston, TX 77030, USA
- Center for Theoretical and Biological Physics, Rice University, Houston, TX 77030, USA
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA
| |
Collapse
|
14
|
Comparative Analysis Highlights Variable Genome Content of Wheat Rusts and Divergence of the Mating Loci. G3-GENES GENOMES GENETICS 2017; 7:361-376. [PMID: 27913634 PMCID: PMC5295586 DOI: 10.1534/g3.116.032797] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Three members of the Puccinia genus, Pucciniatriticina (Pt), Pstriiformis f.sp. tritici (Pst), and Pgraminis f.sp. tritici (Pgt), cause the most common and often most significant foliar diseases of wheat. While similar in biology and life cycle, each species is uniquely adapted and specialized. The genomes of Pt and Pst were sequenced and compared to that of Pgt to identify common and distinguishing gene content, to determine gene variation among wheat rust pathogens, other rust fungi, and basidiomycetes, and to identify genes of significance for infection. Pt had the largest genome of the three, estimated at 135 Mb with expansion due to mobile elements and repeats encompassing 50.9% of contig bases; in comparison, repeats occupy 31.5% for Pst and 36.5% for Pgt We find all three genomes are highly heterozygous, with Pst [5.97 single nucleotide polymorphisms (SNPs)/kb] nearly twice the level detected in Pt (2.57 SNPs/kb) and that previously reported for Pgt Of 1358 predicted effectors in Pt, 784 were found expressed across diverse life cycle stages including the sexual stage. Comparison to related fungi highlighted the expansion of gene families involved in transcriptional regulation and nucleotide binding, protein modification, and carbohydrate degradation enzymes. Two allelic homeodomain pairs, HD1 and HD2, were identified in each dikaryotic Puccinia species along with three pheromone receptor (STE3) mating-type genes, two of which are likely representing allelic specificities. The HD proteins were active in a heterologous Ustilago maydis mating assay and host-induced gene silencing (HIGS) of the HD and STE3 alleles reduced wheat host infection.
Collapse
|
15
|
Peng Z, Froula JL, Cheng JF. Preparing Fosmid Mate-Paired Libraries Using Cre-LoxP Recombination. Methods Mol Biol 2017; 1642:263-284. [PMID: 28815506 DOI: 10.1007/978-1-4939-7169-5_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Fosmid end sequencing has been widely utilized in genome sequence assemblies and genome structural variation studies. We have developed a new approach to construct fosmid paired-end libraries that is suitable for Illumina sequencing platform. This approach employs a newly modified fosmid vector (pFosClip) which contains two loxP sites with identical orientation and two inverse Illumina adaptor priming sites flanking the cloning site. DNA prepared from the fosmid library constructed with pFosClip can be treated with the Cre recombinase to remove most of the vector DNA, leaving only 107 bp of the vector sequence with insert DNA. Frequent cutting restriction enzymes and ligase are used to digest the fosmid DNA to small (less than 1 Kb) fragments and recircularize the fosmid ends and all the internal fragments. Finally an inverse PCR step with the Illumina primers is used to enrich the fosmid paired ends (PEs) for sequencing. The advantages of this approach are the following: (1) the circularization of short fragments with sticky ends is efficient; therefore the success rate is higher than other approaches that attempt to join both blunt ends of large fosmid vectors; and (2) the restriction enzyme cutting generates an identifiable junction tag for splitting the paired reads. (3) Multiple restriction enzymes can be used to overcome possible enzyme-cutting bias. Our results have shown that this approach has produced mostly fosmid size (30-40 Kb) pairs from the targeted fungi and plant genomes and has drastically increased the scaffold sizes in the assembled genomes.
Collapse
Affiliation(s)
- Ze Peng
- United States Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA.
| | - Jeff L Froula
- United States Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA
| | - Jan-Fang Cheng
- United States Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA
| |
Collapse
|
16
|
Mostovoy Y, Levy-Sakin M, Lam J, Lam ET, Hastie AR, Marks P, Lee J, Chu C, Lin C, Džakula Ž, Cao H, Schlebusch SA, Giorda K, Schnall-Levin M, Wall JD, Kwok PY. A hybrid approach for de novo human genome sequence assembly and phasing. Nat Methods 2016; 13:587-90. [PMID: 27159086 PMCID: PMC4927370 DOI: 10.1038/nmeth.3865] [Citation(s) in RCA: 149] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 04/08/2016] [Indexed: 12/11/2022]
Abstract
Despite tremendous progress in genome sequencing, the basic goal of producing phased (haplotype-resolved) genome sequence with end-to-end contiguity for each chromosome at reasonable cost and effort is still unrealized. In this study, we describe a new approach to perform de novo genome assembly and experimental phasing by integrating the data from Illumina short-read sequencing, 10X Genomics Linked-Read sequencing, and BioNano Genomics genome mapping to yield a high-quality, phased, de novo assembled human genome.
Collapse
Affiliation(s)
- Yulia Mostovoy
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA
| | - Michal Levy-Sakin
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA
| | - Jessica Lam
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA
| | - Ernest T Lam
- BioNano Genomics, Inc., San Diego, California, USA
| | | | | | - Joyce Lee
- BioNano Genomics, Inc., San Diego, California, USA
| | - Catherine Chu
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA
| | - Chin Lin
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA
| | | | - Han Cao
- BioNano Genomics, Inc., San Diego, California, USA
| | - Stephen A Schlebusch
- Department of Molecular and Cell Biology, University of Cape Town, Cape Town, South Africa
| | | | | | - Jeffrey D Wall
- Institute for Human Genetics, University of California, San Francisco, San Francisco, California, USA
| | - Pui-Yan Kwok
- Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California, USA.,Institute for Human Genetics, University of California, San Francisco, San Francisco, California, USA.,Department of Dermatology, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|
17
|
Martin G, Baurens FC, Droc G, Rouard M, Cenci A, Kilian A, Hastie A, Doležel J, Aury JM, Alberti A, Carreel F, D'Hont A. Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genomics 2016; 17:243. [PMID: 26984673 PMCID: PMC4793746 DOI: 10.1186/s12864-016-2579-4] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 03/08/2016] [Indexed: 12/04/2022] Open
Abstract
Background Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). Results We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80 %), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5 % of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70 %. Unknown sites (N) were reduced from 17.3 to 10.0 %. Conclusion The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2579-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Guillaume Martin
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France
| | - Franc-Christophe Baurens
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France
| | - Gaëtan Droc
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France
| | - Mathieu Rouard
- Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5, France
| | - Alberto Cenci
- Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5, France
| | - Andrzej Kilian
- Diversity Arrays Technology, Yarralumla, Australian Capital Territory, 2600, Australia
| | - Alex Hastie
- BioNano Genomics, 9640 Towne Centre Drive, San Diego, CA, 92121, USA
| | - Jaroslav Doležel
- Institute of Experimental Botany, Centre of the Region Hana for Biotechnological and Agricultural Research, Šlechtitelů 31, CZ-78371, Olomouc, Czech Republic
| | - Jean-Marc Aury
- Commissariat à l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057, Evry, France
| | - Adriana Alberti
- Commissariat à l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057, Evry, France
| | - Françoise Carreel
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France
| | - Angélique D'Hont
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
| |
Collapse
|
18
|
Love RR, Weisenfeld NI, Jaffe DB, Besansky NJ, Neafsey DE. Evaluation of DISCOVAR de novo using a mosquito sample for cost-effective short-read genome assembly. BMC Genomics 2016; 17:187. [PMID: 26944054 PMCID: PMC4779211 DOI: 10.1186/s12864-016-2531-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 02/24/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND De novo reference assemblies that are affordable, practical to produce, and of sufficient quality for most downstream applications, remain an unattained goal for many taxa. Insects, which may yield too little DNA from individual specimens for long-read sequencing library construction and often have highly heterozygous genomes, can be particularly hard to assemble using inexpensive short-read sequencing data. The large number of insect species with medical or economic importance makes this a critical problem to address. RESULTS Using the assembler DISCOVAR de novo, we assembled the genome of the African malaria mosquito Anopheles arabiensis using 250 bp reads from a single library. The resulting assembly had a contig N50 of 22,433 bp, and recovered the gene set nearly as well as the ALLPATHS-LG AaraD1 An. arabiensis assembly produced with reads from three sequencing libraries and much greater resources. DISCOVAR de novo appeared to perform better than ALLPATHS-LG in regions of low complexity. CONCLUSIONS DISCOVAR de novo performed well assembling the genome of an insect of medical importance, using simpler sequencing input than previous anopheline assemblies. We have shown that this program is a viable tool for cost-effective assembly of a modestly-sized insect genome.
Collapse
Affiliation(s)
- R Rebecca Love
- Eck Institute for Global Health, University of Notre Dame, South Bend, IN, 46556, USA. .,Department of Biological Sciences, University of Notre Dame, South Bend, IN, 46556, USA.
| | - Neil I Weisenfeld
- Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA.
| | - David B Jaffe
- Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA.
| | - Nora J Besansky
- Eck Institute for Global Health, University of Notre Dame, South Bend, IN, 46556, USA. .,Department of Biological Sciences, University of Notre Dame, South Bend, IN, 46556, USA.
| | - Daniel E Neafsey
- Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA.
| |
Collapse
|
19
|
Putnam NH, O'Connell BL, Stites JC, Rice BJ, Blanchette M, Calef R, Troll CJ, Fields A, Hartley PD, Sugnet CW, Haussler D, Rokhsar DS, Green RE. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res 2016; 26:342-50. [PMID: 26848124 PMCID: PMC4772016 DOI: 10.1101/gr.193474.115] [Citation(s) in RCA: 456] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 12/21/2015] [Indexed: 12/19/2022]
Abstract
Long-range and highly accurate de novo assembly from short-read data is one of the most pressing challenges in genomics. Recently, it has been shown that read pairs generated by proximity ligation of DNA in chromatin of living tissue can address this problem, dramatically increasing the scaffold contiguity of assemblies. Here, we describe a simpler approach (“Chicago”) based on in vitro reconstituted chromatin. We generated two Chicago data sets with human DNA and developed a statistical model and a new software pipeline (“HiRise”) that can identify poor quality joins and produce accurate, long-range sequence scaffolds. We used these to construct a highly accurate de novo assembly and scaffolding of a human genome with scaffold N50 of 20 Mbp. We also demonstrated the utility of Chicago for improving existing assemblies by reassembling and scaffolding the genome of the American alligator. With a single library and one lane of Illumina HiSeq sequencing, we increased the scaffold N50 of the American alligator from 508 kbp to 10 Mbp.
Collapse
Affiliation(s)
| | - Brendan L O'Connell
- Dovetail Genomics LLC, Santa Cruz, California 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, California 95066, USA
| | | | - Brandon J Rice
- Dovetail Genomics LLC, Santa Cruz, California 95060, USA
| | | | - Robert Calef
- Dovetail Genomics LLC, Santa Cruz, California 95060, USA
| | | | - Andrew Fields
- Dovetail Genomics LLC, Santa Cruz, California 95060, USA
| | - Paul D Hartley
- Dovetail Genomics LLC, Santa Cruz, California 95060, USA
| | | | - David Haussler
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95066, USA; UC Santa Cruz Genomics Institute and Howard Hughes Medical Institute, University of California, Santa Cruz, California 95066, USA
| | - Daniel S Rokhsar
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA; Department of Energy, Joint Genome Institute, Walnut Creek, California 94598, USA
| | - Richard E Green
- Dovetail Genomics LLC, Santa Cruz, California 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, California 95066, USA
| |
Collapse
|
20
|
Borrill P, Adamski N, Uauy C. Genomics as the key to unlocking the polyploid potential of wheat. THE NEW PHYTOLOGIST 2015; 208:1008-22. [PMID: 26108556 DOI: 10.1111/nph.13533] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 05/31/2015] [Indexed: 05/19/2023]
Abstract
Polyploidy has played a central role in plant genome evolution and in the formation of new species such as tetraploid pasta wheat and hexaploid bread wheat. Until recently, the high sequence conservation between homoeologous genes, together with the large genome size of polyploid wheat, had hindered genomic analyses in this important crop species. In the past 5 yr, however, the advent of next-generation sequencing has radically changed the wheat genomics landscape. Here, we review a series of advances in genomic resources and tools for functional genomics that are shifting the paradigm of what is possible in wheat molecular genetics and breeding. We discuss how understanding the relationship between homoeologues can inform approaches to modulate the response of quantitative traits in polyploid wheat; we also argue that functional redundancy has 'locked up' a wide range of phenotypic variation in wheat. We explore how genomics provides key tools to inform targeted manipulation of multiple homoeologues, thereby allowing researchers and plant breeders to unlock the full polyploid potential of wheat.
Collapse
Affiliation(s)
| | - Nikolai Adamski
- John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Cristobal Uauy
- John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| |
Collapse
|
21
|
Chaisson MJP, Wilson RK, Eichler EE. Genetic variation and the de novo assembly of human genomes. Nat Rev Genet 2015; 16:627-40. [PMID: 26442640 DOI: 10.1038/nrg3933] [Citation(s) in RCA: 222] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The discovery of genetic variation and the assembly of genome sequences are both inextricably linked to advances in DNA-sequencing technology. Short-read massively parallel sequencing has revolutionized our ability to discover genetic variation but is insufficient to generate high-quality genome assemblies or resolve most structural variation. Full resolution of variation is only guaranteed by complete de novo assembly of a genome. Here, we review approaches to genome assembly, the nature of gaps or missing sequences, and biases in the assembly process. We describe the challenges of generating a complete de novo genome assembly using current technologies and the impact that being able to perfectly sequence the genome would have on understanding human disease and evolution. Finally, we summarize recent technological advances that improve both contiguity and accuracy and emphasize the importance of complete de novo assembly as opposed to read mapping as the primary means to understanding the full range of human genetic variation.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Genome Sciences, University of Washington, Foege Building S-413A, Box 355065, 3720 15th Ave NE, Seattle, Washington 98195, USA
| | - Richard K Wilson
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Foege Building S-413A, Box 355065, 3720 15th Ave NE, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
22
|
Weckselblatt B, Rudd MK. Human Structural Variation: Mechanisms of Chromosome Rearrangements. Trends Genet 2015; 31:587-599. [PMID: 26209074 PMCID: PMC4600437 DOI: 10.1016/j.tig.2015.05.010] [Citation(s) in RCA: 158] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Revised: 05/26/2015] [Accepted: 05/27/2015] [Indexed: 01/05/2023]
Abstract
Chromosome structural variation (SV) is a normal part of variation in the human genome, but some classes of SV can cause neurodevelopmental disorders. Analysis of the DNA sequence at SV breakpoints can reveal mutational mechanisms and risk factors for chromosome rearrangement. Large-scale SV breakpoint studies have become possible recently owing to advances in next-generation sequencing (NGS) including whole-genome sequencing (WGS). These findings have shed light on complex forms of SV such as triplications, inverted duplications, insertional translocations, and chromothripsis. Sequence-level breakpoint data resolve SV structure and determine how genes are disrupted, fused, and/or misregulated by breakpoints. Recent improvements in breakpoint sequencing have also revealed non-allelic homologous recombination (NAHR) between paralogous long interspersed nuclear element (LINE) or human endogenous retrovirus (HERV) repeats as a cause of deletions, duplications, and translocations. This review covers the genomic organization of simple and complex constitutional SVs, as well as the molecular mechanisms of their formation.
Collapse
Affiliation(s)
- Brooke Weckselblatt
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - M Katharine Rudd
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
| |
Collapse
|
23
|
Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 2015. [PMID: 25637298 DOI: 10.1186/s13059‐015‐0582‐8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.
Collapse
Affiliation(s)
- Jarrod A Chapman
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Aydın Buluç
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Kerrie Barry
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Evangelos Georganas
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Adam Session
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| | - Veronika Strnadova
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA.
| | - Jerry Jenkins
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Sunish Sehgal
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA. .,Present address: Department of Plant Science, South Dakota State University, Brookings, SD, 57007, USA.
| | - Leonid Oliker
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Jeremy Schmutz
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Katherine A Yelick
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Robbie Waugh
- Division of Plant Sciences, University of Dundee & The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK.
| | - Jesse A Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA.
| | - Gary J Muehlbauer
- Departments of Agronomy and Plant Genetics, and Plant Biology, University of Minnesota, St Paul, MN, 55108, USA.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Daniel S Rokhsar
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| |
Collapse
|
24
|
Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 2015; 16:26. [PMID: 25637298 PMCID: PMC4373400 DOI: 10.1186/s13059-015-0582-8] [Citation(s) in RCA: 162] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 01/06/2015] [Indexed: 11/10/2022] Open
Abstract
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.
Collapse
Affiliation(s)
- Jarrod A Chapman
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Aydın Buluç
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Kerrie Barry
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Evangelos Georganas
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Adam Session
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| | - Veronika Strnadova
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA.
| | - Jerry Jenkins
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Sunish Sehgal
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA. .,Present address: Department of Plant Science, South Dakota State University, Brookings, SD, 57007, USA.
| | - Leonid Oliker
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Jeremy Schmutz
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Katherine A Yelick
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Robbie Waugh
- Division of Plant Sciences, University of Dundee & The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK.
| | - Jesse A Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA.
| | - Gary J Muehlbauer
- Departments of Agronomy and Plant Genetics, and Plant Biology, University of Minnesota, St Paul, MN, 55108, USA.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Daniel S Rokhsar
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| |
Collapse
|
25
|
Foote AD, Liu Y, Thomas GWC, Vinař T, Alföldi J, Deng J, Dugan S, van Elk CE, Hunter ME, Joshi V, Khan Z, Kovar C, Lee SL, Lindblad-Toh K, Mancia A, Nielsen R, Qin X, Qu J, Raney BJ, Vijay N, Wolf JBW, Hahn MW, Muzny DM, Worley KC, Gilbert MTP, Gibbs RA. Convergent evolution of the genomes of marine mammals. Nat Genet 2015; 47:272-5. [PMID: 25621460 DOI: 10.1038/ng.3198] [Citation(s) in RCA: 262] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 12/29/2014] [Indexed: 12/13/2022]
Abstract
Marine mammals from different mammalian orders share several phenotypic traits adapted to the aquatic environment and therefore represent a classic example of convergent evolution. To investigate convergent evolution at the genomic level, we sequenced and performed de novo assembly of the genomes of three species of marine mammals (the killer whale, walrus and manatee) from three mammalian orders that share independently evolved phenotypic adaptations to a marine existence. Our comparative genomic analyses found that convergent amino acid substitutions were widespread throughout the genome and that a subset of these substitutions were in genes evolving under positive selection and putatively associated with a marine phenotype. However, we found higher levels of convergent amino acid substitutions in a control set of terrestrial sister taxa to the marine mammals. Our results suggest that, whereas convergent molecular evolution is relatively common, adaptive molecular convergence linked to phenotypic convergence is comparatively rare.
Collapse
Affiliation(s)
- Andrew D Foote
- 1] Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark. [2] Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Yue Liu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Gregg W C Thomas
- School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA
| | - Tomáš Vinař
- Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
| | - Jessica Alföldi
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Jixin Deng
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Shannon Dugan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | | | - Margaret E Hunter
- Sirenia Project, Southeast Ecological Science Center, US Geological Survey, Gainesville, Florida, USA
| | - Vandita Joshi
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Ziad Khan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Christie Kovar
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Sandra L Lee
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Kerstin Lindblad-Toh
- 1] Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Annalaura Mancia
- 1] Marine Biomedicine and Environmental Science Center, Medical University of South Carolina, Charleston, South Carolina, USA. [2] Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Rasmus Nielsen
- Center for Theoretical Evolutionary Genomics, University of California, Berkeley, Berkeley, California, USA
| | - Xiang Qin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Jiaxin Qu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Brian J Raney
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, Santa Cruz, California, USA
| | - Nagarjun Vijay
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Jochen B W Wolf
- 1] Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden. [2] Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Matthew W Hahn
- 1] School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA. [2] Department of Biology, Indiana University, Bloomington, Indiana, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Kim C Worley
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - M Thomas P Gilbert
- 1] Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark. [2] Trace and Environmental DNA Laboratory, Department of Environment and Agriculture, Curtin University, Perth, Western Australia, Australia
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
26
|
Neafsey DE, Waterhouse RM, Abai MR, Aganezov SS, Alekseyev MA, Allen JE, Amon J, Arcà B, Arensburger P, Artemov G, Assour LA, Basseri H, Berlin A, Birren BW, Blandin SA, Brockman AI, Burkot TR, Burt A, Chan CS, Chauve C, Chiu JC, Christensen M, Costantini C, Davidson VLM, Deligianni E, Dottorini T, Dritsou V, Gabriel SB, Guelbeogo WM, Hall AB, Han MV, Hlaing T, Hughes DST, Jenkins AM, Jiang X, Jungreis I, Kakani EG, Kamali M, Kemppainen P, Kennedy RC, Kirmitzoglou IK, Koekemoer LL, Laban N, Langridge N, Lawniczak MKN, Lirakis M, Lobo NF, Lowy E, MacCallum RM, Mao C, Maslen G, Mbogo C, McCarthy J, Michel K, Mitchell SN, Moore W, Murphy KA, Naumenko AN, Nolan T, Novoa EM, O'Loughlin S, Oringanje C, Oshaghi MA, Pakpour N, Papathanos PA, Peery AN, Povelones M, Prakash A, Price DP, Rajaraman A, Reimer LJ, Rinker DC, Rokas A, Russell TL, Sagnon N, Sharakhova MV, Shea T, Simão FA, Simard F, Slotman MA, Somboon P, Stegniy V, Struchiner CJ, Thomas GWC, Tojo M, Topalis P, Tubio JMC, Unger MF, Vontas J, Walton C, Wilding CS, Willis JH, Wu YC, Yan G, Zdobnov EM, Zhou X, Catteruccia F, Christophides GK, Collins FH, Cornman RS, Crisanti A, Donnelly MJ, Emrich SJ, Fontaine MC, Gelbart W, Hahn MW, Hansen IA, Howell PI, Kafatos FC, Kellis M, Lawson D, Louis C, Luckhart S, Muskavitch MAT, Ribeiro JM, Riehle MA, Sharakhov IV, Tu Z, Zwiebel LJ, Besansky NJ. Mosquito genomics. Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science 2014; 347:1258522. [PMID: 25554792 DOI: 10.1126/science.1258522] [Citation(s) in RCA: 369] [Impact Index Per Article: 36.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.
Collapse
Affiliation(s)
- Daniel E Neafsey
- Genome Sequencing and Analysis Program, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA.
| | - Robert M Waterhouse
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA. Department of Genetic Medicine and Development, University of Geneva Medical School, Rue Michel-Servet 1, 1211 Geneva, Switzerland. Swiss Institute of Bioinformatics, Rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Mohammad R Abai
- Department of Medical Entomology and Vector Control, School of Public Health and Institute of Health Researches, Tehran University of Medical Sciences, Tehran, Iran
| | - Sergey S Aganezov
- George Washington University, Department of Mathematics and Computational Biology Institute, 45085 University Drive, Ashburn, VA 20147, USA
| | - Max A Alekseyev
- George Washington University, Department of Mathematics and Computational Biology Institute, 45085 University Drive, Ashburn, VA 20147, USA
| | - James E Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - James Amon
- National Vector Borne Disease Control Programme, Ministry of Health, Tafea Province, Vanuatu
| | - Bruno Arcà
- Department of Public Health and Infectious Diseases, Division of Parasitology, Sapienza University of Rome, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Peter Arensburger
- Department of Biological Sciences, California State Polytechnic-Pomona, 3801 West Temple Avenue, Pomona, CA 91768, USA
| | - Gleb Artemov
- Tomsk State University, 36 Lenina Avenue, Tomsk, Russia
| | - Lauren A Assour
- Department of Computer Science and Engineering, Eck Institute for Global Health, 211B Cushing Hall, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Hamidreza Basseri
- Department of Medical Entomology and Vector Control, School of Public Health and Institute of Health Researches, Tehran University of Medical Sciences, Tehran, Iran
| | - Aaron Berlin
- Genome Sequencing and Analysis Program, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - Bruce W Birren
- Genome Sequencing and Analysis Program, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - Stephanie A Blandin
- Inserm, U963, F-67084 Strasbourg, France. CNRS, UPR9022, IBMC, F-67084 Strasbourg, France
| | - Andrew I Brockman
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Thomas R Burkot
- Faculty of Medicine, Health and Molecular Science, Australian Institute of Tropical Health Medicine, James Cook University, Cairns 4870, Australia
| | - Austin Burt
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot SL5 7PY, UK
| | - Clara S Chan
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| | - Joanna C Chiu
- Department of Entomology and Nematology, One Shields Avenue, University of California-Davis, Davis, CA 95616, USA
| | - Mikkel Christensen
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlo Costantini
- Institut de Recherche pour le Développement, Unités Mixtes de Recherche Maladies Infectieuses et Vecteurs Écologie, Génétique, Évolution et Contrôle, 911, Avenue Agropolis, BP 64501 Montpellier, France
| | - Victoria L M Davidson
- Division of Biology, Kansas State University, 271 Chalmers Hall, Manhattan, KS 66506, USA
| | - Elena Deligianni
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Hellas, Nikolaou Plastira 100 GR-70013, Heraklion, Crete, Greece
| | - Tania Dottorini
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Vicky Dritsou
- Centre of Functional Genomics, University of Perugia, Perugia, Italy
| | - Stacey B Gabriel
- Genomics Platform, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - Wamdaogo M Guelbeogo
- Centre National de Recherche et de Formation sur le Paludisme, Ouagadougou 01 BP 2208, Burkina Faso
| | - Andrew B Hall
- Program of Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Mira V Han
- School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA
| | - Thaung Hlaing
- Department of Medical Research, No. 5 Ziwaka Road, Dagon Township, Yangon 11191, Myanmar
| | - Daniel S T Hughes
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
| | - Adam M Jenkins
- Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| | - Xiaofang Jiang
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA. Program of Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Irwin Jungreis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Evdoxia G Kakani
- Harvard School of Public Health, Department of Immunology and Infectious Diseases, Boston, MA 02115, USA. Dipartimento di Medicina Sperimentale e Scienze Biochimiche, Università degli Studi di Perugia, Perugia, Italy
| | - Maryam Kamali
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Petri Kemppainen
- Computational Evolutionary Biology Group, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | - Ryan C Kennedy
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Ioannis K Kirmitzoglou
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK. Bioinformatics Research Laboratory, Department of Biological Sciences, New Campus, University of Cyprus, CY 1678 Nicosia, Cyprus
| | - Lizette L Koekemoer
- Wits Research Institute for Malaria, Faculty of Health Sciences, and Vector Control Reference Unit, National Institute for Communicable Diseases of the National Health Laboratory Service, Sandringham 2131, Johannesburg, South Africa
| | - Njoroge Laban
- National Museums of Kenya, P.O. Box 40658-00100, Nairobi, Kenya
| | - Nicholas Langridge
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mara K N Lawniczak
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Manolis Lirakis
- Department of Biology, University of Crete, 700 13 Heraklion, Greece
| | - Neil F Lobo
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA
| | - Ernesto Lowy
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert M MacCallum
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Chunhong Mao
- Virginia Bioinformatics Institute, 1015 Life Science Circle, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Charles Mbogo
- Kenya Medical Research Institute-Wellcome Trust Research Programme, Centre for Geographic Medicine Research - Coast, P.O. Box 230-80108, Kilifi, Kenya
| | - Jenny McCarthy
- Department of Biological Sciences, California State Polytechnic-Pomona, 3801 West Temple Avenue, Pomona, CA 91768, USA
| | - Kristin Michel
- Division of Biology, Kansas State University, 271 Chalmers Hall, Manhattan, KS 66506, USA
| | - Sara N Mitchell
- Harvard School of Public Health, Department of Immunology and Infectious Diseases, Boston, MA 02115, USA
| | - Wendy Moore
- Department of Entomology, 1140 East South Campus Drive, Forbes 410, University of Arizona, Tucson, AZ 85721, USA
| | - Katherine A Murphy
- Department of Entomology and Nematology, One Shields Avenue, University of California-Davis, Davis, CA 95616, USA
| | - Anastasia N Naumenko
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Tony Nolan
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Eva M Novoa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Samantha O'Loughlin
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot SL5 7PY, UK
| | - Chioma Oringanje
- Department of Entomology, 1140 East South Campus Drive, Forbes 410, University of Arizona, Tucson, AZ 85721, USA
| | - Mohammad A Oshaghi
- Department of Medical Entomology and Vector Control, School of Public Health and Institute of Health Researches, Tehran University of Medical Sciences, Tehran, Iran
| | - Nazzy Pakpour
- Department of Medical Microbiology and Immunology, School of Medicine, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Philippos A Papathanos
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK. Centre of Functional Genomics, University of Perugia, Perugia, Italy
| | - Ashley N Peery
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Michael Povelones
- Department of Pathobiology, University of Pennsylvania School of Veterinary Medicine, 3800 Spruce Street, Philadelphia, PA 19104, USA
| | - Anil Prakash
- Regional Medical Research Centre NE, Indian Council of Medical Research, P.O. Box 105, Dibrugarh-786 001, Assam, India
| | - David P Price
- Department of Biology, New Mexico State University, Las Cruces, NM 88003, USA. Molecular Biology Program, New Mexico State University, Las Cruces, NM 88003, USA
| | - Ashok Rajaraman
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| | - Lisa J Reimer
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| | - David C Rinker
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37235, USA
| | - Antonis Rokas
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37235, USA. Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Tanya L Russell
- Faculty of Medicine, Health and Molecular Science, Australian Institute of Tropical Health Medicine, James Cook University, Cairns 4870, Australia
| | - N'Fale Sagnon
- Centre National de Recherche et de Formation sur le Paludisme, Ouagadougou 01 BP 2208, Burkina Faso
| | - Maria V Sharakhova
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Terrance Shea
- Genome Sequencing and Analysis Program, Broad Institute, 415 Main Street, Cambridge, MA 02142, USA
| | - Felipe A Simão
- Department of Genetic Medicine and Development, University of Geneva Medical School, Rue Michel-Servet 1, 1211 Geneva, Switzerland. Swiss Institute of Bioinformatics, Rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Frederic Simard
- Institut de Recherche pour le Développement, Unités Mixtes de Recherche Maladies Infectieuses et Vecteurs Écologie, Génétique, Évolution et Contrôle, 911, Avenue Agropolis, BP 64501 Montpellier, France
| | - Michel A Slotman
- Department of Entomology, Texas A&M University, College Station, TX 77807, USA
| | - Pradya Somboon
- Department of Parasitology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
| | | | - Claudio J Struchiner
- Fundação Oswaldo Cruz, Avenida Brasil 4365, RJ Brazil. Instituto de Medicina Social, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Gregg W C Thomas
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Marta Tojo
- Department of Physiology, School of Medicine, Center for Research in Molecular Medicine and Chronic Diseases, Instituto de Investigaciones Sanitarias, University of Santiago de Compostela, Santiago de Compostela, A Coruña, Spain
| | - Pantelis Topalis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Hellas, Nikolaou Plastira 100 GR-70013, Heraklion, Crete, Greece
| | - José M C Tubio
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Maria F Unger
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA
| | - John Vontas
- Department of Biology, University of Crete, 700 13 Heraklion, Greece
| | - Catherine Walton
- Computational Evolutionary Biology Group, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | - Craig S Wilding
- School of Natural Sciences and Psychology, Liverpool John Moores University, Liverpool L3 3AF, UK
| | - Judith H Willis
- Department of Cellular Biology, University of Georgia, Athens, GA 30602, USA
| | - Yi-Chieh Wu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA. Department of Computer Science, Harvey Mudd College, Claremont, CA 91711, USA
| | - Guiyun Yan
- Program in Public Health, College of Health Sciences, University of California, Irvine, Hewitt Hall, Irvine, CA 92697, USA
| | - Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School, Rue Michel-Servet 1, 1211 Geneva, Switzerland. Swiss Institute of Bioinformatics, Rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Xiaofan Zhou
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Flaminia Catteruccia
- Harvard School of Public Health, Department of Immunology and Infectious Diseases, Boston, MA 02115, USA. Dipartimento di Medicina Sperimentale e Scienze Biochimiche, Università degli Studi di Perugia, Perugia, Italy
| | - George K Christophides
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Frank H Collins
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA
| | - Robert S Cornman
- Department of Cellular Biology, University of Georgia, Athens, GA 30602, USA
| | - Andrea Crisanti
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK. Centre of Functional Genomics, University of Perugia, Perugia, Italy
| | - Martin J Donnelly
- Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK. Malaria Programme, Wellcome Trust Sanger Institute, Cambridge CB10 1SJ, UK
| | - Scott J Emrich
- Department of Computer Science and Engineering, Eck Institute for Global Health, 211B Cushing Hall, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Michael C Fontaine
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA. Centre of Evolutionary and Ecological Studies (Marine Evolution and Conservation group), University of Groningen, Nijenborgh 7, NL-9747 AG Groningen, Netherlands
| | - William Gelbart
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, IN 47405, USA. School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Immo A Hansen
- Department of Biology, New Mexico State University, Las Cruces, NM 88003, USA. Molecular Biology Program, New Mexico State University, Las Cruces, NM 88003, USA
| | - Paul I Howell
- Centers for Disease Control and Prevention, 1600 Clifton Road NE MSG49, Atlanta, GA 30329, USA
| | - Fotis C Kafatos
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA. The Broad Institute of Massachusetts Institute of Technology and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Daniel Lawson
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christos Louis
- Department of Biology, University of Crete, 700 13 Heraklion, Greece. Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Hellas, Nikolaou Plastira 100 GR-70013, Heraklion, Crete, Greece. Centre of Functional Genomics, University of Perugia, Perugia, Italy
| | - Shirley Luckhart
- Department of Medical Microbiology and Immunology, School of Medicine, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
| | - Marc A T Muskavitch
- Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA. Biogen Idec, 14 Cambridge Center, Cambridge, MA 02142, USA
| | - José M Ribeiro
- Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, 12735 Twinbrook Parkway, Rockville, MD 20852, USA
| | - Michael A Riehle
- Department of Entomology, 1140 East South Campus Drive, Forbes 410, University of Arizona, Tucson, AZ 85721, USA
| | - Igor V Sharakhov
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA. Program of Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Zhijian Tu
- Program of Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA. Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Laurence J Zwiebel
- Departments of Biological Sciences and Pharmacology, Institutes for Chemical Biology, Genetics and Global Health, Vanderbilt University and Medical Center, Nashville, TN 37235, USA
| | - Nora J Besansky
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, 317 Galvin Life Sciences Building, Notre Dame, IN 46556, USA.
| |
Collapse
|
27
|
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 2014; 9:e112963. [PMID: 25409509 PMCID: PMC4237348 DOI: 10.1371/journal.pone.0112963] [Citation(s) in RCA: 5127] [Impact Index Per Article: 512.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Accepted: 10/16/2014] [Indexed: 02/06/2023] Open
Abstract
Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3–5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
Collapse
Affiliation(s)
- Bruce J. Walker
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (BJW); (AME)
| | - Thomas Abeel
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- VIB Department of Plant Systems Biology, Ghent University, Ghent, Belgium
| | - Terrance Shea
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Margaret Priest
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Amr Abouelliel
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Sharadha Sakthikumar
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Christina A. Cuomo
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Qiandong Zeng
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jennifer Wortman
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Sarah K. Young
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Ashlee M. Earl
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (BJW); (AME)
| |
Collapse
|
28
|
Abstract
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
Collapse
|
29
|
Keane M, Craig T, Alföldi J, Berlin AM, Johnson J, Seluanov A, Gorbunova V, Di Palma F, Lindblad-Toh K, Church GM, de Magalhães JP. The Naked Mole Rat Genome Resource: facilitating analyses of cancer and longevity-related adaptations. Bioinformatics 2014; 30:3558-60. [PMID: 25172923 PMCID: PMC4253829 DOI: 10.1093/bioinformatics/btu579] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Motivation: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. Results: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat’s extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat Genome Resource (http://www.naked-mole-rat.org), featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species. Availability and implementation: The Naked Mole Rat Genome Resource is freely available online at http://www.naked-mole-rat.org. This resource is open source and the source code is available at https://github.com/maglab/naked-mole-rat-portal. Contact:jp@senescence.info
Collapse
Affiliation(s)
- Michael Keane
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Thomas Craig
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jessica Alföldi
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Aaron M Berlin
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jeremy Johnson
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Andrei Seluanov
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Vera Gorbunova
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Federica Di Palma
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Kerstin Lindblad-Toh
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - George M Church
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK, Broad Institute of MIT and Harvard, Cambridge, MA, USA, Department of Biology, University of Rochester, NY, USA, Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK, Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden and Department of Genetics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
30
|
Braasch I, Peterson SM, Desvignes T, McCluskey BM, Batzel P, Postlethwait JH. A new model army: Emerging fish models to study the genomics of vertebrate Evo-Devo. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2014; 324:316-41. [PMID: 25111899 DOI: 10.1002/jez.b.22589] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2014] [Revised: 06/19/2014] [Accepted: 06/25/2014] [Indexed: 01/08/2023]
Abstract
Many fields of biology--including vertebrate Evo-Devo research--are facing an explosion of genomic and transcriptomic sequence information and a multitude of fish species are now swimming in this "genomic tsunami." Here, we first give an overview of recent developments in sequencing fish genomes and transcriptomes that identify properties of fish genomes requiring particular attention and propose strategies to overcome common challenges in fish genomics. We suggest that the generation of chromosome-level genome assemblies--for which we introduce the term "chromonome"--should be a key component of genomic investigations in fish because they enable large-scale conserved synteny analyses that inform orthology detection, a process critical for connectivity of genomes. Orthology calls in vertebrates, especially in teleost fish, are complicated by divergent evolution of gene repertoires and functions following two rounds of genome duplication in the ancestor of vertebrates and a third round at the base of teleost fish. Second, using examples of spotted gar, basal teleosts, zebrafish-related cyprinids, cavefish, livebearers, icefish, and lobefin fish, we illustrate how next generation sequencing technologies liberate emerging fish systems from genomic ignorance and transform them into a new model army to answer longstanding questions on the genomic and developmental basis of their biodiversity. Finally, we discuss recent progress in the genetic toolbox for the major fish models for functional analysis, zebrafish, and medaka, that can be transferred to many other fish species to study in vivo the functional effect of evolutionary genomic change as Evo-Devo research enters the postgenomic era.
Collapse
Affiliation(s)
- Ingo Braasch
- Institute of Neuroscience, University of Oregon, Eugene, Oregon
| | | | | | | | - Peter Batzel
- Institute of Neuroscience, University of Oregon, Eugene, Oregon
| | | |
Collapse
|
31
|
Jiang Y, Xu P, Liu Z. Generation of physical map contig-specific sequences. Front Genet 2014; 5:243. [PMID: 25101119 PMCID: PMC4105628 DOI: 10.3389/fgene.2014.00243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2014] [Accepted: 07/07/2014] [Indexed: 12/13/2022] Open
Abstract
Rapid advances of the next-generation sequencing technologies have allowed whole genome sequencing of many species. However, with the current sequencing technologies, the whole genome sequence assemblies often fall in short in one of the four quality measurements: accuracy, contiguity, connectivity, and completeness. In particular, small-sized contigs and scaffolds limit the applicability of whole genome sequences for genetic analysis. To enhance the quality of whole genome sequence assemblies, particularly the scaffolding capabilities, additional genomic resources are required. Among these, sequences derived from known physical locations offer great powers for scaffolding. In this mini-review, we will describe the principles, procedures and applications of physical-map-derived sequences, with the focus on physical map contig-specific sequences.
Collapse
Affiliation(s)
- Yanliang Jiang
- Centre for Applied Aquatic Genomics, Chinese Academy of Fishery Sciences Beijing, China
| | - Peng Xu
- Centre for Applied Aquatic Genomics, Chinese Academy of Fishery Sciences Beijing, China
| | - Zhanjiang Liu
- Aquatic Genomics Unit, The Fish Molecular Genetics and Biotechnology Laboratory, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University AL, USA
| |
Collapse
|
32
|
Alexeyenko A, Nystedt B, Vezzi F, Sherwood E, Ye R, Knudsen B, Simonsen M, Turner B, de Jong P, Wu CC, Lundeberg J. Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools. BMC Genomics 2014; 15:439. [PMID: 24906298 PMCID: PMC4070561 DOI: 10.1186/1471-2164-15-439] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 05/28/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. RESULTS In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. CONCLUSIONS By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Joakim Lundeberg
- School of Biotechnology, Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, 171 21 Solna, Sweden.
| |
Collapse
|
33
|
Li W, Freudenberg J, Miramontes P. Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome. BMC Bioinformatics 2014; 15:2. [PMID: 24386976 PMCID: PMC3927684 DOI: 10.1186/1471-2105-15-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2013] [Accepted: 12/17/2013] [Indexed: 11/10/2022] Open
Abstract
Background The amount of non-unique sequence (non-singletons) in a genome directly affects the difficulty of read alignment to a reference assembly for high throughput-sequencing data. Although a longer read is more likely to be uniquely mapped to the reference genome, a quantitative analysis of the influence of read lengths on mappability has been lacking. To address this question, we evaluate the k-mer distribution of the human reference genome. The k-mer frequency is determined for k ranging from 20 bp to 1000 bp. Results We observe that the proportion of non-singletons k-mers decreases slowly with increasing k, and can be fitted by piecewise power-law functions with different exponents at different ranges of k. A slower decay at greater values for k indicates more limited gains in mappability for read lengths between 200 bp and 1000 bp. The frequency distributions of k-mers exhibit long tails with a power-law-like trend, and rank frequency plots exhibit a concave Zipf’s curve. The most frequent 1000-mers comprise 172 regions, which include four large stretches on chromosomes 1 and X, containing genes of biomedical relevance. Comparison with other databases indicates that the 172 regions can be broadly classified into two types: those containing LINE transposable elements and those containing segmental duplications. Conclusion Read mappability as measured by the proportion of singletons increases steadily up to the length scale around 200 bp. When read length increases above 200 bp, smaller gains in mappability are expected. Moreover, the proportion of non-singletons decreases with read lengths much slower than linear. Even a read length of 1000 bp would not allow the unique alignment of reads for many coding regions of human genes. A mix of techniques will be needed for efficiently producing high-quality data that cover the complete human genome.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S, Boas Center for Genomics and Human Genetic, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, USA.
| | | | | |
Collapse
|
34
|
Valouev A, Weng Z, Sweeney RT, Varma S, Le QT, Kong C, Sidow A, West RB. Discovery of recurrent structural variants in nasopharyngeal carcinoma. Genome Res 2013; 24:300-9. [PMID: 24214394 PMCID: PMC3912420 DOI: 10.1101/gr.156224.113] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
We present the discovery of genes recurrently involved in structural variation in nasopharyngeal carcinoma (NPC) and the identification of a novel type of somatic structural variant. We identified the variants with high complexity mate-pair libraries and a novel computational algorithm specifically designed for tumor-normal comparisons, SMASH. SMASH combines signals from split reads and mate-pair discordance to detect somatic structural variants. We demonstrate a >90% validation rate and a breakpoint reconstruction accuracy of 3 bp by Sanger sequencing. Our approach identified three in-frame gene fusions (YAP1-MAML2, PTPLB-RSRC1, and SP3-PTK2) that had strong levels of expression in corresponding NPC tissues. We found two cases of a novel type of structural variant, which we call “coupled inversion,” one of which produced the YAP1-MAML2 fusion. To investigate whether the identified fusion genes are recurrent, we performed fluorescent in situ hybridization (FISH) to screen 196 independent NPC cases. We observed recurrent rearrangements of MAML2 (three cases), PTK2 (six cases), and SP3 (two cases), corresponding to a combined rate of structural variation recurrence of 6% among tested NPC tissues.
Collapse
Affiliation(s)
- Anton Valouev
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California Keck School of Medicine, Los Angeles, California 90087, USA
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Xue W, Li JT, Zhu YP, Hou GY, Kong XF, Kuang YY, Sun XW. L_RNA_scaffolder: scaffolding genomes with transcripts. BMC Genomics 2013; 14:604. [PMID: 24010822 PMCID: PMC3846640 DOI: 10.1186/1471-2164-14-604] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2013] [Accepted: 09/03/2013] [Indexed: 11/25/2022] Open
Abstract
Background Generation of large mate-pair libraries is necessary for de novo genome assembly but the procedure is complex and time-consuming. Furthermore, in some complex genomes, it is hard to increase the N50 length even with large mate-pair libraries, which leads to low transcript coverage. Thus, it is necessary to develop other simple scaffolding approaches, to at least solve the elongation of transcribed fragments. Results We describe L_RNA_scaffolder, a novel genome scaffolding method that uses long transcriptome reads to order, orient and combine genomic fragments into larger sequences. To demonstrate the accuracy of the method, the zebrafish genome was scaffolded. With expanded human transcriptome data, the N50 of human genome was doubled and L_RNA_scaffolder out-performed most scaffolding results by existing scaffolders which employ mate-pair libraries. In these two examples, the transcript coverage was almost complete, especially for long transcripts. We applied L_RNA_scaffolder to the highly polymorphic pearl oyster draft genome and the gene model length significantly increased. Conclusions The simplicity and high-throughput of RNA-seq data makes this approach suitable for genome scaffolding. L_RNA_scaffolder is available at http://www.fishbrowser.org/software/L_RNA_scaffolder.
Collapse
Affiliation(s)
- Wei Xue
- The Centre for Applied Aquatic Genomics, Chinese Academy of Fishery Sciences, Beijing 100141, China.
| | | | | | | | | | | | | |
Collapse
|
36
|
Chen GQ, Zhuang QY, Wang KC, Liu S, Shao JZ, Jiang WM, Hou GY, Li JP, Yu JM, Li YP, Chen JM. Identification and survey of a novel avian coronavirus in ducks. PLoS One 2013; 8:e72918. [PMID: 24023656 PMCID: PMC3758261 DOI: 10.1371/journal.pone.0072918] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Accepted: 07/16/2013] [Indexed: 01/08/2023] Open
Abstract
The rapid discovery of novel viruses using next generation sequencing (NGS) technologies including DNA-Seq and RNA-Seq, has greatly expanded our understanding of viral diversity in recent years. The timely identification of novel viruses using NGS technologies is also important for us to control emerging infectious diseases caused by novel viruses. In this study, we identified a novel duck coronavirus (CoV), distinct with chicken infectious bronchitis virus (IBV), using RNA-Seq. The novel duck-specific CoV was a potential novel species within the genus Gammacoronavirus, as indicated by sequences of three regions in the viral 1b gene. We also performed a survey of CoVs in domestic fowls in China using reverse-transcription polymerase chain reaction (RT-PCR), targeting the viral nucleocapsid (N) gene. A total of 102 CoV positives were identified through the survey. Phylogenetic analysis of the viral N sequences suggested that CoVs in domestic fowls have diverged into several region-specific or host-specific clades or subclades in the world, and IBVs can infect ducks, geese and pigeons, although they mainly circulate in chickens. Moreover, this study provided novel data supporting the notion that some host-specific CoVs other than IBVs circulate in ducks, geese and pigeons, and indicated that the novel duck-specific CoV identified through RNA-Seq in this study is genetically closer to some CoVs circulating in wild water fowls. Taken together, this study shed new insight into the diversity, distribution, evolution and control of avian CoVs.
Collapse
Affiliation(s)
- Gui-Qian Chen
- Institute of Cell Biology and Genetics, College of Life Sciences, Zhejiang University, Hangzhou, China
- The Laboratory of Avian Disease Surveillance, China Animal Health and Epidemiology Center, Qingdao, China
| | - Qing-Ye Zhuang
- The Laboratory of Avian Disease Surveillance, China Animal Health and Epidemiology Center, Qingdao, China
| | - Kai-Cheng Wang
- The Laboratory of Avian Disease Surveillance, China Animal Health and Epidemiology Center, Qingdao, China
| | - Shuo Liu
- The Laboratory of Avian Disease Surveillance, China Animal Health and Epidemiology Center, Qingdao, China
| | - Jian-Zhong Shao
- Institute of Cell Biology and Genetics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Wen-Ming Jiang
- The Laboratory of Avian Disease Surveillance, China Animal Health and Epidemiology Center, Qingdao, China
| | - Guang-Yu Hou
- The Laboratory of Avian Disease Surveillance, China Animal Health and Epidemiology Center, Qingdao, China
| | - Jin-Ping Li
- The Laboratory of Avian Disease Surveillance, China Animal Health and Epidemiology Center, Qingdao, China
| | - Jian-Min Yu
- The Laboratory of Avian Disease Surveillance, China Animal Health and Epidemiology Center, Qingdao, China
| | - Yi-Ping Li
- Institute of Cell Biology and Genetics, College of Life Sciences, Zhejiang University, Hangzhou, China
- * E-mail: (JMC); (YPL)
| | - Ji-Ming Chen
- The Laboratory of Avian Disease Surveillance, China Animal Health and Epidemiology Center, Qingdao, China
- * E-mail: (JMC); (YPL)
| |
Collapse
|
37
|
Abstract
We report the imminent completion of a set of reference genome assemblies for 16 species of Anopheles mosquitoes. In addition to providing a generally useful resource for comparative genomic analyses, these genome sequences will greatly facilitate exploration of the capacity exhibited by some Anopheline mosquito species to serve as vectors for malaria parasites. A community analysis project will commence soon to perform a thorough comparative genomic investigation of these newly sequenced genomes. Completion of this project via the use of short next-generation sequence reads required innovation in both the bioinformatic and laboratory realms, and the resulting knowledge gained could prove useful for genome sequencing projects targeting other unconventional genomes.
Collapse
|
38
|
Abstract
The sequencing of large and complex genomes of crop species, facilitated by new sequencing technologies and bioinformatic approaches, has provided new opportunities for crop improvement. Current challenges include understanding how genetic variation translates into phenotypic performance in the field.
Collapse
Affiliation(s)
- Michael W Bevan
- John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Cristobal Uauy
- John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| |
Collapse
|
39
|
Improving mammalian genome scaffolding using large insert mate-pair next-generation sequencing. BMC Genomics 2013; 14:257. [PMID: 23590730 PMCID: PMC3648348 DOI: 10.1186/1471-2164-14-257] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2012] [Accepted: 04/12/2013] [Indexed: 11/28/2022] Open
Abstract
Background Paired-tag sequencing approaches are commonly used for the analysis of genome structure. However, mammalian genomes have a complex organization with a variety of repetitive elements that complicate comprehensive genome-wide analyses. Results Here, we systematically assessed the utility of paired-end and mate-pair (MP) next-generation sequencing libraries with insert sizes ranging from 170 bp to 25 kb, for genome coverage and for improving scaffolding of a mammalian genome (Rattus norvegicus). Despite a lower library complexity, large insert MP libraries (20 or 25 kb) provided very high physical genome coverage and were found to efficiently span repeat elements in the genome. Medium-sized (5, 8 or 15 kb) MP libraries were much more efficient for genome structure analysis than the more commonly used shorter insert paired-end and 3 kb MP libraries. Furthermore, the combination of medium- and large insert libraries resulted in a 3-fold increase in N50 in scaffolding processes. Finally, we show that our data can be used to evaluate and improve contig order and orientation in the current rat reference genome assembly. Conclusions We conclude that applying combinations of mate-pair libraries with insert sizes that match the distributions of repetitive elements improves contig scaffolding and can contribute to the finishing of draft genomes.
Collapse
|
40
|
Ozturk F, Li Y, Zhu X, Guda C, Nawshad A. Systematic analysis of palatal transcriptome to identify cleft palate genes within TGFβ3-knockout mice alleles: RNA-Seq analysis of TGFβ3 Mice. BMC Genomics 2013; 14:113. [PMID: 23421592 PMCID: PMC3618314 DOI: 10.1186/1471-2164-14-113] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2012] [Accepted: 02/13/2013] [Indexed: 12/19/2022] Open
Abstract
Background In humans, cleft palate (CP) accounts for one of the largest number of birth defects with a complex genetic and environmental etiology. TGFβ3 has been established as an important regulator of palatal fusion in mice and it has been shown that TGFβ3-null mice exhibit CP without any other major deformities. However, the genes that regulate cellular decisions and molecular mechanisms maintained by the TGFβ3 pathway throughout palatogenesis are predominantly unexplored. Our objective in this study was to analyze global transcriptome changes within the palate during different gestational ages within TGFβ3 knockout mice to identify TGFβ3-associated genes previously unknown to be associated with the development of cleft palate. We used deep sequencing technology, RNA-Seq, to analyze the transcriptome of TGFβ3 knockout mice at crucial stages of palatogenesis, including palatal growth (E14.5), adhesion (E15.5), and fusion (E16.5). Results The overall transcriptome analysis of TGFβ3 wildtype mice (C57BL/6) reveals that almost 6000 genes were upregulated during the transition from E14.5 to E15.5 and more than 2000 were downregulated from E15.5 to E16.5. Using bioinformatics tools and databases, we identified the most comprehensive list of CP genes (n = 322) in which mutations cause CP either in humans or mice, and analyzed their expression patterns. The expression motifs of CP genes between TGFβ3+/− and TGFβ3−/− were not significantly different from each other, and the expression of the majority of CP genes remained unchanged from E14.5 to E16.5. Using these patterns, we identified 8 unique genes within TGFβ3−/− mice (Chrng, Foxc2, H19, Kcnj13, Lhx8, Meox2, Shh, and Six3), which may function as the primary contributors to the development of cleft palate in TGFβ3−/− mice. When the significantly altered CP genes were overlaid with TGFβ signaling, all of these genes followed the Smad-dependent pathway. Conclusions Our study represents the first analysis of the palatal transcriptome of the mouse, as well as TGFβ3 knockout mice, using deep sequencing methods. In this study, we characterized the critical regulation of palatal transcripts that may play key regulatory roles through crucial stages of palatal development. We identified potential causative CP genes in a TGFβ3 knockout model, which may lead to a better understanding of the genetic mechanisms of palatogenesis and provide novel potential targets for gene therapy approaches to treat cleft palate.
Collapse
Affiliation(s)
- Ferhat Ozturk
- Department of Oral Biology, College of Dentistry, University of Nebraska Medical Center, 40th and Holdrege St, Lincoln, NE 68583, USA
| | | | | | | | | |
Collapse
|
41
|
HTS-PEG: a method for high throughput sequencing of the paired-ends of genomic libraries. PLoS One 2013; 7:e52257. [PMID: 23284958 PMCID: PMC3527410 DOI: 10.1371/journal.pone.0052257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2012] [Accepted: 11/09/2012] [Indexed: 11/19/2022] Open
Abstract
Second generation sequencing has been widely used to sequence whole genomes. Though various paired-end sequencing methods have been developed to construct the long scaffold from contigs derived from shotgun sequencing, the classical paired-end sequencing of the Bacteria Artificial Chromosome (BAC) or fosmid libraries by the Sanger method still plays an important role in genome assembly. However, sequencing libraries with the Sanger method is expensive and time-consuming. Here we report a new strategy to sequence the paired-ends of genomic libraries with parallel pyrosequencing, using a Chinese amphioxus (Branchiostoma belcheri) BAC library as an example. In total, approximately 12,670 non-redundant paired-end sequences were generated. Mapping them to the primary scaffolds of Chinese amphioxus, we obtained 413 ultra-scaffolds from 1,182 primary scaffolds, and the N50 scaffold length was increased approximately 55 kb, which is about a 10% improvement. We provide a universal and cost-effective method for sequencing the ultra-long paired-ends of genomic libraries. This method can be very easily implemented in other second generation sequencing platforms.
Collapse
|
42
|
Capturing native long-range contiguity by in situ library construction and optical sequencing. Proc Natl Acad Sci U S A 2012; 109:18749-54. [PMID: 23112150 DOI: 10.1073/pnas.1202680109] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The relatively short read lengths associated with the most cost-effective DNA sequencing technologies have limited their use in de novo genome assembly, structural variation detection, and haplotype-resolved genome sequencing. Consequently, there is a strong need for methods that capture various scales of contiguity information at a throughput commensurate with the current scale of massively parallel sequencing. We propose in situ library construction and optical sequencing on the flow cells of currently available massively parallel sequencing platforms as an efficient means of capturing both contiguity information and primary sequence with a single technology. In this proof-of-concept study, we demonstrate basic feasibility by generating >30,000 Escherichia coli paired-end reads separated by 1, 2, or 3 kb using in situ library construction on standard Illumina flow cells. We also show that it is possible to stretch single molecules ranging from 3 to 8 kb on the surface of a flow cell before in situ library construction, thereby enabling the production of clusters whose physical relationship to one another on the flow cell is related to genomic distance.
Collapse
|