51
|
Kumar P, Choudhary M, Jat BS, Kumar B, Singh V, Kumar V, Singla D, Rakshit S. Skim sequencing: an advanced NGS technology for crop improvement. J Genet 2021. [DOI: 10.1007/s12041-021-01285-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
52
|
Bai Y, Lin W, Xu J, Song J, Yang D, Chen YE, Li L, Li Y, Wang Z, Zhang J. Improving the genome assembly of rabbits with long-read sequencing. Genomics 2021; 113:3216-3223. [PMID: 34051323 DOI: 10.1016/j.ygeno.2021.05.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 05/21/2021] [Accepted: 05/25/2021] [Indexed: 10/21/2022]
Abstract
The European rabbit (Oryctolagus cuniculus) is important as a biomedical model given its unique features in immunity and metabolism. The current reference genome OryCun2.0 established with whole-genome shotgun sequencing was quite fragmented and had not been updated for ten years. In this work, we provided a new rabbit genome assembly UM_NZW_1.0 to improve OryCun2.0 by leveraging the contig lengths based on long-read sequencing and a wealth of available Illumina paired-end sequence data. UM_NZW_1.0 showed a remarkable increase of continuity compared with OryCun2.0, with 5 times longer contig N50 and approximately 75% gaps closed. Many of the closed gaps were overlapped with protein-coding genes or transcriptional features, resulting in an enhancement of gene annotations. In particular, UM_NZW_1.0 presented a more complete landscape of the MHC region and the IGH locus, therefore provided a valuable resource for future researches on rabbits.
Collapse
Affiliation(s)
- Yiqin Bai
- State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Weili Lin
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Jie Xu
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA
| | - Jun Song
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA
| | - Dongshan Yang
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA
| | - Y Eugene Chen
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA
| | - Lin Li
- State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China; School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
| | - Yixue Li
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China; School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China; Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China.
| | - Zhen Wang
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Jifeng Zhang
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA.
| |
Collapse
|
53
|
Development of 79 SNP markers to individually genotype and sex-type endangered mountain gorillas (Gorilla beringei beringei). CONSERV GENET RESOUR 2021. [DOI: 10.1007/s12686-021-01217-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractThe mountain gorilla (Gorilla beringei beringei) is one of two endangered subspecies of eastern gorilla. The principle approach to monitoring the two extant mountain gorilla populations has been to use fecal surveys to obtain DNA profiles for individuals that are then used for capture-recapture-based estimates of abundance. To date, 11 to 14 microsatellites have been used for this purpose. To adapt to ongoing changes in genotyping technologies and to facilitate the analysis of fecal DNA samples by multiple laboratories, we developed a panel of single nucleotide polymorphism (SNP) markers that can be used for future gorilla monitoring. We used published short read data sets for 3 individuals to develop a suite of 79 SNPs, including two sex markers, for a Fluidigm platform. This marker set provided high resolution to differentiate individuals and will facilitate future monitoring, leaving room for additional SNPs to be included in a 96-assay format.
Collapse
|
54
|
Robinson JA, Bowie RCK, Dudchenko O, Aiden EL, Hendrickson SL, Steiner CC, Ryder OA, Mindell DP, Wall JD. Genome-wide diversity in the California condor tracks its prehistoric abundance and decline. Curr Biol 2021; 31:2939-2946.e5. [PMID: 33989525 DOI: 10.1016/j.cub.2021.04.035] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 04/05/2021] [Accepted: 04/14/2021] [Indexed: 02/06/2023]
Abstract
Due to their small population sizes, threatened and endangered species frequently suffer from a lack of genetic diversity, potentially leading to inbreeding depression and reduced adaptability.1 During the latter half of the twentieth century, North America's largest soaring bird,2 the California condor (Gymnogyps californianus; Critically Endangered3), briefly went extinct in the wild. Though condors once ranged throughout North America, by 1982 only 22 individuals remained. Following decades of captive breeding and release efforts, there are now >300 free-flying wild condors and ∼200 in captivity. The condor's recent near-extinction from lead poisoning, poaching, and loss of habitat is well documented,4 but much about its history remains obscure. To fill this gap and aid future management of the species, we produced a high-quality chromosome-length genome assembly for the California condor and analyzed its genome-wide diversity. For comparison, we also examined the genomes of two close relatives: the Andean condor (Vultur gryphus; Vulnerable3) and the turkey vulture (Cathartes aura; Least Concern3). The genomes of all three species show evidence of historic population declines. Interestingly, the California condor genome retains a high degree of variation, which our analyses reveal is a legacy of its historically high abundance. Correlations between genome-wide diversity and recombination rate further suggest a history of purifying selection against linked deleterious alleles, boding well for future restoration. We show how both long-term evolutionary forces and recent inbreeding have shaped the genome of the California condor, and provide crucial genomic resources to enable future research and conservation.
Collapse
Affiliation(s)
- Jacqueline A Robinson
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| | - Rauri C K Bowie
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA; Museum of Vertebrate Zoology, University of California, Berkeley, Berkeley, CA, USA
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA; Center for Theoretical and Biological Physics, Rice University, Houston, TX, USA
| | - Erez Lieberman Aiden
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA; Center for Theoretical and Biological Physics, Rice University, Houston, TX, USA; Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech, Pudong, China; Faculty of Science, UWA School of Agriculture and Environment, University of Western Australia, Perth, WA, Australia
| | | | - Cynthia C Steiner
- San Diego Zoo Wildlife Alliance, Beckman Center for Conservation Research, Escondido, CA, USA
| | - Oliver A Ryder
- San Diego Zoo Wildlife Alliance, Beckman Center for Conservation Research, Escondido, CA, USA; Department of Evolution, Behavior, and Ecology, University of California, San Diego, San Diego, CA, USA
| | - David P Mindell
- Museum of Vertebrate Zoology, University of California, Berkeley, Berkeley, CA, USA
| | - Jeffrey D Wall
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
55
|
Zheng S, Shao F, Tao W, Liu Z, Long J, Wang X, Zhang S, Zhao Q, Carleton KL, Kocher TD, Jin L, Wang Z, Peng Z, Wang D, Zhang Y. Chromosome-level assembly of southern catfish (silurus meridionalis) provides insights into visual adaptation to nocturnal and benthic lifestyles. Mol Ecol Resour 2021; 21:1575-1592. [PMID: 33503304 DOI: 10.1111/1755-0998.13338] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Revised: 01/13/2021] [Accepted: 01/22/2021] [Indexed: 01/07/2023]
Abstract
The Southern catfish (Silurus meridionalis) is a nocturnal and benthic freshwater fish endemic to the Yangtze River and its tributaries. In this study, we constructed a chromosome-level draft genome of S. meridionalis using 69.7-Gb Nanopore long reads and 49.5-Gb Illumina short reads. The genome assembly was 741.2 Mb in size with a contig N50 of 13.19 Mb. An additional 116.4 Gb of Bionano and 77.4 Gb of Hi-C data were applied to assemble contigs into scaffolds and further into 29 chromosomes, resulting in a 738.9-Mb genome with a scaffold N50 of 28.04 Mb. A total of 22,965 protein-coding genes were predicted from the genome with 22,519 (98.06%) genes functionally annotated. Comparative genomic and transcriptomic analyses revealed a rod-dominated visual system which was responsible for scotopic vision. The absence of cone opsins SWS1 and SWS2 resulted in the lack of ultraviolet and blue violet sensitivity. Mutations at key amino acid sites of RH1.1, RH1.2 and RH2 resulted in spectral tuning good for dim light vision and narrow colour vision. A higher expression level of rod phototransduction genes than that of cone genes and higher rod-to-cone ratio led to higher optical sensitivity under dim light conditions. In addition, analysis of the genes involved in eye morphogenesis and development revealed the loss of some conserved noncoding elements, which might be associated with the small eyes in catfish. Together, our study provides important clues for the adaptation of the catfish visual system to the nocturnal and benthic lifestyles. The draft genome of S. meridionalis represents a valuable resource for studies of the molecular mechanisms of ecological adaptation.
Collapse
Affiliation(s)
- Shuqing Zheng
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Feng Shao
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Wenjing Tao
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Zhilong Liu
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Juan Long
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Xiaoshuang Wang
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Shuai Zhang
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Qingyuan Zhao
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Karen L Carleton
- Department of Biology, University of Maryland, College Park, MD, USA
| | - Thomas D Kocher
- Department of Biology, University of Maryland, College Park, MD, USA
| | - Li Jin
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Zhijian Wang
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Zuogang Peng
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Deshou Wang
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| | - Yaoguang Zhang
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), School of Life Sciences, Key Laboratory of Aquatic Science of Chongqing, Southwest University, Chongqing, P. R. China
| |
Collapse
|
56
|
Mao Y, Catacchio CR, Hillier LW, Porubsky D, Li R, Sulovari A, Fernandes JD, Montinaro F, Gordon DS, Storer JM, Haukness M, Fiddes IT, Murali SC, Dishuck PC, Hsieh P, Harvey WT, Audano PA, Mercuri L, Piccolo I, Antonacci F, Munson KM, Lewis AP, Baker C, Underwood JG, Hoekzema K, Huang TH, Sorensen M, Walker JA, Hoffman J, Thibaud-Nissen F, Salama SR, Pang AWC, Lee J, Hastie AR, Paten B, Batzer MA, Diekhans M, Ventura M, Eichler EE. A high-quality bonobo genome refines the analysis of hominid evolution. Nature 2021; 594:77-81. [PMID: 33953399 PMCID: PMC8172381 DOI: 10.1038/s41586-021-03519-x] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 04/07/2021] [Indexed: 12/17/2022]
Abstract
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3–5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome. A high-quality bonobo genome assembly provides insights into incomplete lineage sorting in hominids and its relevance to gene evolution and the genetic relationship among living hominids.
Collapse
Affiliation(s)
- Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - LaDeana W Hillier
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jason D Fernandes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Francesco Montinaro
- Department of Biology, University of Bari, Bari, Italy.,Estonian Biocentre, Institute of Genomics, Tartu, Estonia
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Shwetha Canchi Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tzu-Hsueh Huang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jerilyn A Walker
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Jinna Hoffman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R Salama
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.,Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Bari, Italy.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA. .,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
57
|
Nattestad M, Aboukhalil R, Chin CS, Schatz MC. Ribbon: intuitive visualization for complex genomic variation. Bioinformatics 2021; 37:413-415. [PMID: 32766814 DOI: 10.1093/bioinformatics/btaa680] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 06/15/2020] [Accepted: 07/21/2020] [Indexed: 01/08/2023] Open
Abstract
SUMMARY Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. AVAILABILITY AND IMPLEMENTATION Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maria Nattestad
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | | | - Michael C Schatz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.,Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
58
|
Zhou R, Li ST, Yao WY, Xie CD, Chen Z, Zeng ZJ, Wang D, Xu K, Shen ZJ, Mu Y, Bao W, Jiang W, Li R, Liang Q, Li K. The Meishan pig genome reveals structural variation-mediated gene expression and phenotypic divergence underlying Asian pig domestication. Mol Ecol Resour 2021; 21:2077-2092. [PMID: 33825319 DOI: 10.1111/1755-0998.13396] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 03/17/2021] [Accepted: 03/29/2021] [Indexed: 01/27/2023]
Abstract
There are wide genomic and phenotypic differences between Asian and European pig breeds, yet the current reference genome is the European Duroc pig genome. A high-quality pig genome is lacking for genetic analysis of agricultural traits in Asian pigs. Here, using a hybrid approach, a high-quality reference genome (MSCAAS v1) for the Asian Meishan breed is assembled with a contig N50 size of 48.05 Mb. MSCAAS v1 outperforms the Duroc genome as a reference genome for Asian breeds. Genomic comparison reveals 49,103 structural variations (SVs) between Meishan and Duroc, 4.02% of which are Asian-specific SVs (AP-SVs). Notably, a 30-Mb hotspot for AP-SVs on chromosome X enriched for genes associated with Asian-pig-specific phenotypes is present in Asian domestic pig breeds, but absent in Asian wild boars, suggesting that Asian domestic breeds share a common ancestor. Interbreed transcriptomics reveals transcriptional suppression roles of AP-SVs in multiple tissues. Finally, transcriptional regulation in the intron of IGF2R is reported, as genomic SV (274-bp deletion) in Tibetan pig limits its growth compared to domestic pig breeds. In summary, this study provides insights regarding the genetic changes underlying pig domestication and presents a benchmark-setting resource for the utilization of agricultural valuable loci in Asian pigs.
Collapse
Affiliation(s)
- Rong Zhou
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Shang-Tong Li
- National Institute of Biological Sciences (NIBS, Beijing, China
| | - Wen-Ye Yao
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Chun-Di Xie
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | | | - Zhi-Jie Zeng
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China.,College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Di Wang
- Novogene Bioinformatics Institute, Beijing, China
| | - Kui Xu
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhao-Ji Shen
- Guangdong Provincial key Laboratory of Animal Molecular Design and Precise Breeding, College of Life Science and Engineering, Foshan University, Foshan, China.,Fulcrum gene science and technology (Beijing) Ltd, Beijing, China
| | - Yulian Mu
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Wenbin Bao
- College of Animal Science and Technology, Yangzhou University, Yangzhou, China
| | - Wenkai Jiang
- Novogene Bioinformatics Institute, Beijing, China
| | - Ruiqiang Li
- Novogene Bioinformatics Institute, Beijing, China
| | - Qiqi Liang
- Novogene Bioinformatics Institute, Beijing, China
| | - Kui Li
- State Key Laboratory of Animal Nutrition, Key Laboratory of Animal Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
59
|
Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol 2021; 22:101. [PMID: 33845884 PMCID: PMC8040228 DOI: 10.1186/s13059-021-02328-9] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 03/25/2021] [Indexed: 12/13/2022] Open
Abstract
High-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
60
|
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu TY, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, Eichler EE. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 401] [Impact Index Per Article: 100.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
Affiliation(s)
- Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Rebecca Serra Mari
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | - Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Yu Chen
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Santamarina
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Nelson T Chuang
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | | | | | | | | | | | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Junjie Chen
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron M Wenger
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Maryam Ghareghani
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123 Saarbrücken, Germany
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Benjamin Raeder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Allison A Regier
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Haley J Abel
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Ira M Hall
- Department of Genetics, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jose M C Tubio
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Zepeng Mu
- Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | | | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | | | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an, 710061, Shaanxi, China
- Department of Graduate Studies-Life Sciences, Ewha Womans University, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750, South Korea
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
61
|
Sharma V, Hecker N, Walther F, Stuckas H, Hiller M. Convergent Losses of TLR5 Suggest Altered Extracellular Flagellin Detection in Four Mammalian Lineages. Mol Biol Evol 2021; 37:1847-1854. [PMID: 32145026 DOI: 10.1093/molbev/msaa058] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Toll-like receptors (TLRs) play an important role for the innate immune system by detecting pathogen-associated molecular patterns. TLR5 encodes the major extracellular receptor for bacterial flagellin and frequently evolves under positive selection, consistent with coevolutionary arms races between the host and pathogens. Furthermore, TLR5 is inactivated in several vertebrates and a TLR5 stop codon polymorphism is widespread in human populations. Here, we analyzed the genomes of 120 mammals and discovered that TLR5 is convergently lost in four independent lineages, comprising guinea pigs, Yangtze river dolphin, pinnipeds, and pangolins. Validated inactivating mutations, absence of protein-coding transcript expression, and relaxed selection on the TLR5 remnants confirm these losses. PCR analysis further confirmed the loss of TLR5 in the pinniped stem lineage. Finally, we show that TLR11, encoding a second extracellular flagellin receptor, is also absent in these four lineages. Independent losses of TLR5 and TLR11 suggest that a major pathway for detecting flagellated bacteria is not essential for different mammals and predicts an impaired capacity to sense extracellular flagellin.
Collapse
Affiliation(s)
- Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Dresden, Germany.,Center for Systems Biology Dresden, Dresden, Germany.,CRTD-DFG Center for Regenerative Therapies Dresden, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden; Paul Langerhans Institute Dresden (PLID) of the Helmholtz Center Munich at University Hospital Carl Gustav Carus and Faculty of Medicine, Technische Universität Dresden, Dresden; German Center for Diabetes Research (DZD), Munich, Neuherberg, Germany
| | - Nikolai Hecker
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Dresden, Germany.,Center for Systems Biology Dresden, Dresden, Germany
| | - Felix Walther
- Senckenberg Natural History Collections Dresden, Senckenberg - Leibniz Institution for Biodiversity and Earth System Research, Dresden, Germany
| | - Heiko Stuckas
- Senckenberg Natural History Collections Dresden, Senckenberg - Leibniz Institution for Biodiversity and Earth System Research, Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Dresden, Germany.,Center for Systems Biology Dresden, Dresden, Germany
| |
Collapse
|
62
|
Blom MPK. Opportunities and challenges for high-quality biodiversity tissue archives in the age of long-read sequencing. Mol Ecol 2021; 30:5935-5948. [PMID: 33786900 DOI: 10.1111/mec.15909] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/06/2021] [Accepted: 03/22/2021] [Indexed: 12/11/2022]
Abstract
The technological ability to characterize genetic variation at a genome-wide scale provides an unprecedented opportunity to study the genetic underpinnings and evolutionary mechanisms that promote and sustain biodiversity. The transition from short- to long-read sequencing is particularly promising and allows a more holistic view on any changes in genetic diversity across time and space. Long-read sequencing has tremendous potential but sequencing success strongly depends on the long-range integrity of DNA molecules and therefore on the availability of high-quality tissue samples. With the scope of genomic experiments expanding and wild populations simultaneously disappearing at an unprecedented rate, access to high-quality samples may soon be a major concern for many projects. The need for high-quality biodiversity tissue archives is therefore urgent but sampling and preserving high-quality samples is not a trivial exercise. In this review, I will briefly outline how long-read sequencing can benefit the study of molecular ecology, how this will substantially increase the demand for high-quality tissues and why it is challenging to preserve DNA integrity. I will then provide an overview of preservation approaches and end with a call for support to acknowledge the efforts needed to assemble high-quality tissue archives. In doing so, I hope to simultaneously motivate field biologists to expand sampling practices and molecular biologists to develop (cost) efficient guidelines for the sampling and long-term storage of tissues. A concerted, interdisciplinary, effort is needed to catalogue the genetic variation underlying contemporary biodiversity and will eventually provide a critical resource for future studies.
Collapse
Affiliation(s)
- Mozes P K Blom
- Leibniz Institut für Evolutions- und Biodiversitätsforschung, Museum für Naturkunde, Berlin, Germany
| |
Collapse
|
63
|
Feng X, Li H. Higher Rates of Processed Pseudogene Acquisition in Humans and Three Great Apes Revealed by Long-Read Assemblies. Mol Biol Evol 2021; 38:2958-2966. [PMID: 33681998 DOI: 10.1093/molbev/msab062] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
LINE-1-mediated retrotransposition of protein-coding mRNAs is an active process in modern humans for both germline and somatic genomes. Prior works that surveyed human data mostly relied on detecting discordant mappings of paired-end short reads, or exon junctions contained in short reads. Moreover, there have been few genome-wide comparisons between gene retrocopies in great apes and humans. In this study, we introduced a more sensitive and accurate method to identify processed pseudogenes. Our method utilizes long-read assemblies, and more importantly, is able to provide full-length retrocopy sequences as well as flanking regions which are missed by short-read based methods. From 22 human individuals, we pinpointed 40 processed pseudogenes that are not present in the human reference genome GRCh38 and identified 17 pseudogenes that are in GRCh38 but absent from some input individuals. This represents a significantly higher discovery rate than previous reports (39 pseudogenes not in the reference genome out of 939 individuals). We also provided an overview of lineage-specific retrocopies in chimpanzee, gorilla, and orangutan genomes.
Collapse
Affiliation(s)
- Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
64
|
Wang C, Wallerman O, Arendt ML, Sundström E, Karlsson Å, Nordin J, Mäkeläinen S, Pielberg GR, Hanson J, Ohlsson Å, Saellström S, Rönnberg H, Ljungvall I, Häggström J, Bergström TF, Hedhammar Å, Meadows JRS, Lindblad-Toh K. A novel canine reference genome resolves genomic architecture and uncovers transcript complexity. Commun Biol 2021; 4:185. [PMID: 33568770 PMCID: PMC7875987 DOI: 10.1038/s42003-021-01698-x] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/17/2020] [Indexed: 12/13/2022] Open
Abstract
We present GSD_1.0, a high-quality domestic dog reference genome with chromosome length scaffolds and contiguity increased 55-fold over CanFam3.1. Annotation with generated and existing long and short read RNA-seq, miRNA-seq and ATAC-seq, revealed that 32.1% of lifted over CanFam3.1 gaps harboured previously hidden functional elements, including promoters, genes and miRNAs in GSD_1.0. A catalogue of canine "dark" regions was made to facilitate mapping rescue. Alignment in these regions is difficult, but we demonstrate that they harbour trait-associated variation. Key genomic regions were completed, including the Dog Leucocyte Antigen (DLA), T Cell Receptor (TCR) and 366 COSMIC cancer genes. 10x linked-read sequencing of 27 dogs (19 breeds) uncovered 22.1 million SNPs, indels and larger structural variants. Subsequent intersection with protein coding genes showed that 1.4% of these could directly influence gene products, and so provide a source of normal or aberrant phenotypic modifications.
Collapse
Affiliation(s)
- Chao Wang
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| | - Ola Wallerman
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Maja-Louise Arendt
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Veterinary Clinical Sciences, University of Copenhagen, Frederiksberg D, Denmark
| | - Elisabeth Sundström
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Åsa Karlsson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jessika Nordin
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Suvi Mäkeläinen
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Gerli Rosengren Pielberg
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jeanette Hanson
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Åsa Ohlsson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Sara Saellström
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Henrik Rönnberg
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Ingrid Ljungvall
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jens Häggström
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Tomas F Bergström
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Åke Hedhammar
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jennifer R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
65
|
Joubran SS, Cassin-Sackett L. Genomic resources for an ecologically important rodent, Gunnison’s prairie dogs (Cynomys gunnisoni). CONSERV GENET RESOUR 2021. [DOI: 10.1007/s12686-021-01192-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
66
|
Zhuo X, Du AY, Pehrsson EC, Li D, Wang T. Epigenomic differences in the human and chimpanzee genomes are associated with structural variation. Genome Res 2021; 31:279-290. [PMID: 33303495 PMCID: PMC7849402 DOI: 10.1101/gr.263491.120] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Accepted: 12/03/2020] [Indexed: 12/15/2022]
Abstract
Structural variation (SV), including insertions and deletions (indels), is a primary mechanism of genome evolution. However, the mechanism by which SV contributes to epigenome evolution is poorly understood. In this study, we characterized the association between lineage-specific indels and epigenome differences between human and chimpanzee to investigate how SVs might have shaped the epigenetic landscape. By intersecting medium-to-large human-chimpanzee indels (20 bp-50 kb) with putative promoters and enhancers in cranial neural crest cells (CNCCs) and repressed regions in induced pluripotent cells (iPSCs), we found that 12% of indels overlap putative regulatory and repressed regions (RRRs), and 15% of these indels are associated with lineage-biased RRRs. Indel-associated putative enhancer and repressive regions are approximately 1.3 times and approximately three times as likely to be lineage-biased, respectively, as those not associated with indels. We found a twofold enrichment of medium-sized indels (20-50 bp) in CpG island (CGI)-containing promoters than expected by chance. Lastly, from human-specific transposable element insertions, we identified putative regulatory elements, including NR2F1-bound putative CNCC enhancers derived from SVAs and putative iPSC promoters derived from LTR5s. Our results show that different types of indels are associated with specific epigenomic diversity between human and chimpanzee.
Collapse
Affiliation(s)
- Xiaoyu Zhuo
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Alan Y Du
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Erica C Pehrsson
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Daofeng Li
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- McDonell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| |
Collapse
|
67
|
Margres MJ, Rautsaw RM, Strickland JL, Mason AJ, Schramer TD, Hofmann EP, Stiers E, Ellsworth SA, Nystrom GS, Hogan MP, Bartlett DA, Colston TJ, Gilbert DM, Rokyta DR, Parkinson CL. The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype. Proc Natl Acad Sci U S A 2021; 118:e2014634118. [PMID: 33468678 PMCID: PMC7848695 DOI: 10.1073/pnas.2014634118] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Variation in gene regulation is ubiquitous, yet identifying the mechanisms producing such variation, especially for complex traits, is challenging. Snake venoms provide a model system for studying the phenotypic impacts of regulatory variation in complex traits because of their genetic tractability. Here, we sequence the genome of the Tiger Rattlesnake, which possesses the simplest and most toxic venom of any rattlesnake species, to determine whether the simple venom phenotype is the result of a simple genotype through gene loss or a complex genotype mediated through regulatory mechanisms. We generate the most contiguous snake-genome assembly to date and use this genome to show that gene loss, chromatin accessibility, and methylation levels all contribute to the production of the simplest, most toxic rattlesnake venom. We provide the most complete characterization of the venom gene-regulatory network to date and identify key mechanisms mediating phenotypic variation across a polygenic regulatory network.
Collapse
Affiliation(s)
- Mark J Margres
- Department of Biological Sciences, Clemson University, Clemson, SC 29634;
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
- Department of Integrative Biology, University of South Florida, Tampa, FL 33620
| | - Rhett M Rautsaw
- Department of Biological Sciences, Clemson University, Clemson, SC 29634
| | - Jason L Strickland
- Department of Biological Sciences, Clemson University, Clemson, SC 29634
- Department of Biology, University of South Alabama, Mobile, AL 36688
| | - Andrew J Mason
- Department of Biological Sciences, Clemson University, Clemson, SC 29634
| | - Tristan D Schramer
- Department of Biological Sciences, Clemson University, Clemson, SC 29634
| | - Erich P Hofmann
- Department of Biological Sciences, Clemson University, Clemson, SC 29634
| | - Erin Stiers
- Department of Biological Sciences, Clemson University, Clemson, SC 29634
| | - Schyler A Ellsworth
- Department of Biological Science, Florida State University, Tallahassee, FL 32306
| | - Gunnar S Nystrom
- Department of Biological Science, Florida State University, Tallahassee, FL 32306
| | - Michael P Hogan
- Department of Biological Science, Florida State University, Tallahassee, FL 32306
| | - Daniel A Bartlett
- Department of Biological Science, Florida State University, Tallahassee, FL 32306
| | - Timothy J Colston
- Department of Biological Science, Florida State University, Tallahassee, FL 32306
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, FL 32306
| | - Darin R Rokyta
- Department of Biological Science, Florida State University, Tallahassee, FL 32306
| | - Christopher L Parkinson
- Department of Biological Sciences, Clemson University, Clemson, SC 29634;
- Department of Forestry and Environmental Conservation, Clemson University, Clemson, SC 29634
| |
Collapse
|
68
|
Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I, Liachko I, Haryoko T, Jønsson KA, Zhou Q, Irestedt M, Suh A. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour 2021; 21:263-286. [PMID: 32937018 PMCID: PMC7757076 DOI: 10.1111/1755-0998.13252] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 08/21/2020] [Accepted: 08/26/2020] [Indexed: 01/09/2023]
Abstract
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.
Collapse
Affiliation(s)
- Valentina Peona
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
- Museum für NaturkundeLeibniz Institut für Evolutions‐ und BiodiversitätsforschungBerlinGermany
| | - Luohao Xu
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
| | - Reto Burri
- Department of Population EcologyInstitute of Ecology and EvolutionFriedrich‐Schiller‐University JenaJenaGermany
| | | | - Ignas Bunikis
- Department of Immunology, Genetics and PathologyScience for Life LaboratoryUppsala Genome CenterUppsala UniversityUppsalaSweden
| | | | - Tri Haryoko
- Research Centre for BiologyMuseum Zoologicum BogorienseIndonesian Institute of Sciences (LIPI)CibinongIndonesia
| | - Knud A. Jønsson
- Natural History Museum of DenmarkUniversity of CopenhagenCopenhagenDenmark
| | - Qi Zhou
- Department of Neurosciences and Developmental BiologyUniversity of ViennaViennaAustria
- MOE Laboratory of Biosystems Homeostasis & ProtectionLife Sciences InstituteZhejiang UniversityHangzhouChina
- Center for Reproductive MedicineThe 2nd Affiliated HospitalSchool of MedicineZhejiang UniversityHangzhouChina
| | - Martin Irestedt
- Department of Bioinformatics and GeneticsSwedish Museum of Natural HistoryStockholmSweden
| | - Alexander Suh
- Department of Ecology and Genetics—Evolutionary BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- Department of Organismal Biology—Systematic BiologyScience for Life LaboratoriesUppsala UniversityUppsalaSweden
- School of Biological Sciences—Organisms and the EnvironmentUniversity of East AngliaNorwichUK
| |
Collapse
|
69
|
Xavier MJ, Salas-Huetos A, Oud MS, Aston KI, Veltman JA. Disease gene discovery in male infertility: past, present and future. Hum Genet 2021; 140:7-19. [PMID: 32638125 PMCID: PMC7864819 DOI: 10.1007/s00439-020-02202-x] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 06/26/2020] [Indexed: 12/13/2022]
Abstract
Identifying the genes causing male infertility is important to increase our biological understanding as well as the diagnostic yield and clinical relevance of genetic testing in this disorder. While significant progress has been made in some areas, mainly in our knowledge of the genes underlying rare qualitative sperm defects, the same cannot be said for the genetics of quantitative sperm defects. Technological advances and approaches in genomics are critical for the process of disease gene identification. In this review we highlight the impact of various technological developments on male infertility gene discovery as well as functional validation, going from the past to the present and the future. In particular, we draw attention to the use of unbiased genomics approaches, the development of increasingly relevant functional assays and the importance of large-scale international collaboration to advance disease gene identification in male infertility.
Collapse
Affiliation(s)
- M J Xavier
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle-upon-Tyne, UK
| | - A Salas-Huetos
- Andrology and IVF Laboratory, Department of Surgery (Urology), University of Utah, Salt Lake City, USA
| | - M S Oud
- Department of Human Genetics, Radboud University Medical Centre, Nijmegen, Netherlands
| | - K I Aston
- Andrology and IVF Laboratory, Department of Surgery (Urology), University of Utah, Salt Lake City, USA.
| | - J A Veltman
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle-upon-Tyne, UK.
| |
Collapse
|
70
|
Fontsere C, Alvarez-Estape M, Lester J, Arandjelovic M, Kuhlwilm M, Dieguez P, Agbor A, Angedakin S, Ayuk Ayimisin E, Bessone M, Brazzola G, Deschner T, Eno-Nku M, Granjon AC, Head J, Kadam P, Kalan AK, Kambi M, Langergraber K, Lapuente J, Maretti G, Jayne Ormsby L, Piel A, Robbins MM, Stewart F, Vergnes V, Wittig RM, Kühl HS, Marques-Bonet T, Hughes DA, Lizano E. Maximizing the acquisition of unique reads in noninvasive capture sequencing experiments. Mol Ecol Resour 2020; 21:745-761. [PMID: 33217149 DOI: 10.1111/1755-0998.13300] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 10/15/2020] [Accepted: 11/13/2020] [Indexed: 11/30/2022]
Abstract
Noninvasive samples as a source of DNA are gaining interest in genomic studies of endangered species. However, their complex nature and low endogenous DNA content hamper the recovery of good quality data. Target capture has become a productive method to enrich the endogenous fraction of noninvasive samples, such as faeces, but its sensitivity has not yet been extensively studied. Coping with faecal samples with an endogenous DNA content below 1% is a common problem when prior selection of samples from a large collection is not possible. However, samples classified as unfavourable for target capture sequencing might be the only representatives of unique specific geographical locations, or to answer the question of interest. To explore how library complexity may be increased without repeating DNA extractions and generating new libraries, in this study we captured the exome of 60 chimpanzees (Pan troglodytes) using faecal samples with very low proportions of endogenous content (<1%). Our results indicate that by performing additional hybridizations of the same libraries, the molecular complexity can be maintained to achieve higher coverage. Also, whenever possible, the starting DNA material for capture should be increased. Finally, we specifically calculated the sequencing effort needed to avoid exhausting the library complexity of enriched faecal samples with low endogenous DNA content. This study provides guidelines, schemes and tools for laboratories facing the challenges of working with noninvasive samples containing extremely low amounts of endogenous DNA.
Collapse
Affiliation(s)
- Claudia Fontsere
- Institut de Biologia Evolutiva, CSIC-Universitat Pompeu Fabra, PRBB, Barcelona, Catalonia, Spain
| | - Marina Alvarez-Estape
- Institut de Biologia Evolutiva, CSIC-Universitat Pompeu Fabra, PRBB, Barcelona, Catalonia, Spain
| | - Jack Lester
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Mimi Arandjelovic
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Martin Kuhlwilm
- Institut de Biologia Evolutiva, CSIC-Universitat Pompeu Fabra, PRBB, Barcelona, Catalonia, Spain
| | - Paula Dieguez
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anthony Agbor
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Samuel Angedakin
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | | | - Mattia Bessone
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Gregory Brazzola
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Tobias Deschner
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | | | | | - Josephine Head
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Parag Kadam
- School of Biological and Environmental Sciences, Liverpool John Moores University, James Parsons Building, Liverpool, UK
| | - Ammie K Kalan
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Mohamed Kambi
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Kevin Langergraber
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA.,Institute of Human Origins, Arizona State University, Tempe, AZ, USA
| | - Juan Lapuente
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.,Comoé Chimpanzee Conservation Project, Kakpin, Comoé National Park, Ivory Coast, Côte d'Ivoire
| | - Giovanna Maretti
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Lucy Jayne Ormsby
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Alex Piel
- Department of Anthropology, University College London, London, UK
| | - Martha M Robbins
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Fiona Stewart
- School of Biological and Environmental Sciences, Liverpool John Moores University, James Parsons Building, Liverpool, UK.,Department of Anthropology, University College London, London, UK
| | | | - Roman M Wittig
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.,Taï Chimpanzee Project, Centre Suisse de Recherches Scientifiques, Abidjan, Côte d'Ivoire
| | - Hjalmar S Kühl
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | - Tomas Marques-Bonet
- Institut de Biologia Evolutiva, CSIC-Universitat Pompeu Fabra, PRBB, Barcelona, Catalonia, Spain.,CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Catalonia, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain.,Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - David A Hughes
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Esther Lizano
- Institut de Biologia Evolutiva, CSIC-Universitat Pompeu Fabra, PRBB, Barcelona, Catalonia, Spain.,Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| |
Collapse
|
71
|
Lang D, Zhang S, Ren P, Liang F, Sun Z, Meng G, Tan Y, Li X, Lai Q, Han L, Wang D, Hu F, Wang W, Liu S. Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacific Biosciences Sequel II system and ultralong reads of Oxford Nanopore. Gigascience 2020; 9:giaa123. [PMID: 33319909 PMCID: PMC7736813 DOI: 10.1093/gigascience/giaa123] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 07/02/2020] [Accepted: 10/08/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The availability of reference genomes has revolutionized the study of biology. Multiple competing technologies have been developed to improve the quality and robustness of genome assemblies during the past decade. The 2 widely used long-read sequencing providers-Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)-have recently updated their platforms: PacBio enables high-throughput HiFi reads with base-level resolution of >99%, and ONT generated reads as long as 2 Mb. We applied the 2 up-to-date platforms to a single rice individual and then compared the 2 assemblies to investigate the advantages and limitations of each. RESULTS The results showed that ONT ultralong reads delivered higher contiguity, producing a total of 18 contigs of which 10 were assembled into a single chromosome compared to 394 contigs and 3 chromosome-level contigs for the PacBio assembly. The ONT ultralong reads also prevented assembly errors caused by long repetitive regions, for which we observed a total of 44 genes of false redundancies and 10 genes of false losses in the PacBio assembly, leading to over- or underestimation of the gene families in those long repetitive regions. We also noted that the PacBio HiFi reads generated assemblies with considerably fewer errors at the level of single nucleotides and small insertions and deletions than those of the ONT assembly, which generated an average 1.06 errors per kb and finally engendered 1,475 incorrect gene annotations via altered or truncated protein predictions. CONCLUSIONS It shows that both PacBio HiFi reads and ONT ultralong reads had their own merits. Further genome reference constructions could leverage both techniques to lessen the impact of assembly errors and subsequent annotation mistakes rooted in each.
Collapse
Affiliation(s)
- Dandan Lang
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Shilai Zhang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, No.2, North Cuihu Road, Kunming, Yunnan 650091, China
| | - Pingping Ren
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Fan Liang
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Zongyi Sun
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Guanliang Meng
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Yuntao Tan
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Xiaokang Li
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Qihua Lai
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Lingling Han
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Depeng Wang
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
| | - Fengyi Hu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, No.2, North Cuihu Road, Kunming, Yunnan 650091, China
| | - Wen Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, No.32, East Jiaochang Road, Kunming, Yunnan 650223, China
- Center for Ecological and Environmental Sciences, Key Laboratory for Space Bioscience & Biotechnology, Northwestern Polytechnical University, No.127, West Youyi Road, Xi'an, Shanxi 710072, China
| | - Shanlin Liu
- GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China
- Department of Entomology, College of Plant Protection, China Agricultural University, No.2, West Yuanmingyuan Road, Beijing 100193, China
| |
Collapse
|
72
|
Dayama G, Zhou W, Prado-Martinez J, Marques-Bonet T, Mills RE. Characterization of nuclear mitochondrial insertions in the whole genomes of primates. NAR Genom Bioinform 2020; 2:lqaa089. [PMID: 33575633 PMCID: PMC7671390 DOI: 10.1093/nargab/lqaa089] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 05/04/2020] [Accepted: 10/15/2020] [Indexed: 12/30/2022] Open
Abstract
The transfer and integration of whole and partial mitochondrial genomes into the nuclear genomes of eukaryotes is an ongoing process that has facilitated the transfer of genes and contributed to the evolution of various cellular pathways. Many previous studies have explored the impact of these insertions, referred to as NumtS, but have focused primarily on older events that have become fixed and are therefore present in all individual genomes for a given species. We previously developed an approach to identify novel Numt polymorphisms from next-generation sequence data and applied it to thousands of human genomes. Here, we extend this analysis to 79 individuals of other great ape species including chimpanzee, bonobo, gorilla, orang-utan and also an old world monkey, macaque. We show that recent Numt insertions are prevalent in each species though at different apparent rates, with chimpanzees exhibiting a significant increase in both polymorphic and fixed Numt sequences as compared to other great apes. We further assessed positional effects in each species in terms of evolutionary time and rate of insertion and identified putative hotspots on chromosome 5 for Numt integration, providing insight into both recent polymorphic and older fixed reference NumtS in great apes in comparison to human events.
Collapse
Affiliation(s)
- Gargi Dayama
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010, Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
73
|
Murphy WJ, Foley NM, Bredemeyer KR, Gatesy J, Springer MS. Phylogenomics and the Genetic Architecture of the Placental Mammal Radiation. Annu Rev Anim Biosci 2020; 9:29-53. [PMID: 33228377 DOI: 10.1146/annurev-animal-061220-023149] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genomes of placental mammals are being sequenced at an unprecedented rate. Alignments of hundreds, and one day thousands, of genomes spanning the rich living and extinct diversity of species offer unparalleled power to resolve phylogenetic controversies, identify genomic innovations of adaptation, and dissect the genetic architecture of reproductive isolation. We highlight outstanding questions about the earliest phases of placental mammal diversification and the promise of newer methods, as well as remaining challenges, toward using whole genome data to resolve placental mammal phylogeny. The next phase of mammalian comparative genomics will see the completion and application of finished-quality, gapless genome assemblies from many ordinal lineages and closely related species. Interspecific comparisons between the most hypervariable genomic loci will likely reveal large, but heretofore mostly underappreciated, effects on population divergence, morphological innovation, and the origin of new species.
Collapse
Affiliation(s)
- William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, California 92521, USA
| |
Collapse
|
74
|
Gillingham MAF, Montero BK, Wihelm K, Grudzus K, Sommer S, Santos PSC. A novel workflow to improve genotyping of multigene families in wildlife species: An experimental set-up with a known model system. Mol Ecol Resour 2020; 21:982-998. [PMID: 33113273 DOI: 10.1111/1755-0998.13290] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 10/19/2020] [Accepted: 10/22/2020] [Indexed: 12/30/2022]
Abstract
Genotyping complex multigene families in novel systems is particularly challenging. Target primers frequently amplify simultaneously multiple loci leading to high PCR and sequencing artefacts such as chimeras and allele amplification bias. Most genotyping pipelines have been validated in nonmodel systems whereby the real genotype is unknown and the generation of artefacts may be highly repeatable. Further hindering accurate genotyping, the relationship between artefacts and genotype complexity (i.e. number of alleles per genotype) within a PCR remains poorly described. Here, we investigated the latter by experimentally combining multiple known major histocompatibility complex (MHC) haplotypes of a model organism (chicken, Gallus gallus, 43 artificial genotypes with 2-13 alleles per amplicon). In addition to well-defined 'optimal' primers, we simulated a nonmodel species situation by designing 'cross-species' primers based on sequence data from closely related Galliform species. We applied a novel open-source genotyping pipeline (ACACIA; https://gitlab.com/psc_santos/ACACIA), and compared its performance with another, previously published pipeline (AmpliSAS). Allele calling accuracy was higher when using ACACIA (98.5% versus 97% and 77.8% versus 75% for the 'optimal' and 'cross-species' data sets, respectively). Systematic allele dropout of three alleles owing to primer mismatch in the 'cross-species' data set explained high allele calling repeatability (100% when using ACACIA) despite low accuracy, demonstrating that repeatability can be misleading when evaluating genotyping workflows. Genotype complexity was positively associated with nonchimeric artefacts, chimeric artefacts (nonlinearly by levelling when amplifying more than 4-6 alleles) and allele amplification bias. Our study exemplifies and demonstrates pitfalls researchers should avoid to reliably genotype complex multigene families.
Collapse
Affiliation(s)
- Mark A F Gillingham
- Institute of Evolutionary Ecology and Conservation Genomics, Ulm Universität, Ulm, Germany
| | - B Karina Montero
- Institute of Evolutionary Ecology and Conservation Genomics, Ulm Universität, Ulm, Germany.,Zoological Institute, Animal Ecology and Conservation, Biocenter Grindel, Universität Hamburg, Hamburg,, Germany
| | - Kerstin Wihelm
- Institute of Evolutionary Ecology and Conservation Genomics, Ulm Universität, Ulm, Germany
| | - Kara Grudzus
- Institute of Evolutionary Ecology and Conservation Genomics, Ulm Universität, Ulm, Germany
| | - Simone Sommer
- Institute of Evolutionary Ecology and Conservation Genomics, Ulm Universität, Ulm, Germany
| | - Pablo S C Santos
- Institute of Evolutionary Ecology and Conservation Genomics, Ulm Universität, Ulm, Germany
| |
Collapse
|
75
|
Li Y, Struwe WB, Kukura P. Single molecule mass photometry of nucleic acids. Nucleic Acids Res 2020; 48:e97. [PMID: 32756898 PMCID: PMC7515692 DOI: 10.1093/nar/gkaa632] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Accepted: 07/29/2020] [Indexed: 12/12/2022] Open
Abstract
Mass photometry is a recently developed methodology capable of measuring the mass of individual proteins under solution conditions. Here, we show that this approach is equally applicable to nucleic acids, enabling their facile, rapid and accurate detection and quantification using sub-picomoles of sample. The ability to count individual molecules directly measures relative concentrations in complex mixtures without need for separation. Using a dsDNA ladder, we find a linear relationship between the number of bases per molecule and the associated imaging contrast for up to 1200 bp, enabling us to quantify dsDNA length with up to 2 bp accuracy. These results introduce mass photometry as an accurate, rapid and label-free single molecule method complementary to existing DNA characterization techniques.
Collapse
Affiliation(s)
- Yiwen Li
- Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, South Parks Road, Oxford OX1 3QZ, UK
| | - Weston B Struwe
- Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, South Parks Road, Oxford OX1 3QZ, UK
| | - Philipp Kukura
- Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, South Parks Road, Oxford OX1 3QZ, UK
| |
Collapse
|
76
|
Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J, Genereux D, Johnson J, Marinescu VD, Alföldi J, Harris RS, Lindblad-Toh K, Haussler D, Karlsson E, Jarvis ED, Zhang G, Paten B. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020; 587:246-251. [PMID: 33177663 PMCID: PMC7673649 DOI: 10.1038/s41586-020-2871-y] [Citation(s) in RCA: 278] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Accepted: 07/27/2020] [Indexed: 12/11/2022]
Abstract
New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.
Collapse
Affiliation(s)
- Joel Armstrong
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Adam M Novak
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Alden Deran
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Qi Fang
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Duo Xie
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Shaohong Feng
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Josefin Stiller
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Diane Genereux
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | - Jeremy Johnson
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | - Voichita Dana Marinescu
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jessica Alföldi
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | - Robert S Harris
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - David Haussler
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Elinor Karlsson
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Erich D Jarvis
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Guojie Zhang
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
- China National GeneBank, BGI-Shenzhen, Shenzhen, China.
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
77
|
Buckley RM, Davis BW, Brashear WA, Farias FHG, Kuroki K, Graves T, Hillier LW, Kremitzki M, Li G, Middleton RP, Minx P, Tomlinson C, Lyons LA, Murphy WJ, Warren WC. A new domestic cat genome assembly based on long sequence reads empowers feline genomic medicine and identifies a novel gene for dwarfism. PLoS Genet 2020; 16:e1008926. [PMID: 33090996 PMCID: PMC7581003 DOI: 10.1371/journal.pgen.1008926] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 06/10/2020] [Indexed: 12/30/2022] Open
Abstract
The domestic cat (Felis catus) numbers over 94 million in the USA alone, occupies households as a companion animal, and, like humans, suffers from cancer and common and rare diseases. However, genome-wide sequence variant information is limited for this species. To empower trait analyses, a new cat genome reference assembly was developed from PacBio long sequence reads that significantly improve sequence representation and assembly contiguity. The whole genome sequences of 54 domestic cats were aligned to the reference to identify single nucleotide variants (SNVs) and structural variants (SVs). Across all cats, 16 SNVs predicted to have deleterious impacts and in a singleton state were identified as high priority candidates for causative mutations. One candidate was a stop gain in the tumor suppressor FBXW7. The SNV is found in cats segregating for feline mediastinal lymphoma and is a candidate for inherited cancer susceptibility. SV analysis revealed a complex deletion coupled with a nearby potential duplication event that was shared privately across three unrelated cats with dwarfism and is found within a known dwarfism associated region on cat chromosome B1. This SV interrupted UDP-glucose 6-dehydrogenase (UGDH), a gene involved in the biosynthesis of glycosaminoglycans. Importantly, UGDH has not yet been associated with human dwarfism and should be screened in undiagnosed patients. The new high-quality cat genome reference and the compilation of sequence variation demonstrate the importance of these resources when searching for disease causative alleles in the domestic cat and for identification of feline biomedical models. The practice of genomic medicine is predicated on the availability of a high quality reference genome and an understanding of the impact of genome variation. Such resources have lead to countless discoveries in humans, however by working exclusively within the framework of human genetics, our potential for understanding diseases biology is limited, as similar analyses in other species have often lead to novel insights. The generation of Felis_catus_9.0, a new high quality reference genome for the domestic cat, helps facilitate the expansion of genomic medicine into the Felis lineage. Using Felis_catus_9.0 we analyze the landscape of genomic variation from a collection of 54 cats within the context of human gene constraint. The distribution of variant impacts in cats is correlated with patterns of gene constraint in humans, indicating the utility of this reference for identifying novel mutations that cause phenotypes relevant to human and cat health. Moreover, structural variant analysis revealed a novel variant for feline dwarfism in UGDH, a gene that has not been associated with dwarfism in any other species, suggesting a role for UGDH in cases of undiagnosed dwarfism in humans.
Collapse
Affiliation(s)
- Reuben M. Buckley
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, Missouri, United States of America
| | - Brian W. Davis
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Wesley A. Brashear
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Fabiana H. G. Farias
- Department of Psychiatry, Washington University, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics, Washington University, St. Louis, Missouri, United States of America
| | - Kei Kuroki
- Veterinary Medical Diagnostic Laboratory, College of Veterinary Medicine, University of Missouri, Columbia, Missouri, United States of America
| | - Tina Graves
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - LaDeana W. Hillier
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Gang Li
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | | | - Patrick Minx
- Donald Danforth Plant Science, St Louis, Missouri, United States of America
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri, United States of America
| | - Leslie A. Lyons
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, Missouri, United States of America
| | - William J. Murphy
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, College of Veterinary Medicine, Texas A&M University, College Station, Texas, United States of America
| | - Wesley C. Warren
- Division of Animal Sciences, School of Medicine, University of Missouri, Columbia, Missouri, United States of America
- * E-mail:
| |
Collapse
|
78
|
Li H, Feng X, Chu C. The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020; 21:265. [PMID: 33066802 PMCID: PMC7568353 DOI: 10.1186/s13059-020-02168-z] [Citation(s) in RCA: 218] [Impact Index Per Article: 43.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 09/23/2020] [Indexed: 12/21/2022] Open
Abstract
The recent advances in sequencing technologies enable the assembly of individual genomes to the quality of the reference genome. How to integrate multiple genomes from the same species and make the integrated representation accessible to biologists remains an open challenge. Here, we propose a graph-based data model and associated formats to represent multiple genomes while preserving the coordinate of the linear reference genome. We implement our ideas in the minigraph toolkit and demonstrate that we can efficiently construct a pangenome graph and compactly encode tens of thousands of structural variants missing from the current reference genome.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, 02215, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, 02215, MA, USA.
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, 02215, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, 02215, MA, USA
| | - Chong Chu
- Department of Biomedical Informatics, Harvard Medical School, Boston, 02215, MA, USA
| |
Collapse
|
79
|
Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet 2020; 21:597-614. [PMID: 32504078 PMCID: PMC7877196 DOI: 10.1038/s41576-020-0236-x] [Citation(s) in RCA: 582] [Impact Index Per Article: 116.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/31/2020] [Indexed: 12/27/2022]
Abstract
Over the past decade, long-read, single-molecule DNA sequencing technologies have emerged as powerful players in genomics. With the ability to generate reads tens to thousands of kilobases in length with an accuracy approaching that of short-read sequencing technologies, these platforms have proven their ability to resolve some of the most challenging regions of the human genome, detect previously inaccessible structural variants and generate some of the first telomere-to-telomere assemblies of whole chromosomes. Long-read sequencing technologies will soon permit the routine assembly of diploid genomes, which will revolutionize genomics by revealing the full spectrum of human genetic variation, resolving some of the missing heritability and leading to the discovery of novel mechanisms of disease.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
80
|
Liang P, Saqib HSA, Ni X, Shen Y. Long-read sequencing and de novo genome assembly of marine medaka (Oryzias melastigma). BMC Genomics 2020; 21:640. [PMID: 32938378 PMCID: PMC7493909 DOI: 10.1186/s12864-020-07042-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 08/31/2020] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Marine medaka (Oryzias melastigma) is considered as an important ecotoxicological indicator to study the biochemical, physiological and molecular responses of marine organisms towards increasing amount of pollutants in marine and estuarine waters. RESULTS In this study, we reported a high-quality and accurate de novo genome assembly of marine medaka through the integration of single-molecule sequencing, Illumina paired-end sequencing, and 10X Genomics linked-reads. The 844.17 Mb assembly is estimated to cover more than 98% of the genome and is more continuous with fewer gaps and errors than the previous genome assembly. Comparison of O. melastigma with closely related species showed significant expansion of gene families associated with DNA repair and ATP-binding cassette (ABC) transporter pathways. We identified 274 genes that appear to be under significant positive selection and are involved in DNA repair, cellular transportation processes, conservation and stability of the genome. The positive selection of genes and the considerable expansion in gene numbers, especially related to stimulus responses provide strong supports for adaptations of O. melastigma under varying environmental stresses. CONCLUSIONS The highly contiguous marine medaka genome and comparative genomic analyses will increase our understanding of the underlying mechanisms related to its extraordinary adaptation capability, leading towards acceleration in the ongoing and future investigations in marine ecotoxicology.
Collapse
Affiliation(s)
- Pingping Liang
- College of the Environment and Ecology, Xiamen University, Xiamen, 361102, China
| | - Hafiz Sohaib Ahmed Saqib
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xiaomin Ni
- College of the Environment and Ecology, Xiamen University, Xiamen, 361102, China
- Fudan University, Shanghai, 200240, China
| | - Yingjia Shen
- College of the Environment and Ecology, Xiamen University, Xiamen, 361102, China.
| |
Collapse
|
81
|
Soifer L, Fong NL, Yi N, Ireland AT, Lam I, Sooknah M, Paw JS, Peluso P, Concepcion GT, Rank D, Hastie AR, Jojic V, Ruby JG, Botstein D, Roy MA. Fully Phased Sequence of a Diploid Human Genome Determined de Novo from the DNA of a Single Individual. G3 (BETHESDA, MD.) 2020; 10:2911-2925. [PMID: 32631951 PMCID: PMC7466960 DOI: 10.1534/g3.119.400995] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 06/26/2020] [Indexed: 12/17/2022]
Abstract
In recent years, improved sequencing technology and computational tools have made de novo genome assembly more accessible. Many approaches, however, generate either an unphased or only partially resolved representation of a diploid genome, in which polymorphisms are detected but not assigned to one or the other of the homologous chromosomes. Yet chromosomal phase information is invaluable for the understanding of phenotypic trait inheritance in the cases of compound heterozygosity, allele-specific expression or cis-acting variants. Here we use a combination of tools and sequencing technologies to generate a de novo diploid assembly of the human primary cell line WI-38. First, data from PacBio single molecule sequencing and Bionano Genomics optical mapping were combined to generate an unphased assembly. Next, 10x Genomics linked reads were combined with the hybrid assembly to generate a partially phased assembly. Lastly, we developed and optimized methods to use short-read (Illumina) sequencing of flow cytometry-sorted metaphase chromosomes to provide phase information. The final genome assembly was almost fully (94%) phased with the addition of approximately 2.5-fold coverage of Illumina data from the sequenced metaphase chromosomes. The diploid nature of the final de novo genome assembly improved the resolution of structural variants between the WI-38 genome and the human reference genome. The phased WI-38 sequence data are available for browsing and download at wi38.research.calicolabs.com. Our work shows that assembling a completely phased diploid genome de novo from the DNA of a single individual is now readily achievable.
Collapse
Affiliation(s)
- Llya Soifer
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | - Nicole L Fong
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | - Nelda Yi
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | | | - Irene Lam
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | | | | | | | | | - David Rank
- Pacific Biosciences, Menlo Park, CA 94025
| | | | | | - J Graham Ruby
- Calico Life Sciences LLC, South San Francisco, CA 94080
| | | | | |
Collapse
|
82
|
Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res 2020; 30:1291-1305. [PMID: 32801147 PMCID: PMC7545148 DOI: 10.1101/gr.263566.120] [Citation(s) in RCA: 419] [Impact Index Per Article: 83.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 08/04/2020] [Indexed: 12/14/2022]
Abstract
Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.
Collapse
Affiliation(s)
- Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Robert Grothe
- Pacific Biosciences, Menlo Park, California 94025, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
83
|
Liu J, Ali M, Zhou Q. Establishment and evolution of heterochromatin. Ann N Y Acad Sci 2020; 1476:59-77. [PMID: 32017156 PMCID: PMC7586837 DOI: 10.1111/nyas.14303] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 10/31/2019] [Accepted: 01/02/2020] [Indexed: 12/12/2022]
Abstract
The eukaryotic genome is packaged into transcriptionally active euchromatin and silent heterochromatin, with most studies focused on the former encompassing the majority of protein-coding genes. The recent development of various sequencing techniques has refined this classic dichromatic partition and has better illuminated the composition, establishment, and evolution of this genomic and epigenomic "dark matter" in the context of topologically associated domains and phase-separated droplets. Heterochromatin includes genomic regions that can be densely stained by chemical dyes, which have been shown to be enriched for repetitive elements and epigenetic marks, including H3K9me2/3 and H3K27me3. Heterochromatin is usually replicated late, concentrated at the nuclear periphery or around nucleoli, and usually lacks highly expressed genes; and now it is considered to be as neither genetically inert nor developmentally static. Heterochromatin guards genome integrity against transposon activities and exerts important regulatory functions by targeting beyond its contained genes. Both its nucleotide sequences and regulatory proteins exhibit rapid coevolution between species. In addition, there are dynamic transitions between euchromatin and heterochromatin during developmental and evolutionary processes. We summarize here the ever-changing characteristics of heterochromatin and propose models and principles for the evolutionary transitions of heterochromatin that have been mainly learned from studies of Drosophila and yeast. Finally, we highlight the role of sex chromosomes in studying heterochromatin evolution.
Collapse
Affiliation(s)
- Jing Liu
- MOE Laboratory of Biosystems Homeostasis & Protection, Life Sciences InstituteZhejiang UniversityHangzhouChina
- Department of Molecular Evolution and DevelopmentUniversity of ViennaViennaAustria
| | - Mujahid Ali
- Department of Molecular Evolution and DevelopmentUniversity of ViennaViennaAustria
| | - Qi Zhou
- MOE Laboratory of Biosystems Homeostasis & Protection, Life Sciences InstituteZhejiang UniversityHangzhouChina
- Department of Molecular Evolution and DevelopmentUniversity of ViennaViennaAustria
- Center for Reproductive Medicine, The 2nd Affiliated Hospital, School of MedicineZhejiang UniversityHangzhouChina
| |
Collapse
|
84
|
Kerdoncuff E, Lambert A, Achaz G. Testing for population decline using maximal linkage disequilibrium blocks. Theor Popul Biol 2020; 134:171-181. [DOI: 10.1016/j.tpb.2020.03.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 03/26/2020] [Accepted: 03/29/2020] [Indexed: 02/02/2023]
|
85
|
Almarri MA, Bergström A, Prado-Martinez J, Yang F, Fu B, Dunham AS, Chen Y, Hurles ME, Tyler-Smith C, Xue Y. Population Structure, Stratification, and Introgression of Human Structural Variation. Cell 2020; 182:189-199.e15. [PMID: 32531199 PMCID: PMC7369638 DOI: 10.1016/j.cell.2020.05.024] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 03/04/2020] [Accepted: 05/12/2020] [Indexed: 02/07/2023]
Abstract
Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, but they are still understudied. Here we present a comprehensive analysis of structural variation in the Human Genome Diversity panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify, in total, 126,018 variants, 78% of which were not identified in previous global sequencing projects. Some reach high frequency and are private to continental groups or even individual populations, including regionally restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, we discover 1,643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. Our results illustrate the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.
Collapse
Affiliation(s)
| | - Anders Bergström
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK; The Francis Crick Institute, London NW1 1AT, UK
| | | | | | - Beiyuan Fu
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Alistair S Dunham
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK; EMBL-EBI, Hinxton CB10 1SD, UK
| | - Yuan Chen
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | | | | | - Yali Xue
- Wellcome Sanger Institute, Hinxton CB10 1SA, UK.
| |
Collapse
|
86
|
Near-chromosome level genome assembly of the fruit pest Drosophila suzukii using long-read sequencing. Sci Rep 2020; 10:11227. [PMID: 32641717 PMCID: PMC7343843 DOI: 10.1038/s41598-020-67373-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 06/02/2020] [Indexed: 12/31/2022] Open
Abstract
Over the past decade, the spotted wing Drosophila, Drosophila suzukii, has invaded Europe and America and has become a major agricultural pest in these areas, thereby prompting intense research activities to better understand its biology. Two draft genome assemblies already exist for this species but contain pervasive assembly errors and are highly fragmented, which limits their values. Our purpose here was to improve the assembly of the D. suzukii genome and to annotate it in a way that facilitates comparisons with D. melanogaster. For this, we generated PacBio long-read sequencing data and assembled a novel, high-quality D. suzukii genome assembly. It is one of the largest Drosophila genomes, notably because of the expansion of its repeatome. We found that despite 16 rounds of full-sib crossings the D. suzukii strain that we sequenced has maintained high levels of polymorphism in some regions of its genome. As a consequence, the quality of the assembly of these regions was reduced. We explored possible origins of this high residual diversity, including the presence of structural variants and a possible heterogeneous admixture pattern of North American and Asian ancestry. Overall, our assembly and annotation constitute a high-quality genomic resource that can be used for both high-throughput sequencing approaches, as well as manipulative genetic technologies to study D. suzukii.
Collapse
|
87
|
Safdar LB, Almas F, Sarfraz S, Ejaz M, Ali Z, Mahmood Z, Yang L, Tehseen MM, Ikram M, Liu S, Quraishi UM. Genome-wide association study identifies five new cadmium uptake loci in wheat. THE PLANT GENOME 2020; 13:e20030. [PMID: 33016603 DOI: 10.1002/tpg2.20030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 04/26/2020] [Accepted: 04/28/2020] [Indexed: 05/28/2023]
Abstract
Cadmium (Cd) toxicity is a serious threat to future food security and health safety. To identify genetic factors contributing to Cd uptake in wheat, we conducted a genome-wide association study with genotyping from 90K SNP array. A spring wheat diversity panel was planted under normal conditions and Cd stress (50 mg Cd/kg soil). The impact of Cd stress on agronomic traits ranged from a reduction of 16% in plant height to 93% in grain iron content. Individual genotypes showed a considerable variation for Cd uptake and translocation subdividing the panel into three groups: (1) hyper-accumulators (i.e. high Leaf_Cd and low Seed_Cd ), (2) hyper-translocators (i.e. low Leaf_Cd and high Seed_Cd ), and (3) moderate lines (i.e. low Leaf_Cd and low Seed_Cd ). Two lines (SKD-1 and TD-1) maintained an optimum grain yield under Cd stress and were therefore considered as Cd resistant lines. Genome-wide association identified 179 SNP-trait associations for various traits including 16 for Cd uptake at a significance level of P < .001. However, only five SNPs were significant after applying multiple testing correction. These loci were associated with seed-cadmium, grain-iron, and grain-zinc: qSCd-1A, qSCd-1D, qZn-2B1, qZn-2B2, and qFe-6D. These five loci had not been identified in the previously reported studies for Cd uptake in wheat. These loci and the underlying genes should be further investigated using molecular biology techniques to identify Cd resistant genes in wheat.
Collapse
Affiliation(s)
- Luqman Bin Safdar
- Department of Plant Sciences, Quaid-i-Azam University, Islamabad, 45320, Pakistan
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Wuhan, 430062, China
| | - Fakhrah Almas
- Department of Plant Sciences, Quaid-i-Azam University, Islamabad, 45320, Pakistan
| | - Sidra Sarfraz
- Department of Plant Sciences, Quaid-i-Azam University, Islamabad, 45320, Pakistan
| | - Muhammad Ejaz
- Department of Plant Sciences, Quaid-i-Azam University, Islamabad, 45320, Pakistan
| | - Zeshan Ali
- Plant Physiology Program, Crop Sciences Institute, National Agricultural Research Centre, Park Road, Islamabad, PO 45500, Pakistan
| | - Zahid Mahmood
- Department of Plant Sciences, Quaid-i-Azam University, Islamabad, 45320, Pakistan
- Wheat Programme, Crop Sciences Institute, National Agricultural Research Centre, Park Road, Islamabad, PO 45500, Pakistan
| | - Li Yang
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Wuhan, 430062, China
| | | | - Muhammad Ikram
- Statistical Genomics Lab, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Shengyi Liu
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Wuhan, 430062, China
| | - Umar Masood Quraishi
- Department of Plant Sciences, Quaid-i-Azam University, Islamabad, 45320, Pakistan
| |
Collapse
|
88
|
Housman G, Gilad Y. Prime time for primate functional genomics. Curr Opin Genet Dev 2020; 62:1-7. [PMID: 32544775 DOI: 10.1016/j.gde.2020.04.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 04/21/2020] [Accepted: 04/24/2020] [Indexed: 12/14/2022]
Abstract
Functional genomics research is continually improving our understanding of genotype-phenotype relationships in humans, and comparative genomics perspectives can provide additional insight into the evolutionary histories of such relationships. To specifically identify conservation or species-specific divergence in humans, we must look to our closest extant evolutionary relatives. Primate functional genomics research has been steadily advancing and expanding, in spite of several limitations and challenges that this field faces. New technologies and cheaper sequencing provide a unique opportunity to enhance and expand primate comparative studies, and we outline possible paths going forward. The potential human-specific insights that can be gained from primate functional genomics research are substantial, and we propose that now is a prime time to expand such endeavors.
Collapse
Affiliation(s)
- Genevieve Housman
- Section of Genetic Medicine, Department of Medicine, University of Chicago, 5841 S. Maryland Ave., N417, MC6091, Chicago, IL 60637 USA.
| | - Yoav Gilad
- Section of Genetic Medicine, Department of Medicine, University of Chicago, 5841 S. Maryland Ave., N417, MC6091, Chicago, IL 60637 USA; Department of Human Genetics, University of Chicago, Cummings Life Science Center, 928 E. 58th St., Chicago, IL 60637 USA
| |
Collapse
|
89
|
Branching out: what omics can tell us about primate evolution. Curr Opin Genet Dev 2020; 62:65-71. [DOI: 10.1016/j.gde.2020.06.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 06/02/2020] [Accepted: 06/04/2020] [Indexed: 12/25/2022]
|
90
|
Termignoni-Garcia F, Louder MIM, Balakrishnan CN, O’Connell L, Edwards SV. Prospects for sociogenomics in avian cooperative breeding and parental care. Curr Zool 2020; 66:293-306. [PMID: 32440290 PMCID: PMC7233861 DOI: 10.1093/cz/zoz057] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 11/20/2019] [Indexed: 01/08/2023] Open
Abstract
For the last 40 years, the study of cooperative breeding (CB) in birds has proceeded primarily in the context of discovering the ecological, geographical, and behavioral drivers of helping. The advent of molecular tools in the early 1990s assisted in clarifying the relatedness of helpers to those helped, in some cases, confirming predictions of kin selection theory. Methods for genome-wide analysis of sequence variation, gene expression, and epigenetics promise to add new dimensions to our understanding of avian CB, primarily in the area of molecular and developmental correlates of delayed breeding and dispersal, as well as the ontogeny of achieving parental status in nature. Here, we outline key ways in which modern -omics approaches, in particular genome sequencing, transcriptomics, and epigenetic profiling such as ATAC-seq, can be used to add a new level of analysis of avian CB. Building on recent and ongoing studies of avian social behavior and sociogenomics, we review how high-throughput sequencing of a focal species or clade can provide a robust foundation for downstream, context-dependent destructive and non-destructive sampling of specific tissues or physiological states in the field for analysis of gene expression and epigenetics. -Omics approaches have the potential to inform not only studies of the diversification of CB over evolutionary time, but real-time analyses of behavioral interactions in the field or lab. Sociogenomics of birds represents a new branch in the network of methods used to study CB, and can help clarify ways in which the different levels of analysis of CB ultimately interact in novel and unexpected ways.
Collapse
Affiliation(s)
- Flavia Termignoni-Garcia
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Matthew I M Louder
- International Research Center for Neurointelligence, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | | | - Lauren O’Connell
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
91
|
Kuhl H, Li L, Wuertz S, Stöck M, Liang XF, Klopp C. CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes. Gigascience 2020; 9:giaa034. [PMID: 32449778 PMCID: PMC7247394 DOI: 10.1093/gigascience/giaa034] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 01/29/2020] [Accepted: 03/24/2020] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads. CONCLUSIONS CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects.
Collapse
Affiliation(s)
- Heiner Kuhl
- Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany
| | - Ling Li
- Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany
- College of Fisheries, Chinese Perch Research Center, Huazhong Agricultural University; Innovation Base for Chinese Perch Breeding, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, No.1 Shizishan Street, Hongshan District, 430070 Wuhan, Hubei Province, P.R. China
| | - Sven Wuertz
- Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany
| | - Matthias Stöck
- Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany
| | - Xu-Fang Liang
- College of Fisheries, Chinese Perch Research Center, Huazhong Agricultural University; Innovation Base for Chinese Perch Breeding, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, No.1 Shizishan Street, Hongshan District, 430070 Wuhan, Hubei Province, P.R. China
| | - Christophe Klopp
- Sigenae, Bioinfo Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRAe, 24 Chemin de Borde Rouge, 31320 Auzeville-Tolosane, Castanet Tolosan, France
| |
Collapse
|
92
|
Li M, Zhang W, Zhou X. Identification of genes involved in the evolution of human intelligence through combination of inter-species and intra-species genetic variations. PeerJ 2020; 8:e8912. [PMID: 32337102 PMCID: PMC7167246 DOI: 10.7717/peerj.8912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 03/15/2020] [Indexed: 11/20/2022] Open
Abstract
Understanding the evolution of human intelligence is an important undertaking in the science of human genetics. A great deal of biological research has been conducted to search for genes which are related to the significant increase in human brain volume and cerebral cortex complexity during hominid evolution. However, genetic changes affecting intelligence in hominid evolution have remained elusive. We supposed that a subset of intelligence-related genes, which harbored intra-species variations in human populations, may also be evolution-related genes which harbored inter-species variations between humans (Homo sapiens) and great apes (including Pan troglodytes and Pongo abelii). Here we combined inter-species and intra-species genetic variations to discover genes involved in the evolution of human intelligence. Information was collected from published GWAS works on intelligence and a total of 549 genes located within the intelligence-associated loci were identified. The intelligence-related genes containing human-specific variations were detected based on the latest high-quality genome assemblies of three human's closest species. Finally, we identified 40 strong candidates involved in human intelligence evolution. Expression analysis using RNA-Seq data revealed that most of the genes displayed a relatively high expression in the cerebral cortex. For these genes, there is a distinct expression pattern between humans and other species, especially in neocortex tissues. Our work provided a list of strong candidates for the evolution of human intelligence, and also implied that some intelligence-related genes may undergo inter-species evolution and contain intra-species variation.
Collapse
Affiliation(s)
- Mengjie Li
- College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Wenting Zhang
- College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Xiaoyi Zhou
- College of Life Sciences, Shanghai Normal University, Shanghai, China
| |
Collapse
|
93
|
Olson ND, Treangen TJ, Hill CM, Cepeda-Espinoza V, Ghurye J, Koren S, Pop M. Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes. Brief Bioinform 2020; 20:1140-1150. [PMID: 28968737 DOI: 10.1093/bib/bbx098] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 07/13/2017] [Indexed: 01/09/2023] Open
Abstract
Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.
Collapse
|
94
|
Jayakumar V, Ishii H, Seki M, Kumita W, Inoue T, Hase S, Sato K, Okano H, Sasaki E, Sakakibara Y. An improved de novo genome assembly of the common marmoset genome yields improved contiguity and increased mapping rates of sequence data. BMC Genomics 2020; 21:243. [PMID: 32241258 PMCID: PMC7114785 DOI: 10.1186/s12864-020-6657-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The common marmoset (Callithrix jacchus) is one of the most studied primate model organisms. However, the marmoset genomes available in the public databases are highly fragmented and filled with sequence gaps, hindering research advances related to marmoset genomics and transcriptomics. RESULTS Here we utilize single-molecule, long-read sequence data to improve and update the existing genome assembly and report a near-complete genome of the common marmoset. The assembly is of 2.79 Gb size, with a contig N50 length of 6.37 Mb and a chromosomal scaffold N50 length of 143.91 Mb, representing the most contiguous and high-quality marmoset genome up to date. Approximately 90% of the assembled genome was represented in contigs longer than 1 Mb, with approximately 104-fold improvement in contiguity over the previously published marmoset genome. More than 98% of the gaps from the previously published genomes were filled successfully, which improved the mapping rates of genomic and transcriptomic data on to the assembled genome. CONCLUSIONS Altogether the updated, high-quality common marmoset genome assembly provide improvements at various levels over the previous versions of the marmoset genome assemblies. This will allow researchers working on primate genomics to apply the genome more efficiently for their genomic and transcriptomic sequence data.
Collapse
Affiliation(s)
- Vasanthan Jayakumar
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Hiromi Ishii
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Misato Seki
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Wakako Kumita
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kawasaki, Kanagawa 210-0821 Japan
| | - Takashi Inoue
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kawasaki, Kanagawa 210-0821 Japan
| | - Sumitaka Hase
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Kengo Sato
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| | - Hideyuki Okano
- Department of Physiology, Keio University School of Medicine, Shinjuku, Tokyo, 160-8582 Japan
- Laboratory for Marmoset Neural Architecture, RIKEN Center for Brain Science, Wako-shi, Saitama, 351-0198 Japan
| | - Erika Sasaki
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kawasaki, Kanagawa 210-0821 Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522 Japan
| |
Collapse
|
95
|
Teterina AA, Willis JH, Phillips PC. Chromosome-Level Assembly of the Caenorhabditis remanei Genome Reveals Conserved Patterns of Nematode Genome Organization. Genetics 2020; 214:769-780. [PMID: 32111628 PMCID: PMC7153949 DOI: 10.1534/genetics.119.303018] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 02/24/2020] [Indexed: 12/23/2022] Open
Abstract
The nematode Caenorhabditis elegans is one of the key model systems in biology, including possessing the first fully assembled animal genome. Whereas C. elegans is a self-reproducing hermaphrodite with fairly limited within-population variation, its relative C. remanei is an outcrossing species with much more extensive genetic variation, making it an ideal parallel model system for evolutionary genetic investigations. Here, we greatly improve on previous assemblies by generating a chromosome-level assembly of the entire C. remanei genome (124.8 Mb of total size) using long-read sequencing and chromatin conformation capture data. Like other fully assembled genomes in the genus, we find that the C. remanei genome displays a high degree of synteny with C. elegans despite multiple within-chromosome rearrangements. Both genomes have high gene density in central regions of chromosomes relative to chromosome ends and the opposite pattern for the accumulation of repetitive elements. C. elegans and C. remanei also show similar patterns of interchromosome interactions, with the central regions of chromosomes appearing to interact with one another more than the distal ends. The new C. remanei genome presented here greatly augments the use of the Caenorhabditis as a platform for comparative genomics and serves as a basis for molecular population genetics within this highly diverse species.
Collapse
Affiliation(s)
- Anastasia A Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403
- Center of Parasitology, A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow 117071, Russia
| | - John H Willis
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403
| | - Patrick C Phillips
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403
| |
Collapse
|
96
|
Jasinska AJ. Resources for functional genomic studies of health and development in nonhuman primates. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2020; 171 Suppl 70:174-194. [PMID: 32221967 DOI: 10.1002/ajpa.24051] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Revised: 01/22/2020] [Accepted: 02/26/2020] [Indexed: 01/01/2023]
Abstract
Primates display a wide range of phenotypic variation underlaid by complex genetically regulated mechanisms. The links among DNA sequence, gene function, and phenotype have been of interest from an evolutionary perspective, to understand functional genome evolution and its phenotypic consequences, and from a biomedical perspective to understand the shared and human-specific roots of health and disease. Progress in methods for characterizing genetic, transcriptomic, and DNA methylation (DNAm) variation is driving the rapid development of extensive omics resources, which are now increasingly available from humans as well as a growing number of nonhuman primates (NHPs). The fast growth of large-scale genomic data is driving the emergence of integrated tools and databases, thus facilitating studies of gene functionality across primates. This review describes NHP genomic resources that can aid in exploration of how genes shape primate phenotypes. It focuses on the gene expression trajectories across development in different tissues, the identification of functional genetic variation (including variants deleterious for protein function and regulatory variants modulating gene expression), and DNAm profiles as an emerging tool to understand the process of aging. These resources enable comparative functional genomics approaches to identify species-specific and primate-shared gene functionalities associated with health and development.
Collapse
Affiliation(s)
- Anna J Jasinska
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, California, USA.,Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.,Eye on Primates, Los Angeles, California, USA
| |
Collapse
|
97
|
O'Neill K, Brocks D, Hammell MG. Mobile genomics: tools and techniques for tackling transposons. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190345. [PMID: 32075565 PMCID: PMC7061981 DOI: 10.1098/rstb.2019.0345] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/20/2019] [Indexed: 12/22/2022] Open
Abstract
Next-generation sequencing approaches have fundamentally changed the types of questions that can be asked about gene function and regulation. With the goal of approaching truly genome-wide quantifications of all the interaction partners and downstream effects of particular genes, these quantitative assays have allowed for an unprecedented level of detail in exploring biological interactions. However, many challenges remain in our ability to accurately describe and quantify the interactions that take place in those hard to reach and extremely repetitive regions of our genome comprised mostly of transposable elements (TEs). Tools dedicated to TE-derived sequences have lagged behind, making the inclusion of these sequences in genome-wide analyses difficult. Recent improvements, both computational and experimental, allow for the better inclusion of TE sequences in genomic assays and a renewed appreciation for the importance of TE biology. This review will discuss the recent improvements that have been made in the computational analysis of TE-derived sequences as well as the areas where such analysis still proves difficult. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Collapse
Affiliation(s)
- Kathryn O'Neill
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - David Brocks
- Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science, Rehovot, Israel
| | - Molly Gale Hammell
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
98
|
Liu Y, Cheng J, Siejka-Zielińska P, Weldon C, Roberts H, Lopopolo M, Magri A, D'Arienzo V, Harris JM, McKeating JA, Song CX. Accurate targeted long-read DNA methylation and hydroxymethylation sequencing with TAPS. Genome Biol 2020; 21:54. [PMID: 32127008 PMCID: PMC7053107 DOI: 10.1186/s13059-020-01969-6] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 02/21/2020] [Indexed: 12/17/2022] Open
Abstract
We present long-read Tet-assisted pyridine borane sequencing (lrTAPS) for targeted base-resolution sequencing of DNA methylation and hydroxymethylation in regions up to 10 kb from nanogram-level input. Compatible with both Oxford Nanopore and PacBio Single-Molecule Real-Time (SMRT) sequencing, lrTAPS detects methylation with accuracy comparable to short-read Illumina sequencing but with long-range epigenetic phasing. We applied lrTAPS to sequence difficult-to-map regions in mouse embryonic stem cells and to identify distinct methylation events in the integrated hepatitis B virus genome.
Collapse
Affiliation(s)
- Yibin Liu
- Nuffield Department of Medicine, Ludwig Institute for Cancer Research, University of Oxford, Oxford, OX3 7FZ, UK
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, OX3 7FZ, UK
| | - Jingfei Cheng
- Nuffield Department of Medicine, Ludwig Institute for Cancer Research, University of Oxford, Oxford, OX3 7FZ, UK
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, OX3 7FZ, UK
| | - Paulina Siejka-Zielińska
- Nuffield Department of Medicine, Ludwig Institute for Cancer Research, University of Oxford, Oxford, OX3 7FZ, UK
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, OX3 7FZ, UK
| | - Carika Weldon
- Oxford Genomics Centre, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Hannah Roberts
- Oxford Genomics Centre, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Maria Lopopolo
- Oxford Genomics Centre, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Andrea Magri
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, OX3 7FZ, UK
| | - Valentina D'Arienzo
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, OX3 7FZ, UK
| | - James M Harris
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, OX3 7FZ, UK
| | - Jane A McKeating
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, OX3 7FZ, UK
| | - Chun-Xiao Song
- Nuffield Department of Medicine, Ludwig Institute for Cancer Research, University of Oxford, Oxford, OX3 7FZ, UK.
- Nuffield Department of Medicine, Target Discovery Institute, University of Oxford, Oxford, OX3 7FZ, UK.
| |
Collapse
|
99
|
VOLLGER MITCHELLR, LOGSDON GLENNISA, AUDANO PETERA, SULOVARI ARVIS, PORUBSKY DAVID, PELUSO PAUL, WENGER AARONM, CONCEPCION GREGORYT, KRONENBERG ZEVN, MUNSON KATHERINEM, BAKER CARL, SANDERS ASHLEYD, SPIERINGS DIANAC, LANSDORP PETERM, SURTI URVASHI, HUNKAPILLER MICHAELW, EICHLER EVANE. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann Hum Genet 2020; 84:125-140. [PMID: 31711268 PMCID: PMC7015760 DOI: 10.1111/ahg.12364] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 10/17/2019] [Accepted: 10/18/2019] [Indexed: 01/14/2023]
Abstract
The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.
Collapse
Affiliation(s)
- MITCHELL R. VOLLGER
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- These authors contributed equally to this work
| | - GLENNIS A. LOGSDON
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- These authors contributed equally to this work
| | - PETER A. AUDANO
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - ARVIS SULOVARI
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - DAVID PORUBSKY
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - PAUL PELUSO
- Pacific Biosciences of California, Inc., Menlo Park, CA 94025, USA
| | - AARON M. WENGER
- Pacific Biosciences of California, Inc., Menlo Park, CA 94025, USA
| | | | | | - KATHERINE M. MUNSON
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - CARL BAKER
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - ASHLEY D. SANDERS
- European Molecular Biology Laboratory, Genome Biology Unit, 69117, Heidelberg, Germany
| | - DIANA C.J. SPIERINGS
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
| | - PETER M. LANSDORP
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - URVASHI SURTI
- Department of Pathology, University of Pittsburgh School of Medicine, and University of Pittsburgh Medical Center, Pittsburgh, PA 15213, USA
| | | | - EVAN E. EICHLER
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
100
|
Wang L, Wu J, Liu X, Di D, Liang Y, Feng Y, Zhang S, Li B, Qi XG. A high-quality genome assembly for the endangered golden snub-nosed monkey (Rhinopithecus roxellana). Gigascience 2020; 8:5553376. [PMID: 31437279 PMCID: PMC6705546 DOI: 10.1093/gigascience/giz098] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 05/05/2019] [Accepted: 07/26/2019] [Indexed: 01/19/2023] Open
Abstract
Background The golden snub-nosed monkey (Rhinopithecus roxellana) is an endangered colobine species endemic to China, which has several distinct traits including a unique social structure. Although a genome assembly for R. roxellana is available, it is incomplete and fragmented because it was constructed using short-read sequencing technology. Thus, important information such as genome structural variation and repeat sequences may be absent. Findings To obtain a high-quality chromosomal assembly for R. roxellana qinlingensis, we used 5 methods: Pacific Bioscience single-molecule real-time sequencing, Illumina paired-end sequencing, BioNano optical maps, 10X Genomics link-reads, and high-throughput chromosome conformation capture. The assembled genome was ∼3.04 Gb, with a contig N50 of 5.72 Mb and a scaffold N50 of 144.56 Mb. This represented a 100-fold improvement over the previously published genome. In the new genome, 22,497 protein-coding genes were predicted, of which 22,053 were functionally annotated. Gene family analysis showed that 993 and 2,745 gene families were expanded and contracted, respectively. The reconstructed phylogeny recovered a close relationship between R. rollexana and Macaca mulatta, and these 2 species diverged ∼13.4 million years ago. Conclusion We constructed a high-quality genome assembly of the Qinling golden snub-nosed monkey; it had superior continuity and accuracy, which might be useful for future genetic studies in this species and as a new standard reference genome for colobine primates. In addition, the updated genome assembly might improve our understanding of this species and could assist conservation efforts.
Collapse
Affiliation(s)
- Lu Wang
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Jinwei Wu
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Xiaomei Liu
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Dandan Di
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Yuhong Liang
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Yifei Feng
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Suyun Zhang
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Baoguo Li
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China
| | - Xiao-Guang Qi
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| |
Collapse
|