1
|
Liu M, Zhang F, Lu H, Xue H, Dong X, Li Z, Xu J, Wang W, Wei C. PPanG: a precision pangenome browser enabling nucleotide-level analysis of genomic variations in individual genomes and their graph-based pangenome. BMC Genomics 2024; 25:405. [PMID: 38658835 PMCID: PMC11044437 DOI: 10.1186/s12864-024-10302-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 04/11/2024] [Indexed: 04/26/2024] Open
Abstract
Graph-based pangenome is gaining more popularity than linear pangenome because it stores more comprehensive information of variations. However, traditional linear genome browser has its own advantages, especially the tremendous resources accumulated historically. With the fast-growing number of individual genomes and their annotations available, the demand for a genome browser to visualize genome annotation for many individuals together with a graph-based pangenome is getting higher and higher. Here we report a new pangenome browser PPanG, a precise pangenome browser enabling nucleotide-level comparison of individual genome annotations together with a graph-based pangenome. Nine rice genomes with annotations were provided by default as potential references, and any individual genome can be selected as the reference. Our pangenome browser provides unprecedented insights on genome variations at different levels from base to gene, and reveals how the structures of a gene could differ for individuals. PPanG can be applied to any species with multiple individual genomes available and it is available at https://cgm.sjtu.edu.cn/PPanG .
Collapse
Affiliation(s)
- Mingwei Liu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Fan Zhang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100081, China
- College of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Huimin Lu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Hongzhang Xue
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Xiaorui Dong
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Zhikang Li
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100081, China
- College of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Jianlong Xu
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100081, China
| | - Wensheng Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100081, China.
- College of Agronomy, Anhui Agricultural University, Hefei, 230036, China.
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, 572024, China.
| | - Chaochun Wei
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
| |
Collapse
|
2
|
Miga KH, Eichler EE. Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes. Am J Hum Genet 2023; 110:1832-1840. [PMID: 37922882 PMCID: PMC10645551 DOI: 10.1016/j.ajhg.2023.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 11/07/2023] Open
Abstract
Advances in long-read sequencing and assembly now mean that individual labs can generate phased genomes that are more accurate and more contiguous than the original human reference genome. With declining costs and increasing democratization of technology, we suggest that complete genome assemblies, where both parental haplotypes are phased telomere to telomere, will become standard in human genetics. Soon, even in clinical settings where rigorous sample-handling standards must be met, affected individuals could have reference-grade genomes fully sequenced and assembled in just a few hours given advances in technology, computational processing, and annotation. Complete genetic variant discovery will transform how we map, catalog, and associate variation with human disease and fundamentally change our understanding of the genetic diversity of all humans.
Collapse
Affiliation(s)
- Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
3
|
Naithani S, Deng CH, Sahu SK, Jaiswal P. Exploring Pan-Genomes: An Overview of Resources and Tools for Unraveling Structure, Function, and Evolution of Crop Genes and Genomes. Biomolecules 2023; 13:1403. [PMID: 37759803 PMCID: PMC10527062 DOI: 10.3390/biom13091403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/29/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
The availability of multiple sequenced genomes from a single species made it possible to explore intra- and inter-specific genomic comparisons at higher resolution and build clade-specific pan-genomes of several crops. The pan-genomes of crops constructed from various cultivars, accessions, landraces, and wild ancestral species represent a compendium of genes and structural variations and allow researchers to search for the novel genes and alleles that were inadvertently lost in domesticated crops during the historical process of crop domestication or in the process of extensive plant breeding. Fortunately, many valuable genes and alleles associated with desirable traits like disease resistance, abiotic stress tolerance, plant architecture, and nutrition qualities exist in landraces, ancestral species, and crop wild relatives. The novel genes from the wild ancestors and landraces can be introduced back to high-yielding varieties of modern crops by implementing classical plant breeding, genomic selection, and transgenic/gene editing approaches. Thus, pan-genomic represents a great leap in plant research and offers new avenues for targeted breeding to mitigate the impact of global climate change. Here, we summarize the tools used for pan-genome assembly and annotations, web-portals hosting plant pan-genomes, etc. Furthermore, we highlight a few discoveries made in crops using the pan-genomic approach and future potential of this emerging field of study.
Collapse
Affiliation(s)
- Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA;
| | - Cecilia H. Deng
- Molecular & Digital Breeing Group, New Cultivar Innovation, The New Zealand Institute for Plant and Food Research Limited, Private Bag 92169, Auckland 1142, New Zealand;
| | - Sunil Kumar Sahu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China;
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA;
| |
Collapse
|
4
|
Secomandi S, Gallo GR, Sozzoni M, Iannucci A, Galati E, Abueg L, Balacco J, Caprioli M, Chow W, Ciofi C, Collins J, Fedrigo O, Ferretti L, Fungtammasan A, Haase B, Howe K, Kwak W, Lombardo G, Masterson P, Messina G, Møller AP, Mountcastle J, Mousseau TA, Ferrer Obiol J, Olivieri A, Rhie A, Rubolini D, Saclier M, Stanyon R, Stucki D, Thibaud-Nissen F, Torrance J, Torroni A, Weber K, Ambrosini R, Bonisoli-Alquati A, Jarvis ED, Gianfranceschi L, Formenti G. A chromosome-level reference genome and pangenome for barn swallow population genomics. Cell Rep 2023; 42:111992. [PMID: 36662619 PMCID: PMC10044405 DOI: 10.1016/j.celrep.2023.111992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 07/20/2022] [Accepted: 01/04/2023] [Indexed: 01/20/2023] Open
Abstract
Insights into the evolution of non-model organisms are limited by the lack of reference genomes of high accuracy, completeness, and contiguity. Here, we present a chromosome-level, karyotype-validated reference genome and pangenome for the barn swallow (Hirundo rustica). We complement these resources with a reference-free multialignment of the reference genome with other bird genomes and with the most comprehensive catalog of genetic markers for the barn swallow. We identify potentially conserved and accelerated genes using the multialignment and estimate genome-wide linkage disequilibrium using the catalog. We use the pangenome to infer core and accessory genes and to detect variants using it as a reference. Overall, these resources will foster population genomics studies in the barn swallow, enable detection of candidate genes in comparative genomics studies, and help reduce bias toward a single reference genome.
Collapse
Affiliation(s)
- Simona Secomandi
- Department of Biosciences, University of Milan, Milan, Italy; Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Guido R Gallo
- Department of Biosciences, University of Milan, Milan, Italy
| | | | - Alessio Iannucci
- Department of Biology, University of Florence, Sesto Fiorentino (FI), Italy
| | - Elena Galati
- Department of Biosciences, University of Milan, Milan, Italy
| | - Linelle Abueg
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Jennifer Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Manuela Caprioli
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | | | - Claudio Ciofi
- Department of Biology, University of Florence, Sesto Fiorentino (FI), Italy
| | | | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Luca Ferretti
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | | | - Bettina Haase
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Woori Kwak
- Department of Medical and Biological Sciences, The Catholic University of Korea, Bucheon 14662, Korea
| | - Gianluca Lombardo
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Anders P Møller
- Ecologie Systématique Evolution, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Orsay Cedex, France
| | | | - Timothy A Mousseau
- Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA
| | - Joan Ferrer Obiol
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | - Anna Olivieri
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Diego Rubolini
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | | | - Roscoe Stanyon
- Department of Biology, University of Florence, Sesto Fiorentino (FI), Italy
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Antonio Torroni
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | | | - Roberto Ambrosini
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University - Pomona, Pomona, CA, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA; The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
5
|
Wang S, Qian YQ, Zhao RP, Chen LL, Song JM. Graph-based pan-genomes: increased opportunities in plant genomics. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:24-39. [PMID: 36255144 DOI: 10.1093/jxb/erac412] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 10/17/2022] [Indexed: 06/16/2023]
Abstract
Due to the development of sequencing technology and the great reduction in sequencing costs, an increasing number of plant genomes have been assembled, and numerous genomes have revealed large amounts of variations. However, a single reference genome does not allow the exploration of species diversity, and therefore the concept of pan-genome was developed. A pan-genome is a collection of all sequences available for a species, including a large number of consensus sequences, large structural variations, and small variations including single nucleotide polymorphisms and insertions/deletions. A simple linear pan-genome does not allow these structural variations to be intuitively characterized, so graph-based pan-genomes have been developed. These pan-genomes store sequence and structural variation information in the form of nodes and paths to store and display species variation information in a more intuitive manner. The key role of graph-based pan-genomes is to expand the coordinate system of the linear reference genome to accommodate more regions of genetic diversity. Here, we review the origin and development of graph-based pan-genomes, explore their application in plant research, and further highlight the application of graph-based pan-genomes for future plant breeding.
Collapse
Affiliation(s)
- Shuo Wang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning, 530004, China
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yong-Qing Qian
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning, 530004, China
| | - Ru-Peng Zhao
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning, 530004, China
| | - Ling-Ling Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning, 530004, China
| | - Jia-Ming Song
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning, 530004, China
| |
Collapse
|
6
|
Mastromatteo S, Chen A, Gong J, Lin F, Thiruvahindrapuram B, Sung WW, Whitney J, Wang Z, Patel RV, Keenan K, Halevy A, Panjwani N, Avolio J, Wang C, Côté-Maurais G, Bégin S, Adam D, Brochiero E, Bjornson C, Chilvers M, Price A, Parkins M, van Wylick R, Mateos-Corral D, Hughes D, Smith MJ, Morrison N, Tullis E, Stephenson AL, Wilcox P, Quon BS, Leung WM, Solomon M, Sun L, Ratjen F, Strug LJ. High-quality read-based phasing of cystic fibrosis cohort informs genetic understanding of disease modification. HGG ADVANCES 2023; 4:100156. [PMID: 36386424 PMCID: PMC9647008 DOI: 10.1016/j.xhgg.2022.100156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
Phasing of heterozygous alleles is critical for interpretation of cis-effects of disease-relevant variation. We sequenced 477 individuals with cystic fibrosis (CF) using linked-read sequencing, which display an average phase block N50 of 4.39 Mb. We use these samples to construct a graph representation of CFTR haplotypes, demonstrating its utility for understanding complex CF alleles. These are visualized in a Web app, CFTbaRcodes, that enables interactive exploration of CFTR haplotypes present in this cohort. We perform fine-mapping and phasing of the chr7q35 trypsinogen locus associated with CF meconium ileus, an intestinal obstruction at birth associated with more severe CF outcomes and pancreatic disease. A 20-kb deletion polymorphism and a PRSS2 missense variant p.Thr8Ile (rs62473563) are shown to independently contribute to meconium ileus risk (p = 0.0028, p = 0.011, respectively) and are PRSS2 pancreas eQTLs (p = 9.5 × 10−7 and p = 1.4 × 10−4, respectively), suggesting the mechanism by which these polymorphisms contribute to CF. The phase information from linked reads provides a putative causal explanation for variation at a CF-relevant locus, which also has implications for the genetic basis of non-CF pancreatitis, to which this locus has been reported to contribute.
Collapse
|
7
|
Singh V, Pandey S, Bhardwaj A. From the reference human genome to human pangenome: Premise, promise and challenge. Front Genet 2022; 13:1042550. [PMID: 36437921 PMCID: PMC9684177 DOI: 10.3389/fgene.2022.1042550] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 10/21/2022] [Indexed: 11/11/2022] Open
Abstract
The Reference Human Genome remains the single most important resource for mapping genetic variations and assessing their impact. However, it is monophasic, incomplete and not representative of the variation that exists in the population. Given the extent of ethno-geographic diversity and the consequent diversity in clinical manifestations of these variations, population specific references were developed overtime. The dramatically plummeting cost of sequencing whole genomes and the advent of third generation long range sequencers allowing accurate, error free, telomere-to-telomere assemblies of human genomes present us with a unique and unprecedented opportunity to develop a more composite standard reference consisting of a collection of multiple genomes that capture the maximal variation existing in the population, with the deepest annotation possible, enabling a realistic, reliable and actionable estimation of clinical significance of specific variations. The Human Pangenome Project thus is a logical next step promising a more accurate and global representation of genomic variations. The pangenome effort must be reciprocally complemented with precise variant discovery tools and exhaustive annotation to ensure unambiguous clinical assessment of the variant in ethno-geographical context. Here we discuss a broad roadmap, the challenges and way forward in developing a universal pangenome reference including data visualization techniques and integration of prior knowledge base in the new graph based architecture and tools to submit, compare, query, annotate and retrieve relevant information from the pangenomes. The biggest challenge, however, will be the ethical, legal and social implications and the training of human resource to the new reference paradigm.
Collapse
Affiliation(s)
- Vipin Singh
- University Institute of Biotechnology, Chandigarh University, Mohali, India
| | - Shweta Pandey
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Anshu Bhardwaj
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
- *Correspondence: Anshu Bhardwaj,
| |
Collapse
|
8
|
Guarracino A, Heumos S, Nahnsen S, Prins P, Garrison E. ODGI: understanding pangenome graphs. BIOINFORMATICS (OXFORD, ENGLAND) 2022; 38:3319-3326. [PMID: 35552372 DOI: 10.1101/2021.11.10.467921] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 03/18/2022] [Indexed: 05/24/2023]
Abstract
MOTIVATION Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way. RESULTS We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs. AVAILABILITY AND IMPLEMENTATION ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
9
|
Guarracino A, Heumos S, Nahnsen S, Prins P, Garrison E. ODGI: understanding pangenome graphs. Bioinformatics 2022; 38:3319-3326. [PMID: 35552372 PMCID: PMC9237687 DOI: 10.1093/bioinformatics/btac308] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 03/18/2022] [Indexed: 11/13/2022] Open
Abstract
Motivation Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way. Results We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs. Availability and implementation ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrea Guarracino
- Genomics Research Centre, Human Technopole, Viale Rita Levi-Montalcini 1, Milan, 20157, Italy
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, 72076, Germany.,Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, 72076, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, 72076, Germany.,Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, 72076, Germany
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, 38163, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, 38163, USA
| |
Collapse
|
10
|
Goel M, Schneeberger K. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics 2022; 38:2922-2926. [PMID: 35561173 PMCID: PMC9113368 DOI: 10.1093/bioinformatics/btac196] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 03/15/2022] [Accepted: 04/11/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Third-generation genome sequencing technologies have led to a sharp increase in the number of high-quality genome assemblies. This allows the comparison of multiple assembled genomes of individual species and demands new tools for visualizing their structural properties. Here, we present plotsr, an efficient tool to visualize structural similarities and rearrangements between genomes. It can be used to compare genomes on chromosome level or to zoom in on any selected region. In addition, plotsr can augment the visualization with regional identifiers (e.g. genes or genomic markers) or histogram tracks for continuous features (e.g. GC content or polymorphism density). AVAILABILITY AND IMPLEMENTATION plotsr is implemented as a python package and uses the standard matplotlib library for plotting. It is freely available under the MIT license at GitHub (https://github.com/schneebergerlab/plotsr) and bioconda (https://anaconda.org/bioconda/plotsr). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manish Goel
- Faculty of Biology, LMU Munich, Planegg-Martinsried 82152, Germany
- Department of Genetics, Faculty of Biology, LMU Munich, Germany
| | | |
Collapse
|
11
|
The Human Pangenome Project: a global resource to map genomic diversity. Nature 2022; 604:437-446. [PMID: 35444317 DOI: 10.1038/s41586-022-04601-8] [Citation(s) in RCA: 141] [Impact Index Per Article: 70.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 03/01/2022] [Indexed: 12/20/2022]
Abstract
The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.
Collapse
|
12
|
Zanini SF, Bayer PE, Wells R, Snowdon RJ, Batley J, Varshney RK, Nguyen HT, Edwards D, Golicz AA. Pangenomics in crop improvement-from coding structural variations to finding regulatory variants with pangenome graphs. THE PLANT GENOME 2022; 15:e20177. [PMID: 34904403 DOI: 10.1002/tpg2.20177] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 10/07/2021] [Indexed: 05/15/2023]
Abstract
Since the first reported crop pangenome in 2014, advances in high-throughput and cost-effective DNA sequencing technologies facilitated multiple such studies including the pangenomes of oilseed rape (Brassica napus L.), soybean [Glycine max (L.) Merr.], rice (Oryza sativa L.), wheat (Triticum aestivum L.), and barley (Hordeum vulgare L.). Compared with single-reference genomes, pangenomes provide a more accurate representation of the genetic variation present in a species. By combining the genomic data of multiple accessions, pangenomes allow for the detection and annotation of complex DNA polymorphisms such as structural variations (SVs), one of the major determinants of genetic diversity within a species. In this review we summarize the current literature on crop pangenomics, focusing on their application to find candidate SVs involved in traits of agronomic interest. We then highlight the potential of pangenomes in the discovery and functional characterization of noncoding regulatory sequences and their variations. We conclude with a summary and outlook on innovative data structures representing the complete content of plant pangenomes including annotations of coding and noncoding elements and outcomes of transcriptomic and epigenomic experiments.
Collapse
Affiliation(s)
- Silvia F Zanini
- Dep. of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig Univ. Giessen, Giessen, 35392, Germany
| | - Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, Univ. of Western Australia, Perth, Western Australia, Australia
| | - Rachel Wells
- Dep. of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, NR47UH, UK
| | - Rod J Snowdon
- Dep. of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig Univ. Giessen, Giessen, 35392, Germany
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, Univ. of Western Australia, Perth, Western Australia, Australia
| | - Rajeev K Varshney
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
- State Agricultural Biotechnology Centre, Centre for Crop Food Innovation, Food Futures Institute, Murdoch Univ., Murdoch, WA, Australia
| | - Henry T Nguyen
- Division of Plant Sciences, Univ. of Missouri, Columbia, MO, USA
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, Univ. of Western Australia, Perth, Western Australia, Australia
| | - Agnieszka A Golicz
- Dep. of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig Univ. Giessen, Giessen, 35392, Germany
| |
Collapse
|
13
|
Sirén J, Monlong J, Chang X, Novak AM, Eizenga JM, Markello C, Sibbesen JA, Hickey G, Chang PC, Carroll A, Gupta N, Gabriel S, Blackwell TW, Ratan A, Taylor KD, Rich SS, Rotter JI, Haussler D, Garrison E, Paten B. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 2021; 374:abg8871. [PMID: 34914532 PMCID: PMC9365333 DOI: 10.1126/science.abg8871] [Citation(s) in RCA: 94] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
We introduce Giraffe, a pangenome short-read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe maps sequencing reads to thousands of human genomes at a speed comparable to that of standard methods mapping to a single reference genome. The increased mapping accuracy enables downstream improvements in genome-wide genotyping pipelines for both small variants and larger structural variants. We used Giraffe to genotype 167,000 structural variants, discovered in long-read studies, in 5202 diverse human genomes that were sequenced using short reads. We conclude that pangenomics facilitates a more comprehensive characterization of variation and, as a result, has the potential to improve many genomic analyses.
Collapse
Affiliation(s)
- Jouni Sirén
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Xian Chang
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Adam M. Novak
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | | | - Glenn Hickey
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Namrata Gupta
- Genomics Platform, Broad Institute, Cambridge, MA, USA
| | - Stacey Gabriel
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | | | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | |
Collapse
|
14
|
Durant É, Sabot F, Conte M, Rouard M. Panache: a Web Browser-Based Viewer for Linearized Pangenomes. Bioinformatics 2021; 37:4556-4558. [PMID: 34601567 PMCID: PMC8652104 DOI: 10.1093/bioinformatics/btab688] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 07/28/2021] [Accepted: 09/24/2021] [Indexed: 11/15/2022] Open
Abstract
Motivation Pangenomics evolved since its first applications on bacteria, extending from the study of genes for a given population to the study of all of its sequences available. While multiple methods are being developed to construct pangenomes in eukaryotic species there is still a gap for efficient and user-friendly visualization tools. Emerging graph representations come with their own challenges, and linearity remains a suitable option for user-friendliness. Results We introduce Panache, a tool for the visualization and exploration of linear representations of gene-based and sequence-based pangenomes. It uses a layout similar to genome browsers to display presence absence variations and additional tracks along a linear axis with a pangenomics perspective. Availability and implementation Panache is available at github.com/SouthGreenPlatform/panache under the MIT License.
Collapse
Affiliation(s)
- Éloi Durant
- DIADE, Univ Montpellier, CIRAD, IRD, Montpellier, 34830, France.,Syngenta Seeds SAS, Saint-Sauveur, 31790, France.,Bioversity International, Parc Scientifique Agropolis II, Montpellier, 34397, France.,French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, 34398, France
| | - François Sabot
- DIADE, Univ Montpellier, CIRAD, IRD, Montpellier, 34830, France.,French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, 34398, France
| | | | - Mathieu Rouard
- Bioversity International, Parc Scientifique Agropolis II, Montpellier, 34397, France
| |
Collapse
|
15
|
Lei L, Goltsman E, Goodstein D, Wu GA, Rokhsar DS, Vogel JP. Plant Pan-Genomics Comes of Age. ANNUAL REVIEW OF PLANT BIOLOGY 2021; 72:411-435. [PMID: 33848428 DOI: 10.1146/annurev-arplant-080720-105454] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
A pan-genome is the nonredundant collection of genes and/or DNA sequences in a species. Numerous studies have shown that plant pan-genomes are typically much larger than the genome of any individual and that a sizable fraction of the genes in any individual are present in only some genomes. The construction and interpretation of plant pan-genomes are challenging due to the large size and repetitive content of plant genomes. Most pan-genomes are largely focused on nontransposable element protein coding genes because they are more easily analyzed and defined than noncoding and repetitive sequences. Nevertheless, noncoding and repetitive DNA play important roles in determining the phenotype and genome evolution. Fortunately, it is now feasible to make multiple high-quality genomes that can be used to construct high-resolution pan-genomes that capture all the variation. However, assembling, displaying, and interacting with such high-resolution pan-genomes will require the development of new tools.
Collapse
Affiliation(s)
- Li Lei
- DOE Joint Genome Institute, Berkeley, California 94720, USA;
| | - Eugene Goltsman
- DOE Joint Genome Institute, Berkeley, California 94720, USA;
| | - David Goodstein
- DOE Joint Genome Institute, Berkeley, California 94720, USA;
| | | | - Daniel S Rokhsar
- DOE Joint Genome Institute, Berkeley, California 94720, USA;
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | - John P Vogel
- DOE Joint Genome Institute, Berkeley, California 94720, USA;
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
16
|
Martiniano R, Garrison E, Jones ER, Manica A, Durbin R. Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph. Genome Biol 2020; 21:250. [PMID: 32943086 PMCID: PMC7499850 DOI: 10.1186/s13059-020-02160-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 08/27/2020] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND During the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA molecules are short and frequently mutated by post-mortem chemical modifications. These features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Alternative approaches have been developed to replace the linear reference with a variation graph which includes known alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for aDNA and compare with existing methods. RESULTS We use vg to align simulated and real aDNA samples to a variation graph containing 1000 Genome Project variants and compare with the same data aligned with bwa to the human linear reference genome. Using vg leads to a balanced allelic representation at polymorphic sites, effectively removing reference bias, and more sensitive variant detection in comparison with bwa, especially for insertions and deletions (indels). Alternative approaches that use relaxed bwa parameter settings or filter bwa alignments can also reduce bias but can have lower sensitivity than vg, particularly for indels. CONCLUSIONS Our findings demonstrate that aligning aDNA sequences to variation graphs effectively mitigates the impact of reference bias when analyzing aDNA, while retaining mapping sensitivity and allowing detection of variation, in particular indel variation, that was previously missed.
Collapse
Affiliation(s)
- Rui Martiniano
- Department of Genetics, University of Cambridge, Cambridge, CB3 0DH UK
| | - Erik Garrison
- Wellcome Sanger Institute, Cambridge, CB10 1SA UK
- Genomics Institute, University of California, Santa Cruz, CA 95064 USA
| | - Eppie R. Jones
- Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ UK
| | - Andrea Manica
- Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ UK
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, CB3 0DH UK
- Wellcome Sanger Institute, Cambridge, CB10 1SA UK
| |
Collapse
|
17
|
Crysnanto D, Pausch H. Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery. Genome Biol 2020; 21:184. [PMID: 32718320 PMCID: PMC7385871 DOI: 10.1186/s13059-020-02105-0%0a%0a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/14/2020] [Indexed: 09/28/2023] Open
Abstract
BACKGROUND The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. RESULTS We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. CONCLUSIONS We develop the first variation-aware reference graph for an agricultural animal ( https://doi.org/10.5281/zenodo.3759712 ). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations.
Collapse
|
18
|
Crysnanto D, Pausch H. Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery. Genome Biol 2020; 21:184. [PMID: 32718320 PMCID: PMC7385871 DOI: 10.1186/s13059-020-02105-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/14/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. RESULTS We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. CONCLUSIONS We develop the first variation-aware reference graph for an agricultural animal ( https://doi.org/10.5281/zenodo.3759712 ). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations.
Collapse
|
19
|
Eizenga JM, Novak AM, Sibbesen JA, Heumos S, Ghaffaari A, Hickey G, Chang X, Seaman JD, Rounthwaite R, Ebler J, Rautiainen M, Garg S, Paten B, Marschall T, Sirén J, Garrison E. Pangenome Graphs. Annu Rev Genomics Hum Genet 2020; 21:139-162. [PMID: 32453966 DOI: 10.1146/annurev-genom-120219-080406] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely address the sequence and variation described in large collections of related genomes. These approaches often use graphical models of the pangenome to support algorithms for sequence alignment, visualization, functional genomics, and association studies. The additional information provided to these methods by the pangenome allows them to achieve superior performance on a variety of bioinformatic tasks, including read alignment, variant calling, and genotyping. Pangenome graphs stand to become a ubiquitous tool in genomics. Although it is unclear whether they will replace linearreference genomes, their ability to harmoniously relate multiple sequence and coordinate systems will make them useful irrespective of which pangenomic models become most common in the future.
Collapse
Affiliation(s)
- Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Jonas A Sibbesen
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Simon Heumos
- Quantitative Biology Center, University of Tübingen, 72076 Tübingen, Germany
| | - Ali Ghaffaari
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Xian Chang
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Josiah D Seaman
- Royal Botanic Gardens, Kew, Richmond TW9 3AB, United Kingdom.,School of Biological and Chemical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
| | - Robin Rounthwaite
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Jana Ebler
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
| | - Mikko Rautiainen
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
| | - Shilpa Garg
- Departments of Genetics and Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02215, USA.,Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Erik Garrison
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| |
Collapse
|
20
|
Moustafa AM, Lal A, Planet PJ. Comparative genomics in infectious disease. Curr Opin Microbiol 2020; 53:61-70. [PMID: 32248056 DOI: 10.1016/j.mib.2020.02.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 02/23/2020] [Accepted: 02/24/2020] [Indexed: 02/07/2023]
Abstract
With more than one million bacterial genome sequences uploaded to public databases in the last 25 years, genomics has become a powerful tool for studying bacterial biology. Here, we review recent approaches that leverage large numbers of whole genome sequences to decipher the spread and pathogenesis of bacterial infectious diseases.
Collapse
Affiliation(s)
- Ahmed M Moustafa
- Division of Pediatric Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Arnav Lal
- School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Paul J Planet
- Division of Pediatric Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, Perelman College of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA.
| |
Collapse
|
21
|
Yokoyama TT, Sakamoto Y, Seki M, Suzuki Y, Kasahara M. MoMI-G: modular multi-scale integrated genome graph browser. BMC Bioinformatics 2019; 20:548. [PMID: 31690272 PMCID: PMC6833150 DOI: 10.1186/s12859-019-3145-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 10/09/2019] [Indexed: 01/30/2023] Open
Abstract
Background Genome graph is an emerging approach for representing structural variants on genomes with branches. For example, representing structural variants of cancer genomes as a genome graph is more natural than representing such genomes as differences from the linear reference genome. While more and more structural variants are being identified by long-read sequencing, many of them are difficult to visualize using existing structural variants visualization tools. To this end, visualization method for large genome graphs such as human cancer genome graphs is demanded. Results We developed MOdular Multi-scale Integrated Genome graph browser, MoMI-G, a web-based genome graph browser that can visualize genome graphs with structural variants and supporting evidences such as read alignments, read depth, and annotations. This browser allows more intuitive recognition of large, nested, and potentially more complex structural variations. MoMI-G has view modules for different scales, which allow users to view the whole genome down to nucleotide-level alignments of long reads. Alignments spanning reference alleles and those spanning alternative alleles are shown in the same view. Users can customize the view, if they are not satisfied with the preset views. In addition, MoMI-G has Interval Card Deck, a feature for rapid manual inspection of hundreds of structural variants. Herein, we describe the utility of MoMI-G by using representative examples of large and nested structural variations found in two cell lines, LC-2/ad and CHM1. Conclusions Users can inspect complex and large structural variations found by long-read analysis in large genomes such as human genomes more smoothly and more intuitively. In addition, users can easily filter out false positives by manually inspecting hundreds of identified structural variants with supporting long-read alignments and annotations in a short time. Software availability MoMI-G is freely available at https://github.com/MoMI-G/MoMI-G under the MIT license.
Collapse
Affiliation(s)
- Toshiyuki T Yokoyama
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yoshitaka Sakamoto
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Masahiro Kasahara
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.
| |
Collapse
|