Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ghaffaari A, Marschall T. Fully-sensitive seed finding in sequence graphs using a hybrid index. Bioinformatics 2020;35:i81-i89. [PMID: 31510650 PMCID: PMC6612829 DOI: 10.1093/bioinformatics/btz341] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

For:	Ghaffaari A, Marschall T. Fully-sensitive seed finding in sequence graphs using a hybrid index. Bioinformatics 2020;35:i81-i89. [PMID: 31510650 PMCID: PMC6612829 DOI: 10.1093/bioinformatics/btz341] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

Number

Cited by Other Article(s)

Cui Y, Peng C, Xia Z, Yang C, Guo Y. A survey of sequence-to-graph mapping algorithms in the pangenome era. Genome Biol 2025;26:138. [PMID: 40405275 PMCID: PMC12096488 DOI: 10.1186/s13059-025-03606-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 05/06/2025] [Indexed: 05/24/2025] Open

Öztürk Ü, Mattavelli M, Ribeca P. GIN-TONIC: non-hierarchical full-text indexing for graph genomes. NAR Genom Bioinform 2024;6:lqae159. [PMID: 39664816 PMCID: PMC11632618 DOI: 10.1093/nargab/lqae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 10/08/2024] [Accepted: 11/01/2024] [Indexed: 12/13/2024] Open

Joudaki A, Meterez A, Mustafa H, Groot Koerkamp R, Kahles A, Rätsch G. Aligning distant sequences to graphs using long seed sketches. Genome Res 2023;33:1208-1217. [PMID: 37072187 PMCID: PMC10538362 DOI: 10.1101/gr.277659.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/16/2023] [Indexed: 04/20/2023]

Quan C, Lu H, Lu Y, Zhou G. Population-scale genotyping of structural variation in the era of long-read sequencing. Comput Struct Biotechnol J 2022;20:2639-2647. [PMID: 35685364 PMCID: PMC9163579 DOI: 10.1016/j.csbj.2022.05.047] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 05/24/2022] [Accepted: 05/24/2022] [Indexed: 11/29/2022] Open

Wang T, Antonacci-Fulton L, Howe K, Lawson HA, Lucas JK, Phillippy AM, Popejoy AB, Asri M, Carson C, Chaisson MJP, Chang X, Cook-Deegan R, Felsenfeld AL, Fulton RS, Garrison EP, Garrison NA, Graves-Lindsay TA, Ji H, Kenny EE, Koenig BA, Li D, Marschall T, McMichael JF, Novak AM, Purushotham D, Schneider VA, Schultz BI, Smith MW, Sofia HJ, Weissman T, Flicek P, Li H, Miga KH, Paten B, Jarvis ED, Hall IM, Eichler EE, Haussler D. The Human Pangenome Project: a global resource to map genomic diversity. Nature 2022;604:437-446. [PMID: 35444317 PMCID: PMC9402379 DOI: 10.1038/s41586-022-04601-8] [Citation(s) in RCA: 237] [Impact Index Per Article: 79.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 03/01/2022] [Indexed: 12/20/2022]

Affiliation(s)

Ting Wang Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA. Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA. McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
Lucinda Antonacci-Fulton McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Kerstin Howe Wellcome Sanger Institute, Cambridge, UK
Heather A Lawson Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
Julian K Lucas UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
Adam M Phillippy Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
Alice B Popejoy Epidemiology Division, Department of Public Health Sciences, University of California, Davis, CA, USA
Mobin Asri UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
Caryn Carson Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Mark J P Chaisson Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Xian Chang UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
Robert Cook-Deegan Arizona State University, Barrett & O'Connor Washington Center, Washington DC, USA
Adam L Felsenfeld National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
Robert S Fulton McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Erik P Garrison Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Nanibaa' A Garrison Institute for Society & Genetics, College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA Division of General Internal Medicine & Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Tina A Graves-Lindsay McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Hanlee Ji Department of Medicine, Stanford University, School of Medicine, Stanford, CA, USA
Eimear E Kenny Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Barbara A Koenig Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
Daofeng Li Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Tobias Marschall Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
Joshua F McMichael McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Adam M Novak UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
Deepak Purushotham Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Valerie A Schneider National Center for Biotechnology Information (NCBI), National Library of Medicine, Bethesda, MD, USA
Baergen I Schultz National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
Michael W Smith National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
Heidi J Sofia National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
Tsachy Weissman Department of Electrical Engineering, Stanford University, Stanford, CA, USA
Paul Flicek European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
Heng Li Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
Karen H Miga UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
Benedict Paten UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
Erich D Jarvis Vertebrate Genome Lab and and Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA. Howard Hughes Medical Institute, Chevy Chase, MD, USA.
Ira M Hall Yale School of Medicine, New Haven, CT, USA.
Evan E Eichler Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
David Haussler UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA. Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA.

Collapse

Sirén J, Monlong J, Chang X, Novak AM, Eizenga JM, Markello C, Sibbesen JA, Hickey G, Chang PC, Carroll A, Gupta N, Gabriel S, Blackwell TW, Ratan A, Taylor KD, Rich SS, Rotter JI, Haussler D, Garrison E, Paten B. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 2021;374:abg8871. [PMID: 34914532 PMCID: PMC9365333 DOI: 10.1126/science.abg8871] [Citation(s) in RCA: 167] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Affiliation(s)

Jouni Sirén UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Jean Monlong UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Xian Chang UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Adam M. Novak UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Jordan M. Eizenga UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Charles Markello UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Jonas A. Sibbesen UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Glenn Hickey UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Pi-Chuan Chang Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
Andrew Carroll Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
Namrata Gupta Genomics Platform, Broad Institute, Cambridge, MA, USA
Stacey Gabriel Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
Thomas W. Blackwell Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
Aakrosh Ratan Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
Kent D. Taylor The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
Stephen S. Rich Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
Jerome I. Rotter The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
David Haussler UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
Erik Garrison Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Benedict Paten UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA

Collapse

Jain C, Tavakoli N, Aluru S. A variant selection framework for genome graphs. Bioinformatics 2021;37:i460-i467. [PMID: 34252945 PMCID: PMC8336592 DOI: 10.1093/bioinformatics/btab302] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Richmond PA, Kaye AM, Kounkou GJ, Av-Shalom TV, Wasserman WW. Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper. PLoS Comput Biol 2021;17:e1008815. [PMID: 33750951 PMCID: PMC8016220 DOI: 10.1371/journal.pcbi.1008815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 04/01/2021] [Accepted: 02/17/2021] [Indexed: 11/26/2022] Open

Abstract

Across the life sciences, processing next generation sequencing data commonly relies upon a computationally expensive process where reads are mapped onto a reference sequence. Prior to such processing, however, there is a vast amount of information that can be ascertained from the reads, potentially obviating the need for processing, or allowing optimized mapping approaches to be deployed. Here, we present a method termed FlexTyper which facilitates a “reverse mapping” approach in which high throughput sequence queries, in the form of k-mer searches, are run against indexed short-read datasets in order to extract useful information. This reverse mapping approach enables the rapid counting of target sequences of interest. We demonstrate FlexTyper’s utility for recovering depth of coverage, and accurate genotyping of SNP sites across the human genome. We show that genotyping unmapped reads can correctly inform a sample’s population, sex, and relatedness in a family setting. Detection of pathogen sequences within RNA-seq data was sensitive and accurate, performing comparably to existing methods, but with increased flexibility. We present two examples of ways in which this flexibility allows the analysis of genome features not well-represented in a linear reference. First, we analyze contigs from African genome sequencing studies, showing how they distribute across families from three distinct populations. Second, we show how gene-marking k-mers for the killer immune receptor locus allow allele detection in a region that is challenging for standard read mapping pipelines. The future adoption of the reverse mapping approach represented by FlexTyper will be enabled by more efficient methods for FM-index generation and biology-informed collections of reference queries. In the long-term, selection of population-specific references or weighting of edges in pan-population reference genome graphs will be possible using the FlexTyper approach. FlexTyper is available at https://github.com/wassermanlab/OpenFlexTyper.

In the past 15 years, next generation sequencing technology has revolutionized our capacity to process and analyze DNA sequencing data. From agriculture to medicine, this technology is enabling a deeper understanding of the blueprint of life. Next generation sequencing data is composed of short sequences of DNA, referred to as “reads”, which are often shorter than 200 base pairs making them many orders of magnitude smaller than the entirety of a human genome. Gaining insights from this data has typically leveraged a reference-guided mapping approach, where the reads are aligned to a reference genome and then post-processed to gain actionable information such as presence or absence of genomic sequence, or variation between the reference genome and the sequenced sample. Many experts in the field of genomics have concluded that selecting a single, linear reference genome for mapping reads against is limiting, and several current research endeavors are focused on exploring options for improved analysis methods to unlock the full utility of sequencing data. Among these improvements are the usage of sex-matched genomes, population-specific reference genomes, and emergent graph-based reference pan-genomes. However, advanced methods that use raw DNA sequencing data to inform the choice of reference genome and guide the alignment of reads to enriched reference genomes are needed. Here we develop a method termed FlexTyper, which creates a searchable index of the short read data and enables flexible, user-guided queries to provide valuable insights without the need for reference-guided mapping. We demonstrate the utility of our method by identifying sample ancestry and sex in human whole genome sequencing data, detecting viral pathogen reads in RNA-seq data, African-enriched genome regions absent from the global reference, and killer-cell immune receptor alleles that are complex to discern using standard read mapping. We anticipate early adoption of FlexTyper within analysis pipelines as a pre-mapping component, and further envision the bioinformatics and genomics community will leverage the tool for creative uses of sequence queries from unmapped data.

Collapse

Rautiainen M, Marschall T. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253. [PMID: 32972461 PMCID: PMC7513500 DOI: 10.1186/s13059-020-02157-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 08/26/2020] [Indexed: 02/07/2023] Open

Eizenga JM, Novak AM, Sibbesen JA, Heumos S, Ghaffaari A, Hickey G, Chang X, Seaman JD, Rounthwaite R, Ebler J, Rautiainen M, Garg S, Paten B, Marschall T, Sirén J, Garrison E. Pangenome Graphs. Annu Rev Genomics Hum Genet 2020;21:139-162. [PMID: 32453966 DOI: 10.1146/annurev-genom-120219-080406] [Citation(s) in RCA: 136] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Affiliation(s)

Jordan M Eizenga Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Adam M Novak Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Jonas A Sibbesen Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Simon Heumos Quantitative Biology Center, University of Tübingen, 72076 Tübingen, Germany
Ali Ghaffaari Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
Glenn Hickey Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Xian Chang Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Josiah D Seaman Royal Botanic Gardens, Kew, Richmond TW9 3AB, United Kingdom.,School of Biological and Chemical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
Robin Rounthwaite Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Jana Ebler Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
Mikko Rautiainen Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
Shilpa Garg Departments of Genetics and Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02215, USA.,Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
Benedict Paten Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Tobias Marschall Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
Jouni Sirén Genomics Institute, University of California, Santa Cruz, California 95064, USA;
Erik Garrison Genomics Institute, University of California, Santa Cruz, California 95064, USA;

Collapse

Mokveld T, Linthorst J, Al-Ars Z, Holstege H, Reinders M. CHOP: haplotype-aware path indexing in population graphs. Genome Biol 2020;21:65. [PMID: 32160922 PMCID: PMC7066762 DOI: 10.1186/s13059-020-01963-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 02/18/2020] [Indexed: 12/20/2022] Open