Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

40
(from Reference Citation Analysis)

Article PDFs (19)

Cited by > 0 (34)

Searched Name

Aleksey V Zimin

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Neale DB, Zimin AV, Meltzer A, Bhattarai A, Amee M, Figueroa Corona L, Allen BJ, Puiu D, Wright J, De La Torre AR, McGuire PE, Timp W, Salzberg SL, Wegrzyn JL. A genome sequence for the threatened whitebark pine. G3 (Bethesda) 2024;14:jkae061. [PMID: 38526344 DOI: 10.1093/g3journal/jkae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 02/29/2024] [Accepted: 03/12/2024] [Indexed: 03/26/2024]

Affiliation(s)

David B Neale Department of Plant Sciences, University of California, Davis, CA 95616, USA Whitebark Pine Ecosystem Foundation, Missoula, MT 59808, USA
Aleksey V Zimin Department of Biomedical Engineering and Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
Amy Meltzer Department of Biomedical Engineering and Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
Akriti Bhattarai Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
Maurice Amee Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
Laura Figueroa Corona School of Forestry, Northern Arizona University, Flagstaff, AZ 86011, USA
Brian J Allen Department of Plant Sciences, University of California, Davis, CA 95616, USA University of California Cooperative Extension, Central Sierra, Jackson, CA 95642, USA
Daniela Puiu Department of Biomedical Engineering and Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
Jessica Wright USDA Forest Service, Pacific Southwest Research Station, Davis, CA 95618, USA
Amanda R De La Torre School of Forestry, Northern Arizona University, Flagstaff, AZ 86011, USA
Patrick E McGuire Department of Plant Sciences, University of California, Davis, CA 95616, USA
Winston Timp Department of Biomedical Engineering and Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
Steven L Salzberg Department of Biomedical Engineering and Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA Departments of Computer Science and Biostatistics, Johns Hopkins University, Baltimore, MD 21218, USA
Jill L Wegrzyn Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA

Collapse

Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Marschall T, Li H, Paten B, Abel HJ, Antonacci-Fulton LL, Asri M, Baid G, Baker CA, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson MJP, Chang PC, Chang XH, Cheng H, Chu J, Cody S, Colonna V, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler EE, Eizenga JM, Fairley S, Fedrigo O, Felsenfeld AL, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton RS, Gao Y, Garg S, Garrison E, Garrison NA, Giron CG, Green RE, Groza C, Guarracino A, Haggerty L, Hall IM, Harvey WT, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis ED, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Li H, Liao WW, Lu S, Lu TY, Lucas JK, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin FJ, McCartney A, McDaniel J, Miga KH, Mitchell MW, Monlong J, Mountcastle J, Munson KM, Mwaniki MN, Nattestad M, Novak AM, Nurk S, Olsen HE, Olson ND, Paten B, Pesout T, Phillippy AM, Popejoy AB, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Sibbesen JA, Sirén J, Smith MW, Sofia HJ, Tayoun ANA, Thibaud-Nissen F, Tomlinson C, Tricomi FF, Villani F, Vollger MR, Wagner J, Walenz B, Wang T, Wood JMD, Zimin AV, Zook JM. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol 2024;42:663-673. [PMID: 37165083 PMCID: PMC10638906 DOI: 10.1038/s41587-023-01793-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 04/18/2023] [Indexed: 05/12/2023]

Affiliation(s)

Glenn Hickey UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA These authors contributed equally: Glenn Hickey, Jean Monlong
Jean Monlong UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA These authors contributed equally: Glenn Hickey, Jean Monlong
Jana Ebler Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Adam M. Novak UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Jordan M. Eizenga UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Yan Gao Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
Human Pangenome Reference Consortium
Tobias Marschall Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Heng Li Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Benedict Paten UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Human Pangenome Reference Consortium
Haley J. Abel Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
Lucinda L. Antonacci-Fulton McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Mobin Asri UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Gunjan Baid Google LLC, Mountain View, CA, USA
Carl A. Baker Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Anastasiya Belyaeva Google LLC, Mountain View, CA, USA
Konstantinos Billis European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Guillaume Bourque Department of Human Genetics, McGill University, Montreal, QC, Canada Canadian Center for Computational Genomics, McGill University, Montreal, QC, Canada Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
Silvia Buonaiuto Institute of Genetics and Biophysics, National Research Council, Naples, Italy
Andrew Carroll Google LLC, Mountain View, CA, USA
Mark J. P. Chaisson Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Pi-Chuan Chang Google LLC, Mountain View, CA, USA
Xian H. Chang UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Haoyu Cheng Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Justin Chu Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
Sarah Cody McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Vincenza Colonna Institute of Genetics and Biophysics, National Research Council, Naples, Italy Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Daniel E. Cook Google LLC, Mountain View, CA, USA
Robert M. Cook-Deegan Arizona State University, Barrett and O’Connor Washington Center, Washington, DC, USA
Omar E. Cornejo Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
Mark Diekhans UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Daniel Doerr Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Peter Ebert Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Jana Ebler Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Evan E. Eichler Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA Howard Hughes Medical Institute, Chevy Chase, MD, USA
Jordan M. Eizenga UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Susan Fairley European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Olivier Fedrigo Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
Adam L. Felsenfeld National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
Xiaowen Feng Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Christian Fischer Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Paul Flicek European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Giulio Formenti Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
Adam Frankish European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Robert S. Fulton McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
Yan Gao Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
Shilpa Garg Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
Erik Garrison Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Nanibaa’ A. Garrison Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Carlos Garcia Giron Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Richard E. Green Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA Dovetail Genomics, Scotts Valley, CA, USA
Cristian Groza Quantitative Life Sciences, McGill University, Montreal, QC, Canada
Andrea Guarracino Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA Genomics Research Centre, Human Technopole, Milan, Italy
Leanne Haggerty European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Ira M. Hall Department of Genetics, Yale University School of Medicine, New Haven, CT, USA Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
William T. Harvey Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Marina Haukness UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
David Haussler UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA Howard Hughes Medical Institute, Chevy Chase, MD, USA
Simon Heumos Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
Glenn Hickey UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA These authors contributed equally: Glenn Hickey, Jean Monlong
Kendra Hoekzema Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Thibaut Hourlier European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Kerstin Howe Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
Miten Jain Northeastern University, Boston, MA, USA
Erich D. Jarvis Howard Hughes Medical Institute, Chevy Chase, MD, USA Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
Hanlee P. Ji Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
Eimear E. Kenny Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Barbara A. Koenig Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
Alexey Kolesnikov Google LLC, Mountain View, CA, USA
Jan O. Korbel European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
Jennifer Kordosky Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Sergey Koren Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
HoJoon Lee Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
Alexandra P. Lewis Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Heng Li Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Wen-Wei Liao Department of Genetics, Yale University School of Medicine, New Haven, CT, USA Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
Shuangjia Lu Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
Tsung-Yu Lu Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Julian K. Lucas UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Hugo Magalhães Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Santiago Marco-Sola Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
Pierre Marijon Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Charles Markello UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Tobias Marschall Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Fergal J. Martin European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Ann McCartney Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Jennifer McDaniel Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
Karen H. Miga UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Matthew W. Mitchell Coriell Institute for Medical Research, Camden, NJ, USA
Jean Monlong UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA These authors contributed equally: Glenn Hickey, Jean Monlong
Jacquelyn Mountcastle Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
Katherine M. Munson Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Moses Njagi Mwaniki Department of Computer Science, University of Pisa, Pisa, Italy
Maria Nattestad Google LLC, Mountain View, CA, USA
Adam M. Novak UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Sergey Nurk Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Hugh E. Olsen UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Nathan D. Olson Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
Benedict Paten UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Trevor Pesout UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Adam M. Phillippy Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Alice B. Popejoy Department of Public Health Sciences, University of California, Davis, Davis, CA, USA
David Porubsky Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Pjotr Prins Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Daniela Puiu Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Mikko Rautiainen Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Allison A. Regier McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Arang Rhie Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Samuel Sacco Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
Ashley D. Sanders Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
Valerie A. Schneider National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Baergen I. Schultz National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
Kishwar Shafin Google LLC, Mountain View, CA, USA
Jonas A. Sibbesen Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
Jouni Sirén UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Michael W. Smith National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
Heidi J. Sofia National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
Ahmad N. Abou Tayoun Al Jalila Genomics Center of Excellence, Al Jalila Children’s Specialty Hospital, Dubai, UAE Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
Françoise Thibaud-Nissen National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Chad Tomlinson McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Francesca Floriana Tricomi European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Flavia Villani Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Mitchell R. Vollger Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
Justin Wagner Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
Brian Walenz Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Ting Wang McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
Jonathan M. D. Wood Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
Aleksey V. Zimin Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
Justin M. Zook Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA

Collapse

Neale DB, Zimin AV, Meltzer A, Bhattarai A, Amee M, Corona LF, Allen BJ, Puiu D, Wright J, Torre ARDL, McGuire PE, Timp W, Salzberg SL, Wegrzyn JL. A Genome Sequence for the Threatened Whitebark Pine. bioRxiv 2023:2023.11.16.567420. [PMID: 38014212 PMCID: PMC10680812 DOI: 10.1101/2023.11.16.567420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]

Reinhardt JA, Baker RH, Zimin AV, Ladias C, Paczolt KA, Werren JH, Hayashi CY, Wilkinson GS. Impacts of Sex Ratio Meiotic Drive on Genome Structure and Function in a Stalk-Eyed Fly. Genome Biol Evol 2023;15:evad118. [PMID: 37364298 PMCID: PMC10319772 DOI: 10.1093/gbe/evad118] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 06/02/2023] [Accepted: 06/15/2023] [Indexed: 06/28/2023] Open

Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, Paten B. A draft human pangenome reference. Nature 2023;617:312-324. [PMID: 37165242 PMCID: PMC10172123 DOI: 10.1038/s41586-023-05896-x] [Citation(s) in RCA: 170] [Impact Index Per Article: 170.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]

Affiliation(s)

Wen-Wei Liao Department of Genetics, Yale University School of Medicine, New Haven, CT, USA Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
Mobin Asri Genomics Institute, University of California, Santa Cruz, CA, USA
Jana Ebler Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
Daniel Doerr Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
Marina Haukness Genomics Institute, University of California, Santa Cruz, CA, USA
Glenn Hickey Genomics Institute, University of California, Santa Cruz, CA, USA
Shuangjia Lu Department of Genetics, Yale University School of Medicine, New Haven, CT, USA Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
Julian K Lucas Genomics Institute, University of California, Santa Cruz, CA, USA
Jean Monlong Genomics Institute, University of California, Santa Cruz, CA, USA
Haley J Abel Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
Silvia Buonaiuto Institute of Genetics and Biophysics, National Research Council, Naples, Italy
Xian H Chang Genomics Institute, University of California, Santa Cruz, CA, USA
Haoyu Cheng Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Justin Chu Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
Vincenza Colonna Institute of Genetics and Biophysics, National Research Council, Naples, Italy Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Jordan M Eizenga Genomics Institute, University of California, Santa Cruz, CA, USA
Xiaowen Feng Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Christian Fischer Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Robert S Fulton McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
Shilpa Garg Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
Cristian Groza Quantitative Life Sciences, McGill University, Montréal, Québec, Canada
Andrea Guarracino Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA Genomics Research Centre, Human Technopole, Milan, Italy
William T Harvey Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Simon Heumos Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
Kerstin Howe Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
Miten Jain Northeastern University, Boston, MA, USA
Tsung-Yu Lu Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Charles Markello Genomics Institute, University of California, Santa Cruz, CA, USA
Fergal J Martin European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Matthew W Mitchell Coriell Institute for Medical Research, Camden, NJ, USA
Katherine M Munson Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Moses Njagi Mwaniki Department of Computer Science, University of Pisa, Pisa, Italy
Adam M Novak Genomics Institute, University of California, Santa Cruz, CA, USA
Hugh E Olsen Genomics Institute, University of California, Santa Cruz, CA, USA
Trevor Pesout Genomics Institute, University of California, Santa Cruz, CA, USA
David Porubsky Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Pjotr Prins Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Jonas A Sibbesen Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
Jouni Sirén Genomics Institute, University of California, Santa Cruz, CA, USA
Chad Tomlinson McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Flavia Villani Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Mitchell R Vollger Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
Lucinda L Antonacci-Fulton McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Gunjan Baid Google, Mountain View, CA, USA
Carl A Baker Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Anastasiya Belyaeva Google, Mountain View, CA, USA
Konstantinos Billis European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Andrew Carroll Google, Mountain View, CA, USA
Pi-Chuan Chang Google, Mountain View, CA, USA
Sarah Cody McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Daniel E Cook Google, Mountain View, CA, USA
Robert M Cook-Deegan Barrett and O'Connor Washington Center, Arizona State University, Washington, DC, USA
Omar E Cornejo Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
Mark Diekhans Genomics Institute, University of California, Santa Cruz, CA, USA
Peter Ebert Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
Susan Fairley European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Olivier Fedrigo Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
Adam L Felsenfeld National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
Giulio Formenti Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
Adam Frankish European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Yan Gao Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
Nanibaa' A Garrison Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, CA, USA Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, USA Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
Carlos Garcia Giron European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Richard E Green Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA Dovetail Genomics, Scotts Valley, CA, USA
Leanne Haggerty European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Kendra Hoekzema Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Thibaut Hourlier European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Hanlee P Ji Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
Eimear E Kenny Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Barbara A Koenig Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, CA, USA
Alexey Kolesnikov Google, Mountain View, CA, USA
Jan O Korbel European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
Jennifer Kordosky Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Sergey Koren Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
HoJoon Lee Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
Alexandra P Lewis Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Hugo Magalhães Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
Santiago Marco-Sola Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
Pierre Marijon Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
Ann McCartney Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Jennifer McDaniel Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
Jacquelyn Mountcastle Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
Maria Nattestad Google, Mountain View, CA, USA
Sergey Nurk Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Nathan D Olson Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
Alice B Popejoy Department of Public Health Sciences, University of California, Davis, CA, USA
Daniela Puiu Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Mikko Rautiainen Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Allison A Regier McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
Arang Rhie Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Samuel Sacco Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
Ashley D Sanders Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
Valerie A Schneider National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Baergen I Schultz National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
Kishwar Shafin Google, Mountain View, CA, USA
Michael W Smith National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
Heidi J Sofia National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
Ahmad N Abou Tayoun Al Jalila Genomics Center of Excellence, Al Jalila Children's Specialty Hospital, Dubai, UAE Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
Françoise Thibaud-Nissen National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Francesca Floriana Tricomi European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Justin Wagner Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
Brian Walenz Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Jonathan M D Wood Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
Aleksey V Zimin Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
Guillaume Bourque Department of Human Genetics, McGill University, Montréal, Québec, Canada Canadian Center for Computational Genomics, McGill University, Montréal, Québec, Canada Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
Mark J P Chaisson Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Paul Flicek European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Adam M Phillippy Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Justin M Zook Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
Evan E Eichler Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA Howard Hughes Medical Institute, Chevy Chase, MD, USA
David Haussler Genomics Institute, University of California, Santa Cruz, CA, USA Howard Hughes Medical Institute, Chevy Chase, MD, USA
Ting Wang McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
Erich D Jarvis Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA Howard Hughes Medical Institute, Chevy Chase, MD, USA Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
Karen H Miga Genomics Institute, University of California, Santa Cruz, CA, USA
Erik Garrison Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
Tobias Marschall Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany. Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany.
Ira M Hall Department of Genetics, Yale University School of Medicine, New Haven, CT, USA. Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.
Heng Li Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA. Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Benedict Paten Genomics Institute, University of California, Santa Cruz, CA, USA.

Collapse

Chao KH, Zimin AV, Pertea M, Salzberg SL. The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual. G3 (Bethesda) 2023;13:jkac321. [PMID: 36630290 PMCID: PMC9997556 DOI: 10.1093/g3journal/jkac321] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/27/2022] [Accepted: 11/03/2022] [Indexed: 01/12/2023]

Guo A, Salzberg SL, Zimin AV. JASPER: A fast genome polishing tool that improves accuracy of genome assemblies. PLoS Comput Biol 2023;19:e1011032. [PMID: 37000853 PMCID: PMC10096238 DOI: 10.1371/journal.pcbi.1011032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 04/12/2023] [Accepted: 03/16/2023] [Indexed: 04/03/2023] Open

Miller J, Zimin AV, Gordus A. Chromosome-level genome and the identification of sex chromosomes in Uloborus diversus. Gigascience 2022;12:giad002. [PMID: 36762707 PMCID: PMC9912274 DOI: 10.1093/gigascience/giad002] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 11/18/2022] [Accepted: 01/03/2023] [Indexed: 02/11/2023] Open

Pockrandt C, Zimin AV, Salzberg SL. Metagenomic classification with KrakenUniq on low-memory computers. J Open Source Softw 2022;7:4908. [PMID: 37602140 PMCID: PMC10438097 DOI: 10.21105/joss.04908] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/22/2023]

Abstract

Kraken and KrakenUniq are widely-used tools for classifying metagenomics sequences. A key requirement for these systems is a database containing all k-mers from all genomes that the users want to be able to detect, where k = 31 by default. This database can be very large, easily exceeding 100 gigabytes (GB) and sometimes 400 GB. Previously, Kraken and KrakenUniq required loading the entire database into main memory (RAM), and if RAM was insufficient, they used memory mapping, which significantly increased the running time for large datasets. We have implemented a new algorithm in KrakenUniq that allows it to load and process the database in chunks, with only a modest increase in running time. This enhancement now makes it feasible to run KrakenUniq on very large datasets and huge databases on virtually any computer, even a laptop, while providing the same very high classification accuracy as the previous system.

Statement of need

The KrakenUniq software classifies reads from metagenomic samples to establish which organisms are present in the samples and estimate their abundance. The software is widely used used by researchers and clinicians in medical diagnostics, microbiome and environmental studies.Typical databases used by KrakenUniq are tens to hundreds of gigabytes in size. The original KrakenUniq code required loading the entire database in RAM, which demanded expensive high-memory servers to run it efficiently. If a user did not have enough physical RAM to load the entire database, KrakenUniq resorted to memory-mapping the database, which significantly increased run times, frequently by a factor of more than 100. The new functionality described in this paper enables users who do not have access to high-memory servers to run KrakenUniq efficiently, with a CPU time performance increase of 3 to 4-fold, down from 100+.

Collapse

Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger MR, Porubsky D, Cheng H, Asri M, Logsdon GA, Carnevali P, Chaisson MJP, Chin CS, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton RS, Fulton LL, Garg S, Gerton JL, Ghurye J, Granat A, Green RE, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger EB, Jain M, Kirsche M, Kolmogorov M, Korbel JO, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell MW, McDaniel J, Nie F, Olsen HE, Olson ND, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg SL, Sanders AD, Schatz MC, Schmitt A, Schneider VA, Selvaraj S, Shafin K, Shumate A, Stitziel NO, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin AV, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook JM, Eichler EE, Phillippy AM, Paten B, Howe K, Miga KH. Semi-automated assembly of high-quality diploid human reference genomes. Nature 2022;611:519-531. [PMID: 36261518 PMCID: PMC9668749 DOI: 10.1038/s41586-022-05325-5] [Citation(s) in RCA: 66] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 09/06/2022] [Indexed: 01/01/2023]

Affiliation(s)

Erich D. Jarvis grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA ,2grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA
Giulio Formenti grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA
Arang Rhie grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
Andrea Guarracino grid.510779.d0000 0004 9414 6915Genomics Research Centre, Human Technopole, Viale Rita Levi-Montalcini, Milan, Italy
Chentao Yang grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China
Jonathan Wood grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Alan Tracey grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Francoise Thibaud-Nissen grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD USA
Mitchell R. Vollger grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
David Porubsky grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
Haoyu Cheng grid.65499.370000 0001 2106 9910Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA ,10grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
Mobin Asri grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
Glennis A. Logsdon grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
Paolo Carnevali grid.507326.50000 0004 6090 4941Chan Zuckerberg Initiative, Redwood City, CA USA
Mark J. P. Chaisson grid.42505.360000 0001 2156 6853Quantitative and Computational Biology, University of Southern California, Los Angeles, CA USA
Chen-Shan Chin Foundation for Biological Data Science, Belmont, CA USA
Sarah Cody grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
Joanna Collins grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Peter Ebert grid.411327.20000 0001 2176 9917Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
Merly Escalona grid.205975.c0000 0001 0740 6917Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA USA
Olivier Fedrigo grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA
Robert S. Fulton grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
Lucinda L. Fulton grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
Shilpa Garg grid.5254.60000 0001 0674 042XDepartment of Biology, University of Copenhagen, Copenhagen, Denmark
Jennifer L. Gerton grid.250820.d0000 0000 9420 1591Stowers Institute for Medical Research, Kansas City, MO USA
Jay Ghurye grid.504403.6Dovetail Genomics, Scotts Valley, CA USA
Anastasiya Granat grid.185669.50000 0004 0507 3954Illumina, Inc., San Diego, CA USA
Richard E. Green grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
William Harvey grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
Patrick Hasenfeld grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
Alex Hastie grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA USA
Marina Haukness grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
Erich B. Jaeger grid.185669.50000 0004 0507 3954Illumina, Inc., San Diego, CA USA
Miten Jain grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
Melanie Kirsche grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
Mikhail Kolmogorov grid.266100.30000 0001 2107 4242Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA USA
Jan O. Korbel grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
Sergey Koren grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
Jonas Korlach grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
Joyce Lee grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA USA
Daofeng Li grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,28grid.4367.60000 0001 2355 7002The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO USA
Tina Lindsay grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
Julian Lucas grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
Feng Luo grid.26090.3d0000 0001 0665 0280School of Computing, Clemson University, Clemson, SC USA
Tobias Marschall grid.411327.20000 0001 2176 9917Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
Matthew W. Mitchell grid.282012.b0000 0004 0627 5048Coriell Institute for Medical Research, Camden, NJ USA
Jennifer McDaniel grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
Fan Nie grid.216417.70000 0001 0379 7164Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
Hugh E. Olsen grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
Nathan D. Olson grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
Trevor Pesout grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
Tamara Potapova grid.250820.d0000 0000 9420 1591Stowers Institute for Medical Research, Kansas City, MO USA
Daniela Puiu grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
Allison Regier grid.511991.40000 0004 4910 5831DNAnexus, Mountain View, CA USA
Jue Ruan grid.410727.70000 0001 0526 1937Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China
Steven L. Salzberg grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
Ashley D. Sanders grid.419491.00000 0001 1014 0849Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
Michael C. Schatz grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
Anthony Schmitt grid.504177.0Arima Genomics, San Diego, CA USA
Valerie A. Schneider grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD USA
Siddarth Selvaraj grid.504177.0Arima Genomics, San Diego, CA USA
Kishwar Shafin grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
Alaina Shumate grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
Nathan O. Stitziel grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA ,27grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,38grid.4367.60000 0001 2355 7002Cardiovascular Division, John T. Milliken Department of Internal Medicine, Washington University School of Medicine, St. Louis, USA
Catherine Stober grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
James Torrance grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Justin Wagner grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
Jianxin Wang grid.216417.70000 0001 0379 7164Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
Aaron Wenger grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
Chuanle Xiao grid.12981.330000 0001 2360 039XState Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
Aleksey V. Zimin grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
Guojie Zhang grid.13402.340000 0004 1759 700XCenter for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
Ting Wang grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA ,27grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,28grid.4367.60000 0001 2355 7002The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO USA
Heng Li grid.65499.370000 0001 2106 9910Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA
Erik Garrison grid.267301.10000 0004 0386 9246Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN USA
David Haussler grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA ,42grid.205975.c0000 0001 0740 6917Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA USA
Ira Hall grid.47100.320000000419368710Yale School of Medicine, New Haven, CT USA
Justin M. Zook grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
Evan E. Eichler grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA ,8grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
Adam M. Phillippy grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
Benedict Paten grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
Kerstin Howe grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
Karen H. Miga grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
Human Pangenome Reference Consortium

Collapse

Sork VL, Cokus SJ, Fitz-Gibbon ST, Zimin AV, Puiu D, Garcia JA, Gugger PF, Henriquez CL, Zhen Y, Lohmueller KE, Pellegrini M, Salzberg SL. High-quality genome and methylomes illustrate features underlying evolutionary success of oaks. Nat Commun 2022;13:2047. [PMID: 35440538 PMCID: PMC9018854 DOI: 10.1038/s41467-022-29584-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 03/11/2022] [Indexed: 02/01/2023] Open

Affiliation(s)

Victoria L Sork Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095-1438, USA. Institute of the Environment and Sustainability, University of California, Los Angeles, CA, 90095, USA.
Shawn J Cokus Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095-7239, USA
Sorel T Fitz-Gibbon Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095-1438, USA
Aleksey V Zimin Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
Daniela Puiu Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
Jesse A Garcia Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095-1438, USA
Paul F Gugger Appalachian Laboratory, University of Maryland Center for Environmental Science, Frostburg, MD, 21532, USA
Claudia L Henriquez Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095-1438, USA
Ying Zhen Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095-1438, USA
Kirk E Lohmueller Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095-1438, USA Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
Matteo Pellegrini Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, CA, 90095-7239, USA
Steven L Salzberg Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, 21218, USA

Collapse

Zimin AV, Shumate A, Shinder I, Heinz J, Puiu D, Pertea M, Salzberg SL. A reference-quality, fully annotated genome from a Puerto Rican individual. Genetics 2022;220:iyab227. [PMID: 34897437 PMCID: PMC9097244 DOI: 10.1093/genetics/iyab227] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 11/05/2021] [Indexed: 11/12/2022] Open

Zimin AV, Salzberg SL. The SAMBA tool uses long reads to improve the contiguity of genome assemblies. PLoS Comput Biol 2022;18:e1009860. [PMID: 35120119 PMCID: PMC8849508 DOI: 10.1371/journal.pcbi.1009860] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 02/16/2022] [Accepted: 01/24/2022] [Indexed: 01/03/2023] Open

Abstract

Third-generation sequencing technologies can generate very long reads with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using shorter reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. One strategy to upgrade existing assemblies is to generate additional coverage using long-read data, and add that to the previously assembled contigs. SAMBA is a tool that is designed to scaffold and gap-fill existing genome assemblies with additional long-read data, resulting in substantially greater contiguity. SAMBA is the only tool of its kind that also computes and fills in the sequence for all spanned gaps in the scaffolds, yielding much longer contigs. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at https://github.com/alekseyzimin/masurca.

The DNA molecule that is in almost every cell in a living organism can be represented as sequence of four different nucleotides, or bases denoted by letters A,C,G, and T. The current sequencing technologies require breaking the DNA molecule into short fragments, sequencing them to find the corresponding sequence of letters, producing “reads”, and assembly, which recovered the DNA sequence from the reads. Repeats in the genome sequences typically prevented one from recovering full contiguous genome sequence because any repeat that is longer than the size of the read cannot be reliably resolved. Third-generation sequencing technologies can generate very long reads albeit with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using previous-generation reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. Here we introduce a tool called SAMBA that is designed to upgrade existing assemblies using additional coverage with long-read data, resulting in substantially greater contiguity. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at https://github.com/alekseyzimin/masurca.

Collapse

Neale DB, Zimin AV, Zaman S, Scott AD, Shrestha B, Workman RE, Puiu D, Allen BJ, Moore ZJ, Sekhwal MK, De La Torre AR, McGuire PE, Burns E, Timp W, Wegrzyn JL, Salzberg SL. Assembled and annotated 26.5 Gbp coast redwood genome: a resource for estimating evolutionary adaptive potential and investigating hexaploid origin. G3 (Bethesda) 2022;12:6460957. [PMID: 35100403 PMCID: PMC8728005 DOI: 10.1093/g3journal/jkab380] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 10/25/2021] [Indexed: 12/15/2022]

Affiliation(s)

David B Neale Department of Plant Sciences, University of California, Davis, Davis, CA 95616, USA
Aleksey V Zimin Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
Sumaira Zaman Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA.,Department of Computer Science & Engineering, University of Connecticut, Storrs, CT 06269, USA
Alison D Scott Department of Plant Sciences, University of California, Davis, Davis, CA 95616, USA
Bikash Shrestha Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
Rachael E Workman Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, USA
Daniela Puiu Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
Brian J Allen Department of Plant Sciences, University of California, Davis, Davis, CA 95616, USA
Zane J Moore Department of Plant Sciences, University of California, Davis, Davis, CA 95616, USA
Manoj K Sekhwal School of Forestry, Northern Arizona University, Flagstaff, AZ 86011, USA
Amanda R De La Torre School of Forestry, Northern Arizona University, Flagstaff, AZ 86011, USA
Patrick E McGuire Department of Plant Sciences, University of California, Davis, Davis, CA 95616, USA
Emily Burns Save the Redwoods League, San Francisco, CA 94104, USA
Winston Timp Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA.,Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, USA
Jill L Wegrzyn Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA.,Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
Steven L Salzberg Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.,Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA

Collapse

Polinski JM, Zimin AV, Clark KF, Kohn AB, Sadowski N, Timp W, Ptitsyn A, Khanna P, Romanova DY, Williams P, Greenwood SJ, Moroz LL, Walt DR, Bodnar AG. The American lobster genome reveals insights on longevity, neural, and immune adaptations. Sci Adv 2021;7:7/26/eabe8290. [PMID: 34162536 PMCID: PMC8221624 DOI: 10.1126/sciadv.abe8290] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 05/07/2021] [Indexed: 05/30/2023]

Affiliation(s)

Jennifer M Polinski Gloucester Marine Genomics Institute, Gloucester, MA 01930, USA
Aleksey V Zimin Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA
K Fraser Clark Department of Animal Science and Aquaculture, Dalhousie University, Truro, Nova Scotia B2N 5E3, Canada
Andrea B Kohn The Whitney Laboratory for Marine Bioscience and Department of Neuroscience, University of Florida, Gainesville and St. Augustine, FL 32080-8623, USA
Norah Sadowski Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
Winston Timp Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
Andrey Ptitsyn Gloucester Marine Genomics Institute, Gloucester, MA 01930, USA
Prarthana Khanna Genetics Program, Tufts University School of Medicine, Boston, MA 02111, USA
Daria Y Romanova Institute of Higher Nervous Activity and Neurophysiology of RAS, Moscow 117485, Russia
Peter Williams The Whitney Laboratory for Marine Bioscience and Department of Neuroscience, University of Florida, Gainesville and St. Augustine, FL 32080-8623, USA
Spencer J Greenwood Department of Biomedical Sciences, Atlantic Veterinary College, University of Prince Edward Island, Charlottetown, Prince Edward Island C1A 4P3, Canada
Leonid L Moroz The Whitney Laboratory for Marine Bioscience and Department of Neuroscience, University of Florida, Gainesville and St. Augustine, FL 32080-8623, USA
David R Walt Gloucester Marine Genomics Institute, Gloucester, MA 01930, USA Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Wyss Institute for Biologically Inspired Engineering at Harvard University, Boston, MA 02115, USA
Andrea G Bodnar Gloucester Marine Genomics Institute, Gloucester, MA 01930, USA.

Collapse

Alonge M, Shumate A, Puiu D, Zimin AV, Salzberg SL. Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies. Genetics 2020;216:599-608. [PMID: 32796007 PMCID: PMC7536849 DOI: 10.1534/genetics.120.303501] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Accepted: 08/10/2020] [Indexed: 11/18/2022] Open

Zimin AV, Salzberg SL. The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies. PLoS Comput Biol 2020;16:e1007981. [PMID: 32589667 PMCID: PMC7347232 DOI: 10.1371/journal.pcbi.1007981] [Citation(s) in RCA: 121] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 07/09/2020] [Accepted: 05/25/2020] [Indexed: 11/18/2022] Open

Shumate A, Zimin AV, Sherman RM, Puiu D, Wagner JM, Olson ND, Pertea M, Salit ML, Zook JM, Salzberg SL. Assembly and annotation of an Ashkenazi human reference genome. Genome Biol 2020;21:129. [PMID: 32487205 PMCID: PMC7265644 DOI: 10.1186/s13059-020-02047-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 05/15/2020] [Indexed: 01/23/2023] Open

Marrano A, Britton M, Zaini PA, Zimin AV, Workman RE, Puiu D, Bianco L, Pierro EAD, Allen BJ, Chakraborty S, Troggio M, Leslie CA, Timp W, Dandekar A, Salzberg SL, Neale DB. High-quality chromosome-scale assembly of the walnut (Juglans regia L.) reference genome. Gigascience 2020;9:giaa050. [PMID: 32432329 PMCID: PMC7238675 DOI: 10.1093/gigascience/giaa050] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 03/13/2020] [Accepted: 04/20/2020] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

The release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes.

FINDINGS

Here, we report the new chromosome-scale assembly of the walnut reference genome (Chandler v2.0) obtained by combining Oxford Nanopore long-read sequencing with chromosome conformation capture (Hi-C) technology. Relative to the previous reference genome, the new assembly features an 84.4-fold increase in N50 size, with the 16 chromosomal pseudomolecules assembled and representing 95% of its total length. Using full-length transcripts from single-molecule real-time sequencing, we predicted 37,554 gene models, with a mean gene length higher than the previous gene annotations. Most of the new protein-coding genes (90%) present both start and stop codons, which represents a significant improvement compared with Chandler v1.0 (only 48%). We then tested the potential impact of the new chromosome-level genome on different areas of walnut research. By studying the proteome changes occurring during male flower development, we observed that the virtual proteome obtained from Chandler v2.0 presents fewer artifacts than the previous reference genome, enabling the identification of a new potential pollen allergen in walnut. Also, the new chromosome-scale genome facilitates in-depth studies of intraspecies genetic diversity by revealing previously undetected autozygous regions in Chandler, likely resulting from inbreeding, and 195 genomic regions highly differentiated between Western and Eastern walnut cultivars.

CONCLUSION

Overall, Chandler v2.0 will serve as a valuable resource to better understand and explore walnut biology.

Collapse

Affiliation(s)

Annarita Marrano Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Monica Britton Bioinformatics Core Facility, Genome Center, University of California, One Shields Avenue, Davis, CA 95616, USA
Paulo A Zaini Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Aleksey V Zimin Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA
Rachael E Workman Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA
Daniela Puiu Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA
Luca Bianco Research and Innovation Center, Fondazione Edmund Mach, Via E. Mach, 1 38010 S. Michele all'Adige (TN) 38010, Italy
Erica Adele Di Pierro Research and Innovation Center, Fondazione Edmund Mach, Via E. Mach, 1 38010 S. Michele all'Adige (TN) 38010, Italy
Brian J Allen Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Sandeep Chakraborty Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Michela Troggio Research and Innovation Center, Fondazione Edmund Mach, Via E. Mach, 1 38010 S. Michele all'Adige (TN) 38010, Italy
Charles A Leslie Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Winston Timp Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA
Abhaya Dandekar Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Steven L Salzberg Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA Departments of Computer Science and Biostatistics, Johns Hopkins University, 3400 North Charles Street Baltimore, MD 21218, USA
David B Neale Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA

Collapse

Giordano R, Donthu RK, Zimin AV, Julca Chavez IC, Gabaldon T, van Munster M, Hon L, Hall R, Badger JH, Nguyen M, Flores A, Potter B, Giray T, Soto-Adames FN, Weber E, Marcelino JAP, Fields CJ, Voegtlin DJ, Hill CB, Hartman GL. Soybean aphid biotype 1 genome: Insights into the invasive biology and adaptive evolution of a major agricultural pest. Insect Biochem Mol Biol 2020;120:103334. [PMID: 32109587 DOI: 10.1016/j.ibmb.2020.103334] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 01/07/2020] [Accepted: 02/10/2020] [Indexed: 05/12/2023]

Abstract

The soybean aphid, Aphis glycines Matsumura (Hemiptera: Aphididae) is a serious pest of the soybean plant, Glycine max, a major world-wide agricultural crop. We assembled a de novo genome sequence of Ap. glycines Biotype 1, from a culture established shortly after this species invaded North America. 20.4% of the Ap. glycines proteome is duplicated. These in-paralogs are enriched with Gene Ontology (GO) categories mostly related to apoptosis, a possible adaptation to plant chemistry and other environmental stressors. Approximately one-third of these genes show parallel duplication in other aphids. But Ap. gossypii, its closest related species, has the lowest number of these duplicated genes. An Illumina GoldenGate assay of 2380 SNPs was used to determine the world-wide population structure of Ap. Glycines. China and South Korean aphids are the closest to those in North America. China is the likely origin of other Asian aphid populations. The most distantly related aphids to those in North America are from Australia. The diversity of Ap. glycines in North America has decreased over time since its arrival. The genetic diversity of Ap. glycines North American population sampled shortly after its first detection in 2001 up to 2012 does not appear to correlate with geography. However, aphids collected on soybean Rag experimental varieties in Minnesota (MN), Iowa (IA), and Wisconsin (WI), closer to high density Rhamnus cathartica stands, appear to have higher capacity to colonize resistant soybean plants than aphids sampled in Ohio (OH), North Dakota (ND), and South Dakota (SD). Samples from the former states have SNP alleles with high F_ST values and frequencies, that overlap with genes involved in iron metabolism, a crucial metabolic pathway that may be affected by the Rag-associated soybean plant response. The Ap. glycines Biotype 1 genome will provide needed information for future analyses of mechanisms of aphid virulence and pesticide resistance as well as facilitate comparative analyses between aphids with differing natural history and host plant range.

Collapse

Affiliation(s)

Rosanna Giordano Puerto Rico Science, Technology and Research Trust, San Juan, PR, USA; Know Your Bee, Inc. San Juan, PR, USA.
Ravi Kiran Donthu Puerto Rico Science, Technology and Research Trust, San Juan, PR, USA; Know Your Bee, Inc. San Juan, PR, USA.
Aleksey V Zimin Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Irene Consuelo Julca Chavez Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Spain; Institute for Research in Biomedicine, Barcelona, Spain
Toni Gabaldon Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Spain; Institute for Research in Biomedicine, Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
Manuella van Munster CIRAD-INRA-Montpellier SupAgro, TA A54/K, Campus International de Baillarguet, Montpellier, France
Lawrence Hon Color Genomics, Burlingame, CA, USA
Richard Hall Pacific Biosciences, Menlo Park, CA, USA
Jonathan H Badger Cancer and Inflammation Program, Center for Cancer Research, National Cancer Institute, National Institute of Health, DHHS, Bethesda, MD, USA
Minh Nguyen Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
Alejandra Flores College of Liberal Arts and Sciences, School of Molecular and Cellular Biology, University of Illinois, Urbana, IL, USA
Bruce Potter University of Minnesota, Southwest Research and Outreach Center, Lamberton, MN, USA
Tugrul Giray Department of Biology, University of Puerto Rico, San Juan, PR, USA
Felipe N Soto-Adames Florida Department of Agriculture and Consumer Services, Division of Plant Industry, Entomology, Gainesville, FL, USA
Everett Weber Know Your Bee, Inc. San Juan, PR, USA
Jose A P Marcelino Puerto Rico Science, Technology and Research Trust, San Juan, PR, USA; Know Your Bee, Inc. San Juan, PR, USA; Department of Entomology and Nematology, University of Florida, Gainesville, FL, USA
Christopher J Fields HPCBio, Roy J. Carver Biotechnology Center, University of Illinois, Urbana, IL, USA
David J Voegtlin Illinois Natural History Survey, University of Illinois, Urbana, IL, USA
Curt B Hill Agricen Sciences, Pilot Point, TX, USA
Glen L Hartman USDA-ARS and Department of Crop Sciences, University of Illinois, Urbana, IL, USA

Collapse

Marrano A, Britton M, Zaini PA, Zimin AV, Workman RE, Puiu D, Bianco L, Pierro EAD, Allen BJ, Chakraborty S, Troggio M, Leslie CA, Timp W, Dandekar A, Salzberg SL, Neale DB. High-quality chromosome-scale assembly of the walnut (Juglans regia L.) reference genome. Gigascience 2020. [PMID: 32432329 DOI: 10.1101/80979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open

Abstract

BACKGROUND

FINDINGS

CONCLUSION

Overall, Chandler v2.0 will serve as a valuable resource to better understand and explore walnut biology.

Collapse

Affiliation(s)

Annarita Marrano Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Monica Britton Bioinformatics Core Facility, Genome Center, University of California, One Shields Avenue, Davis, CA 95616, USA
Paulo A Zaini Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Aleksey V Zimin Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA
Rachael E Workman Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA
Daniela Puiu Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA
Luca Bianco Research and Innovation Center, Fondazione Edmund Mach, Via E. Mach, 1 38010 S. Michele all'Adige (TN) 38010, Italy
Erica Adele Di Pierro Research and Innovation Center, Fondazione Edmund Mach, Via E. Mach, 1 38010 S. Michele all'Adige (TN) 38010, Italy
Brian J Allen Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Sandeep Chakraborty Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Michela Troggio Research and Innovation Center, Fondazione Edmund Mach, Via E. Mach, 1 38010 S. Michele all'Adige (TN) 38010, Italy
Charles A Leslie Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Winston Timp Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA
Abhaya Dandekar Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA
Steven L Salzberg Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA Departments of Computer Science and Biostatistics, Johns Hopkins University, 3400 North Charles Street Baltimore, MD 21218, USA
David B Neale Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA

Collapse

Read AC, Moscou MJ, Zimin AV, Pertea G, Meyer RS, Purugganan MD, Leach JE, Triplett LR, Salzberg SL, Bogdanove AJ. Genome assembly and characterization of a complex zfBED-NLR gene-containing disease resistance locus in Carolina Gold Select rice with Nanopore sequencing. PLoS Genet 2020;16:e1008571. [PMID: 31986137 PMCID: PMC7004385 DOI: 10.1371/journal.pgen.1008571] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 02/06/2020] [Accepted: 12/16/2019] [Indexed: 12/26/2022] Open

Abstract

Long-read sequencing facilitates assembly of complex genomic regions. In plants, loci containing nucleotide-binding, leucine-rich repeat (NLR) disease resistance genes are an important example of such regions. NLR genes constitute one of the largest gene families in plants and are often clustered, evolving via duplication, contraction, and transposition. We recently mapped the Xo1 locus for resistance to bacterial blight and bacterial leaf streak, found in the American heirloom rice variety Carolina Gold Select, to a region that in the Nipponbare reference genome is NLR gene-rich. Here, toward identification of the Xo1 gene, we combined Nanopore and Illumina reads and generated a high-quality Carolina Gold Select genome assembly. We identified 529 complete or partial NLR genes and discovered, relative to Nipponbare, an expansion of NLR genes at the Xo1 locus. One of these has high sequence similarity to the cloned, functionally similar Xa1 gene. Both harbor an integrated zfBED domain, and the repeats within each protein are nearly perfect. Across diverse Oryzeae, we identified two sub-clades of NLR genes with these features, varying in the presence of the zfBED domain and the number of repeats. The Carolina Gold Select genome assembly also uncovered at the Xo1 locus a rice blast resistance gene and a gene encoding a polyphenol oxidase (PPO). PPO activity has been used as a marker for blast resistance at the locus in some varieties; however, the Carolina Gold Select sequence revealed a loss-of-function mutation in the PPO gene that breaks this association. Our results demonstrate that whole genome sequencing combining Nanopore and Illumina reads effectively resolves NLR gene loci. Our identification of an Xo1 candidate is an important step toward mechanistic characterization, including the role(s) of the zfBED domain. Finally, the Carolina Gold Select genome assembly will facilitate identification of other useful traits in this historically important variety.

Plants lack adaptive immunity, and instead contain repeat-rich, disease resistance genes that evolve rapidly through duplication, recombination, and transposition. The number, variation, and often clustered arrangement of these genes make them challenging to sequence and catalog. The US heirloom rice variety Carolina Gold Select has resistance to two important bacterial diseases. Toward identifying the responsible gene(s), we combined long- and short-read sequencing technologies to assemble the whole genome and identify the resistance gene repertoire. We previously narrowed the location of the gene(s) to a region on chromosome four. The region in Carolina Gold Select is larger than in the rice reference genome (Nipponbare) and contains twice as many resistance genes. One shares unusual features with a known bacterial disease resistance gene, suggesting that it confers the resistance. Across diverse varieties and related species, we identified two widely-distributed groups of such genes. The results are an important step toward mechanistic characterization and deployment of the bacterial disease resistance. The genome assembly also identified a resistance gene for a fungal disease and predicted a marker phenotype used in breeding for resistance. Thus, the Carolina Gold Select genome assembly can be expected to aid in the identification and deployment of other valuable traits.

Collapse

Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 2019;20:278. [PMID: 31842956 PMCID: PMC6912988 DOI: 10.1186/s13059-019-1910-1] [Citation(s) in RCA: 656] [Impact Index Per Article: 131.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/02/2019] [Indexed: 11/13/2022] Open

Breitwieser FP, Pertea M, Zimin AV, Salzberg SL. Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res 2019;29:954-960. [PMID: 31064768 PMCID: PMC6581058 DOI: 10.1101/gr.245373.118] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 05/03/2019] [Indexed: 01/22/2023]

Zimin AV, Puiu D, Hall R, Kingan S, Clavijo BJ, Salzberg SL. The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum. Gigascience 2018;6:1-7. [PMID: 29069494 PMCID: PMC5691383 DOI: 10.1093/gigascience/gix097] [Citation(s) in RCA: 167] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 09/28/2017] [Indexed: 01/17/2023] Open

Zimin AV, Stevens KA, Crepeau MW, Puiu D, Wegrzyn JL, Yorke JA, Langley CH, Neale DB, Salzberg SL. Erratum to: An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. Gigascience 2017;6:1. [PMID: 29020755 PMCID: PMC5632297 DOI: 10.1093/gigascience/gix072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Zimin AV, Stevens KA, Crepeau MW, Puiu D, Wegrzyn JL, Yorke JA, Langley CH, Neale DB, Salzberg SL. An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. Gigascience 2017;6:1-4. [PMID: 28369353 PMCID: PMC5437942 DOI: 10.1093/gigascience/giw016] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 12/21/2016] [Indexed: 11/30/2022] Open

Zimin AV, Puiu D, Luo MC, Zhu T, Koren S, Marçais G, Yorke JA, Dvořák J, Salzberg SL. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res 2017. [PMID: 28130360 DOI: 10.1101/gr.2134c5.116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]

Zimin AV, Puiu D, Luo MC, Zhu T, Koren S, Marçais G, Yorke JA, Dvořák J, Salzberg SL. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res 2017;27:787-792. [PMID: 28130360 PMCID: PMC5411773 DOI: 10.1101/gr.213405.116] [Citation(s) in RCA: 240] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 01/18/2017] [Indexed: 01/12/2023]

Zimin AV, Cornish AS, Maudhoo MD, Gibbs RM, Zhang X, Pandey S, Meehan DT, Wipfler K, Bosinger SE, Johnson ZP, Tharp GK, Marçais G, Roberts M, Ferguson B, Fox HS, Treangen T, Salzberg SL, Yorke JA, Norgren RB. A new rhesus macaque assembly and annotation for next-generation sequencing analyses. Biol Direct 2014;9:20. [PMID: 25319552 PMCID: PMC4214606 DOI: 10.1186/1745-6150-9-20] [Citation(s) in RCA: 136] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 10/03/2014] [Indexed: 12/13/2022] Open

Abstract

Background

The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.

Results

We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.

Conclusions

The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.

Reviewers

This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

Collapse

Neale DB, Wegrzyn JL, Stevens KA, Zimin AV, Puiu D, Crepeau MW, Cardeno C, Koriabine M, Holtz-Morris AE, Liechty JD, Martínez-García PJ, Vasquez-Gross HA, Lin BY, Zieve JJ, Dougherty WM, Fuentes-Soriano S, Wu LS, Gilbert D, Marçais G, Roberts M, Holt C, Yandell M, Davis JM, Smith KE, Dean JFD, Lorenz WW, Whetten RW, Sederoff R, Wheeler N, McGuire PE, Main D, Loopstra CA, Mockaitis K, deJong PJ, Yorke JA, Salzberg SL, Langley CH. Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biol 2014;15:R59. [PMID: 24647006 PMCID: PMC4053751 DOI: 10.1186/gb-2014-15-3-r59] [Citation(s) in RCA: 274] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Accepted: 03/04/2014] [Indexed: 11/30/2022] Open

Dalloul RA, Zimin AV, Settlage RE, Kim S, Reed KM. Next-generation sequencing strategies for characterizing the turkey genome. Poult Sci 2014;93:479-84. [DOI: 10.3382/ps.2013-03560] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. The MaSuRCA genome assembler. Bioinformatics 2013;29:2669-77. [PMID: 23990416 DOI: 10.1093/bioinformatics/btt476] [Citation(s) in RCA: 810] [Impact Index Per Article: 73.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Abstract

MOTIVATION

Second-generation sequencing technologies produce high coverage of the genome by short reads at a low cost, which has prompted development of new assembly methods. In particular, multiple algorithms based on de Bruijn graphs have been shown to be effective for the assembly problem. In this article, we describe a new hybrid approach that has the computational efficiency of de Bruijn graph methods and the flexibility of overlap-based assembly strategies, and which allows variable read lengths while tolerating a significant level of sequencing error. Our method transforms large numbers of paired-end reads into a much smaller number of longer 'super-reads'. The use of super-reads allows us to assemble combinations of Illumina reads of differing lengths together with longer reads from 454 and Sanger sequencing technologies, making it one of the few assemblers capable of handling such mixtures. We call our system the Maryland Super-Read Celera Assembler (abbreviated MaSuRCA and pronounced 'mazurka').

RESULTS

We evaluate the performance of MaSuRCA against two of the most widely used assemblers for Illumina data, Allpaths-LG and SOAPdenovo2, on two datasets from organisms for which high-quality assemblies are available: the bacterium Rhodobacter sphaeroides and chromosome 16 of the mouse genome. We show that MaSuRCA performs on par or better than Allpaths-LG and significantly better than SOAPdenovo on these data, when evaluated against the finished sequence. We then show that MaSuRCA can significantly improve its assemblies when the original data are augmented with long reads.

AVAILABILITY

MaSuRCA is available as open-source code at ftp://ftp.genome.umd.edu/pub/MaSuRCA/. Previous (pre-publication) releases have been publicly available for over a year.

CONTACT

alekseyz@ipst.umd.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Zimin AV, Kelley DR, Roberts M, Marçais G, Salzberg SL, Yorke JA. Mis-assembled "segmental duplications" in two versions of the Bos taurus genome. PLoS One 2012;7:e42680. [PMID: 22880081 PMCID: PMC3411808 DOI: 10.1371/journal.pone.0042680] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 07/11/2012] [Indexed: 01/06/2023] Open

Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Ann Blomberg L, Bouffard P, Burt DW, Crasta O, Crooijmans RPMA, Cooper K, Coulombe RA, De S, Delany ME, Dodgson JB, Dong JJ, Evans C, Frederickson KM, Flicek P, Florea L, Folkerts O, Groenen MAM, Harkins TT, Herrero J, Hoffmann S, Megens HJ, Jiang A, de Jong P, Kaiser P, Kim H, Kim KW, Kim S, Langenberger D, Lee MK, Lee T, Mane S, Marcais G, Marz M, McElroy AP, Modise T, Nefedov M, Notredame C, Paton IR, Payne WS, Pertea G, Prickett D, Puiu D, Qioa D, Raineri E, Ruffier M, Salzberg SL, Schatz MC, Scheuring C, Schmidt CJ, Schroeder S, Searle SMJ, Smith EJ, Smith J, Sonstegard TS, Stadler PF, Tafer H, Tu Z(J, Van Tassell CP, Vilella AJ, Williams KP, Yorke JA, Zhang L, Zhang HB, Zhang X, Zhang Y, Reed KM. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol 2010;8:e1000475. [PMID: 20838655 PMCID: PMC2935454 DOI: 10.1371/journal.pbio.1000475] [Citation(s) in RCA: 320] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Accepted: 07/27/2010] [Indexed: 12/11/2022] Open

Affiliation(s)

Rami A. Dalloul Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
Julie A. Long Animal Biosciences and Biotechnology Laboratory, USDA Agricultural Research Service, Beltsville, Maryland, United States of America
Aleksey V. Zimin Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
Luqman Aslam Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
Kathryn Beal European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Le Ann Blomberg Animal Biosciences and Biotechnology Laboratory, USDA Agricultural Research Service, Beltsville, Maryland, United States of America
Pascal Bouffard Roche Applied Science, Indianapolis, Indiana, United States of America
David W. Burt The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, United Kingdom
Oswald Crasta Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America Chromatin Inc., Champaign, Illinois, United States of America
Richard P. M. A. Crooijmans Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
Kristal Cooper Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
Roger A. Coulombe Department of Veterinary Sciences, Utah State University, Logan, Utah, United States of America
Supriyo De Gene Expression and Genomics Unit, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
Mary E. Delany Department of Animal Science, University of California, Davis, California, United States of America
Jerry B. Dodgson Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
Jennifer J. Dong Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
Clive Evans Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
Karin M. Frederickson Roche Applied Science, Indianapolis, Indiana, United States of America
Paul Flicek European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Liliana Florea Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
Otto Folkerts Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America Chromatin Inc., Champaign, Illinois, United States of America
Martien A. M. Groenen Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
Tim T. Harkins Roche Applied Science, Indianapolis, Indiana, United States of America
Javier Herrero European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Steve Hoffmann Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany LIFE Project, University of Leipzig, Leipzig, Germany
Hendrik-Jan Megens Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
Andrew Jiang Department of Animal Science, University of California, Davis, California, United States of America
Pieter de Jong Children's Hospital and Research Center at Oakland, Oakland, California, United States of America
Pete Kaiser Institute for Animal Health, Compton, Berkshire, United Kingdom
Heebal Kim Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea
Kyu-Won Kim Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea
Sungwon Kim Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
David Langenberger Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
Mi-Kyung Lee Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
Taeheon Lee Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea
Shrinivasrao Mane Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
Guillaume Marcais Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
Manja Marz Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany Philipps-Universität Marburg, Pharmazeutische Chemie, Marburg, Germany
Audrey P. McElroy Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
Thero Modise Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
Mikhail Nefedov Children's Hospital and Research Center at Oakland, Oakland, California, United States of America
Cédric Notredame Comparative Bioinformatics, Centre for Genomic Regulation (CRG), Universitat Pompeus Fabre, Barcelona, Spain
Ian R. Paton The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, United Kingdom
William S. Payne Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
Geo Pertea Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
Dennis Prickett Institute for Animal Health, Compton, Berkshire, United Kingdom
Daniela Puiu Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
Dan Qioa Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
Emanuele Raineri Comparative Bioinformatics, Centre for Genomic Regulation (CRG), Universitat Pompeus Fabre, Barcelona, Spain
Magali Ruffier Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Steven L. Salzberg Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
Michael C. Schatz Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
Chantel Scheuring Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
Carl J. Schmidt Department of Animal and Food Sciences, University of Delaware, Newark, Delaware, United States of America
Steven Schroeder Bovine Functional Genomics Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
Stephen M. J. Searle Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Edward J. Smith Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
Jacqueline Smith The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, United Kingdom
Tad S. Sonstegard Bovine Functional Genomics Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
Peter F. Stadler Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Fraunhofer Institut für Zelltherapie und Immunologie, Leipzig, Germany Department of Theoretical Chemistry University of Vienna, Vienna, Austria Santa Fe Institute, Santa Fe, New Mexico, United States of America
Hakim Tafer Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany Department of Theoretical Chemistry University of Vienna, Vienna, Austria
Zhijian (Jake) Tu Department of Biochemistry, Virginia Tech, Blacksburg, Virginia, United States of America
Curtis P. Van Tassell Bovine Functional Genomics Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America Animal Improvement Programs Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
Albert J. Vilella European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Kelly P. Williams Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
James A. Yorke Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
Liqing Zhang Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
Hong-Bin Zhang Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
Xiaojun Zhang Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
Yang Zhang Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
Kent M. Reed Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, St. Paul, Minnesota, United States of America

Collapse

Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS, Marçais G, Roberts M, Subramanian P, Yorke JA, Salzberg SL. A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol 2009;10:R42. [PMID: 19393038 PMCID: PMC2688933 DOI: 10.1186/gb-2009-10-4-r42] [Citation(s) in RCA: 827] [Impact Index Per Article: 55.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2009] [Revised: 02/06/2009] [Accepted: 04/24/2009] [Indexed: 12/02/2022] Open

Roberts M, Zimin AV, Hayes W, Hunt BR, Ustun C, White JR, Havlak P, Yorke J. Improving Phrap-based assembly of the rat using "reliable" overlaps. PLoS One 2008;3:e1836. [PMID: 18350171 PMCID: PMC2266800 DOI: 10.1371/journal.pone.0001836] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2007] [Accepted: 02/09/2008] [Indexed: 12/02/2022] Open

Zimin AV, Smith DR, Sutton G, Yorke JA. Assembly reconciliation. Bioinformatics 2007;24:42-5. [PMID: 18057021 DOI: 10.1093/bioinformatics/btm542] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Zimin AV, Hunt BR, Ott E. Bifurcation scenarios for bubbling transition. Phys Rev E Stat Nonlin Soft Matter Phys 2003;67:016204. [PMID: 12636582 DOI: 10.1103/physreve.67.016204] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2002] [Indexed: 05/24/2023]

Chukina EA, Lapshin VP, Kliukvin II, Okhotskiĭ VP, Zvezdina MV, Larionov KS, Zimin AV. [Millimeter wavelength electromagnetic irradiation in the complex treatment of patients with extensive bite wounds]. Vopr Kurortol Fizioter Lech Fiz Kult 2001:45-7. [PMID: 11785340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]