1
|
Ziemski M, Wisanwanichthan T, Bokulich NA, Kaehler BD. Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences. Front Microbiol 2021; 12:644487. [PMID: 34220738 PMCID: PMC8249850 DOI: 10.3389/fmicb.2021.644487] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/31/2021] [Indexed: 12/28/2022] Open
Abstract
Naive Bayes classifiers (NBC) have dominated the field of taxonomic classification of amplicon sequences for over a decade. Apart from having runtime requirements that allow them to be trained and used on modest laptops, they have persistently provided class-topping classification accuracy. In this work we compare NBC with random forest classifiers, neural network classifiers, and a perfect classifier that can only fail when different species have identical sequences, and find that in some practical scenarios there is little scope for improving on NBC for taxonomic classification of 16S rRNA gene sequences. Further improvements in taxonomy classification are unlikely to come from novel algorithms alone, and will need to leverage other technological innovations, such as ecological frequency information.
Collapse
Affiliation(s)
- Michal Ziemski
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zürich, Zurich, Switzerland
| | | | - Nicholas A. Bokulich
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zürich, Zurich, Switzerland
| | | |
Collapse
|
2
|
Bokulich NA, Ziemski M, Robeson MS, Kaehler BD. Measuring the microbiome: Best practices for developing and benchmarking microbiomics methods. Comput Struct Biotechnol J 2020; 18:4048-4062. [PMID: 33363701 PMCID: PMC7744638 DOI: 10.1016/j.csbj.2020.11.049] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 11/27/2020] [Accepted: 11/28/2020] [Indexed: 12/12/2022] Open
Abstract
Microbiomes are integral components of diverse ecosystems, and increasingly recognized for their roles in the health of humans, animals, plants, and other hosts. Given their complexity (both in composition and function), the effective study of microbiomes (microbiomics) relies on the development, optimization, and validation of computational methods for analyzing microbial datasets, such as from marker-gene (e.g., 16S rRNA gene) and metagenome data. This review describes best practices for benchmarking and implementing computational methods (and software) for studying microbiomes, with particular focus on unique characteristics of microbiomes and microbiomics data that should be taken into account when designing and testing microbiomics methods.
Collapse
Affiliation(s)
- Nicholas A. Bokulich
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zurich, Switzerland
| | - Michal Ziemski
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zurich, Switzerland
| | - Michael S. Robeson
- University of Arkansas for Medical Sciences, Department of Biomedical Informatics, Little Rock, AR, USA
| | | |
Collapse
|
3
|
Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, Caporaso JG. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019; 37:852-857. [PMID: 31341288 DOI: 10.1038/s41587-019-0209-9] [Citation(s) in RCA: 8152] [Impact Index Per Article: 1630.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Evan Bolyen
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Jai Ram Rideout
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Matthew R Dillon
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Nicholas A Bokulich
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Christian C Abnet
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Gabriel A Al-Ghalith
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Harriet Alexander
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA, USA.,Department of Population Health and Reproduction, University of California, Davis, Davis, CA, USA
| | - Eric J Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Manimozhiyan Arumugam
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Yang Bai
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China.,Centre of Excellence for Plant and Microbial Sciences (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences & John Innes Centre, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jordan E Bisanz
- Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA, USA
| | - Kyle Bittinger
- Division of Gastroenterology and Nutrition, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Hepatology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Asker Brejnrod
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Colin J Brislawn
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - C Titus Brown
- Department of Population Health and Reproduction, University of California, Davis, Davis, CA, USA
| | - Benjamin J Callahan
- Department of Population Health & Pathobiology, North Carolina State University, Raleigh, NC, USA.,Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Andrés Mauricio Caraballo-Rodríguez
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - John Chase
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Emily K Cope
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA.,Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA
| | - Ricardo Da Silva
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | | | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Gavin M Douglas
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Daniel M Durall
- Irving K. Barber School of Arts and Sciences, University of British Columbia, Kelowna, British Columbia, Canada
| | - Claire Duvallet
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Christian F Edwardson
- A. Watson Armour III Center for Animal Health and Welfare, Aquarium Microbiome Project, John G. Shedd Aquarium, Chicago, IL, USA
| | - Madeleine Ernst
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA.,Department of Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Mehrbod Estaki
- Department of Biology, University of British Columbia Okanagan, Okanagan, British Columbia, Canada
| | - Jennifer Fouquier
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Julia M Gauglitz
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Sean M Gibbons
- Institute for Systems Biology, Seattle, WA, USA.,eScience Institute, University of Washington, Seattle, WA, USA
| | - Deanna L Gibson
- Irving K. Barber School of Arts and Sciences, Department of Biology, University of British Columbia, Kelowna, British Columbia, Canada.,Department of Medicine, University of British Columbia, Kelowna, British Columbia, Canada
| | - Antonio Gonzalez
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Kestrel Gorlick
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Jiarong Guo
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, USA
| | - Benjamin Hillmann
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Susan Holmes
- Statistics Department, Stanford University, Palo Alto, CA, USA
| | - Hannes Holste
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gavin A Huttley
- Research School of Biology, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Stefan Janssen
- Department of Pediatric Oncology, Hematology and Clinical Immunology, Heinrich-Heine University Dusseldorf, Dusseldorf, Germany
| | - Alan K Jarmusch
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Lingjing Jiang
- Department of Family Medicine and Public Health, University of California San Diego, La Jolla, CA, USA
| | - Benjamin D Kaehler
- Research School of Biology, The Australian National University, Canberra, Australian Capital Territory, Australia.,School of Science, University of New South Wales, Canberra, Australian Capital Territory, Australia
| | - Kyo Bin Kang
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA.,College of Pharmacy, Sookmyung Women's University, Seoul, Republic of Korea
| | - Christopher R Keefe
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Paul Keim
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Scott T Kelley
- Department of Biology, San Diego State University, San Diego, CA, USA
| | - Dan Knights
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA.,Biotechnology Institute, University of Minnesota, Saint Paul, MN, USA
| | - Irina Koester
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA.,Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Tomasz Kosciolek
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Jorden Kreps
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Morgan G I Langille
- Department of Pharmacology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Joslynn Lee
- Science Education, Howard Hughes Medical Institute, Ashburn, VA, USA
| | - Ruth Ley
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany.,Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Yong-Xin Liu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China.,Centre of Excellence for Plant and Microbial Sciences (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences & John Innes Centre, Beijing, China
| | - Erikka Loftfield
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Catherine Lozupone
- Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Massoud Maher
- Department of Computer Science & Engineering, University of California San Diego, La Jolla, CA, USA
| | - Clarisse Marotz
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Bryan D Martin
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Lauren J McIver
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alexey V Melnik
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Jessica L Metcalf
- Department of Animal Science, Colorado State University, Fort Collins, CO, USA
| | - Sydney C Morgan
- Irving K. Barber School of Arts and Sciences, Unit 2 (Biology), University of British Columbia, Kelowna, British Columbia, Canada
| | - Jamie T Morton
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Department of Computer Science & Engineering, University of California San Diego, La Jolla, CA, USA
| | - Ahmad Turan Naimey
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Jose A Navas-Molina
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Department of Computer Science & Engineering, University of California San Diego, La Jolla, CA, USA.,Google LLC, Mountain View, CA, USA
| | - Louis Felix Nothias
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Stephanie B Orchanian
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Talima Pearson
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Samuel L Peoples
- School of Information Studies, Syracuse University, Syracuse, NY, USA.,School of STEM, University of Washington Bothell, Bothell, WA, USA
| | - Daniel Petras
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Mary Lai Preuss
- Department of Biological Sciences, Webster University, St. Louis, MO, USA
| | - Elmar Pruesse
- Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lasse Buur Rasmussen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Adam Rivers
- Agricultural Research Service, Genomics and Bioinformatics Research Unit, United States Department of Agriculture, Gainesville, FL, USA
| | - Michael S Robeson
- College of Medicine, Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Patrick Rosenthal
- Department of Biological Sciences, Webster University, St. Louis, MO, USA
| | - Nicola Segata
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Michael Shaffer
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Arron Shiffer
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Rashmi Sinha
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Se Jin Song
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - John R Spear
- Department of Civil and Environmental Engineering, Colorado School of Mines, Golden, CO, USA
| | - Austin D Swafford
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Luke R Thompson
- Department of Biological Sciences and Northern Gulf Institute, University of Southern Mississippi, Hattiesburg, MS, USA.,Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, La Jolla, CA, USA
| | - Pedro J Torres
- Department of Biology, San Diego State University, San Diego, CA, USA
| | - Pauline Trinh
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA
| | - Anupriya Tripathi
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA.,Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Division of Biological Sciences, University of California San Diego, San Diego, CA, USA
| | - Peter J Turnbaugh
- Department of Microbiology and Immunology, University of California San Francisco, San Francisco, CA, USA
| | - Sabah Ul-Hasan
- Quantitative and Systems Biology Graduate Program, University of California Merced, Merced, CA, USA
| | | | - Fernando Vargas
- Division of Biological Sciences, University of California San Diego, San Diego, CA, USA
| | | | - Emily Vogtmann
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Max von Hippel
- Department of Mathematics, University of Arizona, Tucson, AZ, USA
| | - William Walters
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Yunhu Wan
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Mingxun Wang
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Jonathan Warren
- National Laboratory Service, Environment Agency, Starcross, UK
| | - Kyle C Weber
- Agricultural Research Service, Genomics and Bioinformatics Research Unit, United States Department of Agriculture, Gainesville, FL, USA.,College of Agriculture and Life Sciences, University of Florida, Gainesville, FL, USA
| | | | - Amy D Willis
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Zhenjiang Zech Xu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Jesse R Zaneveld
- School of STEM, Division of Biological Sciences, University of Washington Bothell, Bothell, WA, USA
| | | | - Qiyun Zhu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - J Gregory Caporaso
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA. .,Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA.
| |
Collapse
|
4
|
Kaehler BD, Bokulich NA, McDonald D, Knight R, Caporaso JG, Huttley GA. Species abundance information improves sequence taxonomy classification accuracy. Nat Commun 2019; 10:4643. [PMID: 31604942 PMCID: PMC6789115 DOI: 10.1038/s41467-019-12669-6] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 09/19/2019] [Indexed: 12/12/2022] Open
Abstract
Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments. Taxonomy classification of amplicon sequences is an important step in investigating microbial communities in microbiome analysis. Here, the authors show incorporating environment-specific taxonomic abundance information can lead to improved species-level classification accuracy across common sample types.
Collapse
Affiliation(s)
- Benjamin D Kaehler
- Research School of Biology, Australian National University, Canberra, Australia. .,School of Science, University of New South Wales, Canberra, Australia.
| | - Nicholas A Bokulich
- Center for Applied Microbiome Science, The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA. .,Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA.
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.,Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - J Gregory Caporaso
- Center for Applied Microbiome Science, The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA. .,Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA.
| | - Gavin A Huttley
- Research School of Biology, Australian National University, Canberra, Australia.
| |
Collapse
|
5
|
Kaehler BD, Bokulich NA, McDonald D, Knight R, Caporaso JG, Huttley GA. Species abundance information improves sequence taxonomy classification accuracy. Nat Commun 2019. [PMID: 31604942 DOI: 10.1101/406611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023] Open
Abstract
Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.
Collapse
Affiliation(s)
- Benjamin D Kaehler
- Research School of Biology, Australian National University, Canberra, Australia.
- School of Science, University of New South Wales, Canberra, Australia.
| | - Nicholas A Bokulich
- Center for Applied Microbiome Science, The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA.
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA.
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - J Gregory Caporaso
- Center for Applied Microbiome Science, The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA.
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA.
| | - Gavin A Huttley
- Research School of Biology, Australian National University, Canberra, Australia.
| |
Collapse
|
6
|
Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, Caporaso JG. Author Correction: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019; 37:1091. [PMID: 31399723 DOI: 10.1038/s41587-019-0252-6] [Citation(s) in RCA: 281] [Impact Index Per Article: 56.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Collapse
Affiliation(s)
- Evan Bolyen
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Jai Ram Rideout
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Matthew R Dillon
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Nicholas A Bokulich
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Christian C Abnet
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Gabriel A Al-Ghalith
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Harriet Alexander
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA, USA.,Department of Population Health and Reproduction, University of California, Davis, Davis, CA, USA
| | - Eric J Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Manimozhiyan Arumugam
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Yang Bai
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China.,Centre of Excellence for Plant and Microbial Sciences (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences & John Innes Centre, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jordan E Bisanz
- Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA, USA
| | - Kyle Bittinger
- Division of Gastroenterology and Nutrition, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Hepatology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Asker Brejnrod
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Colin J Brislawn
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - C Titus Brown
- Department of Population Health and Reproduction, University of California, Davis, Davis, CA, USA
| | - Benjamin J Callahan
- Department of Population Health & Pathobiology, North Carolina State University, Raleigh, NC, USA.,Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Andrés Mauricio Caraballo-Rodríguez
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - John Chase
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Emily K Cope
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA.,Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA
| | - Ricardo Da Silva
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | | | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Gavin M Douglas
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Daniel M Durall
- Irving K. Barber School of Arts and Sciences, University of British Columbia, Kelowna, British Columbia, Canada
| | - Claire Duvallet
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Christian F Edwardson
- A. Watson Armour III Center for Animal Health and Welfare, Aquarium Microbiome Project, John G. Shedd Aquarium, Chicago, IL, USA
| | - Madeleine Ernst
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA.,Department of Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Mehrbod Estaki
- Department of Biology, University of British Columbia Okanagan, Okanagan, British Columbia, Canada
| | - Jennifer Fouquier
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Julia M Gauglitz
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Sean M Gibbons
- Institute for Systems Biology, Seattle, WA, USA.,eScience Institute, University of Washington, Seattle, WA, USA
| | - Deanna L Gibson
- Irving K. Barber School of Arts and Sciences, Department of Biology, University of British Columbia, Kelowna, British Columbia, Canada.,Department of Medicine, University of British Columbia, Kelowna, British Columbia, Canada
| | - Antonio Gonzalez
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Kestrel Gorlick
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Jiarong Guo
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, USA
| | - Benjamin Hillmann
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Susan Holmes
- Statistics Department, Stanford University, Palo Alto, CA, USA
| | - Hannes Holste
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gavin A Huttley
- Research School of Biology, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Stefan Janssen
- Department of Pediatric Oncology, Hematology and Clinical Immunology, Heinrich-Heine University Dusseldorf, Dusseldorf, Germany
| | - Alan K Jarmusch
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Lingjing Jiang
- Department of Family Medicine and Public Health, University of California San Diego, La Jolla, CA, USA
| | - Benjamin D Kaehler
- Research School of Biology, The Australian National University, Canberra, Australian Capital Territory, Australia.,School of Science, University of New South Wales, Canberra, Australian Capital Territory, Australia
| | - Kyo Bin Kang
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA.,College of Pharmacy, Sookmyung Women's University, Seoul, Republic of Korea
| | - Christopher R Keefe
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Paul Keim
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Scott T Kelley
- Department of Biology, San Diego State University, San Diego, CA, USA
| | - Dan Knights
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA.,Biotechnology Institute, University of Minnesota, Saint Paul, MN, USA
| | - Irina Koester
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA.,Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Tomasz Kosciolek
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Jorden Kreps
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Morgan G I Langille
- Department of Pharmacology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Joslynn Lee
- Science Education, Howard Hughes Medical Institute, Ashburn, VA, USA
| | - Ruth Ley
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany.,Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Yong-Xin Liu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China.,Centre of Excellence for Plant and Microbial Sciences (CEPAMS), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences & John Innes Centre, Beijing, China
| | - Erikka Loftfield
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Catherine Lozupone
- Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Massoud Maher
- Department of Computer Science & Engineering, University of California San Diego, La Jolla, CA, USA
| | - Clarisse Marotz
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Bryan D Martin
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Lauren J McIver
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alexey V Melnik
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Jessica L Metcalf
- Department of Animal Science, Colorado State University, Fort Collins, CO, USA
| | - Sydney C Morgan
- Irving K. Barber School of Arts and Sciences, Unit 2 (Biology), University of British Columbia, Kelowna, British Columbia, Canada
| | - Jamie T Morton
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Department of Computer Science & Engineering, University of California San Diego, La Jolla, CA, USA
| | - Ahmad Turan Naimey
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Jose A Navas-Molina
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Department of Computer Science & Engineering, University of California San Diego, La Jolla, CA, USA.,Google LLC, Mountain View, CA, USA
| | - Louis Felix Nothias
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Stephanie B Orchanian
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Talima Pearson
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Samuel L Peoples
- School of Information Studies, Syracuse University, Syracuse, NY, USA.,School of STEM, University of Washington Bothell, Bothell, WA, USA
| | - Daniel Petras
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Mary Lai Preuss
- Department of Biological Sciences, Webster University, St. Louis, MO, USA
| | - Elmar Pruesse
- Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lasse Buur Rasmussen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Adam Rivers
- Agricultural Research Service, Genomics and Bioinformatics Research Unit, United States Department of Agriculture, Gainesville, FL, USA
| | - Michael S Robeson
- College of Medicine, Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Patrick Rosenthal
- Department of Biological Sciences, Webster University, St. Louis, MO, USA
| | - Nicola Segata
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Michael Shaffer
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Arron Shiffer
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Rashmi Sinha
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Se Jin Song
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - John R Spear
- Department of Civil and Environmental Engineering, Colorado School of Mines, Golden, CO, USA
| | - Austin D Swafford
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Luke R Thompson
- Department of Biological Sciences and Northern Gulf Institute, University of Southern Mississippi, Hattiesburg, MS, USA.,Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, La Jolla, CA, USA
| | - Pedro J Torres
- Department of Biology, San Diego State University, San Diego, CA, USA
| | - Pauline Trinh
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, USA
| | - Anupriya Tripathi
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA.,Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Division of Biological Sciences, University of California San Diego, San Diego, CA, USA
| | - Peter J Turnbaugh
- Department of Microbiology and Immunology, University of California San Francisco, San Francisco, CA, USA
| | - Sabah Ul-Hasan
- Quantitative and Systems Biology Graduate Program, University of California Merced, Merced, CA, USA
| | | | - Fernando Vargas
- Division of Biological Sciences, University of California San Diego, San Diego, CA, USA
| | | | - Emily Vogtmann
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Max von Hippel
- Department of Mathematics, University of Arizona, Tucson, AZ, USA
| | - William Walters
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Yunhu Wan
- Metabolic Epidemiology Branch, National Cancer Institute, Rockville, MD, USA
| | - Mingxun Wang
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Jonathan Warren
- National Laboratory Service, Environment Agency, Starcross, UK
| | - Kyle C Weber
- Agricultural Research Service, Genomics and Bioinformatics Research Unit, United States Department of Agriculture, Gainesville, FL, USA.,College of Agriculture and Life Sciences, University of Florida, Gainesville, FL, USA
| | | | - Amy D Willis
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Zhenjiang Zech Xu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Jesse R Zaneveld
- School of STEM, Division of Biological Sciences, University of Washington Bothell, Bothell, WA, USA
| | | | - Qiyun Zhu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.,Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - J Gregory Caporaso
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA. .,Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA.
| |
Collapse
|
7
|
Abstract
Many of the challenges we currently face as an advanced society have been solved in unique ways by biological systems. One such challenge is developing strategies to avoid microbial infection. Social aculeates (wasps, bees and ants) mitigate the risk of infection to their colonies using a wide range of adaptations and mechanisms. These adaptations and mechanisms are reliant on intricate social structures and are energetically costly for the colony. It seems likely that these species must have had alternative and simpler mechanisms in place to ensure the maintenance of hygienic domicile conditions prior to the evolution of these complex behaviours. Features of the aculeate coiled-coil silk proteins are reminiscent of those of naturally occurring α-helical antimicrobial peptides (AMPs). In this study, we demonstrate that peptides derived from the aculeate silk proteins have antimicrobial activity. We reconstruct the predicted ancestral silk sequences of an aculeate ancestor that pre-dates the evolution of sociality and demonstrate that these ancestral sequences also contained peptides with antimicrobial properties. It is possible that the silks evolved as an antifouling material and facilitated the evolution of sociality. These materials serve as model materials for consideration in future biomaterial development.
Collapse
Affiliation(s)
- Tara D. Sutherland
- CSIRO (The Commonwealth Scientific and Industrial Research Organisation), Health and Biosecurity, Canberra, Australian Capital Territory, Australia
| | - Alagacone Sriskantha
- CSIRO (The Commonwealth Scientific and Industrial Research Organisation), Health and Biosecurity, Canberra, Australian Capital Territory, Australia
| | - Trevor D. Rapson
- CSIRO (The Commonwealth Scientific and Industrial Research Organisation), Health and Biosecurity, Canberra, Australian Capital Territory, Australia
| | - Benjamin D. Kaehler
- Research School of Biology, Australian National University, Australian Capital Territory, Australia
| | - Gavin A. Huttley
- Research School of Biology, Australian National University, Australian Capital Territory, Australia
| |
Collapse
|
8
|
Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, Huttley GA, Gregory Caporaso J. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin. Microbiome 2018; 6:90. [PMID: 29773078 PMCID: PMC5956843 DOI: 10.1186/s40168-018-0470-z] [Citation(s) in RCA: 2268] [Impact Index Per Article: 378.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
BACKGROUND Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. RESULTS We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). CONCLUSIONS Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
Collapse
Affiliation(s)
- Nicholas A Bokulich
- The Pathogen and Microbiome Institute, Northern Arizona University, PO Box 4073, Flagstaff, AZ, 86011-4073, USA.
| | - Benjamin D Kaehler
- Research School of Biology, Australian National University, 46 Sullivans Creek Road, Acton ACT, 2601, Australia.
| | - Jai Ram Rideout
- The Pathogen and Microbiome Institute, Northern Arizona University, PO Box 4073, Flagstaff, AZ, 86011-4073, USA
| | - Matthew Dillon
- The Pathogen and Microbiome Institute, Northern Arizona University, PO Box 4073, Flagstaff, AZ, 86011-4073, USA
| | - Evan Bolyen
- The Pathogen and Microbiome Institute, Northern Arizona University, PO Box 4073, Flagstaff, AZ, 86011-4073, USA
| | - Rob Knight
- Departments of Pediatrics and Computer Science and Engineering, and Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Gavin A Huttley
- Research School of Biology, Australian National University, 46 Sullivans Creek Road, Acton ACT, 2601, Australia.
| | - J Gregory Caporaso
- The Pathogen and Microbiome Institute, Northern Arizona University, PO Box 4073, Flagstaff, AZ, 86011-4073, USA.
- Department of Biological Sciences, Northern Arizona University, 1298 S Knoles Drive, Building 56, 3rd Floor, Flagstaff, AZ, USA.
| |
Collapse
|
9
|
Kaehler BD, Yap VB, Huttley GA. Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data. Genome Biol Evol 2018; 9:134-149. [PMID: 28175284 PMCID: PMC5381540 DOI: 10.1093/gbe/evw308] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/02/2017] [Indexed: 01/28/2023] Open
Abstract
Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage-specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of nonsynonymous to synonymous rates of substitution tends to be underestimated over three data sets of mammals, vertebrates, and insects. Our basis for comparison is a nonstationary codon substitution model that allows sequence composition to change. Goodness-of-fit results demonstrate that our new model tends to fit the data better. Direct measurement of nonstationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.
Collapse
Affiliation(s)
- Benjamin D Kaehler
- Research School of Biology, College of Medicine, Biology, and Environment, Australian National University, Canberra, ACT, Australia
| | - Von Bing Yap
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
| | - Gavin A Huttley
- Research School of Biology, College of Medicine, Biology, and Environment, Australian National University, Canberra, ACT, Australia
| |
Collapse
|
10
|
Bokulich NA, Dillon MR, Bolyen E, Kaehler BD, Huttley GA, Caporaso JG. q2-sample-classifier: machine-learning tools for microbiome classification and regression. J Open Res Softw 2018; 3:934. [PMID: 31552137 PMCID: PMC6759219 DOI: 10.21105/joss.00934] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
q2-sample-classifier is a plugin for the QIIME 2 microbiome bioinformatics platform that facilitates access, reproducibility, and interpretation of supervised learning (SL) methods for a broad audience of non-bioinformatics specialists.
Collapse
Affiliation(s)
- Nicholas A Bokulich
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Matthew R Dillon
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Evan Bolyen
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Benjamin D Kaehler
- Research School of Biology, Australian National University, Canberra, Australia
| | - Gavin A Huttley
- Research School of Biology, Australian National University, Canberra, Australia
| | - J Gregory Caporaso
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA
| |
Collapse
|
11
|
Kaehler BD. Full reconstruction of non-stationary strand-symmetric models on rooted phylogenies. J Theor Biol 2017; 420:144-151. [PMID: 28286217 DOI: 10.1016/j.jtbi.2017.03.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 03/06/2017] [Accepted: 03/08/2017] [Indexed: 10/20/2022]
Abstract
Understanding the evolutionary relationship among species is of fundamental importance to the biological sciences. The location of the root in any phylogenetic tree is critical as it gives an order to evolutionary events. None of the popular models of nucleotide evolution currently used in likelihood or Bayesian methods are able to infer the location of the root without exogenous information. It is known that the most general Markov models of nucleotide substitution also cannot identify the location of the root or be fitted to multiple sequence alignments with fewer than three sequences. We prove that the location of the root and the full model can be identified and statistically consistently estimated for a non-stationary, strand-symmetric substitution model given a multiple sequence alignment with two or more sequences. We also generalise earlier work to provide a practical means of overcoming the computationally intractable problem of labelling hidden states in a phylogenetic model.
Collapse
Affiliation(s)
- Benjamin D Kaehler
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia.
| |
Collapse
|
12
|
Maitip J, Trueman HE, Kaehler BD, Huttley GA, Chantawannakul P, Sutherland TD. Folding behavior of four silks of giant honey bee reflects the evolutionary conservation of aculeate silk proteins. Insect Biochem Mol Biol 2015; 59:72-79. [PMID: 25712559 DOI: 10.1016/j.ibmb.2015.02.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Revised: 02/12/2015] [Accepted: 02/13/2015] [Indexed: 06/04/2023]
Abstract
Multiple gene duplication events in the precursor of the Aculeata (bees, ants, hornets) gave rise to four silk genes. Whilst these homologs encode proteins with similar amino acid composition and coiled coil structure, the retention of all four homologs implies they each are important. In this study we identified, produced and characterized the four silk proteins from Apis dorsata, the giant Asian honeybee. The proteins were readily purified, allowing us to investigate the folding behavior of solutions of individual proteins in comparison to mixtures of all four proteins at concentrations where they assemble into their native coiled coil structure. In contrast to solutions of any one protein type, solutions of a mixture of the four proteins formed coiled coils that were stable against dilution and detergent denaturation. The results are consistent with the formation of a heteromeric coiled coil protein complex. The mechanism of silk protein coiled coil formation and evolution is discussed in light of these results.
Collapse
Affiliation(s)
- Jakkrawut Maitip
- Bee Protection Laboratory, Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Holly E Trueman
- CSIRO, (The Commonwealth Scientific and Industrial Research Organization), Food and Nutrition Flagship, Canberra, Australian Capital Territory, Australia
| | - Benjamin D Kaehler
- John Curtin School of Medical Research, Australian National University, Australian Capital Territory, Australia
| | - Gavin A Huttley
- John Curtin School of Medical Research, Australian National University, Australian Capital Territory, Australia
| | - Panuwan Chantawannakul
- Bee Protection Laboratory, Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand.
| | - Tara D Sutherland
- CSIRO, (The Commonwealth Scientific and Industrial Research Organization), Food and Nutrition Flagship, Canberra, Australian Capital Territory, Australia
| |
Collapse
|
13
|
Abstract
The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.
Collapse
Affiliation(s)
- Benjamin D Kaehler
- John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2600, Australia; and
| | - Von Bing Yap
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, 117546, Singapore
| | - Rongli Zhang
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, 117546, Singapore
| | - Gavin A Huttley
- John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2600, Australia; and
| |
Collapse
|