1
|
Acuña-Amador L, Barloy-Hubler F. In silico analysis of Ffp1, an ancestral Porphyromonas spp. fimbrillin, shows differences with Fim and Mfa. Access Microbiol 2024; 6:000771.v3. [PMID: 39130734 PMCID: PMC11316588 DOI: 10.1099/acmi.0.000771.v3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 05/08/2024] [Indexed: 08/13/2024] Open
Abstract
Background. Scant information is available regarding fimbrillins within the genus Porphyromonas, with the notable exception of those belonging to Porphyromonas gingivalis, which have been extensively researched for several years. Besides fim and mfa, a third P. gingivalis adhesin called filament-forming protein 1 (Ffp1) has recently been described and seems to be pivotal for outer membrane vesicle (OMV) production. Objective. We aimed to investigate the distribution and diversity of type V fimbrillin, particularly Ffp1, in the genus Porphyromonas. Methods. A bioinformatics phylogenomic analysis was conducted using all accessible Porphyromonas genomes to generate a domain search for fimbriae, using hidden Markov model profiles. Results. Ffp1 was identified as the sole fimbrillin present in all analysed genomes. After manual verification (i.e. biocuration) of both structural and functional annotations and 3D modelling, this protein was determined to be a type V fimbrillin, with a closer structural resemblance to a Bacteroides ovatus fimbrillin than to FimA or Mfa1 from P. gingivalis. Conclusion. It appears that Ffp1 is an ancestral fimbria, transmitted through vertical inheritance and present across all Porphyromonas species. Additional investigations are necessary to elucidate the biogenesis of Ffp1 fimbriae and their potential role in OMV production and niche adaptation.
Collapse
Affiliation(s)
- Luis Acuña-Amador
- Laboratorio de Investigación en Bacteriología Anaerobia, Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - Frederique Barloy-Hubler
- Université de Rennes 1, CNRS, UMR 6553 ECOBIO (Écosystèmes, Biodiversité, Évolution), 35042 Rennes, France
| |
Collapse
|
2
|
Hess MK, Hodgkinson HE, Hess AS, Zetouni L, Budel JCC, Henry H, Donaldson A, Bilton TP, van Stijn TC, Kirk MR, Dodds KG, Brauning R, McCulloch AF, Hickey SM, Johnson PL, Jonker A, Morton N, Hendy S, Oddy VH, Janssen PH, McEwan JC, Rowe SJ. Large-scale analysis of sheep rumen metagenome profiles captured by reduced representation sequencing reveals individual profiles are influenced by the environment and genetics of the host. BMC Genomics 2023; 24:551. [PMID: 37723422 PMCID: PMC10506323 DOI: 10.1186/s12864-023-09660-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 09/07/2023] [Indexed: 09/20/2023] Open
Abstract
BACKGROUND Producing animal protein while reducing the animal's impact on the environment, e.g., through improved feed efficiency and lowered methane emissions, has gained interest in recent years. Genetic selection is one possible path to reduce the environmental impact of livestock production, but these traits are difficult and expensive to measure on many animals. The rumen microbiome may serve as a proxy for these traits due to its role in feed digestion. Restriction enzyme-reduced representation sequencing (RE-RRS) is a high-throughput and cost-effective approach to rumen metagenome profiling, but the systematic (e.g., sequencing) and biological factors influencing the resulting reference based (RB) and reference free (RF) profiles need to be explored before widespread industry adoption is possible. RESULTS Metagenome profiles were generated by RE-RRS of 4,479 rumen samples collected from 1,708 sheep, and assigned to eight groups based on diet, age, time off feed, and country (New Zealand or Australia) at the time of sample collection. Systematic effects were found to have minimal influence on metagenome profiles. Diet was a major driver of differences between samples, followed by time off feed, then age of the sheep. The RF approach resulted in more reads being assigned per sample and afforded greater resolution when distinguishing between groups than the RB approach. Normalizing relative abundances within the sampling Cohort abolished structures related to age, diet, and time off feed, allowing a clear signal based on methane emissions to be elucidated. Genus-level abundances of rumen microbes showed low-to-moderate heritability and repeatability and were consistent between diets. CONCLUSIONS Variation in rumen metagenomic profiles was influenced by diet, age, time off feed and genetics. Not accounting for environmental factors may limit the ability to associate the profile with traits of interest. However, these differences can be accounted for by adjusting for Cohort effects, revealing robust biological signals. The abundances of some genera were consistently heritable and repeatable across different environments, suggesting that metagenomic profiles could be used to predict an individual's future performance, or performance of its offspring, in a range of environments. These results highlight the potential of using rumen metagenomic profiles for selection purposes in a practical, agricultural setting.
Collapse
Affiliation(s)
- Melanie K Hess
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand.
| | - Hannah E Hodgkinson
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| | - Andrew S Hess
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
- Agriculture, Veterinary & Rangeland Sciences, University of Nevada-Reno, 1664 N. Virginia St. Mail stop 202, Reno, NV, 89557, USA
| | - Larissa Zetouni
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
- Wageningen University & Research, P.O. Box 338, 6700, AH, Wageningen, The Netherlands
| | - Juliana C C Budel
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
- Graduate Program in Animal Science, Universidade Federal do Pará (UFPa), Castanhal, Brazil
| | - Hannah Henry
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| | - Alistair Donaldson
- NSW Department of Primary Industries, University of New England, Armidale, 2351, Australia
| | - Timothy P Bilton
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| | - Tracey C van Stijn
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| | - Michelle R Kirk
- AgResearch Ltd., Grasslands Research Centre, Private Bag 11,008, Palmerston North, 4410, New Zealand
| | - Ken G Dodds
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| | - Rudiger Brauning
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| | - Alan F McCulloch
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| | - Sharon M Hickey
- AgResearch Ltd., Ruakura Research Centre, Private Bag 3115, Hamilton, 3214, New Zealand
| | - Patricia L Johnson
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| | - Arjan Jonker
- AgResearch Ltd., Grasslands Research Centre, Private Bag 11,008, Palmerston North, 4410, New Zealand
| | - Nickolas Morton
- Te Pūnaha Matatini, University of Auckland, Auckland, 1010, New Zealand
| | - Shaun Hendy
- Te Pūnaha Matatini, University of Auckland, Auckland, 1010, New Zealand
| | - V Hutton Oddy
- NSW Department of Primary Industries, University of New England, Armidale, 2351, Australia
| | - Peter H Janssen
- AgResearch Ltd., Grasslands Research Centre, Private Bag 11,008, Palmerston North, 4410, New Zealand
| | - John C McEwan
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| | - Suzanne J Rowe
- AgResearch Ltd., Invermay Agricultural Centre, Private Bag 50034, Mosgiel, 9053, New Zealand
| |
Collapse
|
3
|
Odom AR, Faits T, Castro-Nallar E, Crandall KA, Johnson WE. Metagenomic profiling pipelines improve taxonomic classification for 16S amplicon sequencing data. Sci Rep 2023; 13:13957. [PMID: 37633998 PMCID: PMC10460424 DOI: 10.1038/s41598-023-40799-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 08/16/2023] [Indexed: 08/28/2023] Open
Abstract
Most experiments studying bacterial microbiomes rely on the PCR amplification of all or part of the gene for the 16S rRNA subunit, which serves as a biomarker for identifying and quantifying the various taxa present in a microbiome sample. Several computational methods exist for analyzing 16S amplicon sequencing. However, the most-used bioinformatics tools cannot produce high quality genus-level or species-level taxonomic calls and may underestimate the potential accuracy of these calls. We used 16S sequencing data from mock bacterial communities to evaluate the sensitivity and specificity of several bioinformatics pipelines and genomic reference libraries used for microbiome analyses, concentrating on measuring the accuracy of species-level taxonomic assignments of 16S amplicon reads. We evaluated the tools DADA2, QIIME 2, Mothur, PathoScope 2, and Kraken 2 in conjunction with reference libraries from Greengenes, SILVA, Kraken 2, and RefSeq. Profiling tools were compared using publicly available mock community data from several sources, comprising 136 samples with varied species richness and evenness, several different amplified regions within the 16S rRNA gene, and both DNA spike-ins and cDNA from collections of plated cells. PathoScope 2 and Kraken 2, both tools designed for whole-genome metagenomics, outperformed DADA2, QIIME 2 using the DADA2 plugin, and Mothur, which are theoretically specialized for 16S analyses. Evaluations of reference libraries identified the SILVA and RefSeq/Kraken 2 Standard libraries as superior in accuracy compared to Greengenes. These findings support PathoScope and Kraken 2 as fully capable, competitive options for genus- and species-level 16S amplicon sequencing data analysis, whole genome sequencing, and metagenomics data tools.
Collapse
Affiliation(s)
- Aubrey R Odom
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - Tyler Faits
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - Eduardo Castro-Nallar
- Departamento de Microbiología, Facultad de Ciencias de la Salud, Universidad de Talca, Campus Talca, Avda. Lircay S/N, Talca, Chile
- Centro de Ecología Integrativa, Universidad de Talca, Campus Talca, Avda. Lircay S/N, Talca, Chile
| | - Keith A Crandall
- Department of Biostatistics & Bioinformatics, Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA
| | - W Evan Johnson
- Division of Infectious Disease, Center for Data Science, Rutgers University - New Jersey Medical School, Newark, NJ, USA.
| |
Collapse
|
4
|
Rosenboom I, Scheithauer T, Friedrich FC, Pörtner S, Hollstein L, Pust MM, Sifakis K, Wehrbein T, Rosenhahn B, Wiehlmann L, Chhatwal P, Tümmler B, Davenport CF. Wochenende - modular and flexible alignment-based shotgun metagenome analysis. BMC Genomics 2022; 23:748. [PMID: 36368923 PMCID: PMC9650795 DOI: 10.1186/s12864-022-08985-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 11/02/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Shotgun metagenome analysis provides a robust and verifiable method for comprehensive microbiome analysis of fungal, viral, archaeal and bacterial taxonomy, particularly with regard to visualization of read mapping location, normalization options, growth dynamics and functional gene repertoires. Current read classification tools use non-standard output formats, or do not fully show information on mapping location. As reference datasets are not perfect, portrayal of mapping information is critical for judging results effectively. RESULTS Our alignment-based pipeline, Wochenende, incorporates flexible quality control, trimming, mapping, various filters and normalization. Results are completely transparent and filters can be adjusted by the user. We observe stringent filtering of mismatches and use of mapping quality sharply reduces the number of false positives. Further modules allow genomic visualization and the calculation of growth rates, as well as integration and subsequent plotting of pipeline results as heatmaps or heat trees. Our novel normalization approach additionally allows calculation of absolute abundance profiles by comparison with reads assigned to the human host genome. CONCLUSION Wochenende has the ability to find and filter alignments to all kingdoms of life using both short and long reads, and requires only good quality reference genomes. Wochenende automatically combines multiple available modules ranging from quality control and normalization to taxonomic visualization. Wochenende is available at https://github.com/MHH-RCUG/nf_wochenende .
Collapse
Affiliation(s)
- Ilona Rosenboom
- Clinical Research Group Molecular Pathology of Cystic Fibrosis and Pseudomonas Genomics, Clinic for Pediatric Pneumology, Allergology and Neonatology, Hannover Medical School, Hannover, Germany.
| | | | | | - Sophia Pörtner
- Research Core Unit Genomics, Hannover Medical School, Hannover, Germany
| | - Lisa Hollstein
- Research Core Unit Genomics, Hannover Medical School, Hannover, Germany
| | - Marie-Madlen Pust
- Clinical Research Group Molecular Pathology of Cystic Fibrosis and Pseudomonas Genomics, Clinic for Pediatric Pneumology, Allergology and Neonatology, Hannover Medical School, Hannover, Germany
| | | | - Tom Wehrbein
- Institut Fuer Informationsverarbeitung (TNT), Leibniz University Hannover, Hannover, Germany
| | - Bodo Rosenhahn
- Institut Fuer Informationsverarbeitung (TNT), Leibniz University Hannover, Hannover, Germany
| | - Lutz Wiehlmann
- Research Core Unit Genomics, Hannover Medical School, Hannover, Germany
| | - Patrick Chhatwal
- Department of Microbiology, Hannover Medical School, Hannover, Germany
| | - Burkhard Tümmler
- Clinical Research Group Molecular Pathology of Cystic Fibrosis and Pseudomonas Genomics, Clinic for Pediatric Pneumology, Allergology and Neonatology, Hannover Medical School, Hannover, Germany
| | - Colin F Davenport
- Research Core Unit Genomics, Hannover Medical School, Hannover, Germany
| |
Collapse
|
5
|
Daumann LJ, Pol A, Op den Camp HJM, Martinez-Gomez NC. A perspective on the role of lanthanides in biology: Discovery, open questions and possible applications. Adv Microb Physiol 2022; 81:1-24. [PMID: 36167440 DOI: 10.1016/bs.ampbs.2022.06.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Because of their use in high technologies like computers, smartphones and renewable energy applications, lanthanides (belonging to the group of rare earth elements) are essential for our daily lives. A range of applications in medicine and biochemical research made use of their photo-physical properties. The discovery of a biological role for lanthanides has boosted research in this new field. Several methanotrophs and methylotrophs are strictly dependent on the presence of lanthanides in the growth medium while others show a regulatory response. After the first demonstration of a lanthanide in the active site of the XoxF-type pyrroloquinoline quinone methanol dehydrogenases, follow-up studies showed the same for other pyrroloquinoline quinone-containing enzymes. In addition, research focused on the effect of lanthanides on regulation of gene expression and uptake mechanism into bacterial cells. This review briefly describes the discovery of the role of lanthanides in biology and focuses on open questions in biological lanthanide research and possible application of lanthanide-containing bacteria and enzymes in recovery of these special elements.
Collapse
Affiliation(s)
- Lena J Daumann
- Department of Chemistry, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Arjan Pol
- Department of Microbiology, RIBES, Radboud University, Nijmegen, The Netherlands
| | - Huub J M Op den Camp
- Department of Microbiology, RIBES, Radboud University, Nijmegen, The Netherlands.
| | - N Cecilia Martinez-Gomez
- Department of Plant and Microbial Biology, University of California, Berkeley, California, United States
| |
Collapse
|
6
|
Cremers G, Jetten MSM, Op den Camp HJM, Lücker S. Metascan: METabolic Analysis, SCreening and ANnotation of Metagenomes. FRONTIERS IN BIOINFORMATICS 2022; 2:861505. [PMID: 36304333 PMCID: PMC9580885 DOI: 10.3389/fbinf.2022.861505] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/30/2022] [Indexed: 12/03/2022] Open
Abstract
Large scale next generation metagenomic sequencing of complex environmental samples paves the way for detailed analysis of nutrient cycles in ecosystems. For such an analysis, large scale unequivocal annotation is a prerequisite, which however is increasingly hampered by growing databases and analysis time. Hereto, we created a hidden Markov model (HMM) database by clustering proteins according to their KEGG indexing. HMM profiles for key genes of specific metabolic pathways and nutrient cycles were organized in subsets to be able to analyze each important elemental cycle separately. An important motivation behind the clustered database was to enable a high degree of resolution for annotation, while decreasing database size and analysis time. Here, we present Metascan, a new tool that can fully annotate and analyze deeply sequenced samples with an average analysis time of 11 min per genome for a publicly available dataset containing 2,537 genomes, and 1.1 min per genome for nutrient cycle analysis of the same sample. Metascan easily detected general proteins like cytochromes and ferredoxins, and additional pmoCAB operons were identified that were overlooked in previous analyses. For a mock community, the BEACON (F1) score was 0.72–0.93 compared to the information in NCBI GenBank. In combination with the accompanying database, Metascan provides a fast and useful annotation and analysis tool, as demonstrated by our proof-of-principle analysis of a complex mock community metagenome.
Collapse
|
7
|
Yarlagadda K, Zachwieja AJ, de Flamingh A, Phungviwatnikul T, Rivera-Colón AG, Roseman C, Shackelford L, Swanson KS, Malhi RS. Geographically diverse canid sampling provides novel insights into pre-industrial microbiomes. Proc Biol Sci 2022; 289:20220052. [PMID: 35506233 PMCID: PMC9065982 DOI: 10.1098/rspb.2022.0052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Canine microbiome studies are often limited in the geographic and temporal scope of samples studied. This results in a paucity of data on the canine microbiome around the world, especially in contexts where dogs may not be pets or human associated. Here, we present the shotgun sequences of fecal microbiomes of pet dogs from South Africa, shelter and stray dogs from India, and stray village dogs in Laos. We additionally performed a dietary experiment with dogs housed in a veterinary medical school, attempting to replicate the diet of the sampled dogs from Laos. We analyse the taxonomic diversity in these populations and identify the underlying functional redundancy of these microbiomes. Our results show that diet alone is not sufficient to recapitulate the higher diversity seen in the microbiome of dogs from Laos. Comparisons to previous studies and ancient dog fecal microbiomes highlight the need for greater population diversity in studies of canine microbiomes, as modern analogues can provide better comparisons to ancient microbiomes. We identify trends in microbial diversity and industrialization in dogs that mirror results of human studies, suggesting future research can make use of these companion animals as substitutes for humans in studying the effects of industrialization on the microbiome.
Collapse
Affiliation(s)
- K Yarlagadda
- Department of Anthropology, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - A J Zachwieja
- Department of Biomedical Sciences, University of Minnesota Medical School Duluth, Duluth, Minnesota, USA
| | - A de Flamingh
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - T Phungviwatnikul
- Department of Animal Sciences, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - A G Rivera-Colón
- Department of Evolution, Ecology, and Behavior, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - C Roseman
- School of Integrative Biology, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - L Shackelford
- Department of Anthropology, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - K S Swanson
- Department of Animal Sciences, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - R S Malhi
- Department of Anthropology, University of Illinois Urbana-Champaign, Urbana, IL, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, USA.,Department of Evolution, Ecology, and Behavior, University of Illinois Urbana-Champaign, Urbana, IL, USA.,School of Integrative Biology, University of Illinois Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
8
|
Zafeiropoulos H, Gargan L, Hintikka S, Pavloudi C, Carlsson J. The Dark mAtteR iNvestigator (DARN) tool: getting to know the known unknowns in COI amplicon data. METABARCODING AND METAGENOMICS 2021. [DOI: 10.3897/mbmg.5.69657] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The mitochondrial cytochrome C oxidase subunit I gene (COI) is commonly used in environmental DNA (eDNA) metabarcoding studies, especially for assessing metazoan diversity. Yet, a great number of COI operational taxonomic units (OTUs) or/and amplicon sequence variants (ASVs) retrieved from such studies do not get a taxonomic assignment with a reference sequence. To assess and investigate such sequences, we have developed the Dark mAtteR iNvestigator (DARN) software tool. For this purpose, a reference COI-oriented phylogenetic tree was built from 1,593 consensus sequences covering all the three domains of life. With respect to eukaryotes, consensus sequences at the family level were constructed from 183,330 sequences retrieved from the Midori reference 2 database, which represented 70% of the initial number of reference sequences. Similarly, sequences from 431 bacterial and 15 archaeal taxa at the family level (29% and 1% of the initial number of reference sequences respectively) were retrieved from the BOLD and the PFam databases. DARN makes use of this phylogenetic tree to investigate COI pre-processed sequences of amplicon samples to provide both a tabular and a graphical overview of their phylogenetic assignments. To evaluate DARN, both environmental and bulk metabarcoding samples from different aquatic environments using various primer sets were analysed. We demonstrate that a large proportion of non-target prokaryotic organisms, such as bacteria and archaea, are also amplified in eDNA samples and we suggest prokaryotic COI sequences to be included in the reference databases used for the taxonomy assignment to allow for further analyses of dark matter. DARN source code is available on GitHub at https://github.com/hariszaf/darn and as a Docker image at https://hub.docker.com/r/hariszaf/darn.
Collapse
|
9
|
Wang Y, Yuan H, Huang J, Li C. Inline index helped in cleaning up data contamination generated during library preparation and the subsequent steps. Mol Biol Rep 2021; 49:385-392. [PMID: 34716505 DOI: 10.1007/s11033-021-06884-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 09/23/2021] [Indexed: 11/24/2022]
Abstract
BACKGROUND High-throughput sequencing involves library preparation and amplification steps, which may induce contamination across samples or between samples and the environment. METHODS We tested the effect of applying an inline-index strategy, in which DNA indices of 6 bp were added to both ends of the inserts at the ligation step of library prep for resolving the data contamination problem. RESULTS Our results showed that the contamination ranged from 0.29 to 1.25% in one experiment and from 0.83 to 27.01% in the other. We also found that contamination could be environmental or from reagents besides cross-contamination between samples. CONCLUSIONS Inline-index method is a useful experimental design to clean up the data and address the contamination problem which has been plaguing high-throughput sequencing data in many applications.
Collapse
Affiliation(s)
- Ying Wang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai Ocean University, Shanghai, 201306, China.,Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai Ocean University, Shanghai, 201306, China
| | - Hao Yuan
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai Ocean University, Shanghai, 201306, China.,Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai Ocean University, Shanghai, 201306, China
| | - Junman Huang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai Ocean University, Shanghai, 201306, China.,Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai Ocean University, Shanghai, 201306, China
| | - Chenhong Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai Ocean University, Shanghai, 201306, China. .,Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai Ocean University, Shanghai, 201306, China.
| |
Collapse
|
10
|
Steinegger M, Salzberg SL. Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol 2020; 21:115. [PMID: 32398145 PMCID: PMC7218494 DOI: 10.1186/s13059-020-02023-1] [Citation(s) in RCA: 126] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 04/16/2020] [Indexed: 12/20/2022] Open
Abstract
Genomic analyses are sensitive to contamination in public databases caused by incorrectly labeled reference sequences. Here, we describe Conterminator, an efficient method to detect and remove incorrectly labeled sequences by an exhaustive all-against-all sequence comparison. Our analysis reports contamination of 2,161,746, 114,035, and 14,148 sequences in the RefSeq, GenBank, and NR databases, respectively, spanning the whole range from draft to “complete” model organism genomes. Our method scales linearly with input size and can process 3.3 TB in 12 days on a 32-core computer. Conterminator can help ensure the quality of reference databases. Source code (GPLv3): https://github.com/martin-steinegger/conterminator
Collapse
Affiliation(s)
- Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, 08826, South Korea. .,Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, 21218, Maryland, USA. .,Institute of Molecular Biology and Genetics, Seoul National University, Seoul, 08826, South Korea.
| | - Steven L Salzberg
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, 21218, Maryland, USA.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21218, Maryland, USA.,Departments of Computer Science and Biostatistics, Johns Hopkins University, Baltimore, 21218, Maryland, USA
| |
Collapse
|