1
|
Buetas E, Jordán-López M, López-Roldán A, D'Auria G, Martínez-Priego L, De Marco G, Carda-Diéguez M, Mira A. Full-length 16S rRNA gene sequencing by PacBio improves taxonomic resolution in human microbiome samples. BMC Genomics 2024; 25:310. [PMID: 38528457 DOI: 10.1186/s12864-024-10213-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 03/11/2024] [Indexed: 03/27/2024] Open
Abstract
BACKGROUND Sequencing variable regions of the 16S rRNA gene (≃300 bp) with Illumina technology is commonly used to study the composition of human microbiota. Unfortunately, short reads are unable to differentiate between highly similar species. Considering that species from the same genus can be associated with health or disease it is important to identify them at the lowest possible taxonomic rank. Third-generation sequencing platforms such as PacBio SMRT, increase read lengths allowing to sequence the whole gene with the maximum taxonomic resolution. Despite its potential, full length 16S rRNA gene sequencing is not widely used yet. The aim of the current study was to compare the sequencing output and taxonomic annotation performance of the two approaches (Illumina short read sequencing and PacBio long read sequencing of 16S rRNA gene) in different human microbiome samples. DNA from saliva, oral biofilms (subgingival plaque) and faeces of 9 volunteers was isolated. Regions V3-V4 and V1-V9 were amplified and sequenced by Illumina Miseq and by PacBio Sequel II sequencers, respectively. RESULTS With both platforms, a similar percentage of reads was assigned to the genus level (94.79% and 95.06% respectively) but with PacBio a higher proportion of reads were further assigned to the species level (55.23% vs 74.14%). Regarding overall bacterial composition, samples clustered by niche and not by sequencing platform. In addition, all genera with > 0.1% abundance were detected in both platforms for all types of samples. Although some genera such as Streptococcus tended to be observed at higher frequency in PacBio than in Illumina (20.14% vs 14.12% in saliva, 10.63% vs 6.59% in subgingival plaque biofilm samples) none of the differences were statistically significant when correcting for multiple testing. CONCLUSIONS The results presented in the current manuscript suggest that samples sequenced using Illumina and PacBio are mostly comparable. Considering that PacBio reads were assigned at the species level with higher accuracy than Illumina, our data support the use of PacBio technology for future microbiome studies, although a higher cost is currently required to obtain an equivalent number of reads per sample.
Collapse
Affiliation(s)
- Elena Buetas
- Genomics & Health Department, FISABIO Foundation, Valencia, Spain
| | - Marta Jordán-López
- Department of Periodontics, Faculty of Medicine and Dentistry, University of Valencia, Valencia, Spain
| | - Andrés López-Roldán
- Department of Periodontics, Faculty of Medicine and Dentistry, University of Valencia, Valencia, Spain
| | - Giuseppe D'Auria
- Sequencing and Bioinformatics Service, Fundació Per Al Foment de La Investigació Sanitària I Biomèdica de La Comunitat Valenciana (FISABIO-Salut Pública), València, Spain
| | - Llucia Martínez-Priego
- Sequencing and Bioinformatics Service, Fundació Per Al Foment de La Investigació Sanitària I Biomèdica de La Comunitat Valenciana (FISABIO-Salut Pública), València, Spain
| | - Griselda De Marco
- Sequencing and Bioinformatics Service, Fundació Per Al Foment de La Investigació Sanitària I Biomèdica de La Comunitat Valenciana (FISABIO-Salut Pública), València, Spain
| | | | - Alex Mira
- Genomics & Health Department, FISABIO Foundation, Valencia, Spain
- CIBER Center for Epidemiology and Public Health, Madrid, Spain
| |
Collapse
|
2
|
Zheng H, Marçais G, Kingsford C. Creating and Using Minimizer Sketches in Computational Genomics. J Comput Biol 2023; 30:1251-1276. [PMID: 37646787 PMCID: PMC11082048 DOI: 10.1089/cmb.2023.0094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023] Open
Abstract
Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has fueled breakthroughs in genomics and related fields but has led to computational challenges processing large sequencing experiments. The minimizer sketch is a popular method for sequence sketching that underlies core steps in computational genomics such as read mapping, sequence assembling, k-mer counting, and more. In most applications, minimizer sketches are constructed using one of few classical approaches. More recently, efforts have been put into building minimizer sketches with desirable properties compared with the classical constructions. In this survey, we review the history of the minimizer sketch, the theories developed around the concept, and the plethora of applications taking advantage of such sketches. We aim to provide the readers a comprehensive picture of the research landscape involving minimizer sketches, in anticipation of better fusion of theory and application in the future.
Collapse
Affiliation(s)
- Hongyu Zheng
- Computer Science Department, Princeton University, Princeton, New Jersey, USA
| | - Guillaume Marçais
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
3
|
Runtuwene LR, Sathirapongsasuti N, Srisawat R, Komalamisra N, Tuda JSB, Mongan AE, Aboge GO, Shabardina V, Makalowski W, Nesti DR, Artama WT, Nguyen-Thi LA, Wan KL, Na BK, Hall W, Pain A, Eshita Y, Maeda R, Yamagishi J, Suzuki Y. Global research alliance in infectious disease: a collaborative effort to combat infectious diseases through dissemination of portable sequencing. BMC Res Notes 2022; 15:44. [PMID: 35151353 PMCID: PMC8840504 DOI: 10.1186/s13104-022-05927-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/26/2022] [Indexed: 11/25/2022] Open
Abstract
Objective To disseminate the portable sequencer MinION in developing countries for the main purpose of battling infectious diseases, we found a consortium called Global Research Alliance in Infectious Diseases (GRAID). By holding and inviting researchers both from developed and developing countries, we aim to train the participants with MinION’s operations and foster a collaboration in infectious diseases researches. As a real-life example in which resources are limited, we describe here a result from a training course, a metagenomics analysis from two blood samples collected from a routine cattle surveillance in Kulan Progo District, Yogyakarta Province, Indonesia in 2019. Results One of the samples was successfully sequenced with enough sequencing yield for further analysis. After depleting the reads mapped to host DNA, the remaining reads were shown to map to Theileria orientalis using BLAST and OneCodex. Although the reads were also mapped to Clostridium botulinum, those were found to be artifacts derived from the cow genome. An effort to construct a consensus sequence was successful using a reference-based approach with Pomoxis. Hence, we concluded that the asymptomatic cow might be infected with T. orientalis and showed the usefulness of sequencing technology, specifically the MinION platform, in a developing country. Supplementary Information The online version contains supplementary material available at 10.1186/s13104-022-05927-2.
Collapse
Affiliation(s)
- Lucky R Runtuwene
- Division 1, AIDS Research Center, National Institute of Infectious Diseases, Tokyo, Japan. .,Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan.
| | - Nuankanya Sathirapongsasuti
- Section of Translational Medicine, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Raweewan Srisawat
- Department of Medical Entomology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Narumon Komalamisra
- Department of Medical Entomology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Josef S B Tuda
- Faculty of Medicine, Sam Ratulangi University, Manado, Indonesia
| | - Arthur E Mongan
- Faculty of Medicine, Sam Ratulangi University, Manado, Indonesia
| | - Gabriel O Aboge
- Department of Public Health, Faculty of Veterinary Medicine, University of Nairobi, Nairobi, Kenya
| | - Victoria Shabardina
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Muenster, Germany
| | - Dela Ria Nesti
- Department of Bioresources Technology and Veterinary, Vocational College, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Wayan T Artama
- Department of Biochemistry, Faculty of Veterinary Medicine, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Lan Anh Nguyen-Thi
- Center of Biomedical Research, National Institute of Hygiene and Epidemiology, Hanoi, Vietnam
| | - Kiew-Lian Wan
- Department of Biological Sciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Byoung-Kuk Na
- Department of Parasitology and Tropical Medicine, College of Medicine, Gyeongsang National University, Jinju, South Korea
| | - William Hall
- Centre for Research in Infectious Diseases, University College Dublin, Dublin, Ireland
| | - Arnab Pain
- Pathogen Genomics Laboratory, Biological and Environmental Sciences and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.,Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
| | - Yuki Eshita
- Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
| | - Ryuichiro Maeda
- Division of Biomedical Science, Department of Basic Veterinary Medicine, Obihiro University of Agriculture and Veterinary, Obihiro, Japan
| | - Junya Yamagishi
- Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
| |
Collapse
|
4
|
Martin S, Leggett RM. Alvis: a tool for contig and read ALignment VISualisation and chimera detection. BMC Bioinformatics 2021; 22:124. [PMID: 33726674 DOI: 10.1186/s12859-021-04056-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 03/01/2021] [Indexed: 01/18/2023] Open
Abstract
Background The analysis of long reads or the assessment of assembly or target capture data often necessitates running alignments against reference genomes or gene sets. The aligner outputs are often parsed automatically by scripts, but many kinds of analysis can benefit from the understanding that can follow human inspection of individual alignments. Additionally, diagrams are a useful means of communicating assembly results to others. Results We developed Alvis, a simple command line tool that can generate visualisations for a number of common alignment analysis tasks. Alvis is a fast and portable tool that accepts input in a variety of alignment formats and will output production ready vector images. Additionally, Alvis will highlight potentially chimeric reads or contigs, a common source of misassemblies. Conclusion Alvis diagrams facilitate improved understanding of assembly quality, enable read coverage to be visualised and potential errors to be identified. Additionally, we found that splitting chimeric reads using the output provided by Alvis can improve the contiguity of assemblies, while maintaining correctness.
Collapse
|
5
|
Abstract
Read alignment is the central step of many analytic pipelines that perform variant calling. To reduce error, it is common practice to pre-process raw sequencing reads to remove low-quality bases and residual adapter contamination, a procedure collectively known as ‘trimming’. Trimming is widely assumed to increase the accuracy of variant calling, although there are relatively few systematic evaluations of its effects and no clear consensus on its efficacy. As sequencing datasets increase both in number and size, it is worthwhile reappraising computational operations of ambiguous benefit, particularly when the scope of many analyses now routinely incorporates thousands of samples, increasing the time and cost required. Using a curated set of 17 Gram-negative bacterial genomes, this study initially evaluated the impact of four read-trimming utilities (Atropos, fastp, Trim Galore and Trimmomatic), each used with a range of stringencies, on the accuracy and completeness of three bacterial SNP-calling pipelines. It was found that read trimming made only small, and statistically insignificant, increases in SNP-calling accuracy even when using the highest-performing pre-processor in this study, fastp. To extend these findings, >6500 publicly archived sequencing datasets from Escherichia coli, Mycobacterium tuberculosis and Staphylococcus aureus were re-analysed using a common analytic pipeline. Of the approximately 125 million SNPs and 1.25 million indels called across all samples, the same bases were called in 98.8 and 91.9 % of cases, respectively, irrespective of whether raw reads or trimmed reads were used. Nevertheless, the proportion of mixed calls (i.e. calls where <100 % of the reads support the variant allele; considered a proxy of false positives) was significantly reduced after trimming, which suggests that while trimming rarely alters the set of variant bases, it can affect the proportion of reads supporting each call. It was concluded that read quality- and adapter-trimming add relatively little value to a SNP-calling pipeline and may only be necessary if small differences in the absolute number of SNP calls, or the false call rate, are critical. Broadly similar conclusions can be drawn about the utility of trimming to an indel-calling pipeline. Read trimming remains routinely performed prior to variant calling likely out of concern that doing otherwise would typically have negative consequences. While historically this may have been the case, the data in this study suggests that read trimming is not always a practical necessity.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
6
|
Yu X, Sharma B, Gregory BD. The impact of epitranscriptomic marks on post-transcriptional regulation in plants. Brief Funct Genomics 2020; 20:113-124. [PMID: 33274735 DOI: 10.1093/bfgp/elaa021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 11/01/2020] [Accepted: 11/05/2020] [Indexed: 12/17/2022] Open
Abstract
Ribonucleotides within the various RNA molecules in eukaryotes are marked with more than 160 distinct covalent chemical modifications. These modifications include those that occur internally in messenger RNA (mRNA) molecules such as N6-methyladenosine (m6A) and 5-methylcytosine (m5C), as well as those that occur at the ends of the modified RNAs like the non-canonical 5' end nicotinamide adenine dinucleotide (NAD+) cap modification of specific mRNAs. Recent findings have revealed that covalent RNA modifications can impact the secondary structure, translatability, functionality, stability and degradation of the RNA molecules in which they are included. Many of these covalent RNA additions have also been found to be dynamically added and removed through writer and eraser complexes, respectively, providing a new layer of epitranscriptome-mediated post-transcriptional regulation that regulates RNA quality and quantity in eukaryotic transcriptomes. Thus, it is not surprising that the regulation of RNA fate mediated by these epitranscriptomic marks has been demonstrated to have widespread effects on plant development and the responses of these organisms to abiotic and biotic stresses. In this review, we highlight recent progress focused on the study of the dynamic nature of these epitranscriptome marks and their roles in post-transcriptional regulation during plant development and response to environmental cues, with an emphasis on the mRNA modifications of non-canonical 5' end NAD+ capping, m6A and several other internal RNA modifications.
Collapse
Affiliation(s)
- Xiang Yu
- Research Associate in the lab of Dr Brian D. Gregory
| | | | - Brian D Gregory
- Associate Professor and a Graduate Chair in the Department of Biology at the University of Pennsylvania
| |
Collapse
|
7
|
Meyer N, Janot JM, Lepoitevin M, Smietana M, Vasseur JJ, Torrent J, Balme S. Machine Learning to Improve the Sensing of Biomolecules by Conical Track-Etched Nanopore. Biosensors (Basel) 2020; 10:bios10100140. [PMID: 33028025 PMCID: PMC7601669 DOI: 10.3390/bios10100140] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 09/26/2020] [Accepted: 09/30/2020] [Indexed: 12/23/2022]
Abstract
Single nanopore is a powerful platform to detect, discriminate and identify biomacromolecules. Among the different devices, the conical nanopores obtained by the track-etched technique on a polymer film are stable and easy to functionalize. However, these advantages are hampered by their high aspect ratio that avoids the discrimination of similar samples. Using machine learning, we demonstrate an improved resolution so that it can identify short single- and double-stranded DNA (10- and 40-mers). We have characterized each current blockade event by the relative intensity, dwell time, surface area and both the right and left slope. We show an overlap of the relative current blockade amplitudes and dwell time distributions that prevents their identification. We define the different parameters that characterize the events as features and the type of DNA sample as the target. By applying support-vector machines to discriminate each sample, we show accuracy between 50% and 72% by using two features that distinctly classify the data points. Finally, we achieved an increased accuracy (up to 82%) when five features were implemented.
Collapse
Affiliation(s)
- Nathan Meyer
- Institut Européen des Membranes, UMR5635, UM, ENSCM, CNRS, 34095 Montpellier, France; (N.M.); (J.-M.J.)
- Mécanismes Moléculaires dans les Démences Neurodégénératives, U1198, UM, EPHE, INSERM, 34095 Montpellier, France;
| | - Jean-Marc Janot
- Institut Européen des Membranes, UMR5635, UM, ENSCM, CNRS, 34095 Montpellier, France; (N.M.); (J.-M.J.)
| | - Mathilde Lepoitevin
- Institut des Matériaux Poreux de Paris UMR8004, CNRS, ENS, ESPCI, 75005 Paris, France;
| | - Michaël Smietana
- Institut des Biomolécules Max Mousseron, Université de Montpellier, CNRS, ENSCM, 34095 Montpellier, France; (M.S.); (J.-J.V.)
| | - Jean-Jacques Vasseur
- Institut des Biomolécules Max Mousseron, Université de Montpellier, CNRS, ENSCM, 34095 Montpellier, France; (M.S.); (J.-J.V.)
| | - Joan Torrent
- Mécanismes Moléculaires dans les Démences Neurodégénératives, U1198, UM, EPHE, INSERM, 34095 Montpellier, France;
| | - Sébastien Balme
- Institut Européen des Membranes, UMR5635, UM, ENSCM, CNRS, 34095 Montpellier, France; (N.M.); (J.-M.J.)
- Correspondence:
| |
Collapse
|
8
|
Kraft F, Kurth I. Long-read sequencing to understand genome biology and cell function. Int J Biochem Cell Biol 2020; 126:105799. [PMID: 32629027 DOI: 10.1016/j.biocel.2020.105799] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 06/29/2020] [Accepted: 07/02/2020] [Indexed: 02/08/2023]
Abstract
Determining the sequence of DNA and RNA molecules has a huge impact on the understanding of cell biology and function. Recent advancements in next-generation short-read sequencing (NGS) technologies, drops in cost and a resolution down to the single-cell level shaped our current view on genome structure and function. Third-generation sequencing (TGS) methods further complete the knowledge about these processes based on long reads and the ability to analyze DNA or RNA at single molecule level. Long-read sequencing provides additional possibilities to study genome architecture and the composition of highly complex regions and to determine epigenetic modifications of nucleotide bases at a genome-wide level. We discuss the principles and advancements of long-read sequencing and its applications in genome biology.
Collapse
Affiliation(s)
- Florian Kraft
- Institute of Human Genetics, Medical Faculty, RWTH Aachen University, Aachen, Germany.
| | - Ingo Kurth
- Institute of Human Genetics, Medical Faculty, RWTH Aachen University, Aachen, Germany.
| |
Collapse
|