1
|
Sirasani JP, Gardner C, Jung G, Lee H, Ahn TH. Bioinformatic approaches to blood and tissue microbiome analyses: challenges and perspectives. Brief Bioinform 2025; 26:bbaf176. [PMID: 40269515 PMCID: PMC12018304 DOI: 10.1093/bib/bbaf176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 03/05/2025] [Accepted: 03/25/2025] [Indexed: 04/25/2025] Open
Abstract
Advances in next-generation sequencing have resulted in a growing understanding of the microbiome and its role in human health. Unlike traditional microbiome analysis, blood and tissue microbiome analyses focus on the detection and characterization of microbial DNA in blood and tissue, previously considered a sterile environment. In this review, we discuss the challenges and methodologies associated with analyzing these samples, particularly emphasizing blood and tissue microbiome research. Key preprocessing steps-including the removal of ribosomal RNA, host DNA, and other contaminants-are critical to reducing noise and accurately capturing microbial evidence. We also explore how taxonomic profiling tools, machine learning, and advanced normalization techniques address contamination and low microbial biomass, thereby improving reliability. While it offers the potential for identifying microbial involvement in systemic diseases previously undetectable by traditional methods, this methodology also carries risks and lacks universal acceptance due to concerns over reliability and interpretation errors. This paper critically reviews these factors, highlighting both the promise and pitfalls of using blood and tissue microbiome analyses as a tool for biomarker discovery.
Collapse
Affiliation(s)
- Jammi Prasanthi Sirasani
- Program of Bioinformatics and Computational Biology, Saint Louis University, St. Louis, MO, United States
| | - Cory Gardner
- Department of Computer Science, Saint Louis University, St. Louis, MO, United States
| | - Gihwan Jung
- Department of Computer Science, Saint Louis University, St. Louis, MO, United States
| | - Hyunju Lee
- AI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| | - Tae-Hyuk Ahn
- Program of Bioinformatics and Computational Biology, Saint Louis University, St. Louis, MO, United States
- Department of Computer Science, Saint Louis University, St. Louis, MO, United States
| |
Collapse
|
2
|
Ramos Lopez D, Flores FJ, Espindola AS. MeStanG-Resource for High-Throughput Sequencing Standard Data Sets Generation for Bioinformatic Methods Evaluation and Validation. BIOLOGY 2025; 14:69. [PMID: 39857299 PMCID: PMC11762867 DOI: 10.3390/biology14010069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 01/10/2025] [Accepted: 01/11/2025] [Indexed: 01/27/2025]
Abstract
Metagenomics analysis has enabled the measurement of the microbiome diversity in environmental samples without prior targeted enrichment. Functional and phylogenetic studies based on microbial diversity retrieved using HTS platforms have advanced from detecting known organisms and discovering unknown species to applications in disease diagnostics. Robust validation processes are essential for test reliability, requiring standard samples and databases deriving from real samples and in silico generated artificial controls. We propose a MeStanG as a resource for generating HTS Nanopore data sets to evaluate present and emerging bioinformatics pipelines. MeStanG allows samples to be designed with user-defined organism abundances expressed as number of reads, reference sequences, and predetermined or custom errors by sequencing profiles. The simulator pipeline was evaluated by analyzing its output mock metagenomic samples containing known read abundances using read mapping, genome assembly, and taxonomic classification on three scenarios: a bacterial community composed of nine different organisms, samples resembling pathogen-infected wheat plants, and a viral pathogen serial dilution sampling. The evaluation was able to report consistently the same organisms, and their read abundances as provided in the mock metagenomic sample design. Based on this performance and its novel capacity of generating exact number of reads, MeStanG can be used by scientists to develop mock metagenomic samples (artificial HTS data sets) to assess the diagnostic performance metrics of bioinformatic pipelines, allowing the user to choose predetermined or customized models for research and training.
Collapse
Affiliation(s)
- Daniel Ramos Lopez
- Institute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK 74078, USA;
- Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK 74078, USA
| | - Francisco J. Flores
- Departamento de Ciencias de la Vida y la Agricultura, Universidad de las Fuerzas Armadas-ESPE, Sangolquí 171103, Ecuador;
- Centro de Investigación de Alimentos, CIAL, Facultad de Ciencias de la Ingeniería e Industrias, Universidad UTE, Quito 170527, Ecuador
| | - Andres S. Espindola
- Institute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK 74078, USA;
- Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK 74078, USA
| |
Collapse
|
3
|
Defazio G, Tangaro MA, Pesole G, Fosso B. kMetaShot: a fast and reliable taxonomy classifier for metagenome-assembled genomes. Brief Bioinform 2024; 26:bbae680. [PMID: 39749666 PMCID: PMC11695915 DOI: 10.1093/bib/bbae680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 11/25/2024] [Accepted: 12/20/2024] [Indexed: 01/04/2025] Open
Abstract
The advent of high-throughput sequencing (HTS) technologies unlocked the complexity of the microbial world through the development of metagenomics, which now provides an unprecedented and comprehensive overview of its taxonomic and functional contribution in a huge variety of macro- and micro-ecosystems. In particular, shotgun metagenomics allows the reconstruction of microbial genomes, through the assembly of reads into MAGs (metagenome-assembled genomes). In fact, MAGs represent an information-rich proxy for inferring the taxonomic composition and the functional contribution of microbiomes, even if the relevant analytical approaches are not trivial and still improvable. In this regard, tools like CAMITAX and GTDBtk have implemented complex approaches, relying on marker gene identification and sequence alignments, requiring a large processing time. With the aim of deploying an effective tool for fast and reliable MAG taxonomic classification, we present here kMetaShot, a taxonomy classifier based on k-mer/minimizer counting. We benchmarked kMetaShot against CAMITAX and GTDBtk by using both in silico and real mock communities and demonstrated how, while implementing a fast and concise algorithm, it outperforms the other tools in terms of classification accuracy. Additionally, kMetaShot is an easy-to-install and easy-to-use bioinformatic tool that is also suitable for researchers with few command-line skills. It is available and documented at https://github.com/gdefazio/kMetaShot.
Collapse
Affiliation(s)
- Giuseppe Defazio
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Via E. Orabona 4, 70126, Bari, Italy
| | - Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70125, Bari, Italy
| | - Graziano Pesole
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Via E. Orabona 4, 70126, Bari, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70125, Bari, Italy
- Consorzio Interuniversitario Biotecnologie, BIC Incubatori, Via Flavia 23/1, 34148, Trieste, Italy
| | - Bruno Fosso
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Via E. Orabona 4, 70126, Bari, Italy
| |
Collapse
|
4
|
Zhang H, Fu L, Leiliang X, Qu C, Wu W, Wen R, Huang N, He Q, Cheng Q, Liu G, Cheng Y. Beyond the Gut: The intratumoral microbiome's influence on tumorigenesis and treatment response. Cancer Commun (Lond) 2024; 44:1130-1167. [PMID: 39087354 PMCID: PMC11483591 DOI: 10.1002/cac2.12597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 06/25/2024] [Accepted: 07/13/2024] [Indexed: 08/02/2024] Open
Abstract
The intratumoral microbiome (TM) refers to the microorganisms in the tumor tissues, including bacteria, fungi, viruses, and so on, and is distinct from the gut microbiome and circulating microbiota. TM is strongly associated with tumorigenesis, progression, metastasis, and response to therapy. This paper highlights the current status of TM. Tract sources, adjacent normal tissue, circulatory system, and concomitant tumor co-metastasis are the main origin of TM. The advanced techniques in TM analysis are comprehensively summarized. Besides, TM is involved in tumor progression through several mechanisms, including DNA damage, activation of oncogenic signaling pathways (phosphoinositide 3-kinase [PI3K], signal transducer and activator of transcription [STAT], WNT/β-catenin, and extracellular regulated protein kinases [ERK]), influence of cytokines and induce inflammatory responses, and interaction with the tumor microenvironment (anti-tumor immunity, pro-tumor immunity, and microbial-derived metabolites). Moreover, promising directions of TM in tumor therapy include immunotherapy, chemotherapy, radiotherapy, the application of probiotics/prebiotics/synbiotics, fecal microbiome transplantation, engineered microbiota, phage therapy, and oncolytic virus therapy. The inherent challenges of clinical application are also summarized. This review provides a comprehensive landscape for analyzing TM, especially the TM-related mechanisms and TM-based treatment in cancer.
Collapse
Affiliation(s)
- Hao Zhang
- Department of NeurosurgeryThe Second Affiliated HospitalChongqing Medical UniversityChongqingP. R. China
| | - Li Fu
- Department of NeurosurgeryThe Second Affiliated HospitalChongqing Medical UniversityChongqingP. R. China
- Department of GastroenterologyThe Second Affiliated HospitalChongqing Medical UniversityChongqingP. R. China
| | - Xinwen Leiliang
- Department of NeurosurgeryThe Second Affiliated HospitalChongqing Medical UniversityChongqingP. R. China
| | - Chunrun Qu
- Department of NeurosurgeryXiangya HospitalCentral South UniversityChangshaHunanP. R. China
- National Clinical Research Center for Geriatric DisordersXiangya HospitalCentral South UniversityChangshaHunanP. R. China
| | - Wantao Wu
- Department of OncologyXiangya HospitalCentral South UniversityChangshaHunanP. R. China
| | - Rong Wen
- Department of NeurosurgeryThe Second Affiliated HospitalChongqing Medical UniversityChongqingP. R. China
| | - Ning Huang
- Department of NeurosurgeryThe Second Affiliated HospitalChongqing Medical UniversityChongqingP. R. China
| | - Qiuguang He
- Department of NeurosurgeryThe Second Affiliated HospitalChongqing Medical UniversityChongqingP. R. China
| | - Quan Cheng
- Department of NeurosurgeryXiangya HospitalCentral South UniversityChangshaHunanP. R. China
- National Clinical Research Center for Geriatric DisordersXiangya HospitalCentral South UniversityChangshaHunanP. R. China
| | - Guodong Liu
- Department of NeurosurgeryThe Second Affiliated HospitalChongqing Medical UniversityChongqingP. R. China
| | - Yuan Cheng
- Department of NeurosurgeryThe Second Affiliated HospitalChongqing Medical UniversityChongqingP. R. China
| |
Collapse
|
5
|
Robinson W, Stone JK, Schischlik F, Gasmi B, Kelly MC, Seibert C, Dadkhah K, Gertz EM, Lee JS, Zhu K, Ma L, Wang XW, Sahinalp SC, Patro R, Leiserson MDM, Harris CC, Schäffer AA, Ruppin E. Identification of intracellular bacteria from multiple single-cell RNA-seq platforms using CSI-Microbes. SCIENCE ADVANCES 2024; 10:eadj7402. [PMID: 38959321 PMCID: PMC11221508 DOI: 10.1126/sciadv.adj7402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 05/29/2024] [Indexed: 07/05/2024]
Abstract
The study of the tumor microbiome has been garnering increased attention. We developed a computational pipeline (CSI-Microbes) for identifying microbial reads from single-cell RNA sequencing (scRNA-seq) data and for analyzing differential abundance of taxa. Using a series of controlled experiments and analyses, we performed the first systematic evaluation of the efficacy of recovering microbial unique molecular identifiers by multiple scRNA-seq technologies, which identified the newer 10x chemistries (3' v3 and 5') as the best suited approach. We analyzed patient esophageal and colorectal carcinomas and found that reads from distinct genera tend to co-occur in the same host cells, testifying to possible intracellular polymicrobial interactions. Microbial reads are disproportionately abundant within myeloid cells that up-regulate proinflammatory cytokines like IL1Β and CXCL8, while infected tumor cells up-regulate antigen processing and presentation pathways. These results show that myeloid cells with bacteria engulfed are a major source of bacterial RNA within the tumor microenvironment (TME) and may inflame the TME and influence immunotherapy response.
Collapse
Affiliation(s)
- Welles Robinson
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20910, USA
- Department of Computer Science, University of Maryland, College Park, MD 20910, USA
- Surgery Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
- Tumour Immunogenomics and Immunosurveillance Laboratory, Department of Oncology, University College London, London, UK
| | - Joshua K. Stone
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Fiorella Schischlik
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Billel Gasmi
- Surgery Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Michael C. Kelly
- Center for Cancer Research Single Cell Analysis Facility, Frederick National Laboratory for Cancer Research, Bethesda, MD 20701, USA
| | - Charlie Seibert
- Center for Cancer Research Single Cell Analysis Facility, Frederick National Laboratory for Cancer Research, Bethesda, MD 20701, USA
| | - Kimia Dadkhah
- Center for Cancer Research Single Cell Analysis Facility, Frederick National Laboratory for Cancer Research, Bethesda, MD 20701, USA
| | - E. Michael Gertz
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Joo Sang Lee
- Department of Artificial Intelligence and Department of Precision Medicine, School of Medicine, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Kaiyuan Zhu
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Lichun Ma
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Xin Wei Wang
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - S. Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20910, USA
- Department of Computer Science, University of Maryland, College Park, MD 20910, USA
| | - Mark D. M. Leiserson
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20910, USA
- Department of Computer Science, University of Maryland, College Park, MD 20910, USA
| | - Curtis C. Harris
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Alejandro A. Schäffer
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Eytan Ruppin
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| |
Collapse
|
6
|
Khan J, Rubel T, Molloy E, Dhulipala L, Patro R. Fast, parallel, and cache-friendly suffix array construction. Algorithms Mol Biol 2024; 19:16. [PMID: 38679714 PMCID: PMC11056320 DOI: 10.1186/s13015-024-00263-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 03/21/2024] [Indexed: 05/01/2024] Open
Abstract
PURPOSE String indexes such as the suffix array (SA) and the closely related longest common prefix (LCP) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few scalable parallel algorithms for constructing these are known, and the existing algorithms can be highly non-trivial to implement and parallelize. METHODS In this paper we present CAPS-SA, a simple and scalable parallel algorithm for constructing these string indexes inspired by samplesort and utilizing an LCP-informed mergesort. Due to its design, CAPS-SA has excellent memory-locality and thus incurs fewer cache misses and achieves strong performance on modern multicore systems with deep cache hierarchies. RESULTS We show that despite its simple design, CAPS-SA outperforms existing state-of-the-art parallel SA and LCP-array construction algorithms on modern hardware. Finally, motivated by applications in modern aligners where the query strings have bounded lengths, we introduce the notion of a bounded-context SA and show that CAPS-SA can easily be extended to exploit this structure to obtain further speedups. We make our code publicly available at https://github.com/jamshed/CaPS-SA .
Collapse
Affiliation(s)
- Jamshed Khan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA.
| | - Tobias Rubel
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Erin Molloy
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Laxman Dhulipala
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA.
| |
Collapse
|
7
|
Wan X, Yang Q, Wang X, Bai Y, Liu Z. Isolation and Cultivation of Human Gut Microorganisms: A Review. Microorganisms 2023; 11:1080. [PMID: 37110502 PMCID: PMC10141110 DOI: 10.3390/microorganisms11041080] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 04/12/2023] [Accepted: 04/19/2023] [Indexed: 04/29/2023] Open
Abstract
Microbial resources from the human gut may find use in various applications, such as empirical research on the microbiome, the development of probiotic products, and bacteriotherapy. Due to the development of "culturomics", the number of pure bacterial cultures obtained from the human gut has significantly increased since 2012. However, there is still a considerable number of human gut microbes to be isolated and cultured. Thus, to improve the efficiency of obtaining microbial resources from the human gut, some constraints of the current methods, such as labor burden, culture condition, and microbial targetability, still need to be optimized. Here, we overview the general knowledge and recent development of culturomics for human gut microorganisms. Furthermore, we discuss the optimization of several parts of culturomics including sample collection, sample processing, isolation, and cultivation, which may improve the current strategies.
Collapse
Affiliation(s)
| | | | | | - Yun Bai
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; (X.W.); (Q.Y.); (X.W.)
| | - Zhi Liu
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; (X.W.); (Q.Y.); (X.W.)
| |
Collapse
|
8
|
Das A, Schatz MC. Sketching and sampling approaches for fast and accurate long read classification. BMC Bioinformatics 2022; 23:452. [PMID: 36316646 PMCID: PMC9624007 DOI: 10.1186/s12859-022-05014-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 10/27/2022] [Indexed: 11/05/2022] Open
Abstract
BACKGROUND In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read. RESULTS Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a "screen") of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read's similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy. CONCLUSIONS The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .
Collapse
Affiliation(s)
- Arun Das
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Michael C. Schatz
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|