1
|
Luebbert L, Sullivan DK, Carilli M, Eldjárn Hjörleifsson K, Viloria Winnett A, Chari T, Pachter L. Detection of viral sequences at single-cell resolution identifies novel viruses associated with host gene expression changes. Nat Biotechnol 2025:10.1038/s41587-025-02614-y. [PMID: 40263451 DOI: 10.1038/s41587-025-02614-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 02/24/2025] [Indexed: 04/24/2025]
Abstract
The increasing use of high-throughput sequencing methods in research, agriculture and healthcare provides an opportunity for the cost-effective surveillance of viral diversity and investigation of virus-disease correlation. However, existing methods for identifying viruses in sequencing data rely on and are limited to reference genomes or cannot retain single-cell resolution through cell barcode tracking. We introduce a method that accurately and rapidly detects viral sequences in bulk and single-cell transcriptomics data based on the highly conserved RdRP protein, enabling the detection of over 100,000 RNA virus species. The analysis of viral presence and host gene expression in parallel at single-cell resolution allows for the characterization of host viromes and the identification of viral tropism and host responses. We apply our method to peripheral blood mononuclear cell data from rhesus macaques with Ebola virus disease and describe previously unknown putative viruses. Moreover, we are able to accurately predict viral presence in individual cells based on macaque gene expression.
Collapse
Affiliation(s)
- Laura Luebbert
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Delaney K Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Maria Carilli
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Kristján Eldjárn Hjörleifsson
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Alexander Viloria Winnett
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
2
|
Luebbert L, Sullivan DK, Carilli M, Hjörleifsson KE, Winnett AV, Chari T, Pachter L. Efficient and accurate detection of viral sequences at single-cell resolution reveals putative novel viruses perturbing host gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2023.12.11.571168. [PMID: 38168363 PMCID: PMC10760059 DOI: 10.1101/2023.12.11.571168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
There are an estimated 300,000 mammalian viruses from which infectious diseases in humans may arise. They inhabit human tissues such as the lungs, blood, and brain and often remain undetected. Efficient and accurate detection of viral infection is vital to understanding its impact on human health and to make accurate predictions to limit adverse effects, such as future epidemics. The increasing use of high-throughput sequencing methods in research, agriculture, and healthcare provides an opportunity for the cost-effective surveillance of viral diversity and investigation of virus-disease correlation. However, existing methods for identifying viruses in sequencing data rely on and are limited to reference genomes or cannot retain single-cell resolution through cell barcode tracking. We introduce a method that accurately and rapidly detects viral sequences in bulk and single-cell transcriptomics data based on highly conserved amino acid domains, which enables the detection of RNA viruses covering over 100,000 virus species. The analysis of viral presence and host gene expression in parallel at single-cell resolution allows for the characterization of host viromes and the identification of viral tropism and host responses. We applied our method to identify putative novel viruses in rhesus macaque PBMC data that display cell type specificity and whose presence correlates with altered host gene expression.
Collapse
Affiliation(s)
- Laura Luebbert
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Delaney K. Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Maria Carilli
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | | | - Alexander Viloria Winnett
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
3
|
Leng F, Mei S, Zhou X, Liu X, Yuan Y, Xu W, Hao C, Guo R, Hao C, Li W, Zhang P. DVsc: An Automated Framework for Efficiently Detecting Viral Infection from Single-cell Transcriptomics Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzad007. [PMID: 39215426 PMCID: PMC12016032 DOI: 10.1093/gpbjnl/qzad007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 11/13/2023] [Accepted: 12/15/2023] [Indexed: 09/04/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a valuable tool for studying cellular heterogeneity in various fields, particularly in virological research. By studying the viral and cellular transcriptomes, the dynamics of viral infection can be investigated at a single-cell resolution. However, limited studies have been conducted to investigate whether RNA transcripts from clinical samples contain substantial amounts of viral RNAs, and a specific computational framework for efficiently detecting viral reads based on scRNA-seq data has not been developed. Hence, we introduce DVsc, an open-source framework for precise quantitative analysis of viral infection from single-cell transcriptomics data. When applied to approximately 200 diverse clinical samples that were infected by more than 10 different viruses, DVsc demonstrated high accuracy in systematically detecting viral infection across a wide array of cell types. This innovative bioinformatics pipeline could be crucial for addressing the potential effects of surreptitiously invading viruses on certain illnesses, as well as for designing novel medicines to target viruses in specific host cell subsets and evaluating the efficacy of treatment. DVsc supports the FASTQ format as an input and is compatible with multiple single-cell sequencing platforms. Moreover, it could also be applied to sequences from bulk RNA sequencing data. DVsc is available at http://62.234.32.33:5000/DVsc.
Collapse
Affiliation(s)
- Fei Leng
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Song Mei
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Xiaolin Zhou
- Institute of Biomedical Engineering, University of Toronto, Toronto, M5S 3G9, Canada
| | - Xuanshi Liu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Yefeng Yuan
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Wenjian Xu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Chongyi Hao
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Ruolan Guo
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Chanjuan Hao
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Wei Li
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Peng Zhang
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| |
Collapse
|
4
|
Whitmore LS, Tisoncik-Go J, Gale M. scPathoQuant: a tool for efficient alignment and quantification of pathogen sequence reads from 10× single cell sequencing datasets. Bioinformatics 2024; 40:btae145. [PMID: 38478395 PMCID: PMC10990681 DOI: 10.1093/bioinformatics/btae145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/26/2024] [Accepted: 03/11/2024] [Indexed: 04/05/2024] Open
Abstract
MOTIVATION Currently there is a lack of efficient computational pipelines/tools for conducting simultaneous genome mapping of pathogen-derived and host reads from single cell RNA sequencing (scRNAseq) output from pathogen-infected cells. Contemporary options include processes involving multiple steps and/or running multiple computational tools, increasing user operations time. RESULTS To address the need for new tools to directly map and quantify pathogen and host sequence reads from within an infected cell from scRNAseq datasets in a single operation, we have built a python package, called scPathoQuant. scPathoQuant extracts sequences that were not aligned to the primary host genome, maps them to a pathogen genome of interest (here as demonstrated for viral pathogens), quantifies total reads mapping to the entire pathogen, quantifies reads mapping to individual pathogen genes, and finally integrates pathogen sequence counts into matrix files that are used by standard single cell pipelines for downstream analyses with only one command. We demonstrate that scPathoQuant provides a scRNAseq viral and host genome-wide sequence read abundance analysis that can differentiate and define multiple viruses in a single sample scRNAseq output. AVAILABILITY AND IMPLEMENTATION The SPQ package is available software accessible at https://github.com/galelab/scPathoQuant (DOI 10.5281/zenodo.10463670) with test codes and datasets available https://github.com/galelab/Whitmore_scPathoQuant_testSets (DOI 10.5281/zenodo.10463677) to serve as a resource for the community.
Collapse
Affiliation(s)
- Leanne S Whitmore
- Center for Innate Immunity and Immune Disease, University of Washington, Seattle, WA 98109, United States
- Department of Immunology, School of Medicine, University of Washington, Seattle, WA 98109, United States
| | - Jennifer Tisoncik-Go
- Center for Innate Immunity and Immune Disease, University of Washington, Seattle, WA 98109, United States
- Department of Immunology, School of Medicine, University of Washington, Seattle, WA 98109, United States
| | - Michael Gale
- Center for Innate Immunity and Immune Disease, University of Washington, Seattle, WA 98109, United States
- Department of Immunology, School of Medicine, University of Washington, Seattle, WA 98109, United States
- Washington National Primate Research Center, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
5
|
Chen J, Yin D, Wong HYH, Duan X, Yu KHO, Ho JWK. Vulture: cloud-enabled scalable mining of microbial reads in public scRNA-seq data. Gigascience 2024; 13:giad117. [PMID: 38195165 PMCID: PMC10776309 DOI: 10.1093/gigascience/giad117] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 08/17/2023] [Accepted: 12/16/2023] [Indexed: 01/11/2024] Open
Abstract
The rapidly growing collection of public single-cell sequencing data has become a valuable resource for molecular, cellular, and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our benchmarking experiments, Vulture is 66% to 88% faster than local tools (PathogenTrack and Venus) and 41% faster than the state-of-the-art cloud-based tool Cumulus, while achieving comparable microbial read identification. In terms of the cost on cloud computing systems, Vulture also shows a cost reduction of 83% ($12 vs. ${\$}$70). We applied Vulture to 2 coronavirus disease 2019, 3 hepatocellular carcinoma (HCC), and 2 gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell type-specific enrichment of severe acute respiratory syndrome coronavirus 2, hepatitis B virus (HBV), and Helicobacter pylori-positive cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture.
Collapse
Affiliation(s)
- Junyi Chen
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Danqing Yin
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Harris Y H Wong
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| | - Xin Duan
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| | - Ken H O Yu
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Joshua W K Ho
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| |
Collapse
|