1
|
Gao T, Soldatov R, Sarkar H, Kurkiewicz A, Biederstedt E, Loh PR, Kharchenko PV. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nat Biotechnol 2023; 41:417-426. [PMID: 36163550 PMCID: PMC10289836 DOI: 10.1038/s41587-022-01468-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 08/11/2022] [Indexed: 11/09/2022]
Abstract
Genome instability and aberrant alterations of transcriptional programs both play important roles in cancer. Single-cell RNA sequencing (scRNA-seq) has the potential to investigate both genetic and nongenetic sources of tumor heterogeneity in a single assay. Here we present a computational method, Numbat, that integrates haplotype information obtained from population-based phasing with allele and expression signals to enhance detection of copy number variations from scRNA-seq. Numbat exploits the evolutionary relationships between subclones to iteratively infer single-cell copy number profiles and tumor clonal phylogeny. Analysis of 22 tumor samples, including multiple myeloma, gastric, breast and thyroid cancers, shows that Numbat can reconstruct the tumor copy number profile and precisely identify malignant cells in the tumor microenvironment. We identify genetic subpopulations with transcriptional signatures relevant to tumor progression and therapy resistance. Numbat requires neither sample-matched DNA data nor a priori genotyping, and is applicable to a wide range of experimental settings and cancer types.
Collapse
Affiliation(s)
- Teng Gao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Ruslan Soldatov
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Hirak Sarkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Adam Kurkiewicz
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Evan Biederstedt
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Peter V Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Stem Cell Institute, Cambridge, MA, USA.
- Altos Labs, San Diego, CA, USA.
| |
Collapse
|
2
|
Prashant NM, Alomran N, Chen Y, Liu H, Bousounis P, Movassagh M, Edwards N, Horvath A. SCReadCounts: estimation of cell-level SNVs expression from scRNA-seq data. BMC Genomics 2021; 22:689. [PMID: 34551708 PMCID: PMC8459565 DOI: 10.1186/s12864-021-07974-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 09/03/2021] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Recent studies have demonstrated the utility of scRNA-seq SNVs to distinguish tumor from normal cells, characterize intra-tumoral heterogeneity, and define mutation-associated expression signatures. In addition to cancer studies, SNVs from single cells have been useful in studies of transcriptional burst kinetics, allelic expression, chromosome X inactivation, ploidy estimations, and haplotype inference. RESULTS To aid these types of studies, we have developed a tool, SCReadCounts, for cell-level tabulation of the sequencing read counts bearing SNV reference and variant alleles from barcoded scRNA-seq alignments. Provided genomic loci and expected alleles, SCReadCounts generates cell-SNV matrices with the absolute variant- and reference-harboring read counts, as well as cell-SNV matrices of expressed Variant Allele Fraction (VAFRNA) suitable for a variety of downstream applications. We demonstrate three different SCReadCounts applications on 59,884 cells from seven neuroblastoma samples: (1) estimation of cell-level expression of known somatic mutations and RNA-editing sites, (2) estimation of cell- level allele expression of biallelic SNVs, and (3) a discovery mode assessment of the reference and each of the three alternative nucleotides at genomic positions of interest that does not require prior SNV information. For the later, we applied SCReadCounts on the coding regions of KRAS, where it identified known and novel somatic mutations in a low-to-moderate proportion of cells. The SCReadCounts read counts module is benchmarked against the analogous modules of GATK and Samtools. SCReadCounts is freely available ( https://github.com/HorvathLab/NGS ) as 64-bit self-contained binary distributions for Linux and MacOS, in addition to Python source. CONCLUSIONS SCReadCounts supplies a fast and efficient solution for estimation of cell-level SNV expression from scRNA-seq data. SCReadCounts enables distinguishing cells with monoallelic reference expression from those with no gene expression and is applicable to assess SNVs present in only a small proportion of the cells, such as somatic mutations in cancer.
Collapse
Affiliation(s)
- N M Prashant
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
- Departments of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nawaf Alomran
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Yu Chen
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington, DC, 20057, USA
| | - Hongyu Liu
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Pavlos Bousounis
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Mercedeh Movassagh
- Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Department of Data Sciences, Dana Farber Cancer Institute, Boston, MA, USA
| | - Nathan Edwards
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington, DC, 20057, USA
| | - Anelia Horvath
- McCormick Genomics and Proteomics Center, School of Medicine and Health Sciences, The George Washington University, Washington, DC, 20037, USA.
| |
Collapse
|
3
|
Berger E, Yorukoglu D, Zhang L, Nyquist SK, Shalek AK, Kellis M, Numanagić I, Berger B. Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets. Nat Commun 2020; 11:4662. [PMID: 32938926 PMCID: PMC7494856 DOI: 10.1038/s41467-020-18320-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 08/07/2020] [Indexed: 01/04/2023] Open
Abstract
Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X's feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10× faster than other tools. The advantage of HapTree-X's ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.
Collapse
Affiliation(s)
- Emily Berger
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Department of Mathematics, UC Berkeley, Berkeley, CA, 94720, USA
| | - Deniz Yorukoglu
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Lillian Zhang
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Sarah K Nyquist
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Alex K Shalek
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Manolis Kellis
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Ibrahim Numanagić
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Department of Computer Science, University of Victoria, Victoria, BC, V8P 5C2, Canada.
| | - Bonnie Berger
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
4
|
M PN, Liu H, Bousounis P, Spurr L, Alomran N, Ibeawuchi H, Sein J, Reece-Stremtan D, Horvath A. Estimating the Allele-Specific Expression of SNVs From 10× Genomics Single-Cell RNA-Sequencing Data. Genes (Basel) 2020; 11:E240. [PMID: 32106453 DOI: 10.3390/genes11030240] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 02/10/2020] [Accepted: 02/19/2020] [Indexed: 12/15/2022] Open
Abstract
With the recent advances in single-cell RNA-sequencing (scRNA-seq) technologies, the estimation of allele expression from single cells is becoming increasingly reliable. Allele expression is both quantitative and dynamic and is an essential component of the genomic interactome. Here, we systematically estimate the allele expression from heterozygous single nucleotide variant (SNV) loci using scRNA-seq data generated on the 10×Genomics Chromium platform. We analyzed 26,640 human adipose-derived mesenchymal stem cells (from three healthy donors), sequenced to an average of 150K sequencing reads per cell (more than 4 billion scRNA-seq reads in total). High-quality SNV calls assessed in our study contained approximately 15% exonic and >50% intronic loci. To analyze the allele expression, we estimated the expressed variant allele fraction (VAFRNA) from SNV-aware alignments and analyzed its variance and distribution (mono- and bi-allelic) at different minimum sequencing read thresholds. Our analysis shows that when assessing positions covered by a minimum of three unique sequencing reads, over 50% of the heterozygous SNVs show bi-allelic expression, while at a threshold of 10 reads, nearly 90% of the SNVs are bi-allelic. In addition, our analysis demonstrates the feasibility of scVAFRNA estimation from current scRNA-seq datasets and shows that the 3′-based library generation protocol of 10×Genomics scRNA-seq data can be informative in SNV-based studies, including analyses of transcriptional kinetics.
Collapse
|
5
|
Abstract
Allele-specific expression (ASE) at single-cell resolution is a critical tool for understanding the stochastic and dynamic features of gene expression. However, low read coverage and high biological variability present challenges for analyzing ASE. We demonstrate that discarding multi-mapping reads leads to higher variability in estimates of allelic proportions, an increased frequency of sampling zeros, and can lead to spurious findings of dynamic and monoallelic gene expression. Here, we report a method for ASE analysis from single-cell RNA-Seq data that accurately classifies allelic expression states and improves estimation of allelic proportions by pooling information across cells. We further demonstrate that combining information across cells using a hierarchical mixture model reduces sampling variability without sacrificing cell-to-cell heterogeneity. We applied our approach to re-evaluate the statistical independence of allelic bursting and track changes in the allele-specific expression patterns of cells sampled over a developmental time course.
Collapse
Affiliation(s)
- Kwangbom Choi
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA
| | | | - Gary A Churchill
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA.
| |
Collapse
|
6
|
Abstract
The trillions of cells in the human body can be viewed as elementary but essential biological units that achieve different body states, but the low resolution of previous cell isolation and measurement approaches limits our understanding of the cell-specific molecular profiles. The recent establishment and rapid growth of single-cell sequencing technology has facilitated the identification of molecular profiles of heterogeneous cells, especially on the transcription level of single cells [single-cell RNA sequencing (scRNA-seq)]. As a novel method, the robustness of scRNA-seq under changing conditions will determine its practical potential in major research programs and clinical applications. In this review, we first briefly presented the scRNA-seq-related methods from the point of view of experiments and computation. Then, we compared several state-of-the-art scRNA-seq analysis frameworks mainly by analyzing their performance robustness on independent scRNA-seq datasets for the same complex disease. Finally, we elaborated on our hypothesis on consensus scRNA-seq analysis and summarized the potential indicative and predictive roles of individual cells in understanding disease heterogeneity by single-cell technologies.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
7
|
Scott ER, Wallsten RL. A Look to the Future. Pharmacogenomics 2019. [DOI: 10.1016/b978-0-12-812626-4.00010-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
8
|
Benitez JA, Cheng S, Deng Q. Revealing allele-specific gene expression by single-cell transcriptomics. Int J Biochem Cell Biol 2017; 90:155-60. [PMID: 28578186 DOI: 10.1016/j.biocel.2017.05.029] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Revised: 05/23/2017] [Accepted: 05/27/2017] [Indexed: 11/21/2022]
Abstract
Single-cell sequencing has emerged as a revolutionary method that reveals biological processes with unprecedented resolution and scale, and has already greatly impacted biology and medicine. To investigate processes such as alternative splicing, novel exon detection and allele-specific expression (ASE), full-length based single-cell RNA-seq methods are required for broad sequence coverage and single nucleotide polymorphism (SNP) identification. In this review, we revisit recent achievements from studies that used single-cell RNA-seq to advance our understanding of ASE in the context of both autosomal and X-chromosome genes. We also recapitulate useful bioinformatic tools developed to identify haplotype phase.
Collapse
|
9
|
Abstract
Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing allows the comparison of expression distribution between the two alleles of a diploid organism and the characterization of allele-specific bursting. Here, we propose SCALE to analyze genome-wide allele-specific bursting, with adjustment of technical variability. SCALE detects genes exhibiting allelic differences in bursting parameters and genes whose alleles burst non-independently. We apply SCALE to mouse blastocyst and human fibroblast cells and find that cis control in gene expression overwhelmingly manifests as differences in burst frequency.
Collapse
Affiliation(s)
- Yuchao Jiang
- Genomics and Computational Biology Graduate Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Mingyao Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|