1
|
Foltz SM, Li Y, Yao L, Terekhanova NV, Weerasinghe A, Gao Q, Dong G, Schindler M, Cao S, Sun H, Jayasinghe RG, Fulton RS, Fronick CC, King J, Kohnen DR, Fiala MA, Chen K, DiPersio JF, Vij R, Ding L. Somatic mutation phasing and haplotype extension using linked-reads in multiple myeloma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607342. [PMID: 39149342 PMCID: PMC11326269 DOI: 10.1101/2024.08.09.607342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Somatic mutation phasing informs our understanding of cancer-related events, like driver mutations. We generated linked-read whole genome sequencing data for 23 samples across disease stages from 14 multiple myeloma (MM) patients and systematically assigned somatic mutations to haplotypes using linked-reads. Here, we report the reconstructed cancer haplotypes and phase blocks from several MM samples and show how phase block length can be extended by integrating samples from the same individual. We also uncover phasing information in genes frequently mutated in MM, including DIS3, HIST1H1E, KRAS, NRAS, and TP53, phasing 79.4% of 20,705 high-confidence somatic mutations. In some cases, this enabled us to interpret clonal evolution models at higher resolution using pairs of phased somatic mutations. For example, our analysis of one patient suggested that two NRAS hotspot mutations occurred on the same haplotype but were independent events in different subclones. Given sufficient tumor purity and data quality, our framework illustrates how haplotype-aware analysis of somatic mutations in cancer can be beneficial for some cancer cases.
Collapse
Affiliation(s)
- Steven M. Foltz
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Lijun Yao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Nadezhda V. Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Amila Weerasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Qingsong Gao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Guanlan Dong
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Moses Schindler
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Hua Sun
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Reyka G. Jayasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Catrina C. Fronick
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Justin King
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Daniel R. Kohnen
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Mark A. Fiala
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - John F. DiPersio
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ravi Vij
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Department of Genetics, Washington University in St. Louis, St. Louis, MO, 63110, USA
| |
Collapse
|
2
|
Ressler AK, Snellings DA, Girard R, Gallione CJ, Lightle R, Allen AS, Awad IA, Marchuk DA. Single-nucleus DNA sequencing reveals hidden somatic loss-of-heterozygosity in Cerebral Cavernous Malformations. Nat Commun 2023; 14:7009. [PMID: 37919320 PMCID: PMC10622526 DOI: 10.1038/s41467-023-42908-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 10/24/2023] [Indexed: 11/04/2023] Open
Abstract
Cerebral Cavernous Malformations (CCMs) are vascular malformations of the central nervous system which can lead to moderate to severe neurological phenotypes in patients. A majority of CCM lesions are driven by a cancer-like three-hit mutational mechanism, including a somatic, activating mutation in the oncogene PIK3CA, as well as biallelic loss-of-function mutations in a CCM gene. However, standard sequencing approaches often fail to yield a full complement of pathogenic mutations in many CCMs. We suggest this reality reflects the limited sensitivity to identify low-frequency variants and the presence of mutations undetectable with bulk short-read sequencing. Here we report a single-nucleus DNA-sequencing approach that leverages the underlying biology of CCMs to identify lesions with somatic loss-of-heterozygosity, a class of such hidden mutations. We identify an alternative genetic mechanism for CCM pathogenesis and establish a method that can be repurposed to investigate the genetic underpinning of other disorders with multiple somatic mutations.
Collapse
Affiliation(s)
- Andrew K Ressler
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, 27710, USA.
| | - Daniel A Snellings
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Romuald Girard
- Neurovascular Surgery Program, Department of Neurological Surgery, The University of Chicago Medicine and Biological Sciences, Chicago, IL, USA
| | - Carol J Gallione
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Rhonda Lightle
- Neurovascular Surgery Program, Department of Neurological Surgery, The University of Chicago Medicine and Biological Sciences, Chicago, IL, USA
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, 27710, USA
| | - Issam A Awad
- Neurovascular Surgery Program, Department of Neurological Surgery, The University of Chicago Medicine and Biological Sciences, Chicago, IL, USA
| | - Douglas A Marchuk
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, 27710, USA.
| |
Collapse
|
3
|
Hård J, Mold JE, Eisfeldt J, Tellgren-Roth C, Häggqvist S, Bunikis I, Contreras-Lopez O, Chin CS, Nordlund J, Rubin CJ, Feuk L, Michaëlsson J, Ameur A. Long-read whole-genome analysis of human single cells. Nat Commun 2023; 14:5164. [PMID: 37620373 PMCID: PMC10449900 DOI: 10.1038/s41467-023-40898-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 08/07/2023] [Indexed: 08/26/2023] Open
Abstract
Long-read sequencing has dramatically increased our understanding of human genome variation. Here, we demonstrate that long-read technology can give new insights into the genomic architecture of individual cells. Clonally expanded CD8+ T-cells from a human donor were subjected to droplet-based multiple displacement amplification (dMDA) to generate long molecules with reduced bias. PacBio sequencing generated up to 40% genome coverage per single-cell, enabling detection of single nucleotide variants (SNVs), structural variants (SVs), and tandem repeats, also in regions inaccessible by short reads. 28 somatic SNVs were detected, including one case of mitochondrial heteroplasmy. 5473 high-confidence SVs/cell were discovered, a sixteen-fold increase compared to Illumina-based results from clonally related cells. Single-cell de novo assembly generated a genome size of up to 598 Mb and 1762 (12.8%) complete gene models. In summary, our work shows the promise of long-read sequencing toward characterization of the full spectrum of genetic variation in single cells.
Collapse
Affiliation(s)
- Joanna Hård
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden.
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
- ETH AI Center, ETH Zurich, Zurich, Switzerland.
| | - Jeff E Mold
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Christian Tellgren-Roth
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Susana Häggqvist
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Ignas Bunikis
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | | | | | - Jessica Nordlund
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Carl-Johan Rubin
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Lars Feuk
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Jakob Michaëlsson
- Center for Infectious Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
4
|
Sun S, Cheng F, Han D, Wei S, Zhong A, Massoudian S, Johnson AB. Pairwise comparative analysis of six haplotype assembly methods based on users' experience. BMC Genom Data 2023; 24:35. [PMID: 37386408 PMCID: PMC10311811 DOI: 10.1186/s12863-023-01134-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 05/25/2023] [Indexed: 07/01/2023] Open
Abstract
BACKGROUND A haplotype is a set of DNA variants inherited together from one parent or chromosome. Haplotype information is useful for studying genetic variation and disease association. Haplotype assembly (HA) is a process of obtaining haplotypes using DNA sequencing data. Currently, there are many HA methods with their own strengths and weaknesses. This study focused on comparing six HA methods or algorithms: HapCUT2, MixSIH, PEATH, WhatsHap, SDhaP, and MAtCHap using two NA12878 datasets named hg19 and hg38. The 6 HA algorithms were run on chromosome 10 of these two datasets, each with 3 filtering levels based on sequencing depth (DP1, DP15, and DP30). Their outputs were then compared. RESULT Run time (CPU time) was compared to assess the efficiency of 6 HA methods. HapCUT2 was the fastest HA for 6 datasets, with run time consistently under 2 min. In addition, WhatsHap was relatively fast, and its run time was 21 min or less for all 6 datasets. The other 4 HA algorithms' run time varied across different datasets and coverage levels. To assess their accuracy, pairwise comparisons were conducted for each pair of the six packages by generating their disagreement rates for both haplotype blocks and Single Nucleotide Variants (SNVs). The authors also compared them using switch distance (error), i.e., the number of positions where two chromosomes of a certain phase must be switched to match with the known haplotype. HapCUT2, PEATH, MixSIH, and MAtCHap generated output files with similar numbers of blocks and SNVs, and they had relatively similar performance. WhatsHap generated a much larger number of SNVs in the hg19 DP1 output, which caused it to have high disagreement percentages with other methods. However, for the hg38 data, WhatsHap had similar performance as the other 4 algorithms, except SDhaP. The comparison analysis showed that SDhaP had a much larger disagreement rate when it was compared with the other algorithms in all 6 datasets. CONCLUSION The comparative analysis is important because each algorithm is different. The findings of this study provide a deeper understanding of the performance of currently available HA algorithms and useful input for other users.
Collapse
Affiliation(s)
- Shuying Sun
- Department of Mathematics, Texas State University, San Marcos, TX USA
| | - Flora Cheng
- Carnegie Mellon University, Pittsburgh, PA USA
| | - Daphne Han
- Carnegie Mellon University, Pittsburgh, PA USA
| | - Sarah Wei
- Massachusetts Institute of Technology, Cambridge, MA USA
| | | | | | | |
Collapse
|
5
|
Chang L, Jiao H, Chen J, Wu G, Liu P, Li R, Guo J, Long W, Tang X, Lu B, Xu H, Wu H. Single-cell whole-genome sequencing, haplotype analysis in prenatal diagnosis of monogenic diseases. Life Sci Alliance 2023; 6:e202201761. [PMID: 36810160 PMCID: PMC9947115 DOI: 10.26508/lsa.202201761] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 02/10/2023] [Accepted: 02/10/2023] [Indexed: 02/24/2023] Open
Abstract
Monogenic inherited diseases are common causes of congenital disabilities, leading to severe economic and mental burdens on affected families. In our previous study, we demonstrated the validity of cell-based noninvasive prenatal testing (cbNIPT) in prenatal diagnosis by single-cell targeted sequencing. The present research further explored the feasibility of single-cell whole-genome sequencing (WGS) and haplotype analysis of various monogenic diseases with cbNIPT. Four families were recruited: one with inherited deafness, one with hemophilia, one with large vestibular aqueduct syndrome (LVAS), and one with no disease. Circulating trophoblast cells (cTBs) were obtained from maternal blood and analyzed by single-cell 15X WGS. Haplotype analysis showed that CFC178 (deafness family), CFC616 (hemophilia family), and CFC111 (LVAS family) inherited haplotypes from paternal and/or maternal pathogenic loci. Amniotic fluid or fetal villi samples from the deafness and hemophilia families confirmed these results. WGS performed better than targeted sequencing in genome coverage, allele dropout (ADO), and false-positive (FP) ratios. Our findings suggest that cbNIPT by WGS and haplotype analysis have great potential for use in prenatally diagnosing various monogenic diseases.
Collapse
Affiliation(s)
- Liang Chang
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education, Beijing, China
- Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Haining Jiao
- Department of Obstetrics and Gynecology, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jiucheng Chen
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| | - Guanlin Wu
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| | - Ping Liu
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education, Beijing, China
- Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Rong Li
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education, Beijing, China
- Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Jianying Guo
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education, Beijing, China
- Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Wenqing Long
- Department of Obstetrics and Gynecology, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiaojian Tang
- Department of Obstetrics and Gynecology, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Bingjie Lu
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| | - Haibin Xu
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| | - Han Wu
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| |
Collapse
|
6
|
Moravec JC, Lanfear R, Spector DL, Diermeier SD, Gavryushkin A. Testing for Phylogenetic Signal in Single-Cell RNA-Seq Data. J Comput Biol 2023; 30:518-537. [PMID: 36475926 PMCID: PMC10125402 DOI: 10.1089/cmb.2022.0357] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Phylogenetic methods are emerging as a useful tool to understand cancer evolutionary dynamics, including tumor structure, heterogeneity, and progression. Most currently used approaches utilize either bulk whole genome sequencing or single-cell DNA sequencing and are based on calling copy number alterations and single nucleotide variants (SNVs). Single-cell RNA sequencing (scRNA-seq) is commonly applied to explore differential gene expression of cancer cells throughout tumor progression. The method exacerbates the single-cell sequencing problem of low yield per cell with uneven expression levels. This accounts for low and uneven sequencing coverage and makes SNV detection and phylogenetic analysis challenging. In this article, we demonstrate for the first time that scRNA-seq data contain sufficient evolutionary signal and can also be utilized in phylogenetic analyses. We explore and compare results of such analyses based on both expression levels and SNVs called from scRNA-seq data. Both techniques are shown to be useful for reconstructing phylogenetic relationships between cells, reflecting the clonal composition of a tumor. Both standardized expression values and SNVs appear to be equally capable of reconstructing a similar pattern of phylogenetic relationship. This pattern is stable even when phylogenetic uncertainty is taken in account. Our results open up a new direction of somatic phylogenetics based on scRNA-seq data. Further research is required to refine and improve these approaches to capture the full picture of somatic evolutionary dynamics in cancer.
Collapse
Affiliation(s)
- Jiří C. Moravec
- Department of Computer Science, University of Otago, Dunedin, New Zealand
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Robert Lanfear
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia
| | | | | | - Alex Gavryushkin
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
7
|
Wakita S, Hara M, Kitabatake Y, Kawatani K, Kurahashi H, Hashizume R. Experimental method for haplotype phasing across the entire length of chromosome 21 in trisomy 21 cells using a chromosome elimination technique. J Hum Genet 2022; 67:565-572. [PMID: 35637312 PMCID: PMC9510051 DOI: 10.1038/s10038-022-01049-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 04/25/2022] [Accepted: 05/12/2022] [Indexed: 11/09/2022]
Abstract
Modern sequencing technologies produce a single consensus sequence without distinguishing between homologous chromosomes. Haplotype phasing solves this limitation by identifying alleles on the maternal and paternal chromosomes. This information is critical for understanding gene expression models in genetic disease research. Furthermore, the haplotype phasing of three homologous chromosomes in trisomy cells is more complicated than that in disomy cells. In this study, we attempted the accurate and complete haplotype phasing of chromosome 21 in trisomy 21 cells. To separate homologs, we established three corrected disomy cell lines (ΔPaternal chromosome, ΔMaternal chromosome 1, and ΔMaternal chromosome 2) from trisomy 21 induced pluripotent stem cells by eliminating one chromosome 21 utilizing the Cre-loxP system. These cells were then whole-genome sequenced by a next-generation sequencer. By simply comparing the base information of the whole-genome sequence data at the same position between each corrected disomy cell line, we determined the base on the eliminated chromosome and performed phasing. We phased 51,596 single nucleotide polymorphisms (SNPs) on chromosome 21, randomly selected seven SNPs spanning the entire length of the chromosome, and confirmed that there was no contradiction by direct sequencing.
Collapse
Affiliation(s)
- Sachiko Wakita
- Department of Pathology and Matrix Biology, Mie University Graduate School of Medicine, Mie, Japan
| | - Mari Hara
- Department of Pathology and Matrix Biology, Mie University Graduate School of Medicine, Mie, Japan
| | - Yasuji Kitabatake
- Department of Pediatrics, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan
| | - Keiji Kawatani
- Department of Pediatrics, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan.,Department of Neuroscience, Mayo Clinic, Scottsdale, AZ, USA
| | - Hiroki Kurahashi
- Division of Molecular Genetics, Institute for Comprehensive Medical Science, Fujita Health University, Toyoake, Japan
| | - Ryotaro Hashizume
- Department of Pathology and Matrix Biology, Mie University Graduate School of Medicine, Mie, Japan. .,Department of Genomic Medicine, Mie University Hospital, Mie, Japan.
| |
Collapse
|
8
|
Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol 2021; 22:101. [PMID: 33845884 PMCID: PMC8040228 DOI: 10.1186/s13059-021-02328-9] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 03/25/2021] [Indexed: 12/13/2022] Open
Abstract
High-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
9
|
Berger E, Yorukoglu D, Zhang L, Nyquist SK, Shalek AK, Kellis M, Numanagić I, Berger B. Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets. Nat Commun 2020; 11:4662. [PMID: 32938926 PMCID: PMC7494856 DOI: 10.1038/s41467-020-18320-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 08/07/2020] [Indexed: 01/04/2023] Open
Abstract
Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X's feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10× faster than other tools. The advantage of HapTree-X's ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.
Collapse
Affiliation(s)
- Emily Berger
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Department of Mathematics, UC Berkeley, Berkeley, CA, 94720, USA
| | - Deniz Yorukoglu
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Lillian Zhang
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Sarah K Nyquist
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Alex K Shalek
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Manolis Kellis
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Ibrahim Numanagić
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Department of Computer Science, University of Victoria, Victoria, BC, V8P 5C2, Canada.
| | - Bonnie Berger
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
10
|
Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020; 21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 679] [Impact Index Per Article: 135.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open
Abstract
The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Collapse
Affiliation(s)
- David Lähnemann
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
| | - Ewa Szczurek
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Davis J. McCarthy
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia
- Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
| | - Catalina A. Vallejos
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
- The Alan Turing Institute, British Library, London, UK
| | - Kieran R. Campbell
- Department of Statistics, University of British Columbia, Vancouver, Canada
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Data Science Institute, University of British Columbia, Vancouver, Canada
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmed Mahfouz
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, USA
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Jasmijn Baaijens
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
| | - Marleen Balvert
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Buys de Barbanson
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Antonio Cappuccio
- Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
| | - Giacomo Corleone
- Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
| | - Bas E. Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria Florescu
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Rens Holmer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Thamar Jessurun Lobo
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Emma M. Keizer
- Biometris, Wageningen University & Research, Wageningen, The Netherlands
| | - Indu Khatri
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Szymon M. Kielbasa
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Alexey M. Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Boudewijn P.F. Lelieveldt
- PRB lab, Delft University of Technology, Delft, The Netherlands
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ion I. Mandoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, USA
| | - John C. Marioni
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Felix Mölder
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Amir Niknejad
- Computation molecular design, Zuse Institute Berlin, Berlin, Germany
- Mathematics Department, Mount Saint Vincent, New York, USA
| | - Alicja Rączkowska
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Marcel Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
| | - Antonios Somarakis
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Oliver Stegle
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
| | - Huan Yang
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Alice C. McHardy
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Sohrab P. Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
11
|
Yan Z, Zhu X, Wang Y, Nie Y, Guan S, Kuo Y, Chang D, Li R, Qiao J, Yan L. scHaplotyper: haplotype construction and visualization for genetic diagnosis using single cell DNA sequencing data. BMC Bioinformatics 2020; 21:41. [PMID: 32007105 PMCID: PMC6995221 DOI: 10.1186/s12859-020-3381-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 01/22/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Haplotyping reveals chromosome blocks inherited from parents to in vitro fertilized (IVF) embryos in preimplantation genetic diagnosis (PGD), enabling the observation of the transmission of disease alleles between generations. However, the methods of haplotyping that are suitable for single cells are limited because a whole genome amplification (WGA) process is performed before sequencing or genotyping in PGD, and true haplotype profiles of embryos need to be constructed based on genotypes that can contain many WGA artifacts. RESULTS Here, we offer scHaplotyper as a genetic diagnosis tool that reconstructs and visualizes the haplotype profiles of single cells based on the Hidden Markov Model (HMM). scHaplotyper can trace the origin of each haplotype block in the embryo, enabling the detection of carrier status of disease alleles in each embryo. We applied this method in PGD in two families affected with genetic disorders, and the result was the healthy live births of two children in the two families, demonstrating the clinical application of this method. CONCLUSION Next generation sequencing (NGS) of preimplantation embryos enable genetic screening for families with genetic disorders, avoiding the birth of affected babies. With the validation and successful clinical application, we showed that scHaplotyper is a convenient and accurate method to screen out embryos. More patients with genetic disorder will benefit from the genetic diagnosis of embryos. The source code of scHaplotyper is available at GitHub repository: https://github.com/yzqheart/scHaplotyper.
Collapse
Affiliation(s)
- Zhiqiang Yan
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.,Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xiaohui Zhu
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Yuqian Wang
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Yanli Nie
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Shuo Guan
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Ying Kuo
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Di Chang
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Rong Li
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Jie Qiao
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.,Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.,Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, 100871, China
| | - Liying Yan
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China. .,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China. .,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China.
| |
Collapse
|