1
|
Kalef-Ezra E, Turan ZG, Perez-Rodriguez D, Bomann I, Behera S, Morley C, Scholz SW, Jaunmuktane Z, Demeulemeester J, Sedlazeck FJ, Proukakis C. Single-cell somatic copy number variants in brain using different amplification methods and reference genomes. Commun Biol 2024; 7:1288. [PMID: 39384904 PMCID: PMC11464624 DOI: 10.1038/s42003-024-06940-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 09/23/2024] [Indexed: 10/11/2024] Open
Abstract
The presence of somatic mutations, including copy number variants (CNVs), in the brain is well recognized. Comprehensive study requires single-cell whole genome amplification, with several methods available, prior to sequencing. Here we compare PicoPLEX with two recent adaptations of multiple displacement amplification (MDA): primary template-directed amplification (PTA) and droplet MDA, across 93 human brain cortical nuclei. We demonstrate different properties for each, with PTA providing the broadest amplification, PicoPLEX the most even, and distinct chimeric profiles. Furthermore, we perform CNV calling on two brains with multiple system atrophy and one control brain using different reference genomes. We find that 20.6% of brain cells have at least one Mb-scale CNV, with some supported by bulk sequencing or single-cells from other brain regions. Our study highlights the importance of selecting whole genome amplification method and reference genome for CNV calling, while supporting the existence of somatic CNVs in healthy and diseased human brain.
Collapse
Affiliation(s)
- Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Zeliha Gozde Turan
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Diego Perez-Rodriguez
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
| | - Ida Bomann
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Caoimhe Morley
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Section, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Zane Jaunmuktane
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
- Queen Square Brain Bank for Neurological disorders, UCL Queen Square Institute of Neurology, London, UK
| | - Jonas Demeulemeester
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
- Department of Oncology, KU Leuven, Leuven, Belgium
- Cancer Genomics Laboratory, The Francis Crick Institute, London, UK
- VIB Center for Cancer Biology, Leuven, Belgium
| | - Fritz J Sedlazeck
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
| |
Collapse
|
2
|
Buckley DN, Lewinger JP, Gooden G, Spillman M, Neuman M, Guo XM, Tew BY, Miller H, Khetan VU, Shulman LP, Roman L, Salhia B. OvaPrint-A Cell-free DNA Methylation Liquid Biopsy for the Risk Assessment of High-grade Serous Ovarian Cancer. Clin Cancer Res 2023; 29:5196-5206. [PMID: 37812492 PMCID: PMC10722131 DOI: 10.1158/1078-0432.ccr-23-1197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 08/08/2023] [Accepted: 10/05/2023] [Indexed: 10/10/2023]
Abstract
PURPOSE High-grade serous ovarian carcinoma (HGSOC) is the most lethal epithelial ovarian cancer (EOC) and is often diagnosed at late stage. In women with a known pelvic mass, surgery followed by pathologic assessment is the most reliable way to diagnose EOC and there are still no effective screening tools in asymptomatic women. In the current study, we developed a cell-free DNA (cfDNA) methylation liquid biopsy for the risk assessment of early-stage HGSOC. EXPERIMENTAL DESIGN We performed reduced representation bisulfite sequencing to identify differentially methylated regions (DMR) between HGSOC and normal ovarian and fallopian tube tissue. Next, we performed hybridization probe capture for 1,677 DMRs and constructed a classifier (OvaPrint) on an independent set of cfDNA samples to discriminate HGSOC from benign masses. We also analyzed a series of non-HGSOC EOC, including low-grade and borderline samples to assess the generalizability of OvaPrint. A total of 372 samples (tissue n = 59, plasma n = 313) were analyzed in this study. RESULTS OvaPrint achieved a positive predictive value of 95% and a negative predictive value of 88% for discriminating HGSOC from benign masses, surpassing other commercial tests. OvaPrint was less sensitive for non-HGSOC EOC, albeit it may have potential utility for identifying low-grade and borderline tumors with higher malignant potential. CONCLUSIONS OvaPrint is a highly sensitive and specific test that can be used for the risk assessment of HGSOC in symptomatic women. Prospective studies are warranted to validate OvaPrint for HGSOC and further develop it for non-HGSOC EOC histotypes in both symptomatic and asymptomatic women with adnexal masses.
Collapse
Affiliation(s)
- David N. Buckley
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Juan Pablo Lewinger
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California
| | - Gerald Gooden
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Monique Spillman
- Division of Gynecologic Oncology, Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas
| | - Monica Neuman
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Keck School of Medicine of University of Southern California, Los Angeles, California
| | - X. Mona Guo
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Keck School of Medicine of University of Southern California, Los Angeles, California
| | - Ben Yi Tew
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Heather Miller
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Keck School of Medicine of University of Southern California, Los Angeles, California
| | - Varun U. Khetan
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Keck School of Medicine of University of Southern California, Los Angeles, California
| | - Lee P. Shulman
- Feinberg School of Medicine, Northwestern University, Chicago, Illinois
| | - Lynda Roman
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Keck School of Medicine of University of Southern California, Los Angeles, California
- USC Norris Comprehensive Cancer Center, Los Angeles, California
| | - Bodour Salhia
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California
- USC Norris Comprehensive Cancer Center, Los Angeles, California
| |
Collapse
|
3
|
Kalef-Ezra E, Turan ZG, Perez-Rodriguez D, Bomann I, Behera S, Morley C, Scholz SW, Jaunmuktane Z, Demeulemeester J, Sedlazeck FJ, Proukakis C. Single-cell somatic copy number variants in brain using different amplification methods and reference genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.07.552289. [PMID: 37609320 PMCID: PMC10441336 DOI: 10.1101/2023.08.07.552289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
The presence of somatic mutations, including copy number variants (CNVs), in the brain is well recognized. Comprehensive study requires single-cell whole genome amplification, with several methods available, prior to sequencing. We compared PicoPLEX with two recent adaptations of multiple displacement amplification (MDA): primary template-directed amplification (PTA) and droplet MDA, across 93 human brain cortical nuclei. We demonstrated different properties for each, with PTA providing the broadest amplification, PicoPLEX the most even, and distinct chimeric profiles. Furthermore, we performed CNV calling on two brains with multiple system atrophy and one control brain using different reference genomes. We found that 38% of brain cells have at least one Mb-scale CNV, with some supported by bulk sequencing or single-cells from other brain regions. Our study highlights the importance of selecting whole genome amplification method and reference genome for CNV calling, while supporting the existence of somatic CNVs in healthy and diseased human brain.
Collapse
Affiliation(s)
- Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - Zeliha Gozde Turan
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - Diego Perez-Rodriguez
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
| | - Ida Bomann
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030, USA
| | - Caoimhe Morley
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
| | - Sonja W. Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Zane Jaunmuktane
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Queen Square Brain Bank for Neurological disorders, UCL Queen Square Institute of Neurology, London, UK
| | - Jonas Demeulemeester
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Department of Oncology, KU Leuven, Leuven, Belgium
- Cancer Genomics Laboratory, The Francis Crick Institute, London, UK
- VIB Center for Cancer Biology, Leuven, Belgium
| | - Fritz J Sedlazeck
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, TX, USA
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| |
Collapse
|
4
|
Ferrari G, Esselens L, Hart ML, Janssens S, Kidner C, Mascarello M, Peñalba JV, Pezzini F, von Rintelen T, Sonet G, Vangestel C, Virgilio M, Hollingsworth PM. Developing the Protocol Infrastructure for DNA Sequencing Natural History Collections. Biodivers Data J 2023; 11:e102317. [PMID: 38327316 PMCID: PMC10848826 DOI: 10.3897/bdj.11.e102317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 08/04/2023] [Indexed: 02/09/2024] Open
Abstract
Intentionally preserved biological material in natural history collections represents a vast repository of biodiversity. Advances in laboratory and sequencing technologies have made these specimens increasingly accessible for genomic analyses, offering a window into the genetic past of species and often permitting access to information that can no longer be sampled in the wild. Due to their age, preparation and storage conditions, DNA retrieved from museum and herbarium specimens is often poor in yield, heavily fragmented and biochemically modified. This not only poses methodological challenges in recovering nucleotide sequences, but also makes such investigations susceptible to environmental and laboratory contamination. In this paper, we review the practical challenges associated with making the recovery of DNA sequence data from museum collections more routine. We first review key operational principles and issues to address, to guide the decision-making process and dialogue between researchers and curators about when and how to sample museum specimens for genomic analyses. We then outline the range of steps that can be taken to reduce the likelihood of contamination including laboratory set-ups, workflows and working practices. We finish by presenting a series of case studies, each focusing on protocol practicalities for the application of different mainstream methodologies to museum specimens including: (i) shotgun sequencing of insect mitogenomes, (ii) whole genome sequencing of insects, (iii) genome skimming to recover plant plastid genomes from herbarium specimens, (iv) target capture of multi-locus nuclear sequences from herbarium specimens, (v) RAD-sequencing of bird specimens and (vi) shotgun sequencing of ancient bovid bone samples.
Collapse
Affiliation(s)
- Giada Ferrari
- Royal Botanic Garden Edinburgh, Edinburgh, United KingdomRoyal Botanic Garden EdinburghEdinburghUnited Kingdom
| | - Lore Esselens
- Royal Museum for Central Africa, Tervuren, BelgiumRoyal Museum for Central AfricaTervurenBelgium
- Royal Belgian Institute of Natural Sciences, Brussels, BelgiumRoyal Belgian Institute of Natural SciencesBrusselsBelgium
| | - Michelle L Hart
- Royal Botanic Garden Edinburgh, Edinburgh, United KingdomRoyal Botanic Garden EdinburghEdinburghUnited Kingdom
| | - Steven Janssens
- Meise Botanic Garden, Meise, BelgiumMeise Botanic GardenMeiseBelgium
- Leuven Plant Institute, Department of Biology, Leuven, BelgiumLeuven Plant Institute, Department of BiologyLeuvenBelgium
| | - Catherine Kidner
- Royal Botanic Garden Edinburgh, Edinburgh, United KingdomRoyal Botanic Garden EdinburghEdinburghUnited Kingdom
| | | | - Joshua V Peñalba
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Berlin, GermanyMuseum für Naturkunde, Leibniz Institute for Evolution and Biodiversity ScienceBerlinGermany
| | - Flávia Pezzini
- Royal Botanic Garden Edinburgh, Edinburgh, United KingdomRoyal Botanic Garden EdinburghEdinburghUnited Kingdom
| | - Thomas von Rintelen
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Berlin, GermanyMuseum für Naturkunde, Leibniz Institute for Evolution and Biodiversity ScienceBerlinGermany
| | - Gontran Sonet
- Royal Belgian Institute of Natural Sciences, Brussels, BelgiumRoyal Belgian Institute of Natural SciencesBrusselsBelgium
| | - Carl Vangestel
- Royal Belgian Institute of Natural Sciences, Brussels, BelgiumRoyal Belgian Institute of Natural SciencesBrusselsBelgium
| | - Massimiliano Virgilio
- Royal Museum for Central Africa, Department of African Zoology, Tervuren, BelgiumRoyal Museum for Central Africa, Department of African ZoologyTervurenBelgium
| | - Peter M Hollingsworth
- Royal Botanic Garden Edinburgh, Edinburgh, United KingdomRoyal Botanic Garden EdinburghEdinburghUnited Kingdom
| |
Collapse
|
5
|
Massimino M, Martorana F, Stella S, Vitale SR, Tomarchio C, Manzella L, Vigneri P. Single-Cell Analysis in the Omics Era: Technologies and Applications in Cancer. Genes (Basel) 2023; 14:1330. [PMID: 37510235 PMCID: PMC10380065 DOI: 10.3390/genes14071330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 06/16/2023] [Accepted: 06/20/2023] [Indexed: 07/30/2023] Open
Abstract
Cancer molecular profiling obtained with conventional bulk sequencing describes average alterations obtained from the entire cellular population analyzed. In the era of precision medicine, this approach is unable to track tumor heterogeneity and cannot be exploited to unravel the biological processes behind clonal evolution. In the last few years, functional single-cell omics has improved our understanding of cancer heterogeneity. This approach requires isolation and identification of single cells starting from an entire population. A cell suspension obtained by tumor tissue dissociation or hematological material can be manipulated using different techniques to separate individual cells, employed for single-cell downstream analysis. Single-cell data can then be used to analyze cell-cell diversity, thus mapping evolving cancer biological processes. Despite its unquestionable advantages, single-cell analysis produces massive amounts of data with several potential biases, stemming from cell manipulation and pre-amplification steps. To overcome these limitations, several bioinformatic approaches have been developed and explored. In this work, we provide an overview of this entire process while discussing the most recent advances in the field of functional omics at single-cell resolution.
Collapse
Affiliation(s)
- Michele Massimino
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Federica Martorana
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Stefania Stella
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Silvia Rita Vitale
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Cristina Tomarchio
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Livia Manzella
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Paolo Vigneri
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
- Humanitas Istituto Clinico Catanese, University Oncology Department, 95045 Catania, Italy
| |
Collapse
|
6
|
Zhai Y, Bardel C, Vallée M, Iwaz J, Roy P. Performance comparisons between clustering models for reconstructing NGS results from technical replicates. Front Genet 2023; 14:1148147. [PMID: 37007945 PMCID: PMC10060969 DOI: 10.3389/fgene.2023.1148147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 03/06/2023] [Indexed: 03/18/2023] Open
Abstract
To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes.
Collapse
Affiliation(s)
- Yue Zhai
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- *Correspondence: Yue Zhai,
| | - Claire Bardel
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
- Service de Génétique, Hospices Civils de Lyon, Bron, France
| | - Maxime Vallée
- Cellule Bioinformatique de La Plateforme de Séquençage Haut Débit NGS-HCL, Hospices Civils de Lyon, Bron, France
| | - Jean Iwaz
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
| | - Pascal Roy
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
| |
Collapse
|
7
|
Begg TJA, Schmidt A, Kocher A, Larmuseau MHD, Runfeldt G, Maier PA, Wilson JD, Barquera R, Maj C, Szolek A, Sager M, Clayton S, Peltzer A, Hui R, Ronge J, Reiter E, Freund C, Burri M, Aron F, Tiliakou A, Osborn J, Behar DM, Boecker M, Brandt G, Cleynen I, Strassburg C, Prüfer K, Kühnert D, Meredith WR, Nöthen MM, Attenborough RD, Kivisild T, Krause J. Genomic analyses of hair from Ludwig van Beethoven. Curr Biol 2023; 33:1431-1447.e22. [PMID: 36958333 DOI: 10.1016/j.cub.2023.02.041] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 10/11/2022] [Accepted: 02/13/2023] [Indexed: 03/25/2023]
Abstract
Ludwig van Beethoven (1770-1827) remains among the most influential and popular classical music composers. Health problems significantly impacted his career as a composer and pianist, including progressive hearing loss, recurring gastrointestinal complaints, and liver disease. In 1802, Beethoven requested that following his death, his disease be described and made public. Medical biographers have since proposed numerous hypotheses, including many substantially heritable conditions. Here we attempt a genomic analysis of Beethoven in order to elucidate potential underlying genetic and infectious causes of his illnesses. We incorporated improvements in ancient DNA methods into existing protocols for ancient hair samples, enabling the sequencing of high-coverage genomes from small quantities of historical hair. We analyzed eight independently sourced locks of hair attributed to Beethoven, five of which originated from a single European male. We deemed these matching samples to be almost certainly authentic and sequenced Beethoven's genome to 24-fold genomic coverage. Although we could not identify a genetic explanation for Beethoven's hearing disorder or gastrointestinal problems, we found that Beethoven had a genetic predisposition for liver disease. Metagenomic analyses revealed furthermore that Beethoven had a hepatitis B infection during at least the months prior to his death. Together with the genetic predisposition and his broadly accepted alcohol consumption, these present plausible explanations for Beethoven's severe liver disease, which culminated in his death. Unexpectedly, an analysis of Y chromosomes sequenced from five living members of the Van Beethoven patrilineage revealed the occurrence of an extra-pair paternity event in Ludwig van Beethoven's patrilineal ancestry.
Collapse
Affiliation(s)
- Tristan James Alexander Begg
- Department of Archaeology, University of Cambridge, CB2 3ER Cambridge, UK; Institute for Archaeological Sciences, University of Tübingen, 72070 Tübingen, Germany; Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany.
| | - Axel Schmidt
- Institute of Human Genetics, University Hospital of Bonn, Bonn 53127, Germany
| | - Arthur Kocher
- Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany; Transmission, Infection, Diversification and Evolution Group, Max Planck Institute for the Science of Human History, 07745 Jena, Germany; Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany
| | - Maarten H D Larmuseau
- Department of Human Genetics, Katholieke Universiteit Leuven, 3000 Leuven, Belgium; Laboratory of Human Genetic Genealogy, Department of Human Genetics, Katholieke Universiteit Leuven, 3000 Leuven, Belgium; ARCHES - Antwerp Cultural Heritage Sciences, Faculty of Design Sciences, University of Antwerp, 2000 Antwerp, Belgium; Histories vzw, 9000 Gent, Belgium
| | | | | | - John D Wilson
- Austrian Academy of Sciences, 1030 Vienna, Austria; University of Vienna, 1010 Vienna, Austria
| | - Rodrigo Barquera
- Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany
| | - Carlo Maj
- Institute of Human Genetics, University Hospital of Bonn, Bonn 53127, Germany; Center for Human Genetics, University Hospital of Marburg, Marburg, Germany
| | - András Szolek
- Applied Bioinformatics, Department for Computer Science, University of Tübingen, Sand 14, 72076 Tübingen, Germany; Department of Immunology, Interfaculty Institute for Cell Biology, University of Tübingen, Tübingen, Germany
| | | | - Stephen Clayton
- Institute for Archaeological Sciences, University of Tübingen, 72070 Tübingen, Germany; Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany
| | - Alexander Peltzer
- Quantitative Biology Center (QBiC) University of Tübingen, Tübingen, Germany
| | - Ruoyun Hui
- MacDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK; Alan Turing Institute, 2QR, John Dodson House, London NW1 2DB, UK
| | | | - Ella Reiter
- Institute for Archaeological Sciences, University of Tübingen, 72070 Tübingen, Germany
| | - Cäcilia Freund
- Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany
| | - Marta Burri
- Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany
| | - Franziska Aron
- Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany
| | - Anthi Tiliakou
- Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany; Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany
| | - Joanna Osborn
- Department of Archaeology, University of Cambridge, CB2 3ER Cambridge, UK
| | - Doron M Behar
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | | | - Guido Brandt
- Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany
| | - Isabelle Cleynen
- Department of Human Genetics, Katholieke Universiteit Leuven, 3000 Leuven, Belgium
| | - Christian Strassburg
- Department of Internal Medicine I, University Hospital Bonn, 53127 Bonn, Germany
| | - Kay Prüfer
- Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany
| | - Denise Kühnert
- Transmission, Infection, Diversification and Evolution Group, Max Planck Institute for the Science of Human History, 07745 Jena, Germany; European Virus Bioinformatics Center (EVBC), Jena, Germany; Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany
| | - William Rhea Meredith
- American Beethoven Society, San Jose State University, San Jose, CA 95192, USA; Ira F. Brilliant Center for Beethoven Studies, San Jose State University, San Jose, CA 95192, USA; School of Music and Dance, San Jose State University, San Jose, CA 95192, USA
| | - Markus M Nöthen
- Institute of Human Genetics, University Hospital of Bonn, Bonn 53127, Germany
| | - Robert David Attenborough
- MacDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK; School of Archaeology & Anthropology, Australian National University, Canberra, ACT 0200, Australia
| | - Toomas Kivisild
- Department of Archaeology, University of Cambridge, CB2 3ER Cambridge, UK; Department of Human Genetics, Katholieke Universiteit Leuven, 3000 Leuven, Belgium; Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia.
| | - Johannes Krause
- Institute for Archaeological Sciences, University of Tübingen, 72070 Tübingen, Germany; Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany; Max Planck Institute for the Science of Human History, Kahlaische Str. 10, 07745 Jena, Germany.
| |
Collapse
|
8
|
de Flamingh A, Ishida Y, Pečnerová P, Vilchis S, Siegismund HR, van Aarde RJ, Malhi RS, Roca AL. Combining methods for non-invasive fecal DNA enables whole genome and metagenomic analyses in wildlife biology. Front Genet 2023; 13:1021004. [PMID: 36712847 PMCID: PMC9876978 DOI: 10.3389/fgene.2022.1021004] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 12/05/2022] [Indexed: 01/13/2023] Open
Abstract
Non-invasive biological samples benefit studies that investigate rare, elusive, endangered, or dangerous species. Integrating genomic techniques that use non-invasive biological sampling with advances in computational approaches can benefit and inform wildlife conservation and management. Here, we used non-invasive fecal DNA samples to generate low- to medium-coverage genomes (e.g., >90% of the complete nuclear genome at six X-fold coverage) and metagenomic sequences, combining widely available and accessible DNA collection cards with commonly used DNA extraction and library building approaches. DNA preservation cards are easy to transport and can be stored non-refrigerated, avoiding cumbersome or costly sample methods. The genomic library construction and shotgun sequencing approach did not require enrichment or targeted DNA amplification. The utility and potential of the data generated was demonstrated through genome scale and metagenomic analyses of zoo and free-ranging African savanna elephants (Loxodonta africana). Fecal samples collected from free-ranging individuals contained an average of 12.41% (5.54-21.65%) endogenous elephant DNA. Clustering of these elephants with others from the same geographic region was demonstrated by a principal component analysis of genetic variation using nuclear genome-wide SNPs. Metagenomic analyses identified taxa that included Loxodonta, green plants, fungi, arthropods, bacteria, viruses and archaea, showcasing the utility of this approach for addressing complementary questions based on host-associated DNA, e.g., pathogen and parasite identification. The molecular and bioinformatic analyses presented here contributes towards the expansion and application of genomic techniques to conservation science and practice.
Collapse
Affiliation(s)
- Alida de Flamingh
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Yasuko Ishida
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Patrícia Pečnerová
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sahara Vilchis
- Department of Anthropology, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Hans R. Siegismund
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rudi J. van Aarde
- Department of Zoology and Entomology, Conservation Ecology Research Unit, University of Pretoria, Pretoria, South Africa
| | - Ripan S. Malhi
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- Department of Anthropology, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Alfred L. Roca
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
9
|
Hooker JC, Nissan N, Luckert D, Charette M, Zapata G, Lefebvre F, Mohr RM, Daba KA, Warkentin TD, Hadinezhad M, Barlow B, Hou A, Golshani A, Cober ER, Samanfar B. A Multi-Year, Multi-Cultivar Approach to Differential Expression Analysis of High- and Low-Protein Soybean ( Glycine max). Int J Mol Sci 2022; 24:ijms24010222. [PMID: 36613666 PMCID: PMC9820483 DOI: 10.3390/ijms24010222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 12/12/2022] [Accepted: 12/15/2022] [Indexed: 12/25/2022] Open
Abstract
Soybean (Glycine max (L.) Merr.) is among the most valuable crops based on its nutritious seed protein and oil. Protein quality, evaluated as the ratio of glycinin (11S) to β-conglycinin (7S), can play a role in food and feed quality. To help uncover the underlying differences between high and low protein soybean varieties, we performed differential expression analysis on high and low total protein soybean varieties and high and low 11S soybean varieties grown in four locations across Eastern and Western Canada over three years (2018-2020). Simultaneously, ten individual differential expression datasets for high vs. low total protein soybeans and ten individual differential expression datasets for high vs. low 11S soybeans were assessed, for a total of 20 datasets. The top 15 most upregulated and the 15 most downregulated genes were extracted from each differential expression dataset and cross-examination was conducted to create shortlists of the most consistently differentially expressed genes. Shortlisted genes were assessed for gene ontology to gain a global appreciation of the commonly differentially expressed genes. Genes with roles in the lipid metabolic pathway and carbohydrate metabolic pathway were differentially expressed in high total protein and high 11S soybeans in comparison to their low total protein and low 11S counterparts. Expression differences were consistent between East and West locations with the exception of one, Glyma.03G054100. These data are important for uncovering the genes and biological pathways responsible for the difference in seed protein between high and low total protein or 11S cultivars.
Collapse
Affiliation(s)
- Julia C. Hooker
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON K1A 0C6, Canada
- Department of Biology, Ottawa Institute of Systems Biology, Carleton University, 1125 Colonel By Dr., Ottawa, ON K1S 5B6, Canada
| | - Nour Nissan
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON K1A 0C6, Canada
- Department of Biology, Ottawa Institute of Systems Biology, Carleton University, 1125 Colonel By Dr., Ottawa, ON K1S 5B6, Canada
| | - Doris Luckert
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON K1A 0C6, Canada
| | - Martin Charette
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON K1A 0C6, Canada
| | - Gerardo Zapata
- Canadian Centre for Computational Genomics, 740 Dr. Penfield Ave, Montréal, QC H3A 0G1, Canada
| | - François Lefebvre
- Canadian Centre for Computational Genomics, 740 Dr. Penfield Ave, Montréal, QC H3A 0G1, Canada
| | - Ramona M. Mohr
- Agriculture and Agri-Food Canada, 2701 Grand Valley Road, Brandon, MB R7A 5Y3, Canada
| | - Ketema A. Daba
- Crop Development Centre, University of Saskatchewan, Saskatoon, SK S7N 5A8, Canada
| | - Thomas D. Warkentin
- Crop Development Centre, University of Saskatchewan, Saskatoon, SK S7N 5A8, Canada
| | - Mehri Hadinezhad
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON K1A 0C6, Canada
| | - Brent Barlow
- Crop Development Centre, University of Saskatchewan, Saskatoon, SK S7N 5A8, Canada
| | - Anfu Hou
- Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada
| | - Ashkan Golshani
- Department of Biology, Ottawa Institute of Systems Biology, Carleton University, 1125 Colonel By Dr., Ottawa, ON K1S 5B6, Canada
| | - Elroy R. Cober
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON K1A 0C6, Canada
| | - Bahram Samanfar
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON K1A 0C6, Canada
- Department of Biology, Ottawa Institute of Systems Biology, Carleton University, 1125 Colonel By Dr., Ottawa, ON K1S 5B6, Canada
- Correspondence:
| |
Collapse
|
10
|
Mo Y, Jiao Y. Advances and applications of single-cell omics technologies in plant research. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 110:1551-1563. [PMID: 35426954 DOI: 10.1111/tpj.15772] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 06/14/2023]
Abstract
Single-cell sequencing approaches reveal the intracellular dynamics of individual cells and answer biological questions with high-dimensional catalogs of millions of cells, including genomics, transcriptomics, chromatin accessibility, epigenomics, and proteomics data across species. These emerging yet thriving technologies have been fully embraced by the field of plant biology, with a constantly expanding portfolio of applications. Here, we introduce the current technical advances used for single-cell omics, especially single-cell genome and transcriptome sequencing. Firstly, we overview methods for protoplast and nucleus isolation and genome and transcriptome amplification. Subsequently, we use well-executed benchmarking studies to highlight advances made through the application of single-cell omics techniques. Looking forward, we offer a glimpse of additional hurdles and future opportunities that will introduce broad adoption of single-cell sequencing with revolutionary perspectives in plant biology.
Collapse
Affiliation(s)
- Yajin Mo
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Center for Quantitative Biology, School of Life Sciences, Peking University, Beijing, 100871, China
- School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yuling Jiao
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Center for Quantitative Biology, School of Life Sciences, Peking University, Beijing, 100871, China
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, 100101, China
| |
Collapse
|
11
|
Niu YN, Roberts EG, Denisko D, Hoffman MM. Assessing and assuring interoperability of a genomics file format. Bioinformatics 2022; 38:3327-3336. [PMID: 35575355 PMCID: PMC9237710 DOI: 10.1093/bioinformatics/btac327] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/30/2022] [Accepted: 05/11/2022] [Indexed: 12/01/2022] Open
Abstract
Motivation Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. Results We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite. Availability and implementation Acidbio is available at https://github.com/hoffmangroup/acidbio. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi Nian Niu
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada
| | - Eric G Roberts
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada
| | - Danielle Denisko
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Michael M Hoffman
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada.,Vector Institute, Toronto, ON, M5G 1M1, Canada
| |
Collapse
|
12
|
Suchan T, Chauvey L, Poullet M, Tonasso‐Calvière L, Schiavinato S, Clavel P, Clavel B, Lepetz S, Seguin‐Orlando A, Orlando L. Assessing the impact of USER‐treatment on hyRAD capture applied to ancient DNA. Mol Ecol Resour 2022; 22:2262-2274. [DOI: 10.1111/1755-0998.13619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 03/09/2022] [Accepted: 03/23/2022] [Indexed: 12/01/2022]
Affiliation(s)
- Tomasz Suchan
- Centre d’Anthropobiologie et de Génomique de Toulouse (CAGT) Université Paul Sabatier Faculté de Santé 37 allées Jules Guesde, Bâtiment A 31000 Toulouse France
- W. Szafer Institute of Botany Polish Academy of Sciences Lubicz 46 31‐512 Kraków Poland
| | - Lorelei Chauvey
- Centre d’Anthropobiologie et de Génomique de Toulouse (CAGT) Université Paul Sabatier Faculté de Santé 37 allées Jules Guesde, Bâtiment A 31000 Toulouse France
| | - Marine Poullet
- Centre d’Anthropobiologie et de Génomique de Toulouse (CAGT) Université Paul Sabatier Faculté de Santé 37 allées Jules Guesde, Bâtiment A 31000 Toulouse France
| | - Laure Tonasso‐Calvière
- Centre d’Anthropobiologie et de Génomique de Toulouse (CAGT) Université Paul Sabatier Faculté de Santé 37 allées Jules Guesde, Bâtiment A 31000 Toulouse France
| | - Stéphanie Schiavinato
- Centre d’Anthropobiologie et de Génomique de Toulouse (CAGT) Université Paul Sabatier Faculté de Santé 37 allées Jules Guesde, Bâtiment A 31000 Toulouse France
| | - Pierre Clavel
- Centre d’Anthropobiologie et de Génomique de Toulouse (CAGT) Université Paul Sabatier Faculté de Santé 37 allées Jules Guesde, Bâtiment A 31000 Toulouse France
| | - Benoit Clavel
- Archéozoologie, Archéobotanique: sociétés, pratiques et environnements (AASPE) Muséum National d’Histoire Naturelle CNRS CP 55 rue Buffon Paris France
| | - Sébastien Lepetz
- Archéozoologie, Archéobotanique: sociétés, pratiques et environnements (AASPE) Muséum National d’Histoire Naturelle CNRS CP 55 rue Buffon Paris France
| | - Andaine Seguin‐Orlando
- Centre d’Anthropobiologie et de Génomique de Toulouse (CAGT) Université Paul Sabatier Faculté de Santé 37 allées Jules Guesde, Bâtiment A 31000 Toulouse France
| | - Ludovic Orlando
- Centre d’Anthropobiologie et de Génomique de Toulouse (CAGT) Université Paul Sabatier Faculté de Santé 37 allées Jules Guesde, Bâtiment A 31000 Toulouse France
| |
Collapse
|
13
|
Hanlon VC, Chan DD, Hamadeh Z, Wang Y, Mattsson CA, Spierings DC, Coope RJ, Lansdorp PM. Construction of Strand-seq libraries in open nanoliter arrays. CELL REPORTS METHODS 2022; 2:100150. [PMID: 35474869 PMCID: PMC9017222 DOI: 10.1016/j.crmeth.2021.100150] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 10/22/2021] [Accepted: 12/17/2021] [Indexed: 12/22/2022]
Abstract
Single-cell Strand-seq generates directional genomic information to study DNA repair, assemble genomes, and map structural variation onto chromosome-length haplotypes. We report a nanoliter-volume, one-pot (OP) Strand-seq library preparation protocol in which reagents are added cumulatively, DNA purification steps are avoided, and enzymes are inactivated with a thermolabile protease. OP-Strand-seq libraries capture 10%-25% of the genome from a single-cell with reduced costs and increased throughput.
Collapse
Affiliation(s)
| | - Daniel D. Chan
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
| | - Zeid Hamadeh
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
| | - Yanni Wang
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
| | | | - Diana C.J. Spierings
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, the Netherlands
| | - Robin J.N. Coope
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
| | - Peter M. Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, the Netherlands
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
14
|
Single-cell delineation of lineage and genetic identity in the mouse brain. Nature 2022; 601:404-409. [PMID: 34912118 PMCID: PMC8770128 DOI: 10.1038/s41586-021-04237-0] [Citation(s) in RCA: 96] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 11/12/2021] [Indexed: 01/25/2023]
Abstract
During neurogenesis, mitotic progenitor cells lining the ventricles of the embryonic mouse brain undergo their final rounds of cell division, giving rise to a wide spectrum of postmitotic neurons and glia1,2. The link between developmental lineage and cell-type diversity remains an open question. Here we used massively parallel tagging of progenitors to track clonal relationships and transcriptomic signatures during mouse forebrain development. We quantified clonal divergence and convergence across all major cell classes postnatally, and found diverse types of GABAergic neuron that share a common lineage. Divergence of GABAergic clones occurred during embryogenesis upon cell-cycle exit, suggesting that differentiation into subtypes is initiated as a lineage-dependent process at the progenitor cell level.
Collapse
|
15
|
Quinodoz SA, Bhat P, Chovanec P, Jachowicz JW, Ollikainen N, Detmar E, Soehalim E, Guttman M. SPRITE: a genome-wide method for mapping higher-order 3D interactions in the nucleus using combinatorial split-and-pool barcoding. Nat Protoc 2022; 17:36-75. [PMID: 35013617 DOI: 10.1038/s41596-021-00633-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/13/2021] [Indexed: 12/23/2022]
Abstract
A fundamental question in gene regulation is how cell-type-specific gene expression is influenced by the packaging of DNA within the nucleus of each cell. We recently developed Split-Pool Recognition of Interactions by Tag Extension (SPRITE), which enables mapping of higher-order interactions within the nucleus. SPRITE works by cross-linking interacting DNA, RNA and protein molecules and then mapping DNA-DNA spatial arrangements through an iterative split-and-pool barcoding method. All DNA molecules within a cross-linked complex are barcoded by repeatedly splitting complexes across a 96-well plate, ligating molecules with a unique tag sequence, and pooling all complexes into a single well before repeating the tagging. Because all molecules in a cross-linked complex are covalently attached, they will sort together throughout each round of split-and-pool and will obtain the same series of SPRITE tags, which we refer to as a barcode. The DNA fragments and their associated barcodes are sequenced, and all reads sharing identical barcodes are matched to reconstruct interactions. SPRITE accurately maps pairwise DNA interactions within the nucleus and measures higher-order spatial contacts occurring among up to thousands of simultaneously interacting molecules. Here, we provide a detailed protocol for the experimental steps of SPRITE, including a video ( https://youtu.be/6SdWkBxQGlg ). Furthermore, we provide an automated computational pipeline available on GitHub that allows experimenters to seamlessly generate SPRITE interaction matrices starting with raw fastq files. The protocol takes ~5 d from cell cross-linking to high-throughput sequencing for the experimental steps and 1 d for data processing.
Collapse
Affiliation(s)
- Sofia A Quinodoz
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
| | - Prashant Bhat
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Peter Chovanec
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Joanna W Jachowicz
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Noah Ollikainen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Elizabeth Detmar
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Elizabeth Soehalim
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Mitchell Guttman
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
16
|
Individual human cortical progenitors can produce excitatory and inhibitory neurons. Nature 2022; 601:397-403. [PMID: 34912114 PMCID: PMC8994470 DOI: 10.1038/s41586-021-04230-7] [Citation(s) in RCA: 97] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 11/10/2021] [Indexed: 01/19/2023]
Abstract
The cerebral cortex is a cellularly complex structure comprising a rich diversity of neuronal and glial cell types. Cortical neurons can be broadly categorized into two classes-excitatory neurons that use the neurotransmitter glutamate, and inhibitory interneurons that use γ-aminobutyric acid (GABA). Previous developmental studies in rodents have led to a prevailing model in which excitatory neurons are born from progenitors located in the cortex, whereas cortical interneurons are born from a separate population of progenitors located outside the developing cortex in the ganglionic eminences1-5. However, the developmental potential of human cortical progenitors has not been thoroughly explored. Here we show that, in addition to excitatory neurons and glia, human cortical progenitors are also capable of producing GABAergic neurons with the transcriptional characteristics and morphologies of cortical interneurons. By developing a cellular barcoding tool called 'single-cell-RNA-sequencing-compatible tracer for identifying clonal relationships' (STICR), we were able to carry out clonal lineage tracing of 1,912 primary human cortical progenitors from six specimens, and to capture both the transcriptional identities and the clonal relationships of their progeny. A subpopulation of cortically born GABAergic neurons was transcriptionally similar to cortical interneurons born from the caudal ganglionic eminence, and these cells were frequently related to excitatory neurons and glia. Our results show that individual human cortical progenitors can generate both excitatory neurons and cortical interneurons, providing a new framework for understanding the origins of neuronal diversity in the human cortex.
Collapse
|
17
|
Smith JP, Corces MR, Xu J, Reuter VP, Chang HY, Sheffield NC. PEPATAC: an optimized pipeline for ATAC-seq data analysis with serial alignments. NAR Genom Bioinform 2021; 3:lqab101. [PMID: 34859208 PMCID: PMC8632735 DOI: 10.1093/nargab/lqab101] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 09/30/2021] [Accepted: 11/15/2021] [Indexed: 12/18/2022] Open
Abstract
As chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. PEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project. BSD2-licensed code and documentation are available at https://pepatac.databio.org.
Collapse
Affiliation(s)
- Jason P Smith
- Center for Public Health Genomics, University of Virginia, VA,22908, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, VA 22908 USA
| | - M Ryan Corces
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94304, USA
| | - Jin Xu
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94304, USA
| | - Vincent P Reuter
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, PA 19087, USA
| | - Howard Y Chang
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94304, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, VA,22908, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, VA 22908 USA
- Department of Public Health Sciences, University of Virginia, VA 22908, USA
- Department of Biomedical Engineering, University of Virginia, VA 22908, USA
| |
Collapse
|
18
|
O'Grady CJ, Dhandapani V, Colbourne JK, Frisch D. Refining the evolutionary time machine: An assessment of whole genome amplification using single historical Daphnia eggs. Mol Ecol Resour 2021; 22:946-961. [PMID: 34672105 DOI: 10.1111/1755-0998.13524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Revised: 09/03/2021] [Accepted: 09/07/2021] [Indexed: 12/14/2022]
Abstract
Whole genome sequencing is instrumental for the study of genome variation in natural populations, delivering important knowledge on genomic modifications and potential targets of natural selection at the population level. Large dormant eggbanks of aquatic invertebrates such as the keystone herbivore Daphnia, a microcrustacean widespread in freshwater ecosystems, provide detailed sedimentary archives to study genomic processes over centuries. To overcome the problem of limited DNA amounts in single Daphnia dormant eggs, we developed an optimized workflow for whole genome amplification (WGA), yielding sufficient amounts of DNA for downstream whole genome sequencing of individual historical eggs, including polyploid lineages. We compare two WGA kits, applied to recently produced Daphnia magna dormant eggs from laboratory cultures, and to historical dormant eggs of Daphnia pulicaria collected from Arctic lake sediment between 10 and 300 years old. Resulting genome coverage breadth in most samples was ~70%, including those from >100-year-old isolates. Sequence read distribution was highly correlated among samples amplified with the same kit, but less correlated between kits. Despite this, a high percentage of genomic positions with single nucleotide polymorphisms in one or more samples (maximum of 74% between kits, and 97% within kits) were recovered at a depth required for genotyping. As a by-product of sequencing we obtained 100% coverage of the mitochondrial genomes even from the oldest isolates (~300 years). The mitochondrial DNA provides an additional source for evolutionary studies of these populations. We provide an optimized workflow for WGA followed by whole genome sequencing including steps to minimize exogenous DNA.
Collapse
Affiliation(s)
- Christopher James O'Grady
- School of Life Sciences, University of Warwick, Coventry, UK.,Cell and Gene Therapy Catapult, London, UK.,School of Biosciences, University of Birmingham, Birmingham, UK
| | | | | | - Dagmar Frisch
- School of Biosciences, University of Birmingham, Birmingham, UK.,Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Berlin, Germany
| |
Collapse
|
19
|
Polavarapu VK, Xing P, Zhang H, Zhao M, Mathot L, Zhao L, Rosen G, Swartling FJ, Sjöblom T, Chen X. Profiling chromatin accessibility in formalin-fixed paraffin-embedded samples. Genome Res 2021; 32:150-161. [PMID: 34261731 PMCID: PMC8744681 DOI: 10.1101/gr.275269.121] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 07/08/2021] [Indexed: 11/25/2022]
Abstract
Archived formalin-fixed paraffin-embedded (FFPE) samples are the global standard format for preservation of the majority of biopsies in both basic research and translational cancer studies, and profiling chromatin accessibility in the archived FFPE tissues is fundamental to understanding gene regulation. Accurate mapping of chromatin accessibility from FFPE specimens is challenging because of the high degree of DNA damage. Here, we first showed that standard ATAC-seq can be applied to purified FFPE nuclei but yields lower library complexity and a smaller proportion of long DNA fragments. We then present FFPE-ATAC, the first highly sensitive method for decoding chromatin accessibility in FFPE tissues that combines Tn5-mediated transposition and T7 in vitro transcription. The FFPE-ATAC generates high-quality chromatin accessibility profiles with 500 nuclei from a single FFPE tissue section, enables the dissection of chromatin profiles from the regions of interest with the aid of hematoxylin and eosin (H&E) staining, and reveals disease-associated chromatin regulation from the human colorectal cancer FFPE tissue archived for more than 10 years. In summary, the approach allows decoding of the chromatin states that regulate gene expression in archival FFPE tissues, thereby permitting investigators, to better understand epigenetic regulation in cancer and precision medicine.
Collapse
|
20
|
Smith JP, Dutta AB, Sathyan KM, Guertin MJ, Sheffield NC. PEPPRO: quality control and processing of nascent RNA profiling data. Genome Biol 2021; 22:155. [PMID: 33992117 PMCID: PMC8126160 DOI: 10.1186/s13059-021-02349-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/12/2021] [Indexed: 12/18/2022] Open
Abstract
Nascent RNA profiling is growing in popularity; however, there is no standard analysis pipeline to uniformly process the data and assess quality. Here, we introduce PEPPRO, a comprehensive, scalable workflow for GRO-seq, PRO-seq, and ChRO-seq data. PEPPRO produces uniformly processed output files for downstream analysis and assesses adapter abundance, RNA integrity, library complexity, nascent RNA purity, and run-on efficiency. PEPPRO is restartable and fault-tolerant, records copious logs, and provides a web-based project report. PEPPRO can be run locally or using a cluster, providing a portable first step for genomic nascent RNA analysis.
Collapse
Affiliation(s)
- Jason P Smith
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, USA
| | - Arun B Dutta
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, USA
| | | | - Michael J Guertin
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA.
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, USA.
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA.
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, USA.
- Department of Public Health Sciences, University of Virginia, Charlottesville, USA.
- Department of Biomedical Engineering, University of Virginia, Charlottesville, USA.
| |
Collapse
|
21
|
Ciobanu D, Clum A, Ahrendt S, Andreopoulos WB, Salamov A, Chan S, Quandt CA, Foster B, Meier-Kolthoff JP, Tang YT, Schwientek P, Benny GL, Smith ME, Bauer D, Deshpande S, Barry K, Copeland A, Singer SW, Woyke T, Grigoriev IV, James TY, Cheng JF. A single-cell genomics pipeline for environmental microbial eukaryotes. iScience 2021; 24:102290. [PMID: 33870123 PMCID: PMC8042348 DOI: 10.1016/j.isci.2021.102290] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 02/12/2021] [Accepted: 03/04/2021] [Indexed: 12/05/2022] Open
Abstract
Single-cell sequencing of environmental microorganisms is an essential component of the microbial ecology toolkit. However, large-scale targeted single-cell sequencing for the whole-genome recovery of uncultivated eukaryotes is lagging. The key challenges are low abundance in environmental communities, large complex genomes, and cell walls that are difficult to break. We describe a pipeline composed of state-of-the art single-cell genomics tools and protocols optimized for poorly studied and uncultivated eukaryotic microorganisms that are found at low abundance. This pipeline consists of seven distinct steps, beginning with sample collection and ending with genome annotation, each equipped with quality review steps to ensure high genome quality at low cost. We tested and evaluated each step on environmental samples and cultures of early-diverging lineages of fungi and Chromista/SAR. We show that genomes produced using this pipeline are almost as good as complete reference genomes for functional and comparative genomics for environmental microbial eukaryotes.
Collapse
Affiliation(s)
- Doina Ciobanu
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Alicia Clum
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Steven Ahrendt
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - William B. Andreopoulos
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Asaf Salamov
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Sandy Chan
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
- Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - C. Alisha Quandt
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Brian Foster
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Jan P. Meier-Kolthoff
- Department of Bioinformatics and Databases, Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7B, 38124 Braunschweig, Germany
| | - Yung Tsu Tang
- Joint BioEnergy Institute, Emeryville, CA 94608, USA
| | - Patrick Schwientek
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Gerald L. Benny
- Department of Plant Pathology, University of Florida, Gainesville, FL 32611, USA
| | - Matthew E. Smith
- Department of Plant Pathology, University of Florida, Gainesville, FL 32611, USA
| | - Diane Bauer
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Shweta Deshpande
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Kerrie Barry
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Alex Copeland
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | | | - Tanja Woyke
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| | - Igor V. Grigoriev
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Timothy Y. James
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jan-Fang Cheng
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory Berkeley, Berkeley, CA, USA
| |
Collapse
|
22
|
Confidence intervals for Markov chain transition probabilities based on next generation sequencing reads data. QUANTITATIVE BIOLOGY 2020; 8:143-154. [PMID: 34262790 DOI: 10.1007/s40484-020-0200-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Background Markov chains (MC) have been widely used to model molecular sequences. The estimations of MC transition matrix and confidence intervals of the transition probabilities from long sequence data have been intensively studied in the past decades. In next generation sequencing (NGS), a large amount of short reads are generated. These short reads can overlap and some regions of the genome may not be sequenced resulting in a new type of data. Based on NGS data, the transition probabilities of MC can be estimated by moment estimators. However, the classical asymptotic distribution theory for MC transition probability estimators based on long sequences is no longer valid. Methods In this study, we present the asymptotic distributions of several statistics related to MC based on NGS data. We show that, after scaling by the effective coverage d defined in a previous study by the authors, these statistics based on NGS data approximate to the same distributions as the corresponding statistics for long sequences. Results We apply the asymptotic properties of these statistics for finding the theoretical confidence regions for MC transition probabilities based on NGS short reads data. We validate our theoretical confidence intervals using both simulated data and real data sets, and compare the results with those by the parametric bootstrap method. Conclusions We find that the asymptotic distributions of these statistics and the theoretical confidence intervals of transition probabilities based on NGS data given in this study are highly accurate, providing a powerful tool for NGS data analysis.
Collapse
|
23
|
Xu T, Gong Y, Su X, Zhu P, Dai J, Xu J, Ma B. Phenome-Genome Profiling of Single Bacterial Cell by Raman-Activated Gravity-Driven Encapsulation and Sequencing. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2020; 16:e2001172. [PMID: 32519499 DOI: 10.1002/smll.202001172] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 05/01/2020] [Indexed: 06/11/2023]
Abstract
The small size and low DNA amount of bacterial cells have hindered establishing phenome-genome links in a precisely indexed, one-cell-per-reaction manner. Here, Raman-Activated Gravity-driven single-cell Encapsulation and Sequencing (RAGE-Seq) is presented, where individual cells are phenotypically screened via single-cell Raman spectra (SCRS) in an aquatic, vitality-preserving environment, then the cell with targeted SCRS is precisely packaged in a picoliter microdroplet and readily exported in a precisely indexed, "one-cell-one-tube" manner. Such integration of microdroplet encapsulation to Raman-activated sorting ensures high-coverage one-cell genome sequencing or cultivation that is directly linked to metabolic phenotype. For clinical Escherichia coli isolates, genome assemblies derived from precisely one cell via RAGE-Seq consistently reach >95% coverage. Moreover, directly from a urine sample of urogenital tract infection, metabolic-activity-based antimicrobial susceptibility phenotypes and genome sequence of 99.5% coverage are obtained simultaneously from precisely one cell. This single-cell global mutation map corroborates resistance phenotype and genotype, and unveils epidemiological features with high specificity and sensitivity. The ability to profile and correlate bacterial metabolic phenome and high-quality genome sequences at one-cell resolution suggests broad application of RAGE-Seq.
Collapse
Affiliation(s)
- Teng Xu
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, 266101, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266071, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yanhai Gong
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, 266101, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266071, China
| | - Xiaolu Su
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, 266101, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266071, China
| | - Pengfei Zhu
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, 266101, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266071, China
| | - Jing Dai
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, 266101, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266071, China
| | - Jian Xu
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, 266101, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266071, China
| | - Bo Ma
- Single-Cell Center, CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, 266101, China
- Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266071, China
| |
Collapse
|
24
|
Deng C, Daley T, Calabrese P, Ren J, Smith AD. Predicting the Number of Bases to Attain Sufficient Coverage in High-Throughput Sequencing Experiments. J Comput Biol 2019; 27:1130-1143. [PMID: 31725321 DOI: 10.1089/cmb.2019.0264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
For many types of high-throughput sequencing experiments, success in downstream analysis depends on attaining sufficient coverage for individual positions in the genome. For example, when identifying single-nucleotide variants de novo, the number of reads supporting a particular variant call determines our confidence in that variant call. If sequenced reads are distributed uniformly along the genome, the coverage of a nucleotide position is easily approximated by a Poisson distribution, with rate equal to average sequencing depth. Unfortunately, as has become well known, high-throughput sequencing data are never uniform. The numerous factors contributing to variation in coverage have resisted attempts at direct modeling and change along with minor adjustments in the underlying technology. We propose a new nonparametric method to predict the portion of a genome that will attain some specified minimum coverage, as a function of sequencing effort, using information from a shallow sequencing experiment from the same library. Simulations show our approach performs well under an array of distributional assumptions that deviate from uniformity. We applied this approach to estimate coverage at varying depths in single-cell whole-genome sequencing data from multiple protocols. These resulted in highly accurate predictions, demonstrating the effectiveness of our approach in analyzing complexity of sequencing libraries and optimizing design of sequencing experiments.
Collapse
Affiliation(s)
- Chao Deng
- Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Timothy Daley
- Departments of Statistics and Bioengineering, Stanford University, Stanford, California, USA
| | - Peter Calabrese
- Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Jie Ren
- Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Andrew D Smith
- Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
25
|
Deng C, Daley T, De Sena Brandine G, Smith AD. Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021339] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
High-throughput sequencing technologies have evolved at a stellar pace for almost a decade and have greatly advanced our understanding of genome biology. In these sampling-based technologies, there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a representative picture of the underlying population. This has long been recognized as a problem in statistical ecology and in the broader statistics literature. In this review, we discuss the connections between these fields, methodological advances that parallel both the needs and opportunities of large-scale data analysis, and specific applications in modern biology. In the process we describe unique aspects of applying these approaches to sequencing technologies, including sequencing error, population and individual heterogeneity, and the design of experiments.
Collapse
Affiliation(s)
- Chao Deng
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Timothy Daley
- Department of Statistics and Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | - Guilherme De Sena Brandine
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Andrew D. Smith
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
26
|
Mawla AM, Huising MO. Navigating the Depths and Avoiding the Shallows of Pancreatic Islet Cell Transcriptomes. Diabetes 2019; 68:1380-1393. [PMID: 31221802 PMCID: PMC6609986 DOI: 10.2337/dbi18-0019] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Accepted: 04/29/2019] [Indexed: 12/24/2022]
Abstract
Islet gene expression has been widely studied to better understand the transcriptional features that define a healthy β-cell. Transcriptomes of FACS-purified α-, β-, and δ-cells using bulk RNA-sequencing have facilitated our understanding of the complex network of cross talk between islet cells and its effects on β-cell function. However, these approaches were by design not intended to resolve heterogeneity between individual cells. Several recent studies used single-cell RNA sequencing (scRNA-Seq) to report considerable heterogeneity within mouse and human β-cells. In this Perspective, we assess how this newfound ability to assess gene expression at single-cell resolution has enhanced our understanding of β-cell heterogeneity. We conduct a comprehensive assessment of several single human β-cell transcriptome data sets and ask if the heterogeneity reported by these studies showed overlap and concurred with previously known examples of β-cell heterogeneity. We also illustrate the impact of the inevitable limitations of working at or below the limit of detection of gene expression at single cell resolution and their consequences for the quality of single-islet cell transcriptome data. Finally, we offer some guidance on when to opt for scRNA-Seq and when bulk sequencing approaches may be better suited.
Collapse
Affiliation(s)
- Alex M Mawla
- Department of Neurobiology, Physiology and Behavior, College of Biological Sciences, University of California, Davis, Davis, CA
| | - Mark O Huising
- Department of Neurobiology, Physiology and Behavior, College of Biological Sciences, University of California, Davis, Davis, CA
- Department of Physiology and Membrane Biology, School of Medicine, University of California, Davis, Davis, CA
| |
Collapse
|
27
|
Abstract
Background Metagenomic sequencing is a powerful technology for studying the mixture of microbes or the microbiomes on human and in the environment. One basic task of analyzing metagenomic data is to identify the component genomes in the community. This task is challenging due to the complexity of microbiome composition, limited availability of known reference genomes, and usually insufficient sequencing coverage. Results As an initial step toward understanding the complete composition of a metagenomic sample, we studied the problem of estimating the total length of all distinct component genomes in a metagenomic sample. We showed that this problem can be solved by estimating the total number of distinct k-mers in all the metagenomic sequencing data. We proposed a method for this estimation based on the sequencing coverage distribution of observed k-mers, and introduced a k-mer redundancy index (KRI) to fill in the gap between the count of distinct k-mers and the total genome length. We showed the effectiveness of the proposed method on a set of carefully designed simulation data corresponding to multiple situations of true metagenomic data. Results on real data indicate that the uncaptured genomic information can vary dramatically across metagenomic samples, with the potential to mislead downstream analyses. Conclusions We proposed the question of how long the total genome length of all different species in a microbial community is and introduced a method to answer it. Electronic supplementary material The online version of this article (10.1186/s12864-019-5467-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kui Hua
- MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China.,Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics Division and Center for Synthetic & System Biology, BNRIST, Beijing, 100084, China. .,Department of Automation, Tsinghua University, Beijing, 100084, China. .,School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
28
|
Abstract
The fundamental operative unit of a cancer is the genetically and epigenetically innovative single cell. Whether proliferating or quiescent, in the primary tumour mass or disseminated elsewhere, single cells govern the parameters that dictate all facets of the biology of cancer. Thus, single-cell analyses provide the ultimate level of resolution in our quest for a fundamental understanding of this disease. Historically, this quest has been hampered by technological shortcomings. In this Opinion article, we argue that the rapidly evolving field of single-cell sequencing has unshackled the cancer research community of these shortcomings. From furthering an elemental understanding of intra-tumoural genetic heterogeneity and cancer genome evolution to illuminating the governing principles of disease relapse and metastasis, we posit that single-cell sequencing promises to unravel the biology of all facets of this disease.
Collapse
Affiliation(s)
- Timour Baslan
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, New York 10044, USA, and Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - James Hicks
- University of Southern California Dana and David Dornsife College of Letters, Arts, and Sciences, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
29
|
Rosenkrantz JL, Carbone L. Investigating somatic aneuploidy in the brain: why we need a new model. Chromosoma 2017; 126:337-350. [PMID: 27638401 PMCID: PMC5908214 DOI: 10.1007/s00412-016-0615-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Revised: 08/18/2016] [Accepted: 08/22/2016] [Indexed: 12/17/2022]
Abstract
The steady occurrence of DNA mutations is a key source for evolution, generating the genomic variation in the population upon which natural selection acts. Mutations driving evolution have to occur in the oocytes and sperm in order to be transmitted to the next generation. Through similar mechanisms, mutations also accumulate in somatic cells (e.g., skin cells, neurons, lymphocytes) during development and adult life. The concept that somatic cells can collect new mutations with time suggests that we are a mosaic of cells with different genomic compositions. Particular attention has been recently paid to somatic mutations in the brain, with a focus on the relationship between this phenomenon and the origin of human diseases. Given this progressive accumulation of mutations, it is likely that an increased load of somatic mutations is present later in life and that this could be associated with late-life diseases and aging. In this review, we focus on a particular type of mutation: the loss and/or gain of whole chromosomes (i.e., aneuploidy) caused by errors in chromosomes segregation in neurons and glia. Currently, it is hard to grasp the functional impact of somatic mutation in the brain because we lack reliable estimates of the proportion of aneuploid cells in the normal brain across different ages. Here, we revisit the key studies that attempted to quantify the proportion of aneuploid cells in both normal and diseased brains and highlight the deep inconsistencies among the different studies done in the last 15 years. Finally, our review highlights several limitations of studies performed in human and rodent models and explores a possible translational role for non-human primates.
Collapse
Affiliation(s)
- Jimi L Rosenkrantz
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA
| | - Lucia Carbone
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA.
- Department of Medicine, Oregon Health and Science University, Portland, OR, USA.
- Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA.
- Division of Neuroscience, Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.
| |
Collapse
|
30
|
Müller S, Diaz A. Single-Cell mRNA Sequencing in Cancer Research: Integrating the Genomic Fingerprint. Front Genet 2017; 8:73. [PMID: 28620412 PMCID: PMC5450061 DOI: 10.3389/fgene.2017.00073] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 05/18/2017] [Indexed: 12/12/2022] Open
Abstract
Critical cancer mutations are often regional and mosaic, confounding the efficacy of targeted therapeutics. Single cell mRNA sequencing (scRNA-seq) has enabled unprecedented studies of intra-tumor heterogeneity and its role in cancer progression, metastasis, and treatment resistance. When coupled with DNA sequencing, scRNA-seq allows one to infer the in vivo impact of genomic alterations on gene expression. This combination can be used to reliably distinguish neoplastic from non-neoplastic cells, to correlate paracrine-signaling pathways between neoplastic cells and stroma, and to map expression signatures to inferred clones and phylogenies. Here we review recent advances in scRNA-seq, with a special focus on cancer. We discuss the challenges and prospects of combining scRNA-seq with DNA sequencing to assess intra-tumor heterogeneity.
Collapse
Affiliation(s)
- Sören Müller
- Department of Neurological Surgery, University of California, San Francisco, San FranciscoCA, United States
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San FranciscoCA, United States
| | - Aaron Diaz
- Department of Neurological Surgery, University of California, San Francisco, San FranciscoCA, United States
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San FranciscoCA, United States
| |
Collapse
|
31
|
Coskun AF, Eser U, Islam S. Cellular identity at the single-cell level. MOLECULAR BIOSYSTEMS 2016; 12:2965-79. [PMID: 27460751 DOI: 10.1039/c6mb00388e] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A single cell creates surprising heterogeneity in a multicellular organism. While every organismal cell shares almost an identical genome, molecular interactions in cells alter the use of DNA sequences to modulate the gene of interest for specialization of cellular functions. Each cell gains a unique identity through molecular coding across the DNA, RNA, and protein conversions. On the other hand, loss of cellular identity leads to critical diseases such as cancer. Most cell identity dissection studies are based on bulk molecular assays that mask differences in individual cells. To probe cell-to-cell variability in a population, we discuss single cell approaches to decode the genetic, epigenetic, transcriptional, and translational mechanisms for cell identity formation. In combination with molecular instructions, the physical principles behind cell identity determination are examined. Deciphering and reprogramming cellular types impact biology and medicine.
Collapse
Affiliation(s)
- Ahmet F Coskun
- Division of Chemistry and Chemical Engineering, California Institute of Technology, California, USA.
| | | | | |
Collapse
|
32
|
Robust estimates of overall immune-repertoire diversity from high-throughput measurements on samples. Nat Commun 2016; 7:11881. [PMID: 27302887 PMCID: PMC4912625 DOI: 10.1038/ncomms11881] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2016] [Accepted: 05/09/2016] [Indexed: 01/10/2023] Open
Abstract
The diversity of an organism's B- and T-cell repertoires is both clinically important and a key measure of immunological complexity. However, diversity is hard to estimate by current methods, because of inherent uncertainty in the number of B- and T-cell clones that will be missing from a blood or tissue sample by chance (the missing-species problem), inevitable sampling bias, and experimental noise. To solve this problem, we developed Recon, a modified maximum-likelihood method that outputs the overall diversity of a repertoire from measurements on a sample. Recon outputs accurate, robust estimates by any of a vast set of complementary diversity measures, including species richness and entropy, at fractional repertoire coverage. It also outputs error bars and power tables, allowing robust comparisons of diversity between individuals and over time. We apply Recon to in silico and experimental immune-repertoire sequencing data sets as proof of principle for measuring diversity in large, complex systems. Diversity of an organism's B- and T-cell repertoires is clinically important, but difficult to estimate due to uncertainty in the number of clones in a sample, sampling bias and experimental noise. Here Kaplinsky and Arnaout present Recon, a method that reconstructs the distribution of the overall repertoire from sample measurements.
Collapse
|
33
|
Movahedi NS, Embree M, Nagarajan H, Zengler K, Chitsaz H. Efficient Synergistic Single-Cell Genome Assembly. Front Bioeng Biotechnol 2016; 4:42. [PMID: 27243002 PMCID: PMC4876485 DOI: 10.3389/fbioe.2016.00042] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 05/06/2016] [Indexed: 11/13/2022] Open
Abstract
As the vast majority of all microbes are unculturable, single-cell sequencing has become a significant method to gain insight into microbial physiology. Single-cell sequencing methods, currently powered by multiple displacement genome amplification (MDA), have passed important milestones such as finishing and closing the genome of a prokaryote. However, the quality and reliability of genome assemblies from single cells are still unsatisfactory due to uneven coverage depth and the absence of scattered chunks of the genome in the final collection of reads caused by MDA bias. In this work, our new algorithm Hybrid De novo Assembler (HyDA) demonstrates the power of coassembly of multiple single-cell genomic data sets through significant improvement of the assembly quality in terms of predicted functional elements and length statistics. Coassemblies contain significantly more base pairs and protein coding genes, cover more subsystems, and consist of longer contigs compared to individual assemblies by the same algorithm as well as state-of-the-art single-cell assemblers SPAdes and IDBA-UD. Hybrid De novo Assembler (HyDA) is also able to avoid chimeric assemblies by detecting and separating shared and exclusive pieces of sequence for input data sets. By replacing one deep single-cell sequencing experiment with a few single-cell sequencing experiments of lower depth, the coassembly method can hedge against the risk of failure and loss of the sample, without significantly increasing sequencing cost. Application of the single-cell coassembler HyDA to the study of three uncultured members of an alkane-degrading methanogenic community validated the usefulness of the coassembly concept. HyDA is open source and publicly available at http://chitsazlab.org/software.html, and the raw reads are available at http://chitsazlab.org/research.html.
Collapse
Affiliation(s)
- Narjes S Movahedi
- Department of Computer Science, Wayne State University , Detroit, MI , USA
| | - Mallory Embree
- Department of Bioengineering, University of California San Diego , San Diego, CA , USA
| | - Harish Nagarajan
- Department of Bioengineering, University of California San Diego , San Diego, CA , USA
| | - Karsten Zengler
- Department of Bioengineering, University of California San Diego , San Diego, CA , USA
| | - Hamidreza Chitsaz
- Department of Computer Science, Colorado State University , Fort Collins, CO , USA
| |
Collapse
|
34
|
Diaz A, Liu SJ, Sandoval C, Pollen A, Nowakowski TJ, Lim DA, Kriegstein A. SCell: integrated analysis of single-cell RNA-seq data. Bioinformatics 2016; 32:2219-20. [PMID: 27153637 DOI: 10.1093/bioinformatics/btw201] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 04/09/2016] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED Analysis of the composition of heterogeneous tissue has been greatly enabled by recent developments in single-cell transcriptomics. We present SCell, an integrated software tool for quality filtering, normalization, feature selection, iterative dimensionality reduction, clustering and the estimation of gene-expression gradients from large ensembles of single-cell RNA-seq datasets. SCell is open source, and implemented with an intuitive graphical interface. Scripts and protocols for the high-throughput pre-processing of large ensembles of single-cell, RNA-seq datasets are provided as an additional resource. AVAILABILITY AND IMPLEMENTATION Binary executables for Windows, MacOS and Linux are available at http://sourceforge.net/projects/scell, source code and pre-processing scripts are available from https://github.com/diazlab/SCellSupplementary information: Supplementary data are available at Bioinformatics online. CONTACT aaron.diaz@ucsf.edu.
Collapse
Affiliation(s)
- Aaron Diaz
- Department of Neurological Surgery, UCSF Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research
| | - Siyuan J Liu
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research
| | - Carmen Sandoval
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research
| | - Alex Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research
| | - Tom J Nowakowski
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research
| | - Daniel A Lim
- Department of Neurological Surgery, UCSF Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research
| | - Arnold Kriegstein
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research
| |
Collapse
|
35
|
Gawad C, Koh W, Quake SR. Single-cell genome sequencing: current state of the science. Nat Rev Genet 2016; 17:175-88. [PMID: 26806412 DOI: 10.1038/nrg.2015.16] [Citation(s) in RCA: 899] [Impact Index Per Article: 99.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The field of single-cell genomics is advancing rapidly and is generating many new insights into complex biological systems, ranging from the diversity of microbial ecosystems to the genomics of human cancer. In this Review, we provide an overview of the current state of the field of single-cell genome sequencing. First, we focus on the technical challenges of making measurements that start from a single molecule of DNA, and then explore how some of these recent methodological advancements have enabled the discovery of unexpected new biology. Areas highlighted include the application of single-cell genomics to interrogate microbial dark matter and to evaluate the pathogenic roles of genetic mosaicism in multicellular organisms, with a focus on cancer. We then attempt to predict advances we expect to see in the next few years.
Collapse
Affiliation(s)
- Charles Gawad
- Departments of Oncology and Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Winston Koh
- Departments of Bioengineering and Applied Physics, Stanford University, Stanford, California 94304, USA.,Howard Hughes Medical Institute, Stanford University, California 94304, USA
| | - Stephen R Quake
- Departments of Bioengineering and Applied Physics, Stanford University, Stanford, California 94304, USA.,Howard Hughes Medical Institute, Stanford University, California 94304, USA
| |
Collapse
|
36
|
Deng C, Daley T, Smith AD. Applications of species accumulation curves in large-scale biological data analysis. QUANTITATIVE BIOLOGY 2015; 3:135-144. [PMID: 27252899 DOI: 10.1007/s40484-015-0049-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The species accumulation curve, or collector's curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45-63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and k-mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible.
Collapse
Affiliation(s)
- Chao Deng
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Timothy Daley
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Andrew D Smith
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
37
|
Yalcin D, Hakguder ZM, Otu HH. Bioinformatics approaches to single-cell analysis in developmental biology. Mol Hum Reprod 2015; 22:182-92. [PMID: 26358759 DOI: 10.1093/molehr/gav050] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 09/04/2015] [Indexed: 12/17/2022] Open
Abstract
Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging and omics techniques on single cells. There have been improvements in computational single-cell image analysis in developmental biology regarding feature extraction, segmentation, image enhancement and machine learning, handling limitations of optical resolution to gain new perspectives from the raw microscopy images. Omics approaches, such as transcriptomics, genomics and epigenomics, targeting gene and small RNA expression, single nucleotide and structural variations and methylation and histone modifications, rely heavily on high-throughput sequencing technologies. Although there are well-established bioinformatics methods for analysis of sequence data, there are limited bioinformatics approaches which address experimental design, sample size considerations, amplification bias, normalization, differential expression, coverage, clustering and classification issues, specifically applied at the single-cell level. In this review, we summarize biological and technological advancements, discuss challenges faced in the aforementioned data acquisition and analysis issues and present future prospects for application of single-cell analyses to developmental biology.
Collapse
Affiliation(s)
- Dicle Yalcin
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0511, USA
| | - Zeynep M Hakguder
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0511, USA
| | - Hasan H Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0511, USA
| |
Collapse
|
38
|
Garvin T, Aboukhalil R, Kendall J, Baslan T, Atwal GS, Hicks J, Wigler M, Schatz MC. Interactive analysis and assessment of single-cell copy-number variations. Nat Methods 2015; 12:1058-60. [PMID: 26344043 PMCID: PMC4775251 DOI: 10.1038/nmeth.3578] [Citation(s) in RCA: 173] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 07/07/2015] [Indexed: 01/19/2023]
Abstract
We present Ginkgo (http://qb.cshl.edu/ginkgo), a user-friendly, open-source web platform for the analysis of single-cell copy-number variations (CNVs). Ginkgo automatically constructs copy-number profiles of cells from mapped reads and constructs phylogenetic trees of related cells. We validated Ginkgo by reproducing the results of five major studies. After comparing three commonly used single-cell amplification techniques, we concluded that degenerate oligonucleotide-primed PCR is the most consistent for CNV analysis.
Collapse
Affiliation(s)
- Tyler Garvin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | | | - Jude Kendall
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Timour Baslan
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.,Department of Molecular and Cellular Biology, Stony Brook University, Stony Brook, New York, USA
| | - Gurinder S Atwal
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - James Hicks
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Michael Wigler
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Michael C Schatz
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| |
Collapse
|