1
|
Fu Y, Timp W, Sedlazeck FJ. Computational analysis of DNA methylation from long-read sequencing. Nat Rev Genet 2025:10.1038/s41576-025-00822-5. [PMID: 40155770 DOI: 10.1038/s41576-025-00822-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2025] [Indexed: 04/01/2025]
Abstract
DNA methylation is a critical epigenetic mechanism in numerous biological processes, including gene regulation, development, ageing and the onset of various diseases such as cancer. Studies of methylation are increasingly using single-molecule long-read sequencing technologies to simultaneously measure epigenetic states such as DNA methylation with genomic variation. These long-read data sets have spurred the continuous development of advanced computational methods to gain insights into the roles of methylation in regulating chromatin structure and gene regulation. In this Review, we discuss the computational methods for calling methylation signals, contrasting methylation between samples, analysing cell-type diversity and gaining additional genomic insights, and then further discuss the challenges and future perspectives of tool development for DNA methylation research.
Collapse
Affiliation(s)
- Yilei Fu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
2
|
Sigurpalsdottir BD, Stefansson OA, Holley G, Beyter D, Zink F, Hardarson MÞ, Sverrisson SÞ, Kristinsdottir N, Magnusdottir DN, Magnusson OÞ, Gudbjartsson DF, Halldorsson BV, Stefansson K. A comparison of methods for detecting DNA methylation from long-read sequencing of human genomes. Genome Biol 2024; 25:69. [PMID: 38468278 PMCID: PMC10929077 DOI: 10.1186/s13059-024-03207-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 02/28/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Long-read sequencing can enable the detection of base modifications, such as CpG methylation, in single molecules of DNA. The most commonly used methods for long-read sequencing are nanopore developed by Oxford Nanopore Technologies (ONT) and single molecule real-time (SMRT) sequencing developed by Pacific Bioscience (PacBio). In this study, we systematically compare the performance of CpG methylation detection from long-read sequencing. RESULTS We demonstrate that CpG methylation detection from 7179 nanopore-sequenced DNA samples is highly accurate and consistent with 132 oxidative bisulfite-sequenced (oxBS) samples, isolated from the same blood draws. We introduce quality filters for CpGs that further enhance the accuracy of CpG methylation detection from nanopore-sequenced DNA, while removing at most 30% of CpGs. We evaluate the per-site performance of CpG methylation detection across different genomic features and CpG methylation rates and demonstrate how the latest R10.4 flowcell chemistry and base-calling algorithms improve methylation detection from nanopore sequencing. Additionally, we show how the methylation detection of 50 SMRT-sequenced genomes compares to nanopore sequencing and oxBS. CONCLUSIONS This study provides the first systematic comparison of CpG methylation detection tools for long-read sequencing methods. We compare two commonly used computational methods for the detection of CpG methylation in a large number of nanopore genomes, including samples sequenced using the latest R10.4 nanopore flowcell chemistry and 50 SMRT sequenced samples. We provide insights into the strengths and limitations of each sequencing method as well as recommendations for standardization and evaluation of tools designed for genome-scale modified base detection using long-read sequencing.
Collapse
Affiliation(s)
- Brynja D Sigurpalsdottir
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland.
- School of Technology, Reykjavík University, Reykjavík, Iceland.
| | | | | | - Doruk Beyter
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
| | - Florian Zink
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
| | - Marteinn Þ Hardarson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- School of Technology, Reykjavík University, Reykjavík, Iceland
| | | | | | | | | | - Daniel F Gudbjartsson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavík, Iceland
| | - Bjarni V Halldorsson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland.
- School of Technology, Reykjavík University, Reykjavík, Iceland.
| | - Kari Stefansson
- deCODE Genetics/Amgen Inc., Sturlugata 8, Reykjavík, Iceland
- Faculty of Medicine, School of Health Science, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
3
|
Ni P, Nie F, Zhong Z, Xu J, Huang N, Zhang J, Zhao H, Zou Y, Huang Y, Li J, Xiao CL, Luo F, Wang J. DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing. Nat Commun 2023; 14:4054. [PMID: 37422489 PMCID: PMC10329642 DOI: 10.1038/s41467-023-39784-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 06/22/2023] [Indexed: 07/10/2023] Open
Abstract
Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.
Collapse
Affiliation(s)
- Peng Ni
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Fan Nie
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Zeyu Zhong
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jinrui Xu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Neng Huang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jun Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Haochen Zhao
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - You Zou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Yuanfeng Huang
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, 410000, China
| | - Jinchen Li
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, 410000, China
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, 410000, China
| | - Chuan-Le Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, #7 Jinsui Road, Tianhe District, Guangzhou, China.
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, 29634-0974, USA.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
- Xiangjiang Laboratory, Changsha, 410205, China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China.
| |
Collapse
|
4
|
O'Neill H, Lee H, Gupta I, Rodger EJ, Chatterjee A. Single-Cell DNA Methylation Analysis in Cancer. Cancers (Basel) 2022; 14:6171. [PMID: 36551655 PMCID: PMC9777108 DOI: 10.3390/cancers14246171] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/07/2022] [Accepted: 12/10/2022] [Indexed: 12/23/2022] Open
Abstract
Morphological, transcriptomic, and genomic defects are well-explored parameters of cancer biology. In more recent years, the impact of epigenetic influences, such as DNA methylation, is becoming more appreciated. Aberrant DNA methylation has been implicated in many types of cancers, influencing cell type, state, transcriptional regulation, and genomic stability to name a few. Traditionally, large populations of cells from the tissue of interest are coalesced for analysis, producing averaged methylome data. Considering the inherent heterogeneity of cancer, analysing populations of cells as a whole denies the ability to discover novel aberrant methylation patterns, identify subpopulations, and trace cell lineages. Due to recent advancements in technology, it is now possible to obtain methylome data from single cells. This has both research and clinical implications, ranging from the identification of biomarkers to improved diagnostic tools. As with all emerging technologies, distinct experimental, bioinformatic, and practical challenges present themselves. This review begins with exploring the potential impact of single-cell sequencing on understanding cancer biology and how it could eventually benefit a clinical setting. Following this, the techniques and experimental approaches which made this technology possible are explored. Finally, the present challenges currently associated with single-cell DNA methylation sequencing are described.
Collapse
Affiliation(s)
- Hannah O'Neill
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin 9016, New Zealand
| | - Heather Lee
- School of Biomedical Sciences and Pharmacy, College of Health, Medicine and Wellbeing, The University of Newcastle, Callaghan, NSW 2308, Australia
- Hunter Medical Research Institute, New Lambton Heights, NSW 2305, Australia
| | - Ishaan Gupta
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India
| | - Euan J Rodger
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin 9016, New Zealand
| | - Aniruddha Chatterjee
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin 9016, New Zealand
- School of Health Sciences and Technology, University of Petroleum and Energy Studies (UPES), Dehradun 248007, India
| |
Collapse
|
5
|
Urban JM, Foulk MS, Bliss JE, Coleman CM, Lu N, Mazloom R, Brown SJ, Spradling AC, Gerbi SA. High contiguity de novo genome assembly and DNA modification analyses for the fungus fly, Sciara coprophila, using single-molecule sequencing. BMC Genomics 2021; 22:643. [PMID: 34488624 PMCID: PMC8419958 DOI: 10.1186/s12864-021-07926-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 08/08/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The lower Dipteran fungus fly, Sciara coprophila, has many unique biological features that challenge the rule of genome DNA constancy. For example, Sciara undergoes paternal chromosome elimination and maternal X chromosome nondisjunction during spermatogenesis, paternal X elimination during embryogenesis, intrachromosomal DNA amplification of DNA puff loci during larval development, and germline-limited chromosome elimination from all somatic cells. Paternal chromosome elimination in Sciara was the first observation of imprinting, though the mechanism remains a mystery. Here, we present the first draft genome sequence for Sciara coprophila to take a large step forward in addressing these features. RESULTS We assembled the Sciara genome using PacBio, Nanopore, and Illumina sequencing. To find an optimal assembly using these datasets, we generated 44 short-read and 50 long-read assemblies. We ranked assemblies using 27 metrics assessing contiguity, gene content, and dataset concordance. The highest-ranking assemblies were scaffolded using BioNano optical maps. RNA-seq datasets from multiple life stages and both sexes facilitated genome annotation. A set of 66 metrics was used to select the first draft assembly for Sciara. Nearly half of the Sciara genome sequence was anchored into chromosomes, and all scaffolds were classified as X-linked or autosomal by coverage. CONCLUSIONS We determined that X-linked genes in Sciara males undergo dosage compensation. An entire bacterial genome from the Rickettsia genus, a group known to be endosymbionts in insects, was co-assembled with the Sciara genome, opening the possibility that Rickettsia may function in sex determination in Sciara. Finally, the signal level of the PacBio and Nanopore data support the presence of cytosine and adenine modifications in the Sciara genome, consistent with a possible role in imprinting.
Collapse
Affiliation(s)
- John M Urban
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University Division of Biology and Medicine, Sidney Frank Hall for Life Sciences, 185 Meeting Street, Providence, RI, 02912, USA.
- Department of Embryology, Carnegie Institution for Science, Howard Hughes Medical Institute Research Laboratories, 3520 San Martin Drive, Baltimore, MD, 21218, USA.
| | - Michael S Foulk
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University Division of Biology and Medicine, Sidney Frank Hall for Life Sciences, 185 Meeting Street, Providence, RI, 02912, USA
- Present Address: Department of Biology, Mercyhurst University, Erie, PA, 16546, USA
| | - Jacob E Bliss
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University Division of Biology and Medicine, Sidney Frank Hall for Life Sciences, 185 Meeting Street, Providence, RI, 02912, USA
| | - C Michelle Coleman
- KSU Bioinformatics Center, Kansas State University Division of Biology, Ackert Hall, Manhattan, Kansas, 66502, USA
| | - Nanyan Lu
- KSU Bioinformatics Center, Kansas State University Division of Biology, Ackert Hall, Manhattan, Kansas, 66502, USA
| | - Reza Mazloom
- KSU Bioinformatics Center, Kansas State University Division of Biology, Ackert Hall, Manhattan, Kansas, 66502, USA
| | - Susan J Brown
- KSU Bioinformatics Center, Kansas State University Division of Biology, Ackert Hall, Manhattan, Kansas, 66502, USA
| | - Allan C Spradling
- Department of Embryology, Carnegie Institution for Science, Howard Hughes Medical Institute Research Laboratories, 3520 San Martin Drive, Baltimore, MD, 21218, USA
| | - Susan A Gerbi
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University Division of Biology and Medicine, Sidney Frank Hall for Life Sciences, 185 Meeting Street, Providence, RI, 02912, USA.
| |
Collapse
|
6
|
Namjou B, Stanaway IB, Lingren T, Mentch FD, Benoit B, Dikilitas O, Niu X, Shang N, Shoemaker AH, Carey DJ, Mirshahi T, Singh R, Nestor JG, Hakonarson H, Denny JC, Crosslin DR, Jarvik GP, Kullo IJ, Williams MS, Harley JB. Evaluation of the MC4R gene across eMERGE network identifies many unreported obesity-associated variants. Int J Obes (Lond) 2021; 45:155-169. [PMID: 32952152 PMCID: PMC7752751 DOI: 10.1038/s41366-020-00675-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 08/07/2020] [Accepted: 09/03/2020] [Indexed: 12/16/2022]
Abstract
BACKGROUND/OBJECTIVES Melanocortin-4 receptor (MC4R) plays an essential role in food intake and energy homeostasis. More than 170 MC4R variants have been described over the past two decades, with conflicting reports regarding the prevalence and phenotypic effects of these variants in diverse cohorts. To determine the frequency of MC4R variants in large cohort of different ancestries, we evaluated the MC4R coding region for 20,537 eMERGE participants with sequencing data plus additional 77,454 independent individuals with genome-wide genotyping data at this locus. SUBJECTS/METHODS The sequencing data were obtained from the eMERGE phase III study, in which multisample variant call format calls have been generated, curated, and annotated. In addition to penetrance estimation using body mass index (BMI) as a binary outcome, GWAS and PheWAS were performed using median BMI in linear regression analyses. All results were adjusted for principal components, age, sex, and sites of genotyping. RESULTS Targeted sequencing data of MC4R revealed 125 coding variants in 1839 eMERGE participants including 30 unreported coding variants that were predicted to be functionally damaging. Highly penetrant unreported variants included (L325I, E308K, D298N, S270F, F261L, T248A, D111V, and Y80F) in which seven participants had obesity class III defined as BMI ≥ 40 kg/m2. In GWAS analysis, in addition to known risk haplotype upstream of MC4R (best variant rs6567160 (P = 5.36 × 10-25, Beta = 0.37), a novel rare haplotype was detected which was protective against obesity and encompassed the V103I variant with known gain-of-function properties (P = 6.23 × 10-08, Beta = -0.62). PheWAS analyses extended this protective effect of V103I to type 2 diabetes, diabetic nephropathy, and chronic renal failure independent of BMI. CONCLUSIONS MC4R screening in a large eMERGE cohort confirmed many previous findings, extend the MC4R pleotropic effects, and discovered additional MC4R rare alleles that probably contribute to obesity.
Collapse
Affiliation(s)
- Bahram Namjou
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center (CCHMC), Cincinnati, OH, USA.
- College of Medicine, University of Cincinnati, Cincinnati, OH, USA.
| | - Ian B Stanaway
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, WA, USA
| | - Todd Lingren
- College of Medicine, University of Cincinnati, Cincinnati, OH, USA
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Frank D Mentch
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Barbara Benoit
- Research Information Science and Computing, Partners HealthCare, Somerville, MA, USA
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Xinnan Niu
- Departments of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, TN, USA
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Ashley H Shoemaker
- Department of Pediatrics, Division of Endocrinology and Diabetes, Vanderbilt University Medical Center, Nashville, TN, USA
| | - David J Carey
- Department of Molecular and Functional Genomics, Geisinger, Danville, PA, USA
| | - Tooraj Mirshahi
- Department of Molecular and Functional Genomics, Geisinger, Danville, PA, USA
| | | | - Jordan G Nestor
- Department of Medicine, Division of Nephrology, Columbia University, New York, NY, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Joshua C Denny
- Departments of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, TN, USA
| | - David R Crosslin
- Department of Biomedical Informatics Medical Education, School of Medicine, University of Washington, Seattle, WA, USA
| | - Gail P Jarvik
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
- Department Genome Sciences, University of Washington Medical Center, Seattle, WA, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Marc S Williams
- Genomic Medicine Institute (M.S.W.), Geisinger, Danville, PA, USA
| | - John B Harley
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center (CCHMC), Cincinnati, OH, USA
- College of Medicine, University of Cincinnati, Cincinnati, OH, USA
- U.S. Department of Veterans Affairs Medical Center, Cincinnati, OH, USA
| |
Collapse
|
7
|
Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet 2020; 21:597-614. [PMID: 32504078 PMCID: PMC7877196 DOI: 10.1038/s41576-020-0236-x] [Citation(s) in RCA: 577] [Impact Index Per Article: 115.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/31/2020] [Indexed: 12/27/2022]
Abstract
Over the past decade, long-read, single-molecule DNA sequencing technologies have emerged as powerful players in genomics. With the ability to generate reads tens to thousands of kilobases in length with an accuracy approaching that of short-read sequencing technologies, these platforms have proven their ability to resolve some of the most challenging regions of the human genome, detect previously inaccessible structural variants and generate some of the first telomere-to-telomere assemblies of whole chromosomes. Long-read sequencing technologies will soon permit the routine assembly of diploid genomes, which will revolutionize genomics by revealing the full spectrum of human genetic variation, resolving some of the missing heritability and leading to the discovery of novel mechanisms of disease.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
8
|
Informatics for PacBio Long Reads. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1129:119-129. [PMID: 30968364 DOI: 10.1007/978-981-13-6037-4_8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
In this article, we review the development of a wide variety of bioinformatics software implementing state-of-the-art algorithms since the introduction of SMRT sequencing technology into the field. We focus on the three major categories of development: read mapping (aligning to reference genomes), de novo assembly, and detection of structural variants. The long SMRT reads benefit all the applications, but they are achievable only through considering the nature of the long reads technology properly.
Collapse
|
9
|
Ishiura H, Shibata S, Yoshimura J, Suzuki Y, Qu W, Doi K, Almansour MA, Kikuchi JK, Taira M, Mitsui J, Takahashi Y, Ichikawa Y, Mano T, Iwata A, Harigaya Y, Matsukawa MK, Matsukawa T, Tanaka M, Shirota Y, Ohtomo R, Kowa H, Date H, Mitsue A, Hatsuta H, Morimoto S, Murayama S, Shiio Y, Saito Y, Mitsutake A, Kawai M, Sasaki T, Sugiyama Y, Hamada M, Ohtomo G, Terao Y, Nakazato Y, Takeda A, Sakiyama Y, Umeda-Kameyama Y, Shinmi J, Ogata K, Kohno Y, Lim SY, Tan AH, Shimizu J, Goto J, Nishino I, Toda T, Morishita S, Tsuji S. Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease. Nat Genet 2019; 51:1222-1232. [DOI: 10.1038/s41588-019-0458-z] [Citation(s) in RCA: 178] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 05/29/2019] [Indexed: 11/09/2022]
|
10
|
Abstract
Understanding chromatin regulation holds enormous promise for controlling gene regulation, predicting cellular identity, and developing diagnostics and cellular therapies. However, the dynamic nature of chromatin, together with cell-to-cell heterogeneity in its structure, limits our ability to extract its governing principles. Single cell mapping of chromatin modifications, in conjunction with expression measurements, could help overcome these limitations. Here, we review recent advances in single cell-based measurements of chromatin modifications, including optimization to reduce DNA loss, improved DNA sequencing, barcoding, and antibody engineering. We also highlight several applications of these techniques that have provided insights into cell-type classification, mapping modification co-occurrence and heterogeneity, and monitoring chromatin dynamics.
Collapse
Affiliation(s)
- Connor H Ludwig
- Department of Bioengineering, Stanford University, Shriram Center, 443 Via Ortega, Rm 042, Stanford, CA 94305, USA
| | - Lacramioara Bintu
- Department of Bioengineering, Stanford University, Shriram Center, 443 Via Ortega, Rm 042, Stanford, CA 94305, USA
| |
Collapse
|
11
|
Sharim H, Grunwald A, Gabrieli T, Michaeli Y, Margalit S, Torchinsky D, Arielly R, Nifker G, Juhasz M, Gularek F, Almalvez M, Dufault B, Chandra SS, Liu A, Bhattacharya S, Chen YW, Vilain E, Wagner KR, Pevsner J, Reifenberger J, Lam ET, Hastie AR, Cao H, Barseghyan H, Weinhold E, Ebenstein Y. Long-read single-molecule maps of the functional methylome. Genome Res 2019; 29:646-656. [PMID: 30846530 PMCID: PMC6442387 DOI: 10.1101/gr.240739.118] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 02/25/2019] [Indexed: 01/23/2023]
Abstract
We report on the development of a methylation analysis workflow for optical detection of fluorescent methylation profiles along chromosomal DNA molecules. In combination with Bionano Genomics genome mapping technology, these profiles provide a hybrid genetic/epigenetic genome-wide map composed of DNA molecules spanning hundreds of kilobase pairs. The method provides kilobase pair–scale genomic methylation patterns comparable to whole-genome bisulfite sequencing (WGBS) along genes and regulatory elements. These long single-molecule reads allow for methylation variation calling and analysis of large structural aberrations such as pathogenic macrosatellite arrays not accessible to single-cell second-generation sequencing. The method is applied here to study facioscapulohumeral muscular dystrophy (FSHD), simultaneously recording the haplotype, copy number, and methylation status of the disease-associated, highly repetitive locus on Chromosome 4q.
Collapse
Affiliation(s)
- Hila Sharim
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Assaf Grunwald
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Tslil Gabrieli
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Yael Michaeli
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Sapir Margalit
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Dmitry Torchinsky
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Rani Arielly
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Gil Nifker
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Matyas Juhasz
- Institute of Organic Chemistry RWTH Aachen University, D-52056 Aachen, Germany
| | - Felix Gularek
- Institute of Organic Chemistry RWTH Aachen University, D-52056 Aachen, Germany
| | - Miguel Almalvez
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Brandon Dufault
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Sreetama Sen Chandra
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Alexander Liu
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Surajit Bhattacharya
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Yi-Wen Chen
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Eric Vilain
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Kathryn R Wagner
- Kennedy Krieger Institute and Departments of Neurology and Neuroscience, The Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA
| | - Jonathan Pevsner
- Kennedy Krieger Institute and Departments of Neurology and Neuroscience, The Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA
| | | | - Ernest T Lam
- Bionano Genomics, Incorporated, San Diego, California 92121, USA
| | - Alex R Hastie
- Bionano Genomics, Incorporated, San Diego, California 92121, USA
| | - Han Cao
- Bionano Genomics, Incorporated, San Diego, California 92121, USA
| | - Hayk Barseghyan
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Elmar Weinhold
- Institute of Organic Chemistry RWTH Aachen University, D-52056 Aachen, Germany
| | - Yuval Ebenstein
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| |
Collapse
|
12
|
Single-Molecule Sequencing: Towards Clinical Applications. Trends Biotechnol 2019; 37:72-85. [DOI: 10.1016/j.tibtech.2018.07.013] [Citation(s) in RCA: 112] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 07/16/2018] [Accepted: 07/18/2018] [Indexed: 12/31/2022]
|
13
|
Suzuki Y, Wang Y, Au KF, Morishita S. A Statistical Method for Observing Personal Diploid Methylomes and Transcriptomes with Single-Molecule Real-Time Sequencing. Genes (Basel) 2018; 9:E460. [PMID: 30235838 PMCID: PMC6162384 DOI: 10.3390/genes9090460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/12/2018] [Accepted: 09/12/2018] [Indexed: 11/16/2022] Open
Abstract
We address the problem of observing personal diploid methylomes, CpG methylome pairs of homologous chromosomes that are distinguishable with respect to phased heterozygous variants (PHVs), which is challenging due to scarcity of PHVs in personal genomes. Single molecule real-time (SMRT) sequencing is promising as it outputs long reads with CpG methylation information, but a serious concern is whether reliable PHVs are available in erroneous SMRT reads with an error rate of ∼15%. To overcome the issue, we propose a statistical model that reduces the error rate of phasing CpG site to 1%, thereby calling CpG hypomethylation in each haplotype with >90% precision and sensitivity. Using our statistical model, we examined GNAS complex locus known for a combination of maternally, paternally, or biallelically expressed isoforms, and observed allele-specific methylation pattern almost perfectly reflecting their respective allele-specific expression status, demonstrating the merit of elucidating comprehensive personal diploid methylomes and transcriptomes.
Collapse
Affiliation(s)
- Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo 277-8561, Japan.
| | - Yunhao Wang
- Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
| | - Kin Fai Au
- Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210, USA.
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo 277-8561, Japan.
| |
Collapse
|
14
|
Meyer KN, Lacey MR. Modeling Methylation Patterns with Long Read Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1379-1389. [PMID: 28682263 DOI: 10.1109/tcbb.2017.2721943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Variation in cytosine methylation at CpG dinucleotides is often observed in genomic regions, and analysis typically focuses on estimating the proportion of methylated sites observed in a given region and comparing these levels across samples to determine association with conditions of interest. While sites are tacitly treated as independent, when observed at the level of individual molecules methylation patterns exhibit strong evidence of local spatial dependence. We previously developed a neighboring sites model to account for correlation and clustering behavior observed in two tandem repeat regions in a collection of ovarian carcinomas. We now introduce extensions of the model that account for the effect of distance between sites as well as asymmetric correlation in de novo methylation and demethylation rates. We apply our models to published data from a whole genome bisulfite sequencing experiment using long reads, estimating model parameters for a selection of CpG-dense regions spanning between 21 and 67 sites. Our methods detect evidence of local spatial correlation as a function of site-to-site distance and demonstrate the added value of employing long read sequencing data in epigenetic research.
Collapse
|
15
|
Smeets E, Lynch AG, Prekovic S, Van den Broeck T, Moris L, Helsen C, Joniau S, Claessens F, Massie CE. The role of TET-mediated DNA hydroxymethylation in prostate cancer. Mol Cell Endocrinol 2018; 462:41-55. [PMID: 28870782 DOI: 10.1016/j.mce.2017.08.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 06/30/2017] [Accepted: 08/31/2017] [Indexed: 10/18/2022]
Abstract
Ten-eleven translocation (TET) proteins are recently characterized dioxygenases that regulate demethylation by oxidizing 5-methylcytosine to 5-hydroxymethylcytosine and further derivatives. The recent finding that 5hmC is also a stable and independent epigenetic modification indicates that these proteins play an important role in diverse physiological and pathological processes such as neural and tumor development. Both the genomic distribution of (hydroxy)methylation and the expression and activity of TET proteins are dysregulated in a wide range of cancers including prostate cancer. Up to now it is still unknown how changes in TET and 5(h)mC profiles are related to the pathogenesis of prostate cancer. In this review, we explore recent advances in the current understanding of how TET expression and function are regulated in development and cancer. Furthermore, we look at the impact on 5hmC in prostate cancer and the potential underlying mechanisms. Finally, we tried to summarize the latest techniques for detecting and quantifying global and locus-specific 5hmC levels of genomic DNA.
Collapse
Affiliation(s)
- E Smeets
- Molecular Endocrinology Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| | - A G Lynch
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - S Prekovic
- Molecular Endocrinology Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - T Van den Broeck
- Molecular Endocrinology Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium; Department of Urology, University Hospitals Leuven, Campus Gasthuisberg, Leuven, Belgium
| | - L Moris
- Molecular Endocrinology Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium; Department of Urology, University Hospitals Leuven, Campus Gasthuisberg, Leuven, Belgium
| | - C Helsen
- Molecular Endocrinology Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - S Joniau
- Department of Urology, University Hospitals Leuven, Campus Gasthuisberg, Leuven, Belgium
| | - F Claessens
- Molecular Endocrinology Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - C E Massie
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| |
Collapse
|
16
|
Ichikawa K, Tomioka S, Suzuki Y, Nakamura R, Doi K, Yoshimura J, Kumagai M, Inoue Y, Uchida Y, Irie N, Takeda H, Morishita S. Centromere evolution and CpG methylation during vertebrate speciation. Nat Commun 2017. [PMID: 29184138 DOI: 10.1038/s41467-017-01982-7.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here, we perform de novo long-read genome assembly of three inbred medaka strains that are derived from geographically isolated subpopulations and undergo speciation. Using single-molecule real-time (SMRT) sequencing, we obtain three chromosome-mapped genomes of length ~734, ~678, and ~744Mbp with a resource of twenty-two centromeric regions of length 20-345kbp. Centromeres are positionally conserved among the three strains and even between four pairs of chromosomes that were duplicated by the teleost-specific whole-genome duplication 320-350 million years ago. The centromeres do not all evolve at a similar pace; rather, centromeric monomers in non-acrocentric chromosomes evolve significantly faster than those in acrocentric chromosomes. Using methylation sensitive SMRT reads, we uncover centromeres are mostly hypermethylated but have hypomethylated sub-regions that acquire unique sequence compositions independently. These findings reveal the potential of non-acrocentric centromere evolution to contribute to speciation.
Collapse
Affiliation(s)
- Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Shingo Tomioka
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Ryohei Nakamura
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Koichiro Doi
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Jun Yoshimura
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Masahiko Kumagai
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yusuke Inoue
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yui Uchida
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Naoki Irie
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Hiroyuki Takeda
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan.
| | - Shinich Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan.
| |
Collapse
|
17
|
Centromere evolution and CpG methylation during vertebrate speciation. Nat Commun 2017; 8:1833. [PMID: 29184138 PMCID: PMC5705604 DOI: 10.1038/s41467-017-01982-7] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 10/31/2017] [Indexed: 11/10/2022] Open
Abstract
Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here, we perform de novo long-read genome assembly of three inbred medaka strains that are derived from geographically isolated subpopulations and undergo speciation. Using single-molecule real-time (SMRT) sequencing, we obtain three chromosome-mapped genomes of length ~734, ~678, and ~744Mbp with a resource of twenty-two centromeric regions of length 20–345kbp. Centromeres are positionally conserved among the three strains and even between four pairs of chromosomes that were duplicated by the teleost-specific whole-genome duplication 320–350 million years ago. The centromeres do not all evolve at a similar pace; rather, centromeric monomers in non-acrocentric chromosomes evolve significantly faster than those in acrocentric chromosomes. Using methylation sensitive SMRT reads, we uncover centromeres are mostly hypermethylated but have hypomethylated sub-regions that acquire unique sequence compositions independently. These findings reveal the potential of non-acrocentric centromere evolution to contribute to speciation. Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here Ichikawa et al perform de novo long-read genome assembly of three inbred medaka strains, and report long-range structure of centromeres and their methylation as well as correlation of structural variants with differential gene expression.
Collapse
|
18
|
Kapusta A, Suh A. Evolution of bird genomes-a transposon's-eye view. Ann N Y Acad Sci 2016; 1389:164-185. [DOI: 10.1111/nyas.13295] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 10/06/2016] [Accepted: 10/11/2016] [Indexed: 02/06/2023]
Affiliation(s)
- Aurélie Kapusta
- Department of Human Genetics; University of Utah School of Medicine; Salt Lake City Utah
| | - Alexander Suh
- Department of Evolutionary Biology (EBC); Uppsala University; Uppsala Sweden
| |
Collapse
|