1
|
Ono Y, Hamada M, Asai K. PBSIM3: a simulator for all types of PacBio and ONT long reads. NAR Genom Bioinform 2022; 4:lqac092. [PMID: 36465498 PMCID: PMC9713900 DOI: 10.1093/nargab/lqac092] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 11/02/2022] [Accepted: 11/12/2022] [Indexed: 12/03/2022] Open
Abstract
Long-read sequencers, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencers, have improved their read length and accuracy, thereby opening up unprecedented research. Many tools and algorithms have been developed to analyze long reads, and rapid progress in PacBio and ONT has further accelerated their development. Together with the development of high-throughput sequencing technologies and their analysis tools, many read simulators have been developed and effectively utilized. PBSIM is one of the popular long-read simulators. In this study, we developed PBSIM3 with three new functions: error models for long reads, multi-pass sequencing for high-fidelity read simulation and transcriptome sequencing simulation. Therefore, PBSIM3 is now able to meet a wide range of long-read simulation requirements.
Collapse
Affiliation(s)
- Yukiteru Ono
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8561, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 63-520, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Institute for Medical-Oriented Structural Biology, Waseda University, 2-2, Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo, 113-8602, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8561, Japan
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26, Aomi, Koto-ku, 135-0064 Tokyo, Japan
| |
Collapse
|
2
|
Reduced NCOR2 expression accelerates androgen deprivation therapy failure in prostate cancer. Cell Rep 2021; 37:110109. [PMID: 34910907 PMCID: PMC8889623 DOI: 10.1016/j.celrep.2021.110109] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 09/21/2021] [Accepted: 11/17/2021] [Indexed: 01/27/2023] Open
Abstract
This study addresses the roles of nuclear receptor corepressor 2 (NCOR2) in prostate cancer (PC) progression in response to androgen deprivation therapy (ADT). Reduced NCOR2 expression significantly associates with shorter disease-free survival in patients with PC receiving adjuvant ADT. Utilizing the CWR22 xenograft model, we demonstrate that stably reduced NCOR2 expression accelerates disease recurrence following ADT, associates with gene expression patterns that include neuroendocrine features, and induces DNA hypermethylation. Stably reduced NCOR2 expression in isogenic LNCaP (androgen-sensitive) and LNCaP-C4–2 (androgen-independent) cells revealed that NCOR2 reduction phenocopies the impact of androgen treatment and induces global DNA hypermethylation patterns. NCOR2 genomic binding is greatest in LNCaP-C4–2 cells and most clearly associates with forkhead box (FOX) transcription factor FOXA1 binding. NCOR2 binding significantly associates with transcriptional regulation most when in active enhancer regions. These studies reveal robust roles for NCOR2 in regulating the PC transcriptome and epigenome and underscore recent mutational studies linking NCOR2 loss of function to PC disease progression. Long et al. show that reduced levels of NCOR2 lead to accelerated prostate cancer recurrence during androgen withdrawal in a patient-derived xenograft model. NCOR2 reduction is characterized by incomplete response to androgen withdrawal, and recurrent tumors show increased neuroendocrine traits. These phenotypic changes are associated with hypermethylated enhancers.
Collapse
|
3
|
Libbrecht MW, Chan RCW, Hoffman MM. Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns. PLoS Comput Biol 2021; 17:e1009423. [PMID: 34648491 PMCID: PMC8516206 DOI: 10.1371/journal.pcbi.1009423] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.
Collapse
Affiliation(s)
| | - Rachel C. W. Chan
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Michael M. Hoffman
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| |
Collapse
|
4
|
Ono Y, Asai K, Hamada M. PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores. Bioinformatics 2021; 37:589-595. [PMID: 32976553 PMCID: PMC8097687 DOI: 10.1093/bioinformatics/btaa835] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/20/2020] [Accepted: 09/11/2020] [Indexed: 12/21/2022] Open
Abstract
Motivation Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. Results To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. Availability and implementation The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yukiteru Ono
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan.,Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, Tokyo 162-8480, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
5
|
Hardison RC, Zhang Y, Keller CA, Xiang G, Heuston EF, An L, Lichtenberg J, Giardine BM, Bodine D, Mahony S, Li Q, Yue F, Weiss MJ, Blobel GA, Taylor J, Hughes J, Higgs DR, Göttgens B. Systematic integration of GATA transcription factors and epigenomes via IDEAS paints the regulatory landscape of hematopoietic cells. IUBMB Life 2020; 72:27-38. [PMID: 31769130 PMCID: PMC6972633 DOI: 10.1002/iub.2195] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 10/17/2019] [Indexed: 01/15/2023]
Abstract
Members of the GATA family of transcription factors play key roles in the differentiation of specific cell lineages by regulating the expression of target genes. Three GATA factors play distinct roles in hematopoietic differentiation. In order to better understand how these GATA factors function to regulate genes throughout the genome, we are studying the epigenomic and transcriptional landscapes of hematopoietic cells in a model-driven, integrative fashion. We have formed the collaborative multi-lab VISION project to conduct ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis. The epigenomic data included nuclease accessibility in chromatin, CTCF occupancy, and histone H3 modifications for 20 cell types covering hematopoietic stem cells, multilineage progenitor cells, and mature cells across the blood cell lineages of mouse. The analysis used the Integrative and Discriminative Epigenome Annotation System (IDEAS), which learns all common combinations of features (epigenetic states) simultaneously in two dimensions-along chromosomes and across cell types. The result is a segmentation that effectively paints the regulatory landscape in readily interpretable views, revealing constitutively active or silent loci as well as the loci specifically induced or repressed in each stage and lineage. Nuclease accessible DNA segments in active chromatin states were designated candidate cis-regulatory elements in each cell type, providing one of the most comprehensive registries of candidate hematopoietic regulatory elements to date. Applications of VISION resources are illustrated for the regulation of genes encoding GATA1, GATA2, GATA3, and Ikaros. VISION resources are freely available from our website http://usevision.org.
Collapse
Affiliation(s)
- Ross C. Hardison
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Yu Zhang
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Cheryl A. Keller
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Guanjue Xiang
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Elisabeth F. Heuston
- Genetics and Molecular Biology Branch, Hematopoiesis SectionNational Institutes of Health, NHGRIBethesdaMD
| | - Lin An
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Jens Lichtenberg
- Genetics and Molecular Biology Branch, Hematopoiesis SectionNational Institutes of Health, NHGRIBethesdaMD
| | - Belinda M. Giardine
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - David Bodine
- Genetics and Molecular Biology Branch, Hematopoiesis SectionNational Institutes of Health, NHGRIBethesdaMD
| | - Shaun Mahony
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Qunhua Li
- Departments of Biochemistry and Molecular Biology and of StatisticsThe Pennsylvania State University, University ParkPA
| | - Feng Yue
- Department of Biochemistry and Molecular BiologyThe Pennsylvania State University College of MedicineHershey, PA
| | - Mitchell J. Weiss
- Hematology DepartmentSt. Jude Children's Research HospitalMemphis, TN
| | | | - James Taylor
- Departments of Biology and of Computer ScienceJohns Hopkins UniversityBaltimore, MD
| | - Jim Hughes
- Laboratory of Gene RegulationWeatherall Institute of Molecular Medicine, Oxford UniversityOxfordUK
| | - Douglas R. Higgs
- Laboratory of Gene RegulationWeatherall Institute of Molecular Medicine, Oxford UniversityOxfordUK
| | - Berthold Göttgens
- Department of Hematology, Cambridge Institute for Medical ResearchUniversity of CambridgeCambridgeUK
| |
Collapse
|
6
|
Girgis HZ, Velasco A, Reyes ZE. HebbPlot: an intelligent tool for learning and visualizing chromatin mark signatures. BMC Bioinformatics 2018; 19:310. [PMID: 30176808 PMCID: PMC6122555 DOI: 10.1186/s12859-018-2312-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 08/14/2018] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Histone modifications play important roles in gene regulation, heredity, imprinting, and many human diseases. The histone code is complex and consists of more than 100 marks. Therefore, biologists need computational tools to characterize general signatures representing the distributions of tens of chromatin marks around thousands of regions. RESULTS To this end, we developed a software tool, HebbPlot, which utilizes a Hebbian neural network in learning a general chromatin signature from regions with a common function. Hebbian networks can learn the associations between tens of marks and thousands of regions. HebbPlot presents a signature as a digital image, which can be easily interpreted. Moreover, signatures produced by HebbPlot can be compared quantitatively. We validated HebbPlot in six case studies. The results of these case studies are novel or validating results already reported in the literature, indicating the accuracy of HebbPlot. Our results indicate that promoters have a directional chromatin signature; several marks tend to stretch downstream or upstream. H3K4me3 and H3K79me2 have clear directional distributions around active promoters. In addition, the signatures of high- and low-CpG promoters are different; H3K4me3, H3K9ac, and H3K27ac are the most different marks. When we studied the signatures of enhancers active in eight tissues, we observed that these signatures are similar, but not identical. Further, we identified some histone modifications - H3K36me3, H3K79me1, H3K79me2, and H4K8ac - that are associated with coding regions of active genes. Other marks - H4K12ac, H3K14ac, H3K27me3, and H2AK5ac - were found to be weakly associated with coding regions of inactive genes. CONCLUSIONS This study resulted in a novel software tool, HebbPlot, for learning and visualizing the chromatin signature of a genetic element. Using HebbPlot, we produced a visual catalog of the signatures of multiple genetic elements in 57 cell types available through the Roadmap Epigenomics Project. Furthermore, we made a progress toward a functional catalog consisting of 22 histone marks. In sum, HebbPlot is applicable to a wide array of studies, facilitating the deciphering of the histone code.
Collapse
Affiliation(s)
- Hani Z. Girgis
- Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104-9700 OK USA
| | - Alfredo Velasco
- Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104-9700 OK USA
| | - Zachary E. Reyes
- Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104-9700 OK USA
| |
Collapse
|
7
|
Abstract
Noncoding DNA regions have central roles in human biology, evolution, and disease. ChromHMM helps to annotate the noncoding genome using epigenomic information across one or multiple cell types. It combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type. ChromHMM learns chromatin-state signatures using a multivariate hidden Markov model (HMM) that explicitly models the combinatorial presence or absence of each mark. ChromHMM uses these signatures to generate a genome-wide annotation for each cell type by calculating the most probable state for each genomic segment. ChromHMM provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state. ChromHMM is distinguished by its modeling emphasis on combinations of marks, its tight integration with downstream functional enrichment analyses, its speed, and its ease of use. Chromatin states are learned, annotations are produced, and enrichments are computed within 1 d.
Collapse
|
8
|
Siu C, Wiseman S, Gakkhar S, Heravi-Moussavi A, Bilenky M, Carles A, Sierocinski T, Tam A, Zhao E, Kasaian K, Moore RA, Mungall AJ, Walker B, Thomson T, Marra MA, Hirst M, Jones SJM. Characterization of the human thyroid epigenome. J Endocrinol 2017; 235:153-165. [PMID: 28808080 DOI: 10.1530/joe-17-0145] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 08/14/2017] [Indexed: 12/15/2022]
Abstract
The thyroid gland, necessary for normal human growth and development, functions as an essential regulator of metabolism by the production and secretion of appropriate levels of thyroid hormone. However, assessment of abnormal thyroid function may be challenging suggesting a more fundamental understanding of normal function is needed. One way to characterize normal gland function is to study the epigenome and resulting transcriptome within its constituent cells. This study generates the first published reference epigenomes for human thyroid from four individuals using ChIP-seq and RNA-seq. We profiled six histone modifications (H3K4me1, H3K4me3, H3K27ac, H3K36me3, H3K9me3, H3K27me3), identified chromatin states using a hidden Markov model, produced a novel quantitative metric for model selection and established epigenomic maps of 19 chromatin states. We found that epigenetic features characterizing promoters and transcription elongation tend to be more consistent than regions characterizing enhancers or Polycomb-repressed regions and that epigenetically active genes consistent across all epigenomes tend to have higher expression than those not marked as epigenetically active in all epigenomes. We also identified a set of 18 genes epigenetically active and consistently expressed in the thyroid that are likely highly relevant to thyroid function. Altogether, these epigenomes represent a powerful resource to develop a deeper understanding of the underlying molecular biology of thyroid function and provide contextual information of thyroid and human epigenomic data for comparison and integration into future studies.
Collapse
Affiliation(s)
- Celia Siu
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
- Department of SciencesUniversity of British Columbia, Vancouver, Canada
| | - Sam Wiseman
- Department of SurgerySt. Paul's Hospital & University of British Columbia, Vancouver, Canada
| | - Sitanshu Gakkhar
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
| | | | - Misha Bilenky
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
| | - Annaick Carles
- Department of Microbiology & ImmunologyMichael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Thomas Sierocinski
- Department of Microbiology & ImmunologyMichael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Angela Tam
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
| | - Eric Zhao
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
| | - Katayoon Kasaian
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
| | - Richard A Moore
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
| | - Andrew J Mungall
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
| | - Blair Walker
- Department of Pathology and Laboratory MedicineSt. Paul's Hospital & University of British Columbia, Vancouver, Canada
| | - Thomas Thomson
- Department of Pathology and Laboratory MedicineBC Cancer Agency & University of British Columbia, Vancouver, Canada
| | - Marco A Marra
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
- Department of Medical GeneticsUniversity of British Columbia, Vancouver, Canada
| | - Martin Hirst
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
- Department of Microbiology & ImmunologyMichael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences CentreBC Cancer Agency, Vancouver, Canada
- Department of Medical GeneticsUniversity of British Columbia, Vancouver, Canada
- Department of Molecular Biology & BiochemistrySimon Fraser University, Burnaby, Canada
| |
Collapse
|
9
|
Liu Q, Bonneville R, Li T, Jin VX. Transcription factor-associated combinatorial epigenetic pattern reveals higher transcriptional activity of TCF7L2-regulated intragenic enhancers. BMC Genomics 2017; 18:375. [PMID: 28499350 PMCID: PMC5429574 DOI: 10.1186/s12864-017-3764-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 05/03/2017] [Indexed: 01/24/2023] Open
Abstract
Background Recent studies have suggested that combinations of multiple epigenetic modifications are essential for controlling gene expression. Despite numerous computational approaches have been developed to decipher the combinatorial epigenetic patterns or “epigenetic code”, none of them has explicitly addressed the relationship between a specific transcription factor (TF) and the patterns. Methods Here, we developed a novel computational method, T-cep, for annotating chromatin states associated with a specific TF. T-cep is composed of three key consecutive modules: (i) Data preprocessing, (ii) HMM training, and (iii) Potential TF-states calling. Results We evaluated T-cep on a TCF7L2-omics data. Unexpectedly, our method has uncovered a novel set of TCF7L2-regulated intragenic enhancers missed by other software tools, where the associated genes exert the highest gene expression. We further used siRNA knockdown, Co-transfection, RT-qPCR and Luciferase Reporter Assay not only to validate the accuracy and efficiency of prediction by T-cep, but also to confirm the functionality of TCF7L2-regulated enhancers in both MCF7 and PANC1 cells respectively. Conclusions Our study for the first time at a genome-wide scale reveals the enhanced transcriptional activity of cell-type-specific TCF7L2 intragenic enhancers in regulating gene expression. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3764-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qi Liu
- Department of Molecular Medicine, University of Texas Health Science Center, 8403 Floyd Curl, San Antonio, TX, 78229, USA.,College of Life Science, Jilin University, Changchun, 130012, China
| | - Russell Bonneville
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH, 43210, USA
| | - Tianbao Li
- Department of Molecular Medicine, University of Texas Health Science Center, 8403 Floyd Curl, San Antonio, TX, 78229, USA.,College of Life Science, Jilin University, Changchun, 130012, China
| | - Victor X Jin
- Department of Molecular Medicine, University of Texas Health Science Center, 8403 Floyd Curl, San Antonio, TX, 78229, USA.
| |
Collapse
|
10
|
Zhang Y, An L, Yue F, Hardison RC. Jointly characterizing epigenetic dynamics across multiple human cell types. Nucleic Acids Res 2016; 44:6721-31. [PMID: 27095202 PMCID: PMC5772166 DOI: 10.1093/nar/gkw278] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 04/06/2016] [Indexed: 12/16/2022] Open
Abstract
Advanced sequencing technologies have generated a plethora of data for many chromatin marks in multiple tissues and cell types, yet there is lack of a generalized tool for optimal utility of those data. A major challenge is to quantitatively model the epigenetic dynamics across both the genome and many cell types for understanding their impacts on differential gene regulation and disease. We introduce IDEAS, an integrative and discriminative epigenome annotation system, for jointly characterizing epigenetic landscapes in many cell types and detecting differential regulatory regions. A key distinction between our method and existing state-of-the-art algorithms is that IDEAS integrates epigenomes of many cell types simultaneously in a way that preserves the position-dependent and cell type-specific information at fine scales, thereby greatly improving segmentation accuracy and producing comparable annotations across cell types.
Collapse
Affiliation(s)
- Yu Zhang
- Dept. of Statistics, Penn State University, 325 Thomas Building, University Park, PA 16803, USA
| | - Lin An
- Bioinformatics and Genomics Program, Huck Institutes of the Life Sciences, Penn State University, 101 Huck Life Sciences Building, University Park, PA 16802, USA
| | - Feng Yue
- Dept. of Biochemistry and Molecular Biology, Penn State School of Medicine, 500 University Drive, MC H171, Hershey, PA 17033, USA
| | - Ross C Hardison
- Dept. of Biochemistry and Molecular Biology, Penn State University, 304 Wartik Laboratory, University Park, PA 16802, USA
| |
Collapse
|
11
|
Lin WY. Beyond Rare-Variant Association Testing: Pinpointing Rare Causal Variants in Case-Control Sequencing Study. Sci Rep 2016; 6:21824. [PMID: 26903168 PMCID: PMC4763184 DOI: 10.1038/srep21824] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 02/01/2016] [Indexed: 12/31/2022] Open
Abstract
Rare-variant association testing usually requires some method of aggregation. The next important step is to pinpoint individual rare causal variants among a large number of variants within a genetic region. Recently Ionita-Laza et al. propose a backward elimination (BE) procedure that can identify individual causal variants among the many variants in a gene. The BE procedure removes a variant if excluding this variant can lead to a smaller P-value for the BURDEN test (referred to as "BE-BURDEN") or the SKAT test (referred to as "BE-SKAT"). We here use the adaptive combination of P-values (ADA) method to pinpoint causal variants. Unlike most gene-based association tests, the ADA statistic is built upon per-site P-values of individual variants. It is straightforward to select important variants given the optimal P-value truncation threshold found by ADA. We performed comprehensive simulations to compare ADA with BE-SKAT and BE-BURDEN. Ranking these three approaches according to positive predictive values (PPVs), the percentage of truly causal variants among the total selected variants, we found ADA > BE-SKAT > BE-BURDEN across all simulation scenarios. We therefore recommend using ADA to pinpoint plausible rare causal variants in a gene.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|