1
|
Ibrahim A, Jahanifar M, Wahab N, Toss MS, Makhlouf S, Atallah N, Lashen AG, Katayama A, Graham S, Bilal M, Bhalerao A, Ahmed Raza SE, Snead D, Minhas F, Rajpoot N, Rakha E. Artificial Intelligence-Based Mitosis Scoring in Breast Cancer: Clinical Application. Mod Pathol 2024; 37:100416. [PMID: 38154653 DOI: 10.1016/j.modpat.2023.100416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 10/27/2023] [Accepted: 12/14/2023] [Indexed: 12/30/2023]
Abstract
In recent years, artificial intelligence (AI) has demonstrated exceptional performance in mitosis identification and quantification. However, the implementation of AI in clinical practice needs to be evaluated against the existing methods. This study is aimed at assessing the optimal method of using AI-based mitotic figure scoring in breast cancer (BC). We utilized whole slide images from a large cohort of BC with extended follow-up comprising a discovery (n = 1715) and a validation (n = 859) set (Nottingham cohort). The Cancer Genome Atlas of breast invasive carcinoma (TCGA-BRCA) cohort (n = 757) was used as an external test set. Employing automated mitosis detection, the mitotic count was assessed using 3 different methods, the mitotic count per tumor area (MCT; calculated by dividing the number of mitotic figures by the total tumor area), the mitotic index (MI; defined as the average number of mitotic figures per 1000 malignant cells), and the mitotic activity index (MAI; defined as the number of mitotic figures in 3 mm2 area within the mitotic hotspot). These automated metrics were evaluated and compared based on their correlation with the well-established visual scoring method of the Nottingham grading system and Ki67 score, clinicopathologic parameters, and patient outcomes. AI-based mitotic scores derived from the 3 methods (MCT, MI, and MAI) were significantly correlated with the clinicopathologic characteristics and patient survival (P < .001). However, the mitotic counts and the derived cutoffs varied significantly between the 3 methods. Only MAI and MCT were positively correlated with the gold standard visual scoring method used in Nottingham grading system (r = 0.8 and r = 0.7, respectively) and Ki67 scores (r = 0.69 and r = 0.55, respectively), and MAI was the only independent predictor of survival (P < .05) in multivariate Cox regression analysis. For clinical applications, the optimum method of scoring mitosis using AI needs to be considered. MAI can provide reliable and reproducible results and can accurately quantify mitotic figures in BC.
Collapse
Affiliation(s)
- Asmaa Ibrahim
- Academic Unit for Translational Medical Sciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom; Department of Pathology, Faculty of Medicine, Suez Canal University, Egypt
| | - Mostafa Jahanifar
- Tissue Image Analytics Centre, University of Warwick, United Kingdom
| | - Noorul Wahab
- Tissue Image Analytics Centre, University of Warwick, United Kingdom
| | - Michael S Toss
- Academic Unit for Translational Medical Sciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom; Histopathology Department, Royal Hallamshire Hospital, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
| | - Shorouk Makhlouf
- Academic Unit for Translational Medical Sciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| | - Nehal Atallah
- Academic Unit for Translational Medical Sciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| | - Ayat G Lashen
- Academic Unit for Translational Medical Sciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| | - Ayaka Katayama
- Academic Unit for Translational Medical Sciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| | - Simon Graham
- Tissue Image Analytics Centre, University of Warwick, United Kingdom
| | - Mohsin Bilal
- Tissue Image Analytics Centre, University of Warwick, United Kingdom
| | - Abhir Bhalerao
- Tissue Image Analytics Centre, University of Warwick, United Kingdom
| | - Shan E Ahmed Raza
- Tissue Image Analytics Centre, University of Warwick, United Kingdom
| | - David Snead
- Cellular Pathology, University Hospitals Coventry and Warwickshire NHS Trust, United Kingdom
| | - Fayyaz Minhas
- Tissue Image Analytics Centre, University of Warwick, United Kingdom
| | - Nasir Rajpoot
- Tissue Image Analytics Centre, University of Warwick, United Kingdom.
| | - Emad Rakha
- Academic Unit for Translational Medical Sciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom; Nottingham University Hospitals NHS Trust, Nottingham, United Kingdom; Pathology Department, Hamad Medical Corporation, Doha, Qatar.
| |
Collapse
|
2
|
Monroy-Eklund A, Taylor C, Weidmann CA, Burch C, Laederach A. Structural analysis of MALAT1 long noncoding RNA in cells and in evolution. RNA (NEW YORK, N.Y.) 2023; 29:691-704. [PMID: 36792358 PMCID: PMC10159000 DOI: 10.1261/rna.079388.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 02/02/2023] [Indexed: 05/06/2023]
Abstract
Although not canonically polyadenylated, the long noncoding RNA MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) is stabilized by a highly conserved 76-nt triple helix structure on its 3' end. The entire MALAT1 transcript is over 8000 nt long in humans. The strongest structural conservation signal in MALAT1 (as measured by covariation of base pairs) is in the triple helix structure. Primary sequence analysis of covariation alone does not reveal the degree of structural conservation of the entire full-length transcript, however. Furthermore, RNA structure is often context dependent; RNA binding proteins that are differentially expressed in different cell types may alter structure. We investigate here the in-cell and cell-free structures of the full-length human and green monkey (Chlorocebus sabaeus) MALAT1 transcripts in multiple tissue-derived cell lines using SHAPE chemical probing. Our data reveal levels of uniform structural conservation in different cell lines, in cells and cell-free, and even between species, despite significant differences in primary sequence. The uniformity of the structural conservation across the entire transcript suggests that, despite seeing covariation signals only in the triple helix junction of the lncRNA, the rest of the transcript's structure is remarkably conserved, at least in primates and across multiple cell types and conditions.
Collapse
Affiliation(s)
- Anais Monroy-Eklund
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Colin Taylor
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Chase A Weidmann
- Department of Biological Chemistry, University of Michigan Medical School, Center for RNA Biomedicine, Rogel Cancer Center, Ann Arbor, Michigan 48109, USA
| | - Christina Burch
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
3
|
Halvorsen SC, Benita Y, Hopton M, Hoppe B, Gunnlaugsson HO, Korgaonkar P, Vanderburg CR, Nielsen GP, Trepanowski N, Cheah JH, Frosch MP, Schwab JH, Rosenberg AE, Hornicek FJ, Sassi S. Transcriptional Profiling Supports the Notochordal Origin of Chordoma and Its Dependence on a TGFΒ1-TBXT Network. THE AMERICAN JOURNAL OF PATHOLOGY 2023; 193:532-547. [PMID: 36804377 DOI: 10.1016/j.ajpath.2023.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Revised: 12/23/2022] [Accepted: 01/26/2023] [Indexed: 02/19/2023]
Abstract
Chordoma is a rare malignant tumor demonstrating notochordal differentiation. It is dependent on brachyury (TBXT), a hallmark notochordal gene and transcription factor, and shares histologic features and the same anatomic location as the notochord. In this study, we perform a molecular comparison of chordoma and notochord to identify dysregulated cellular pathways. The lack of a molecular reference from appropriate control tissue limits our understanding of chordoma and its relationship to notochord. Accordingly, we conducted an unbiased comparison of chordoma, human notochord, and an atlas of normal and cancerous tissue using gene expression profiling to clarify the chordoma/notochord relationship and potentially identify novel drug targets. We found striking consistency in gene expression profiles between chordoma and notochord, supporting the hypothesis that chordoma develops from notochordal remnants. We identified a 12-gene diagnostic chordoma signature and found that the TBXT/transforming growth factor (TGF)-β/SOX6/SOX9 pathway is hyperactivated in the tumor, suggesting that pathways associated with chondrogenesis are a central driver of chordoma development. Experimental validation in chordoma cells confirms these findings and emphasizes the dependence of chordoma proliferation and survival on TGF-β. Our computational and experimental evidence provides the first molecular connection between notochord and chordoma and identifies core members of a chordoma regulatory pathway involving TBXT. This pathway provides new therapeutic targets for this unique malignant neoplasm and highlights TGF-β as a prime druggable candidate.
Collapse
Affiliation(s)
- Stefan C Halvorsen
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts
| | - Yair Benita
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts
| | - Megan Hopton
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts
| | - Brooke Hoppe
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts
| | - Hilmar Orn Gunnlaugsson
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts
| | - Parimal Korgaonkar
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts
| | - Charles R Vanderburg
- Harvard NeuroDiscovery Center, Massachusetts General Hospital and Harvard Medical School, Charlestown, Massachusetts
| | - G Petur Nielsen
- Department of Pathology, Harvard Medical School, Massachusetts General Hospital, Boston, Massachusetts
| | - Nicole Trepanowski
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts
| | - Jaime H Cheah
- High Throughput Sciences Facility, Koch Institute of MIT, Cambridge, Massachusetts
| | - Matthew P Frosch
- C.S. Kubik Laboratory for Neuropathology, Massachusetts General Hospital, Charlestown, Massachusetts
| | - Joseph H Schwab
- Department of Orthopedic Surgery, Harvard Medical School, Massachusetts General Hospital, Boston, Massachusetts
| | - Andrew E Rosenberg
- Department of Pathology, Harvard Medical School, Massachusetts General Hospital, Boston, Massachusetts
| | - Francis J Hornicek
- Department of Orthopedic Surgery, Harvard Medical School, Massachusetts General Hospital, Boston, Massachusetts.
| | - Slim Sassi
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, Massachusetts; Department of Orthopedic Surgery, Harvard Medical School, Massachusetts General Hospital, Boston, Massachusetts.
| |
Collapse
|
4
|
Liu Y, Zhu ZX, Zboinski EK, Qiu W, Lian J, Liu S, Van Dyke TE, Johansson HE, Tu Q, Luo E, Chen JJ. Long non-coding RNA APDC plays important regulatory roles in metabolism of bone and adipose tissues. RNA Biol 2023; 20:836-846. [PMID: 37953645 PMCID: PMC10653663 DOI: 10.1080/15476286.2023.2268489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/23/2023] [Indexed: 11/14/2023] Open
Abstract
The long noncoding RNA (lncR) ANRIL in the human genome is an established genetic risk factor for atherosclerosis, periodontitis, diabetes, and cancer. However, the regulatory role of lncR-ANRIL in bone and adipose tissue metabolism remains unclear. To elucidate the function of lncRNA ANRIL in a mouse model, we investigated its ortholog, AK148321 (referred to as lncR-APDC), located on chr4 of the mouse genome, which is hypothesized to have similar biological functions to ANRIL. We initially revealed that lncR-APDC in mouse bone marrow cells (BMSCs) and lncR-ANRIL in human osteoblasts (hFOBs) are both increased during early osteogenesis. Subsequently, we examined the osteogenesis, adipogenesis, osteoclastogenesis function with lncR-APDC deletion/overexpression cell models. In vivo, we compared the phenotypic differences in bone and adipose tissue between APDC-KO and wild-type mice. Our findings demonstrated that lncR-APDC deficiency impaired osteogenesis while promoting adipogenesis and osteoclastogenesis. Conversely, the overexpression of lncR-APDC stimulated osteogenesis, but impaired adipogenesis and osteoclastogenesis. Furthermore, KDM6B was downregulated with lncR-APDC deficiency and upregulated with overexpression. Through binding-site analysis, we identified miR-99a as a potential target of lncR-APDC. The results suggest that lncR-APDC exerts its osteogenic function via miR-99a/KDM6B/Hox pathways. Additionally, osteoclasto-osteogenic imbalance was mediated by lncR-APDC through MAPK/p38 and TLR4/MyD88 activation. These findings highlight the pivotal role of lncR-APDC as a key regulator in bone and fat tissue metabolism. It shows potential therapeutic for addressing imbalances in osteogenesis, adipogenesis, and osteoclastogenesis.
Collapse
Affiliation(s)
- Yao Liu
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
- Division of Oral Biology, Tufts University School of Dental Medicine, Boston, MA, USA
| | - Zoe Xiaofang Zhu
- Division of Oral Biology, Tufts University School of Dental Medicine, Boston, MA, USA
| | - Elissa K. Zboinski
- Division of Oral Biology, Tufts University School of Dental Medicine, Boston, MA, USA
| | - Wei Qiu
- Division of Oral Biology, Tufts University School of Dental Medicine, Boston, MA, USA
- Department of Stomatology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Junxiang Lian
- Division of Oral Biology, Tufts University School of Dental Medicine, Boston, MA, USA
- Stomatological Hospital, Southern Medical University, Guangzhou, Guangdong, China
| | - Shibo Liu
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Thomas E. Van Dyke
- Center for Clinical and Translational Research, The Forsyth Institute, Cambridge, MA, USA
- Department of Oral Medicine, Infection, and Immunity, Faculty of Medicine, Harvard University, Boston, MA, USA
| | - Hans E. Johansson
- Research and Development, LGC Biosearch Technologies, Petaluma, CA, USA
| | - Qisheng Tu
- Division of Oral Biology, Tufts University School of Dental Medicine, Boston, MA, USA
| | - En Luo
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Jake Jinkun Chen
- Division of Oral Biology, Tufts University School of Dental Medicine, Boston, MA, USA
- Department of Developmental, Molecular and Chemical Biology, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
5
|
Burskaia V, Naumenko S, Schelkunov M, Bedulina D, Neretina T, Kondrashov A, Yampolsky L, Bazykin GA. Excessive Parallelism in Protein Evolution of Lake Baikal Amphipod Species Flock. Genome Biol Evol 2020; 12:1493-1503. [PMID: 32653919 DOI: 10.1093/gbe/evaa138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/03/2020] [Indexed: 11/12/2022] Open
Abstract
Repeated emergence of similar adaptations is often explained by parallel evolution of underlying genes. However, evidence of parallel evolution at amino acid level is limited. When the analyzed species are highly divergent, this can be due to epistatic interactions underlying the dynamic nature of the amino acid preferences: The same amino acid substitution may have different phenotypic effects on different genetic backgrounds. Distantly related species also often inhabit radically different environments, which makes the emergence of parallel adaptations less likely. Here, we hypothesize that parallel molecular adaptations are more prevalent between closely related species. We analyze the rate of parallel evolution in genome-size sets of orthologous genes in three groups of species with widely ranging levels of divergence: 46 species of the relatively recent lake Baikal amphipod radiation, a species flock of very closely related cichlids, and a set of significantly more divergent vertebrates. Strikingly, in genes of amphipods, the rate of parallel substitutions at nonsynonymous sites exceeded that at synonymous sites, suggesting rampant selection driving parallel adaptation. At sites of parallel substitutions, the intraspecies polymorphism is low, suggesting that parallelism has been driven by positive selection and is therefore adaptive. By contrast, in cichlids, the rate of nonsynonymous parallel evolution was similar to that at synonymous sites, whereas in vertebrates, this rate was lower than that at synonymous sites, indicating that in these groups of species, parallel substitutions are mainly fixed by drift.
Collapse
Affiliation(s)
- Valentina Burskaia
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Moscow Oblast, Russia
| | - Sergey Naumenko
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, Russia
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Mikhail Schelkunov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Moscow Oblast, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, Russia
| | - Daria Bedulina
- Institute of Biology, Irkutsk State University, Russia
- Baikal Research Centre, Irkutsk, Russia
| | - Tatyana Neretina
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, Russia
- N.A. Pertsov White Sea Biological Station, Lomonosov Moscow State University, Primorskiy, Russia
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Russia
| | - Alexey Kondrashov
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Russia
- Department of Ecology and Evolutionary Biology, University of Michigan
| | - Lev Yampolsky
- Department of Biological Sciences, East Tennessee State University
| | - Georgii A Bazykin
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Moscow Oblast, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, Russia
| |
Collapse
|
6
|
Choudhury R, Singh S, Arumugam S, Roguev A, Stewart AF. The Set1 complex is dimeric and acts with Jhd2 demethylation to convey symmetrical H3K4 trimethylation. Genes Dev 2019; 33:550-564. [PMID: 30842216 PMCID: PMC6499330 DOI: 10.1101/gad.322222.118] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 02/15/2019] [Indexed: 12/19/2022]
Abstract
In this study, Choudhury et al. report that yeast Set1C/COMPASS is dimeric and, consequently, symmetrically trimethylates histone 3 Lys4 (H3K4me3) on promoter nucleosomes. This presents a new paradigm for the establishment of epigenetic detail, in which dimeric methyltransferase and monomeric demethylase cooperate to eliminate asymmetry and focus symmetrical H3K4me3 onto selected nucleosomes. Epigenetic modifications can maintain or alter the inherent symmetry of the nucleosome. However, the mechanisms that deposit and/or propagate symmetry or asymmetry are not understood. Here we report that yeast Set1C/COMPASS (complex of proteins associated with Set1) is dimeric and, consequently, symmetrically trimethylates histone 3 Lys4 (H3K4me3) on promoter nucleosomes. Mutation of the dimer interface to make Set1C monomeric abolished H3K4me3 on most promoters. The most active promoters, particularly those involved in the oxidative phase of the yeast metabolic cycle, displayed H3K4me2, which is normally excluded from active promoters, and a subset of these also displayed H3K4me3. In wild-type yeast, deletion of the sole H3K4 demethylase, Jhd2, has no effect. However, in monomeric Set1C yeast, Jhd2 deletion increased H3K4me3 levels on the H3K4me2 promoters. Notably, the association of Set1C with the elongating polymerase was not perturbed by monomerization. These results imply that symmetrical H3K4 methylation is an embedded consequence of Set1C dimerism and that Jhd2 demethylates asymmetric H3K4me3. Consequently, rather than methylation and demethylation acting in opposition as logic would suggest, a dimeric methyltransferase and monomeric demethylase cooperate to eliminate asymmetry and focus symmetrical H3K4me3 onto selected nucleosomes. This presents a new paradigm for the establishment of epigenetic detail.
Collapse
Affiliation(s)
- Rupam Choudhury
- Genomics, Biotechnology Center, Center for Molecular and Cellular Bioengineering, University of Technology Dresden, 01307 Dresden, Germany
| | - Sukhdeep Singh
- Genomics, Biotechnology Center, Center for Molecular and Cellular Bioengineering, University of Technology Dresden, 01307 Dresden, Germany
| | - Senthil Arumugam
- European Molecular Biology Laboratory Australia Node for Single Molecule Science, ARC Centre of Excellence in Advanced Molecular Imaging, School of Medical Sciences, University of New South Wales, Sydney 2052, Australia
| | - Assen Roguev
- Genomics, Biotechnology Center, Center for Molecular and Cellular Bioengineering, University of Technology Dresden, 01307 Dresden, Germany.,Department of Cellular and Molecular Pharmacology, University of California at San Francisco, San Francisco, California 94518, USA
| | - A Francis Stewart
- Genomics, Biotechnology Center, Center for Molecular and Cellular Bioengineering, University of Technology Dresden, 01307 Dresden, Germany
| |
Collapse
|
7
|
Phenotypic Nonspecificity as the Result of Limited Specificity of Transcription Factor Function. GENETICS RESEARCH INTERNATIONAL 2018; 2018:7089109. [PMID: 30510805 PMCID: PMC6230420 DOI: 10.1155/2018/7089109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Accepted: 10/09/2018] [Indexed: 11/18/2022]
Abstract
Drosophila transcription factor (TF) function is phenotypically nonspecific. Phenotypic nonspecificity is defined as one phenotype being induced or rescued by multiple TFs. To explain this unexpected result, a hypothetical world of limited specificity is explored where all TFs have unique random distributions along the genome due to low information content of DNA sequence recognition and somewhat promiscuous cooperative interactions with other TFs. Transcription is an emergent property of these two conditions. From this model, explicit predictions are made. First, many more cases of TF nonspecificity are expected when examined. Second, the genetic analysis of regulatory sequences should uncover cis-element bypass and, third, genetic analysis of TF function should generally uncover differential pleiotropy. In addition, limited specificity provides evolutionary opportunity and explains the inefficiency of expression analysis in identifying genes required for biological processes.
Collapse
|
8
|
Butkiewicz M, Blue EE, Leung YY, Jian X, Marcora E, Renton AE, Kuzma A, Wang LS, Koboldt DC, Haines JL, Bush WS. Functional annotation of genomic variants in studies of late-onset Alzheimer's disease. Bioinformatics 2018; 34:2724-2731. [PMID: 29590295 PMCID: PMC6084586 DOI: 10.1093/bioinformatics/bty177] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 03/17/2018] [Accepted: 03/23/2018] [Indexed: 01/01/2023] Open
Abstract
Motivation Annotation of genomic variants is an increasingly important and complex part of the analysis of sequence-based genomic analyses. Computational predictions of variant function are routinely incorporated into gene-based analyses of rare-variants, though to date most studies use limited information for assessing variant function that is often agnostic of the disease being studied. Results In this work, we outline an annotation process motivated by the Alzheimer's Disease Sequencing Project, illustrate the impact of including tissue-specific transcript sets and sources of gene regulatory information and assess the potential impact of changing genomic builds on the annotation process. While these factors only impact a small proportion of total variant annotations (∼5%), they influence the potential analysis of a large fraction of genes (∼25%). Availability and implementation Individual variant annotations are available via the NIAGADS GenomicsDB, at https://www.niagads.org/genomics/ tools-and-software/databases/genomics-database. Annotations are also available for bulk download at https://www.niagads.org/datasets. Annotation processing software is available at http://www.icompbio.net/resources/software-and-downloads/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mariusz Butkiewicz
- Department of Population and Quantitative Health Sciences, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Elizabeth E Blue
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Xueqiu Jian
- Division of Epidemiology, Human Genetics and Environmental Sciences, University of Texas Health Science Center, Houston, TX, USA
| | - Edoardo Marcora
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alan E Renton
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Amanda Kuzma
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Jonathan L Haines
- Department of Population and Quantitative Health Sciences, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - William S Bush
- Department of Population and Quantitative Health Sciences, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
9
|
Khor SS, Morino R, Nakazono K, Kamitsuji S, Akita M, Kawajiri M, Yamasaki T, Kami A, Hoshi Y, Tada A, Ishikawa K, Hine M, Kobayashi M, Kurume N, Kamatani N, Tokunaga K, Johnson TA. Genome-wide association study of self-reported food reactions in Japanese identifies shrimp and peach specific loci in the HLA-DR/DQ gene region. Sci Rep 2018; 8:1069. [PMID: 29348432 PMCID: PMC5773682 DOI: 10.1038/s41598-017-18241-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 12/07/2017] [Indexed: 12/20/2022] Open
Abstract
Food allergy is an increasingly important health problem in the world. Several genome-wide association studies (GWAS) focused on European ancestry samples have identified food allergy-specific loci in the HLA class II region. We conducted GWAS of self-reported reactivity with common foods using the data from 11011 Japanese women and identified shrimp and peach allergy-specific loci in the HLA-DR/DQ gene region tagged by rs74995702 (P = 6.30 × 10−17, OR = 1.91) and rs28359884 (P = 2.3 × 10−12, OR = 1.80), respectively. After HLA imputation using a Japanese population-specific reference, the most strongly associated haplotype was HLA-DRB1*04:05-HLA-DQB1*04:01 for shrimp allergy (P = 3.92 × 10−19, OR = 1.99) and HLA-DRB1*09:01-HLA-DQB1*03:03 for peach allergy (P = 1.15 × 10−7, OR = 1.68). Additionally, both allergies’ associated variants were eQTLs for several HLA genes, with HLA-DQA2 the single eQTL gene shared between the two traits. Our study suggests that allergy to certain foods may be related to genetic differences that tag both HLA alleles having particular epitope binding specificities as well as variants modulating expression of particular HLA genes. Investigating this further could increase our understanding of food allergy aetiology and potentially lead to better therapeutic strategies for allergen immunotherapies.
Collapse
Affiliation(s)
- Seik-Soon Khor
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Ryoko Morino
- EverGene Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | | | | | | | | | - Tatsuya Yamasaki
- Life Science Group, Healthcare Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Azusa Kami
- EverGene Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Yuria Hoshi
- Life Science Group, Healthcare Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Asami Tada
- EverGene Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | | | - Maaya Hine
- LunaLuna Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Miki Kobayashi
- LunaLuna Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Nami Kurume
- LunaLuna Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | | | - Katsushi Tokunaga
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
| | | |
Collapse
|
10
|
Kober KM, Pogson GH. Genome-wide signals of positive selection in strongylocentrotid sea urchins. BMC Genomics 2017; 18:555. [PMID: 28732465 PMCID: PMC5521101 DOI: 10.1186/s12864-017-3944-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 07/13/2017] [Indexed: 12/21/2022] Open
Abstract
Background Comparative genomics studies investigating the signals of positive selection among groups of closely related species are still rare and limited in taxonomic breadth. Such studies show great promise in advancing our knowledge about the proportion and the identity of genes experiencing diversifying selection. However, methodological challenges have led to high levels of false positives in past studies. Here, we use the well-annotated genome of the purple sea urchin, Strongylocentrotus purpuratus, as a reference to investigate the signals of positive selection at 6520 single-copy orthologs from nine sea urchin species belonging to the family Strongylocentrotidae paying careful attention to minimizing false positives. Results We identified 1008 (15.5%) candidate positive selection genes (PSGs). Tests for positive selection along the nine terminal branches of the phylogeny identified 824 genes that showed lineage-specific adaptive diversification (1.67% of branch-sites tests performed). Positively selected codons were not enriched at exon borders or near regions containing missing data, suggesting a limited contribution of false positives caused by alignment or annotation errors. Alignments were validated at 10 loci with re-sequencing using Sanger methods. No differences were observed in the rates of synonymous substitution (dS), GC content, and codon bias between the candidate PSGs and those not showing positive selection. However, the candidate PSGs had 68% higher rates of nonsynonymous substitution (dN) and 33% lower levels of heterozygosity, consistent with selective sweeps and opposite to that expected by a relaxation of selective constraint. Although positive selection was identified at reproductive proteins and innate immunity genes, the strongest signals of adaptive diversification were observed at extracellular matrix proteins, cell adhesion molecules, membrane receptors, and ion channels. Many candidate PSGs have been widely implicated as targets of pathogen binding, inactivation, mimicry, or exploitation in other groups (notably mammals). Conclusions Our study confirmed the widespread action of positive selection across sea urchin genomes and allowed us to reject the possibility that annotation and alignment errors (including paralogs) were responsible for creating false signals of adaptive molecular divergence. The candidate PSGs identified in our study represent promising targets for future research into the selective agents responsible for their adaptive diversification and their contribution to speciation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3944-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kord M Kober
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, USA. .,Institute for Computational Health Sciences, University of California, San Francisco, USA. .,Present address: Department of Physiological Nursing, University of California, San Francisco, USA.
| | - Grant H Pogson
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, USA
| |
Collapse
|
11
|
Tyner C, Barber GP, Casper J, Clawson H, Diekhans M, Eisenhart C, Fischer CM, Gibson D, Gonzalez JN, Guruvadoo L, Haeussler M, Heitner S, Hinrichs AS, Karolchik D, Lee BT, Lee CM, Nejad P, Raney BJ, Rosenbloom KR, Speir ML, Villarreal C, Vivian J, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 2017; 45:D626-D634. [PMID: 27899642 PMCID: PMC5210591 DOI: 10.1093/nar/gkw1134] [Citation(s) in RCA: 197] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 10/17/2016] [Accepted: 10/31/2016] [Indexed: 12/14/2022] Open
Abstract
Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new 'multi-region' track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan.
Collapse
Affiliation(s)
- Cath Tyner
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Clayton M Fischer
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Gibson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Luvina Guruvadoo
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Donna Karolchik
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Parisa Nejad
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kate R Rosenbloom
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Chris Villarreal
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - John Vivian
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
12
|
Abstract
Methylation of the 5-cytosine (m5C) is a common but not well-understood RNA modification, which can be detected by sequencing of bisulfite-treated transcripts (RNA-BSseq). In this Chapter, we discuss computational RNA-BSseq data analysis methods for transcriptome-wide identification and quantification of m5C.
Collapse
Affiliation(s)
- Dietmar Rieder
- Division of Bioinformatics, Biocenter, Medical University of Innsbruck, Innrain 80/IV, Innsbruck, 6020, Austria.
| | - Francesca Finotello
- Division of Bioinformatics, Biocenter, Medical University of Innsbruck, Innrain 80/IV, Innsbruck, 6020, Austria
| |
Collapse
|
13
|
Inferring Heterozygosity from Ancient and Low Coverage Genomes. Genetics 2016; 205:317-332. [PMID: 27821432 PMCID: PMC5223511 DOI: 10.1534/genetics.116.189985] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 10/19/2016] [Indexed: 12/30/2022] Open
Abstract
While genetic diversity can be quantified accurately from high coverage sequencing data, it is often desirable to obtain such estimates from data with low coverage, either to save costs or because of low DNA quality, as is observed for ancient samples. Here, we introduce a method to accurately infer heterozygosity probabilistically from sequences with average coverage <1× of a single individual. The method relaxes the infinite sites assumption of previous methods, does not require a reference sequence, except for the initial alignment of the sequencing data, and takes into account both variable sequencing errors and potential postmortem damage. It is thus also applicable to nonmodel organisms and ancient genomes. Since error rates as reported by sequencing machines are generally distorted and require recalibration, we also introduce a method to accurately infer recalibration parameters in the presence of postmortem damage. This method does not require knowledge about the underlying genome sequence, but instead works with haploid data (e.g., from the X-chromosome from mammalian males) and integrates over the unknown genotypes. Using extensive simulations we show that a few megabasepairs of haploid data are sufficient for accurate recalibration, even at average coverages as low as 1×. At similar coverages, our method also produces very accurate estimates of heterozygosity down to 10−4 within windows of about 1 Mbp. We further illustrate the usefulness of our approach by inferring genome-wide patterns of diversity for several ancient human samples, and we found that 3000–5000-year-old samples showed diversity patterns comparable to those of modern humans. In contrast, two European hunter-gatherer samples exhibited not only considerably lower levels of diversity than modern samples, but also highly distinct distributions of diversity along their genomes. Interestingly, these distributions were also very different between the two samples, supporting earlier conclusions of a highly diverse and structured population in Europe prior to the arrival of farming.
Collapse
|
14
|
Lucotte EA, Laurent R, Heyer E, Ségurel L, Toupance B. Detection of Allelic Frequency Differences between the Sexes in Humans: A Signature of Sexually Antagonistic Selection. Genome Biol Evol 2016; 8:1489-500. [PMID: 27189992 PMCID: PMC4898804 DOI: 10.1093/gbe/evw090] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Sexually antagonistic (SA) selection, a form of selection that can occur when both sexes have different fitness optima for a trait, is a major force shaping the evolution of organisms. A seminal model developed by Rice (Rice WR. 1984. Sex chromosomes and the evolution of sexual dimorphism. Evolution 38:735-742.) predicts that the X chromosome should be a hotspot for the accumulation of loci under SA selection as compared with the autosomes. Here, we propose a methodological framework designed to detect a specific signature of SA selection on viability, differences in allelic frequencies between the sexes. Applying this method on genome-wide single nucleotide polymorphism (SNP) data in human populations where no sex-specific population stratification could be detected, we show that there are overall significantly more SNPs exhibiting differences in allelic frequencies between the sexes on the X chromosome as compared with autosomes, supporting the predictions of Rice's model. This pattern is consistent across populations and is robust to correction for potential biases such as differences in linkage disequilibrium, sample size, and genotyping errors between chromosomes. Although SA selection is not the only factor resulting in allelic frequency differences between the sexes, we further show that at least part of the identified X-linked loci is caused by such a sex-specific processes.
Collapse
Affiliation(s)
- Elise A Lucotte
- Eco-Anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Paris, France Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Romain Laurent
- Eco-Anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Evelyne Heyer
- Eco-Anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Laure Ségurel
- Eco-Anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Bruno Toupance
- Eco-Anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
15
|
Nettling M, Treutler H, Cerquides J, Grosse I. Detecting and correcting the binding-affinity bias in ChIP-seq data using inter-species information. BMC Genomics 2016; 17:347. [PMID: 27165633 PMCID: PMC4862171 DOI: 10.1186/s12864-016-2682-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 04/28/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. ChIP-seq has become the major technology to uncover genomic regions containing those binding sites, but motifs predicted by traditional computational approaches using these data are distorted by a ubiquitous binding-affinity bias. Here, we present an approach for detecting and correcting this bias using inter-species information. RESULTS We find that the binding-affinity bias caused by the ChIP-seq experiment in the reference species is stronger than the indirect binding-affinity bias in orthologous regions from phylogenetically related species. We use this difference to develop a phylogenetic footprinting model that is capable of detecting and correcting the binding-affinity bias. We find that this model improves motif prediction and that the corrected motifs are typically softer than those predicted by traditional approaches. CONCLUSIONS These findings indicate that motifs published in databases and in the literature are artificially sharpened compared to the native motifs. These findings also indicate that our current understanding of transcriptional gene regulation might be blurred, but that it is possible to advance this understanding by taking into account inter-species information available today and even more in the future.
Collapse
Affiliation(s)
- Martin Nettling
- Institute of Computer Science, Martin Luther University, Halle (Saale), Germany.
| | | | | | - Ivo Grosse
- Institute of Computer Science, Martin Luther University, Halle (Saale), Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| |
Collapse
|
16
|
Ren J, Song K, Deng M, Reinert G, Cannon CH, Sun F. Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics. Bioinformatics 2016; 32:993-1000. [PMID: 26130573 PMCID: PMC6169497 DOI: 10.1093/bioinformatics/btv395] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Revised: 03/11/2015] [Accepted: 06/25/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Next-generation sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential.A plausible model for this underlying distribution of word counts is given through modeling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. RESULTS Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution ,: using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate those using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results ,: and that the clustering results that use a N: MC of the estimated order give a plausible clustering of the species. AVAILABILITY AND IMPLEMENTATION Our implementation of the statistics developed here is available as R package 'NGS.MC' at http://www-rcf.usc.edu/∼fsun/Programs/NGS-MC/NGS-MC.html CONTACT fsun@usc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jie Ren
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, USA
| | - Kai Song
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Gesine Reinert
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK
| | - Charles H Cannon
- Department of Biological Sciences, Texas Tech University, TX 79409-3131, USA, Xishuangbanna Tropical Botanic Garden, Chinese Academy of Sciences, Yunnan, China and
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, USA, Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
17
|
Defective histone supply causes changes in RNA polymerase II elongation rate and cotranscriptional pre-mRNA splicing. Proc Natl Acad Sci U S A 2015; 112:14840-5. [PMID: 26578803 DOI: 10.1073/pnas.1506760112] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
RNA polymerase II (RNAPII) transcription elongation is a highly regulated process that greatly influences mRNA levels as well as pre-mRNA splicing. Despite many studies in vitro, how chromatin modulates RNAPII elongation in vivo is still unclear. Here, we show that a decrease in the level of available canonical histones leads to more accessible chromatin with decreased levels of canonical histones and variants H2A.X and H2A.Z and increased levels of H3.3. With this altered chromatin structure, the RNAPII elongation rate increases, and the kinetics of pre-mRNA splicing is delayed with respect to RNAPII elongation. Consistent with the kinetic model of cotranscriptional splicing, the rapid RNAPII elongation induced by histone depletion promotes the skipping of variable exons in the CD44 gene. Indeed, a slowly elongating mutant of RNAPII was able to rescue this defect, indicating that the defective splicing induced by histone depletion is a direct consequence of the increased elongation rate. In addition, genome-wide analysis evidenced that histone reduction promotes widespread alterations in pre-mRNA processing, including intron retention and changes in alternative splicing. Our data demonstrate that pre-mRNA splicing may be regulated by chromatin structure through the modulation of the RNAPII elongation rate.
Collapse
|
18
|
Rustagi Y, Jaiswal HK, Rawal K, Kundu GC, Rani V. Comparative Characterization of Cardiac Development Specific microRNAs: Fetal Regulators for Future. PLoS One 2015; 10:e0139359. [PMID: 26465880 PMCID: PMC4605649 DOI: 10.1371/journal.pone.0139359] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Accepted: 09/10/2015] [Indexed: 11/18/2022] Open
Abstract
MicroRNAs (miRNAs) are small, conserved RNAs known to regulate several biological processes by influencing gene expression in eukaryotes. The implication of miRNAs as another player of regulatory layers during heart development and diseases has recently been explored. However, there is no study which elucidates the profiling of miRNAs during development of heart till date. Very limited miRNAs have been reported to date in cardiac context. In addition, integration of large scale experimental data with computational and comparative approaches remains an unsolved challenge.The present study was designed to identify the microRNAs implicated in heart development using next generation sequencing, bioinformatics and experimental approaches. We sequenced six small RNA libraries prepared from different developmental stages of the heart using chicken as a model system to produce millions of short sequence reads. We detected 353 known and 703 novel miRNAs involved in heart development. Out of total 1056 microRNAs identified, 32.7% of total dataset of known microRNAs displayed differential expression whereas seven well studied microRNAs namely let-7, miR-140, miR-181, miR-30, miR-205, miR-103 and miR-22 were found to be conserved throughout the heart development. The 3'UTR sequences of genes were screened from Gallus gallus genome for potential microRNA targets. The target mRNAs were appeared to be enriched with genes related to cell cycle, apoptosis, signaling pathways, extracellular remodeling, metabolism, chromatin remodeling and transcriptional regulators. Our study presents the first comprehensive overview of microRNA profiling during heart development and prediction of possible cardiac specific targets and has a big potential in future to develop microRNA based therapeutics against cardiac pathologies where fetal gene re-expression is witnessed in adult heart.
Collapse
Affiliation(s)
- Yashika Rustagi
- Department of Biotechnology, Jaypee Institute of Information Technology, A–10, Sector–62, Noida, 201307, Uttar Pradesh, India
| | - Hitesh K. Jaiswal
- Department of Biotechnology, Jaypee Institute of Information Technology, A–10, Sector–62, Noida, 201307, Uttar Pradesh, India
| | - Kamal Rawal
- Department of Biotechnology, Jaypee Institute of Information Technology, A–10, Sector–62, Noida, 201307, Uttar Pradesh, India
| | - Gopal C. Kundu
- Laboratory of Tumor Biology, Angiogenesis and Nanomedicine Research, National Centre for Cell Science (NCCS), Pune 411007, India
| | - Vibha Rani
- Department of Biotechnology, Jaypee Institute of Information Technology, A–10, Sector–62, Noida, 201307, Uttar Pradesh, India
| |
Collapse
|
19
|
Zheng WW, Dong XM, Yin RH, Xu FF, Ning HM, Zhang MJ, Xu CW, Yang Y, Ding YL, Wang ZD, Zhao WB, Tang LJ, Chen H, Wang XH, Zhan YQ, Yu M, Ge CH, Li CY, Yang XM. EDAG positively regulates erythroid differentiation and modifies GATA1 acetylation through recruiting p300. Stem Cells 2015; 32:2278-89. [PMID: 24740910 DOI: 10.1002/stem.1723] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Revised: 03/03/2014] [Accepted: 03/24/2014] [Indexed: 11/11/2022]
Abstract
Erythroid differentiation-associated gene (EDAG) has been considered to be a transcriptional regulator that controls hematopoietic cell differentiation, proliferation, and apoptosis. The role of EDAG in erythroid differentiation of primary erythroid progenitor cells and in vivo remains unknown. In this study, we found that EDAG is highly expressed in CMPs and MEPs and upregulated during the erythroid differentiation of CD34(+) cells following erythropoietin (EPO) treatment. Overexpression of EDAG induced erythroid differentiation of CD34(+) cells in vitro and in vivo using immunodeficient mice. Conversely, EDAG knockdown reduced erythroid differentiation in EPO-treated CD34(+) cells. Detailed mechanistic analysis suggested that EDAG forms complex with GATA1 and p300 and increases GATA1 acetylation and transcriptional activity by facilitating the interaction between GATA1 and p300. EDAG deletion mutants lacking the binding domain with GATA1 or p300 failed to enhance erythroid differentiation, suggesting that EDAG regulates erythroid differentiation partly through forming EDAG/GATA1/p300 complex. In the presence of the specific inhibitor of p300 acetyltransferase activity, C646, EDAG was unable to accelerate erythroid differentiation, indicating an involvement of p300 acetyltransferase activity in EDAG-induced erythroid differentiation. ChIP-PCR experiments confirmed that GATA1 and EDAG co-occupy GATA1-targeted genes in primary erythroid cells and in vivo. ChIP-seq was further performed to examine the global occupancy of EDAG during erythroid differentiation and a total of 7,133 enrichment peaks corresponding to 3,847 genes were identified. Merging EDAG ChIP-Seq and GATA1 ChIP-Seq datasets revealed that 782 genes overlapped. Microarray analysis suggested that EDAG knockdown selectively inhibits GATA1-activated target genes. These data provide novel insights into EDAG in regulation of erythroid differentiation.
Collapse
Affiliation(s)
- Wei-Wei Zheng
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing, People's Republic of China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Absence of canonical marks of active chromatin in developmentally regulated genes. Nat Genet 2015; 47:1158-1167. [PMID: 26280901 PMCID: PMC4625605 DOI: 10.1038/ng.3381] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 07/22/2015] [Indexed: 12/13/2022]
Abstract
The interplay of active and repressive histone modifications is assumed to have a key role in the regulation of gene expression. In contrast to this generally accepted view, we show that the transcription of genes temporally regulated during fly and worm development occurs in the absence of canonically active histone modifications. Conversely, strong chromatin marking is related to transcriptional and post-transcriptional stability, an association that we also observe in mammals. Our results support a model in which chromatin marking is associated with the stable production of RNA, whereas unmarked chromatin would permit rapid gene activation and deactivation during development. In the latter case, regulation by transcription factors would have a comparatively more important regulatory role than chromatin marks.
Collapse
|
21
|
Chelaru F, Corrada Bravo H. Epiviz: a view inside the design of an integrated visual analysis software for genomics. BMC Bioinformatics 2015; 16 Suppl 11:S4. [PMID: 26328750 PMCID: PMC4559604 DOI: 10.1186/1471-2105-16-s11-s4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows. Results In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps. Conclusions Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community.
Collapse
|
22
|
Magbanua MJM, Wolf DM, Yau C, Davis SE, Crothers J, Au A, Haqq CM, Livasy C, Rugo HS, Esserman L, Park JW, van 't Veer LJ. Serial expression analysis of breast tumors during neoadjuvant chemotherapy reveals changes in cell cycle and immune pathways associated with recurrence and response. Breast Cancer Res 2015; 17:73. [PMID: 26021444 PMCID: PMC4479083 DOI: 10.1186/s13058-015-0582-3] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 05/13/2015] [Indexed: 12/20/2022] Open
Abstract
INTRODUCTION The molecular biology involving neoadjuvant chemotherapy (NAC) response is poorly understood. To elucidate the impact of NAC on the breast cancer transcriptome and its association with clinical outcome, we analyzed gene expression data derived from serial tumor samples of patients with breast cancer who received NAC in the I-SPY 1 TRIAL. METHODS Expression data were collected before treatment (T1), 24-96 hours after initiation of chemotherapy (T2) and at surgery (TS). Expression levels between T1 and T2 (T1 vs. T2; n = 36) and between T1 and TS (T1 vs. TS; n = 39) were compared. Subtype was assigned using the PAM50 gene signature. Differences in early gene expression changes (T2 - T1) between responders and nonresponders, as defined by residual cancer burden, were evaluated. Cox proportional hazards modeling was used to identify genes in residual tumors associated with recurrence-free survival (RFS). Pathway analysis was performed with Ingenuity software. RESULTS When we compared expression profiles at T1 vs. T2 and at T1 vs. TS, we detected significantly altered expression of 150 and 59 transcripts, respectively. We observed notable downregulation of proliferation and immune-related genes at T2. Lower concordance in subtype assignment was observed between T1 and TS (62 %) than between T1 and T2 (75 %). Analysis of early gene expression changes (T2 - T1) revealed that decreased expression of cell cycle inhibitors was associated with poor response. Increased interferon signaling (TS - T1) and high expression of cell proliferation genes in residual tumors (TS) were associated with reduced RFS. CONCLUSIONS Serial gene expression analysis revealed candidate immune and proliferation pathways associated with response and recurrence. Larger studies incorporating the approach described here are warranted to identify predictive and prognostic biomarkers in the NAC setting for specific targeted therapies. CLINICAL TRIAL REGISTRATION ClinicalTrials.gov identifier: NCT00033397 . Registered 9 Apr 2002.
Collapse
Affiliation(s)
- Mark Jesus M Magbanua
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA. .,Division of Hematology/Oncology, University of California San Francisco, Box 1387, 2340 Sutter Street, San Francisco, CA, 94115, USA.
| | - Denise M Wolf
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA.
| | - Christina Yau
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA. .,Department of Surgery, University of California San Francisco, San Francisco, CA, USA. .,Buck Institute for Research on Aging, Novato, CA, USA.
| | - Sarah E Davis
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA. .,Department of Surgery, University of California San Francisco, San Francisco, CA, USA.
| | - Julia Crothers
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA. .,Division of Hematology/Oncology, University of California San Francisco, Box 1387, 2340 Sutter Street, San Francisco, CA, 94115, USA.
| | - Alfred Au
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA. .,Department of Surgery, University of California San Francisco, San Francisco, CA, USA.
| | - Christopher M Haqq
- Department of Urology, University of California San Francisco, San Francisco, CA, USA.
| | - Chad Livasy
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA.
| | - Hope S Rugo
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA. .,Division of Hematology/Oncology, University of California San Francisco, Box 1387, 2340 Sutter Street, San Francisco, CA, 94115, USA.
| | | | - Laura Esserman
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA. .,Department of Surgery, University of California San Francisco, San Francisco, CA, USA.
| | - John W Park
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA. .,Division of Hematology/Oncology, University of California San Francisco, Box 1387, 2340 Sutter Street, San Francisco, CA, 94115, USA.
| | - Laura J van 't Veer
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA. .,Department of Laboratory Medicine, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
23
|
Wang N, Lu SF, Chen H, Wang JF, Fu SP, Hu CJ, Yang Y, Liang FR, Zhu BM. A protocol of histone modification-based mechanistic study of acupuncture in patients with stable angina pectoris. Altern Ther Health Med 2015; 15:139. [PMID: 25925670 PMCID: PMC4465012 DOI: 10.1186/s12906-015-0653-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Accepted: 04/15/2015] [Indexed: 12/20/2022]
Abstract
Background Angina pectoris (Angina) is a medical condition related to myocardial ischemia. Although acupuncture has been widely accepted as a clinical approach for angina, there is no sufficient evidence of its effectiveness against this syndrome, and its mechanisms have not yet been well elucidated. We develop this protocol to confirm the clinical efficacy of electro-acupuncture on stable angina pectoris by needling on acupoint Neiguan (PC6). Furthermore, we employ high-throughput sequencing technology to investigate the gene expression profiling and determine involvement of histone modifications in the regulation of genes after electro-acupuncture treatment. Methods/Design A randomized, controlled, double-blinded (assessor and patients) trial will be carried out. Sixty participants will be randomly assigned to two acupuncture treatment groups and one control group in a 1:1:1 ratio. Participants in acupuncture groups will receive 12 sessions of electro-acupuncture treatment across 4 weeks, followed by a 12-week randomization period. The acupuncture groups are divided into Neiguan (PC6) on Pericardium Meridian of Hand-jueyin or a non-acupoint. The primary clinical measure of effect is the frequency of angina attacks between these groups for four weeks after randomization. RNAs are extracted from peripheral neutrophils collected from all participants on day 0, day 30, and week 16, and are processed to RNA-Seq. We then investigate profiles of histone modifications by ChIP-Seq, for H3 Lysine 4 (H3K4me) and acetylation of H3 Lysine 27 (H3K27ac), in the presence or absence of acupuncture treatment. Discussion This study determines the efficacy and mechanisms of electro-acupuncture on stable angina pectoris. We focus on effectiveness of acupuncture on alleviating symptoms of myocardial ischemia and the gene regulation and the chromatin remodeling marks, including H3K4me1, H3K4me2, and H3K27ac, which could be key factors for regulating gene expressions caused by electro-acupuncture treatment at Neiguan. This is the first genome-wide study of electro-acupuncture treatment in angina patients, and will provide valuable information for future studies in the fields of acupuncture and its underlying mechanisms. Fourteen patients have been recruited since recruitment opened in November of 2012. This study is scheduled to end in November of 2014. Trials registration ChiCTR-TRC-12002668
Collapse
|
24
|
Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat Biotechnol 2015; 33:364-76. [PMID: 25690853 PMCID: PMC4512306 DOI: 10.1038/nbt.3157] [Citation(s) in RCA: 226] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Accepted: 02/02/2015] [Indexed: 12/31/2022]
Abstract
With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals, and surpass experimental datasets in consistency, recovery of gene annotations, and enrichment for disease-associated variants. We use the imputed data to detect low quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments, and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.
Collapse
|
25
|
Global transcriptome analysis and enhancer landscape of human primary T follicular helper and T effector lymphocytes. Blood 2014; 124:3719-29. [PMID: 25331115 DOI: 10.1182/blood-2014-06-582700] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
T follicular helper (Tfh) cells are a subset of CD4(+) T helper cells that migrate into germinal centers and promote B-cell maturation into memory B and plasma cells. Tfh cells are necessary for promotion of protective humoral immunity following pathogen challenge, but when aberrantly regulated, drive pathogenic antibody formation in autoimmunity and undergo neoplastic transformation in angioimmunoblastic T-cell lymphoma and other primary cutaneous T-cell lymphomas. Limited information is available on the expression and regulation of genes in human Tfh cells. Using a fluorescence-activated cell sorting-based strategy, we obtained primary Tfh and non-Tfh T effector cells from tonsils and prepared genome-wide maps of active, intermediate, and poised enhancers determined by chromatin immunoprecipitation-sequencing, with parallel transcriptome analyses determined by RNA sequencing. Tfh cell enhancers were enriched near genes highly expressed in lymphoid cells or involved in lymphoid cell function, with many mapping to sites previously associated with autoimmune disease in genome-wide association studies. A group of active enhancers unique to Tfh cells associated with differentially expressed genes was identified. Fragments from these regions directed expression in reporter gene assays. These data provide a significant resource for studies of T lymphocyte development and differentiation and normal and perturbed Tfh cell function.
Collapse
|
26
|
Yin J, Morrissey ME, Shine L, Kennedy C, Higgins DG, Kennedy BN. Genes and signaling networks regulated during zebrafish optic vesicle morphogenesis. BMC Genomics 2014; 15:825. [PMID: 25266257 PMCID: PMC4190348 DOI: 10.1186/1471-2164-15-825] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2014] [Accepted: 09/24/2014] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The genetic cascades underpinning vertebrate early eye morphogenesis are poorly understood. One gene family essential for eye morphogenesis encodes the retinal homeobox (Rx) transcription factors. Mutations in the human retinal homeobox gene (RAX) can lead to gross morphological phenotypes ranging from microphthalmia to anophthalmia. Zebrafish rx3 null mutants produce a similar striking eyeless phenotype with an associated expanded forebrain. Thus, we used zebrafish rx3-/- mutants as a model to uncover an Rx3-regulated gene network during early eye morphogenesis. RESULTS Rx3-regulated genes were identified using whole transcriptomic sequencing (RNA-seq) of rx3-/- mutants and morphologically wild-type siblings during optic vesicle morphogenesis. A gene co-expression network was then constructed for the Rx3-regulated genes, identifying gene cross-talk during early eye development. Genes highly connected in the network are hub genes, which tend to exhibit higher expression changes between rx3-/- mutants and normal phenotype siblings. Hub genes down-regulated in rx3-/- mutants encompass homeodomain transcription factors and mediators of retinoid-signaling, both associated with eye development and known human eye disorders. In contrast, genes up-regulated in rx3-/- mutants are centered on Wnt signaling pathways, associated with brain development and disorders. The temporal expression pattern of Rx3-regulated genes was further profiled during early development from maternal stage until visual function is fully mature. Rx3-regulated genes exhibited synchronized expression patterns, and a transition of gene expression during the early segmentation stage when Rx3 was highly expressed. Furthermore, most of these deregulated genes are enriched with multiple RAX-binding motif sequences on the gene promoter. CONCLUSIONS Here, we assembled a comprehensive model of Rx3-regulated genes during early eye morphogenesis. Rx3 promotes optic vesicle morphogenesis and represses brain development through a highly correlated and modulated network, exhibiting repression of genes mediating Wnt signaling and concomitant enhanced expression of homeodomain transcription factors and retinoid-signaling genes.
Collapse
Affiliation(s)
- Jun Yin
- />UCD Conway Institute, UCD School of Medicine and Medical Science, University College Dublin, Belfield, Dublin 4 Ireland
- />Department of Genetics, Yale University School of Medicine, New Haven, CT 06520 USA
| | - Maria E Morrissey
- />UCD Conway Institute, UCD School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4 Ireland
| | - Lisa Shine
- />UCD Conway Institute, UCD School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4 Ireland
| | - Ciarán Kennedy
- />UCD Conway Institute, UCD School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4 Ireland
| | - Desmond G Higgins
- />UCD Conway Institute, UCD School of Medicine and Medical Science, University College Dublin, Belfield, Dublin 4 Ireland
| | - Breandán N Kennedy
- />UCD Conway Institute, UCD School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4 Ireland
| |
Collapse
|
27
|
Epiviz: interactive visual analytics for functional genomics data. Nat Methods 2014; 11:938-40. [PMID: 25086505 PMCID: PMC4149593 DOI: 10.1038/nmeth.3038] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 06/12/2014] [Indexed: 12/19/2022]
Abstract
Visualization is an integral aspect of genomics data analysis. Algorithmic-statistical analysis and interactive visualization are most effective when used iteratively. Epiviz (http://epiviz.cbcb.umd.edu/), a web-based genome browser, and the Epivizr Bioconductor package allow interactive, extensible and reproducible visualization within a state-of-the-art data-analysis platform.
Collapse
|
28
|
Ning S, Zhao Z, Ye J, Wang P, Zhi H, Li R, Wang T, Wang J, Wang L, Li X. SNP@lincTFBS: an integrated database of polymorphisms in human LincRNA transcription factor binding sites. PLoS One 2014; 9:e103851. [PMID: 25075616 PMCID: PMC4116217 DOI: 10.1371/journal.pone.0103851] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Accepted: 07/02/2014] [Indexed: 12/15/2022] Open
Abstract
Large intergenic non-coding RNAs (lincRNAs) are a new class of functional transcripts, and aberrant expression of lincRNAs was associated with several human diseases. The genetic variants in lincRNA transcription factor binding sites (TFBSs) can change lincRNA expression, thereby affecting the susceptibility to human diseases. To identify and annotate these functional candidates, we have developed a database SNP@lincTFBS, which is devoted to the exploration and annotation of single nucleotide polymorphisms (SNPs) in potential TFBSs of human lincRNAs. We identified 6,665 SNPs in 6,614 conserved TFBSs of 2,423 human lincRNAs. In addition, with ChIPSeq dataset, we identified 139,576 SNPs in 304,517 transcription factor peaks of 4,813 lincRNAs. We also performed comprehensive annotation for these SNPs using 1000 Genomes Project datasets across 11 populations. Moreover, one of the distinctive features of SNP@lincTFBS is the collection of disease-associated SNPs in the lincRNA TFBSs and SNPs in the TFBSs of disease-associated lincRNAs. The web interface enables both flexible data searches and downloads. Quick search can be query of lincRNA name, SNP identifier, or transcription factor name. SNP@lincTFBS provides significant advances in identification of disease-associated lincRNA variants and improved convenience to interpret the discrepant expression of lincRNAs. The SNP@lincTFBS database is available at http://bioinfo.hrbmu.edu.cn/SNP_lincTFBS.
Collapse
Affiliation(s)
- Shangwei Ning
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zuxianglan Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jingrun Ye
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Peng Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hui Zhi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Ronghong Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Tingting Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jianjian Wang
- The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Lihua Wang
- The Second Affiliated Hospital, Harbin Medical University, Harbin, China
- * E-mail: (LW); (XL)
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- * E-mail: (LW); (XL)
| |
Collapse
|
29
|
Ananda G, Hile SE, Breski A, Wang Y, Kelkar Y, Makova KD, Eckert KA. Microsatellite interruptions stabilize primate genomes and exist as population-specific single nucleotide polymorphisms within individual human genomes. PLoS Genet 2014; 10:e1004498. [PMID: 25033203 PMCID: PMC4102424 DOI: 10.1371/journal.pgen.1004498] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 05/28/2014] [Indexed: 01/01/2023] Open
Abstract
Interruptions of microsatellite sequences impact genome evolution and can alter disease manifestation. However, human polymorphism levels at interrupted microsatellites (iMSs) are not known at a genome-wide scale, and the pathways for gaining interruptions are poorly understood. Using the 1000 Genomes Phase-1 variant call set, we interrogated mono-, di-, tri-, and tetranucleotide repeats up to 10 units in length. We detected ∼26,000–40,000 iMSs within each of four human population groups (African, European, East Asian, and American). We identified population-specific iMSs within exonic regions, and discovered that known disease-associated iMSs contain alleles present at differing frequencies among the populations. By analyzing longer microsatellites in primate genomes, we demonstrate that single interruptions result in a genome-wide average two- to six-fold reduction in microsatellite mutability, as compared with perfect microsatellites. Centrally located interruptions lowered mutability dramatically, by two to three orders of magnitude. Using a biochemical approach, we tested directly whether the mutability of a specific iMS is lower because of decreased DNA polymerase strand slippage errors. Modeling the adenomatous polyposis coli tumor suppressor gene sequence, we observed that a single base substitution interruption reduced strand slippage error rates five- to 50-fold, relative to a perfect repeat, during synthesis by DNA polymerases α, β, or η. Computationally, we demonstrate that iMSs arise primarily by base substitution mutations within individual human genomes. Our biochemical survey of human DNA polymerase α, β, δ, κ, and η error rates within certain microsatellites suggests that interruptions are created most frequently by low fidelity polymerases. Our combined computational and biochemical results demonstrate that iMSs are abundant in human genomes and are sources of population-specific genetic variation that may affect genome stability. The genome-wide identification of iMSs in human populations presented here has important implications for current models describing the impact of microsatellite polymorphisms on gene expression. Microsatellites are short tandem repeat DNA sequences located throughout the human genome that display a high degree of inter-individual variation. This characteristic makes microsatellites an attractive tool for population genetics and forensics research. Some microsatellites affect gene expression, and mutations within such microsatellites can cause disease. Interruption mutations disrupt the perfect repeated array and are frequently associated with altered disease risk, but they have not been thoroughly studied in human genomes. We identified interrupted mono-, di-, tri- and tetranucleotide MSs (iMS) within individual genomes from African, European, Asian and American population groups. We show that many iMSs, including some within disease-associated genes, are unique to a single population group. By measuring the conservation of microsatellites between human and chimpanzee genomes, we demonstrate that interruptions decrease the probability of microsatellite mutations throughout the genome. We demonstrate that iMSs arise in the human genome by single base changes within the DNA, and provide biochemical data suggesting that these stabilizing changes may be created by error-prone DNA polymerases. Our genome-wide study supports the model in which iMSs act to stabilize individual genomes, and suggests that population-specific differences in microsatellite architecture may be an avenue by which genetic ancestry impacts individual disease risk.
Collapse
Affiliation(s)
- Guruprasad Ananda
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Suzanne E. Hile
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
| | - Amanda Breski
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
| | - Yanli Wang
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Yogeshwar Kelkar
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, Pennsylvania, United States of America
- Center for Medical Genomics, Penn State University, University Park, Pennsylvania, United States of America
- * E-mail: (KDM); (KAE)
| | - Kristin A. Eckert
- Department of Pathology, Gittlen Cancer Research Foundation, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, United States of America
- Center for Medical Genomics, Penn State University, University Park, Pennsylvania, United States of America
- * E-mail: (KDM); (KAE)
| |
Collapse
|
30
|
Zhang H, Li J, Hou S, Wang G, Jiang M, Sun C, Hu X, Zhuang F, Dai Z, Dai J, Xi JJ. Engineered TAL Effector modulators for the large-scale gain-of-function screening. Nucleic Acids Res 2014; 42:e114. [PMID: 24939900 PMCID: PMC4132705 DOI: 10.1093/nar/gku535] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Recent effective use of TAL Effectors (TALEs) has provided an important approach to the design and synthesis of sequence-specific DNA-binding proteins. However, it is still a challenging task to design and manufacture effective TALE modulators because of the limited knowledge of TALE–DNA interactions. Here we synthesized more than 200 TALE modulators and identified two determining factors of transcription activity in vivo: chromatin accessibility and the distance from the transcription start site. The implementation of these modulators in a gain-of-function screen was successfully demonstrated for four cell lines in migration/invasion assays and thus has broad relevance in this field. Furthermore, a novel TALE–TALE modulator was developed to transcriptionally inhibit target genes. Together, these findings underscore the huge potential of these TALE modulators in the study of gene function, reprogramming of cellular behaviors, and even clinical investigation.
Collapse
Affiliation(s)
- Hanshuo Zhang
- Biomedical Engineering Department, College of Engineering, Peking University Yan Nan Yuan 60, Beijing 100871, China
| | - Juan Li
- Beijing ViewSolid Biotechnology, Beijing 100034, China
| | - Sha Hou
- Center for Epigenetics and Chromatin, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Gancheng Wang
- Biomedical Engineering Department, College of Engineering, Peking University Yan Nan Yuan 60, Beijing 100871, China
| | - Mingjun Jiang
- Biomedical Engineering Department, College of Engineering, Peking University Yan Nan Yuan 60, Beijing 100871, China
| | - Changhong Sun
- Biomedical Engineering Department, College of Engineering, Peking University Yan Nan Yuan 60, Beijing 100871, China
| | - Xiongbing Hu
- Beijing ViewSolid Biotechnology, Beijing 100034, China
| | | | - Zhifei Dai
- Biomedical Engineering Department, College of Engineering, Peking University Yan Nan Yuan 60, Beijing 100871, China
| | - Junbiao Dai
- Center for Epigenetics and Chromatin, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Jianzhong Jeff Xi
- Biomedical Engineering Department, College of Engineering, Peking University Yan Nan Yuan 60, Beijing 100871, China State Key Laboratory of Biomembrane and Membrane Biotechnology, Institute of Molecular Medicine, Peking University, Beijing 100871, China
| |
Collapse
|
31
|
Miranda KC, Bond DT, Levin JZ, Adiconis X, Sivachenko A, Russ C, Brown D, Nusbaum C, Russo LM. Massively parallel sequencing of human urinary exosome/microvesicle RNA reveals a predominance of non-coding RNA. PLoS One 2014; 9:e96094. [PMID: 24816817 PMCID: PMC4015934 DOI: 10.1371/journal.pone.0096094] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 04/03/2014] [Indexed: 11/22/2022] Open
Abstract
Intact RNA from exosomes/microvesicles (collectively referred to as microvesicles) has sparked much interest as potential biomarkers for the non-invasive analysis of disease. Here we use the Illumina Genome Analyzer to determine the comprehensive array of nucleic acid reads present in urinary microvesicles. Extraneous nucleic acids were digested using RNase and DNase treatment and the microvesicle inner nucleic acid cargo was analyzed with and without DNase digestion to examine both DNA and RNA sequences contained in microvesicles. Results revealed that a substantial proportion (∼87%) of reads aligned to ribosomal RNA. Of the non-ribosomal RNA sequences, ∼60% aligned to non-coding RNA and repeat sequences including LINE, SINE, satellite repeats, and RNA repeats (tRNA, snRNA, scRNA and srpRNA). The remaining ∼40% of non-ribosomal RNA reads aligned to protein coding genes and splice sites encompassing approximately 13,500 of the known 21,892 protein coding genes of the human genome. Analysis of protein coding genes specific to the renal and genitourinary tract revealed that complete segments of the renal nephron and collecting duct as well as genes indicative of the bladder and prostate could be identified. This study reveals that the entire genitourinary system may be mapped using microvesicle transcript analysis and that the majority of non-ribosomal RNA sequences contained in microvesicles is potentially functional non-coding RNA, which play an emerging role in cell regulation.
Collapse
Affiliation(s)
- Kevin C. Miranda
- Program in Membrane Biology, Division of Nephrology & Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Daniel T. Bond
- Program in Membrane Biology, Division of Nephrology & Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Joshua Z. Levin
- Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts, United States of America
| | - Xian Adiconis
- Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts, United States of America
| | - Andrey Sivachenko
- Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts, United States of America
| | - Carsten Russ
- Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts, United States of America
| | - Dennis Brown
- Program in Membrane Biology, Division of Nephrology & Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Chad Nusbaum
- Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts, United States of America
| | - Leileata M. Russo
- Program in Membrane Biology, Division of Nephrology & Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
32
|
|
33
|
Kim YH, Kim TH, Kang SW, Kim HJ, Park SJ, Jeong KH, Kim SK, Lee SH, Ihm CG, Lee TW, Moon JY, Yoon YC, Chung JH. Association between a TGFBR2 gene polymorphism (rs2228048, Asn389Asn) and acute rejection in Korean kidney transplantation recipients. Immunol Invest 2014; 42:285-95. [PMID: 23883197 DOI: 10.3109/08820139.2013.777073] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Transforming growth factor-β (TGF-β) signaling transduction initiates TGF-β activation, resulting in activation of TGF-β receptor II (TGFBR2). Any quantitative and qualitative changes in TGFBR2 are expected to affect the TGF-β signaling pathway, which occupies a central position with respect to the regulation of cell growth, differentiation, apoptosis, immune reaction, angiogenesis and extracellular matrix formation. Recent studies have shown that TGF-β1 gene polymorphisms may confer susceptibility to early acute and chronic allograft rejection in kidney transplantation recipients. In this study, we assessed whether polymorphisms of the TGFBR2 gene were associated with susceptibility to kidney transplantation rejection. A total of 347 renal allograft recipients transplanted at three centers in Korea were analyzed. Three SNPs (rs764522, rs3087465, rs2228048) of the TGFBR2 gene were genotyped from genomic DNA with direct sequencing. Multiple logistic regression models (codominant, dominant, recessive, and log-additive) were performed to evaluate odds ratios (ORs), 95% confidence intervals (CIs), and p-values. A total of 63 patients (18%) developed acute rejection (AR). There were no significant differences in age, sex, number of HLA mismatches, cause of renal failure, or immunosuppressant regimen between the AR and non-AR group. The synonymous SNP rs2228048 was significantly associated with AR (p = 0.020 in recessive model, and p = 0.036 in log-additive model. The allele frequencies of rs2228048 were different between the AR and non-AR group (p = 0.026). These results suggest that the synonymous TGFBR2 gene SNP rs2228048 may be associated with the development of AR in Korean kidney transplantation recipients. Authors Yeong-Hoon Kim and Tae Hee Kim contributed equally to this work and are considered co-first authors.
Collapse
Affiliation(s)
- Yeong-Hoon Kim
- Department of Nephrology, College of Medicine, Inje University, Busan, Korea
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Zeng J, Nagrajan HK, Yi SV. Fundamental diversity of human CpG islands at multiple biological levels. Epigenetics 2014; 9:483-91. [PMID: 24419148 PMCID: PMC4121359 DOI: 10.4161/epi.27654] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
CpG islands (CGIs) are commonly used as genomic markers to study the patterns and regulatory consequences of DNA methylation. Interestingly, recent studies reveal a substantial diversity among CGIs: long and short CGIs, for example, exhibit contrasting patterns of gene expression complexity and nucleosome occupancy. Evolutionary origins of CGIs are also highly heterogeneous. In order to systematically evaluate potential diversities among CGIs and ultimately to illuminate the link between diversity of CGIs and their epigenetic variation, we analyzed the nucleotide-resolution DNA methylation maps (methylomes) of multiple cellular origins. We discover novel ‘clusters’ of CGIs according to their patterns of DNA methylation; the stably hypomethylated CGI cluster (cluster I), sperm-hypomethylated CGI cluster (cluster II), and variably methylated CGI cluster (cluster III). These epigenomic CGI clusters are strikingly distinct at multiple biological features including genomic, evolutionary, and functional characteristics. At the genomic level, the stably hypomethylated CGI cluster tends to be longer and harbors many more CpG dinucleotides than those in other clusters. They are also frequently associated with promoters, while CGI clusters II and III mostly reside in intragenic or intergenic regions and exhibit highly tissue-specific DNA methylation. Functional ontology terms and transcriptional profiles co-vary with CGI clusters, indicating that the regulatory functions of CGIs are tightly linked to their heterogeneity. Finally, CGIs associated with distinctive biological processes, such as diseases, aging, and imprinting, occur disproportionately across CGI clusters. These new findings provide an effective means to combine existing knowledge on CGIs into a genomic context while bringing new insights that elucidate the significance of DNA methylation across different biological conditions and demography.
Collapse
Affiliation(s)
- Jia Zeng
- School of Biology; Georgia Institute of Technology; Atlanta, GA USA
| | - Hema K Nagrajan
- School of Biology; Georgia Institute of Technology; Atlanta, GA USA
| | - Soojin V Yi
- School of Biology; Georgia Institute of Technology; Atlanta, GA USA
| |
Collapse
|
35
|
Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hinrichs AS, Learned K, Lee BT, Li CH, Raney BJ, Rhead B, Rosenbloom KR, Sloan CA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res 2014; 42:D764-70. [PMID: 24270787 PMCID: PMC3964947 DOI: 10.1093/nar/gkt1168] [Citation(s) in RCA: 550] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Revised: 10/30/2013] [Accepted: 10/30/2013] [Indexed: 12/17/2022] Open
Abstract
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.
Collapse
Affiliation(s)
- Donna Karolchik
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Galt P. Barber
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Melissa S. Cline
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Timothy R. Dreszer
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Pauline A. Fujita
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Rachel A. Harte
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian T. Lee
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Chin H. Li
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian J. Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brooke Rhead
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Kate R. Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Cricket A. Sloan
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Matthew L. Speir
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ann S. Zweig
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Robert M. Kuhn
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - W. James Kent
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| |
Collapse
|
36
|
Lee JE, Wang C, Xu S, Cho YW, Wang L, Feng X, Baldridge A, Sartorelli V, Zhuang L, Peng W, Ge K. H3K4 mono- and di-methyltransferase MLL4 is required for enhancer activation during cell differentiation. eLife 2013; 2:e01503. [PMID: 24368734 PMCID: PMC3869375 DOI: 10.7554/elife.01503] [Citation(s) in RCA: 331] [Impact Index Per Article: 30.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Enhancers play a central role in cell-type-specific gene expression and are marked by H3K4me1/2. Active enhancers are further marked by H3K27ac. However, the methyltransferases responsible for H3K4me1/2 on enhancers remain elusive. Furthermore, how these enzymes function on enhancers to regulate cell-type-specific gene expression is unclear. In this study, we identify MLL4 (KMT2D) as a major mammalian H3K4 mono- and di-methyltransferase with partial functional redundancy with MLL3 (KMT2C). Using adipogenesis and myogenesis as model systems, we show that MLL4 exhibits cell-type- and differentiation-stage-specific genomic binding and is predominantly localized on enhancers. MLL4 co-localizes with lineage-determining transcription factors (TFs) on active enhancers during differentiation. Deletion of Mll4 markedly decreases H3K4me1/2, H3K27ac, Mediator and Polymerase II levels on enhancers and leads to severe defects in cell-type-specific gene expression and cell differentiation. Together, these findings identify MLL4 as a major mammalian H3K4 mono- and di-methyltransferase essential for enhancer activation during cell differentiation. DOI:http://dx.doi.org/10.7554/eLife.01503.001 Almost every cell in a human body carries the same genes, but not every cell will express all of these genes as proteins. As different types of cells, such as brain, liver, fat or muscle cells, develop, they will express different genes; or they will express the same genes, but at different times and in different amounts. Enhancers are short stretches of DNA that boost the amount of protein that is produced when a gene is expressed, and they are particularly important for those genes that are expressed differently between cell types. Enhancers bolster expression of a gene by interacting with the DNA nearby. Even genes separated from enhancers by a long stretches of DNA can benefit because the way that DNA is tightly packed inside the nucleus means that two distant sequences can actually end up close together. Proteins called transcription factors will bind to enhancers and recruit the cell’s protein ‘machinery’ required to express nearby genes. Enhancers can be identified by specific chemical marks associated with their DNA, but little is known about the enzymes that leave these marks in mammals. Moreover, it is not clear which genes are influenced by these marks. Now, by examining fat cells and muscle cells as they mature, Lee et al. have found that an enzyme called MLL4 is responsible for adding chemical marks to enhancers in both humans and mice. Further, MLL4 is required both to allow cells to specialize into different cell types, and to boost the expression of genes that are specific to each type of mature cells. Since faulty MLL4 has been implicated in several cancers and developmental defects, the findings of Lee et al. may lead to a better understanding of these diseases. DOI:http://dx.doi.org/10.7554/eLife.01503.002
Collapse
Affiliation(s)
- Ji-Eun Lee
- Adipocyte Biology and Gene Regulation Section, Laboratory of Endocrinology and Receptor Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, United States
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Mucksová J, Kalina J, Bakst M, Yan H, J.P.Brillard, Benešová B, Fafílek B, Hejnar J, Trefil P. Expression of the chicken GDNF family receptor α-1 as a marker of spermatogonial stem cells. Anim Reprod Sci 2013; 142:75-83. [DOI: 10.1016/j.anireprosci.2013.08.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Revised: 07/26/2013] [Accepted: 08/08/2013] [Indexed: 01/15/2023]
|
38
|
Sompallae R, Hofmann O, Maher CA, Gedye C, Behren A, Vitezic M, Daub CO, Devalle S, Caballero OL, Carninci P, Hayashizaki Y, Lawlor ER, Cebon J, Hide W. A comprehensive promoter landscape identifies a novel promoter for CD133 in restricted tissues, cancers, and stem cells. Front Genet 2013; 4:209. [PMID: 24194746 PMCID: PMC3810939 DOI: 10.3389/fgene.2013.00209] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 09/30/2013] [Indexed: 12/02/2022] Open
Abstract
PROM1 is the gene encoding prominin-1 or CD133, an important cell surface marker for the isolation of both normal and cancer stem cells. PROM1 transcripts initiate at a range of transcription start sites (TSS) associated with distinct tissue and cancer expression profiles. Using high resolution Cap Analysis of Gene Expression (CAGE) sequencing we characterize TSS utilization across a broad range of normal and developmental tissues. We identify a novel proximal promoter (P6) within CD133+ melanoma cell lines and stem cells. Additional exon array sampling finds P6 to be active in populations enriched for mesenchyme, neural stem cells and within CD133+ enriched Ewing sarcomas. The P6 promoter is enriched with respect to previously characterized PROM1 promoters for a HMGI/Y (HMGA1) family transcription factor binding site motif and exhibits different epigenetic modifications relative to the canonical promoter region of PROM1.
Collapse
|
39
|
Wittmann BC, Tan GC, Lisman JE, Dolan RJ, Düzel E. Reprint of: DAT genotype modulates striatal processing and long-term memory for items associated with reward and punishment. Neuropsychologia 2013; 51:2469-77. [PMID: 24139823 DOI: 10.1016/j.neuropsychologia.2013.09.031] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Previous studies have shown that appetitive motivation enhances episodic memory formation via a network including the substantia nigra/ventral tegmental area (SN/VTA), striatum and hippocampus. This functional magnetic resonance imaging (fMRI) study now contrasted the impact of aversive and appetitive motivation on episodic long-term memory. Cue pictures predicted monetary reward or punishment in alternating experimental blocks. One day later, episodic memory for the cue pictures was tested. We also investigated how the neural processing of appetitive and aversive motivation and episodic memory were modulated by dopaminergic mechanisms. To that end, participants were selected on the basis of their genotype for a variable number of tandem repeat polymorphism of the dopamine transporter (DAT) gene. The resulting groups were carefully matched for the 5-HTTLPR polymorphism of the serotonin transporter gene. Recognition memory for cues from both motivational categories was enhanced in participants homozygous for the 10-repeat allele of the DAT, the functional effects of which are not known yet, but not in heterozygous subjects. In comparison with heterozygous participants, 10-repeat homozygous participants also showed increased striatal activity for anticipation of motivational outcomes compared to neutral outcomes. In a subsequent memory analysis, encoding activity in striatum and hippocampus was found to be higher for later recognized items in 10-repeat homozygotes compared to 9/10-repeat heterozygotes. These findings suggest that processing of appetitive and aversive motivation in the human striatum involve the dopaminergic system and that dopamine plays a role in memory for both types of motivational information. In accordance with animal studies, these data support the idea that encoding of motivational events depends on dopaminergic processes in the hippocampus.
Collapse
Affiliation(s)
- Bianca C Wittmann
- Wellcome Trust Centre for Neuroimaging, University College London, London, WC1N 3BG, UK; Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA; Department of Psychology, University of Giessen, 35394 Giessen, Germany.
| | | | | | | | | |
Collapse
|
40
|
The human constitutive androstane receptor promotes the differentiation and maturation of hepatic-like cells. Dev Biol 2013; 384:155-65. [PMID: 24144921 DOI: 10.1016/j.ydbio.2013.10.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 09/11/2013] [Accepted: 10/12/2013] [Indexed: 11/22/2022]
Abstract
Expression of the constitutive androstane receptor (CAR, NR1I3) is enriched in the mature mammalian liver and increasingly recognized for its prominent role in regulating a myriad of processes including biotransformation, chemical transport, energy metabolism and lipid homeostasis. Previously, we demonstrated that CAR levels were markedly enhanced during the differentiation of hepatic-like cells derived from hESCs, prompting the hypothesis that CAR contributes a key functional role in directing human hepatogenesis. Here we demonstrate that over-expression of CAR in human embryonic stem cells (ESCs), transduced by a lentiviral vector, accelerates the maturation of hepatic-like cells, with CAR over-expressing cells exhibiting a 2.5-fold increase in albumin secretion by day 20 in culture differentiation, and significantly enhanced levels of mRNA expression of several liver-selective markers, including hepatic transcription factors, plasma proteins, biotransformation enzymes, and metabolic enzymes. CAR over-expressing cells also exhibited enhanced CITCO-inducible CYP3A7 enzymatic activity. Knockdown of CAR via siRNA attenuated the differentiation-dependent expression programs. In contrast, expression levels of the pregnane X receptor (PXR), a nuclear receptor most similar to CAR in primary sequence, were negligible in human fetal liver tissues or in the differentiating hESCs, and stable over-expression of PXR in hepatic-induced hESCs failed to enhance expression of hepatic phenotype markers. Together, these results define a novel role for human CAR in hepatic lineage commitment.
Collapse
|
41
|
Barturen G, Rueda A, Oliver JL, Hackenberg M. MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data. F1000Res 2013; 2:217. [PMID: 24627790 PMCID: PMC3938178 DOI: 10.12688/f1000research.2-217.v2] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/19/2014] [Indexed: 01/10/2023] Open
Abstract
Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants. We developed
MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources.
MethylExtract detects variation (SNVs – Single Nucleotide Variants) in a similar way to
VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of
MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called
Bis-SNP. MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of
MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at
http://bioinfo2.ugr.es/MethylExtract/ and
http://sourceforge.net/projects/methylextract/, and also permanently accessible from
10.5281/zenodo.7144.
Collapse
Affiliation(s)
- Guillermo Barturen
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Granada, 18071, Spain ; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, Granada, 18016, Spain
| | - Antonio Rueda
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Granada, 18071, Spain ; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, Granada, 18016, Spain
| | - José L Oliver
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Granada, 18071, Spain ; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, Granada, 18016, Spain
| | - Michael Hackenberg
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Granada, 18071, Spain ; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, Granada, 18016, Spain
| |
Collapse
|
42
|
Barturen G, Rueda A, Oliver JL, Hackenberg M. MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data. F1000Res 2013; 2:217. [PMID: 24627790 DOI: 10.12688/f1000research.2-217.v1] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/09/2013] [Indexed: 01/30/2023] Open
Abstract
Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants. We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs - Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP. MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.
Collapse
Affiliation(s)
- Guillermo Barturen
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Granada, 18071, Spain ; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, Granada, 18016, Spain
| | - Antonio Rueda
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Granada, 18071, Spain ; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, Granada, 18016, Spain
| | - José L Oliver
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Granada, 18071, Spain ; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, Granada, 18016, Spain
| | - Michael Hackenberg
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Granada, 18071, Spain ; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, Granada, 18016, Spain
| |
Collapse
|
43
|
Kvikstad EM, Duret L. Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome. Mol Biol Evol 2013; 31:23-36. [PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.
Collapse
Affiliation(s)
- Erika M Kvikstad
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, CNRS, Université Lyon 1, Villeurbanne, France
| | | |
Collapse
|
44
|
Prolactin Receptor Gene Diversity in Azara’s Owl Monkeys (Aotus azarai) and Humans (Homo sapiens) Suggests a Non-Neutral Evolutionary History among Primates. INT J PRIMATOL 2013. [DOI: 10.1007/s10764-013-9721-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
45
|
CpGislandEVO: a database and genome browser for comparative evolutionary genomics of CpG islands. BIOMED RESEARCH INTERNATIONAL 2013; 2013:709042. [PMID: 24205506 PMCID: PMC3800598 DOI: 10.1155/2013/709042] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Revised: 07/12/2013] [Accepted: 08/19/2013] [Indexed: 12/14/2022]
Abstract
Hypomethylated, CpG-rich DNA segments (CpG islands, CGIs) are epigenome markers involved in key biological processes. Aberrant methylation is implicated in the appearance of several disorders as cancer, immunodeficiency, or centromere instability. Furthermore, methylation differences at promoter regions between human and chimpanzee strongly associate with genes involved in neurological/psychological disorders and cancers. Therefore, the evolutionary comparative analyses of CGIs can provide insights on the functional role of these epigenome markers in both health and disease. Given the lack of specific tools, we developed CpGislandEVO. Briefly, we first compile a database of statistically significant CGIs for the best assembled mammalian genome sequences available to date. Second, by means of a coupled browser front-end, we focus on the CGIs overlapping orthologous genes extracted from OrthoDB, thus ensuring the comparison between CGIs located on truly homologous genome segments. This allows comparing the main compositional features between homologous CGIs. Finally, to facilitate nucleotide comparisons, we lifted genome coordinates between assemblies from different species, which enables the analysis of sequence divergence by direct count of nucleotide substitutions and indels occurring between homologous CGIs. The resulting CpGislandEVO database, linking together CGIs and single-cytosine DNA methylation data from several mammalian species, is freely available at our website.
Collapse
|
46
|
Sabarinathan R, Tafer H, Seemann SE, Hofacker IL, Stadler PF, Gorodkin J. RNAsnp: efficient detection of local RNA secondary structure changes induced by SNPs. Hum Mutat 2013; 34:546-56. [PMID: 23315997 PMCID: PMC3708107 DOI: 10.1002/humu.22273] [Citation(s) in RCA: 95] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2012] [Accepted: 12/18/2012] [Indexed: 02/05/2023]
Abstract
Structural characteristics are essential for the functioning of many noncoding RNAs and cis-regulatory elements of mRNAs. SNPs may disrupt these structures, interfere with their molecular function, and hence cause a phenotypic effect. RNA folding algorithms can provide detailed insights into structural effects of SNPs. The global measures employed so far suffer from limited accuracy of folding programs on large RNAs and are computationally too demanding for genome-wide applications. Here, we present a strategy that focuses on the local regions of maximal structural change between mutant and wild-type. These local regions are approximated in a “screening mode” that is intended for genome-wide applications. Furthermore, localized regions are identified as those with maximal discrepancy. The mutation effects are quantified in terms of empirical P values. To this end, the RNAsnp software uses extensive precomputed tables of the distribution of SNP effects as function of length and GC content. RNAsnp thus achieves both a noise reduction and speed-up of several orders of magnitude over shuffling-based approaches. On a data set comprising 501 SNPs associated with human-inherited diseases, we predict 54 to have significant local structural effect in the untranslated region of mRNAs. RNAsnp is available at http://rth.dk/resources/rnasnp.
Collapse
|
47
|
Segmenting the human genome based on states of neutral genetic divergence. Proc Natl Acad Sci U S A 2013; 110:14699-704. [PMID: 23959903 DOI: 10.1073/pnas.1221792110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human-orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states--each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants--including those associated with cancer and other diseases--and to improve computational predictions of noncoding functional elements.
Collapse
|
48
|
Wittmann BC, Tan GC, Lisman JE, Dolan RJ, Düzel E. DAT genotype modulates striatal processing and long-term memory for items associated with reward and punishment. Neuropsychologia 2013; 51:2184-93. [PMID: 23911780 PMCID: PMC3809516 DOI: 10.1016/j.neuropsychologia.2013.07.018] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2013] [Revised: 07/22/2013] [Accepted: 07/24/2013] [Indexed: 11/30/2022]
Abstract
Previous studies have shown that appetitive motivation enhances episodic memory formation via a network including the substantia nigra/ventral tegmental area (SN/VTA), striatum and hippocampus. This functional magnetic resonance imaging (fMRI) study now contrasted the impact of aversive and appetitive motivation on episodic long-term memory. Cue pictures predicted monetary reward or punishment in alternating experimental blocks. One day later, episodic memory for the cue pictures was tested. We also investigated how the neural processing of appetitive and aversive motivation and episodic memory were modulated by dopaminergic mechanisms. To that end, participants were selected on the basis of their genotype for a variable number of tandem repeat polymorphism of the dopamine transporter (DAT) gene. The resulting groups were carefully matched for the 5-HTTLPR polymorphism of the serotonin transporter gene. Recognition memory for cues from both motivational categories was enhanced in participants homozygous for the 10-repeat allele of the DAT, the functional effects of which are not known yet, but not in heterozygous subjects. In comparison with heterozygous participants, 10-repeat homozygous participants also showed increased striatal activity for anticipation of motivational outcomes compared to neutral outcomes. In a subsequent memory analysis, encoding activity in striatum and hippocampus was found to be higher for later recognized items in 10-repeat homozygotes compared to 9/10-repeat heterozygotes. These findings suggest that processing of appetitive and aversive motivation in the human striatum involve the dopaminergic system and that dopamine plays a role in memory for both types of motivational information. In accordance with animal studies, these data support the idea that encoding of motivational events depends on dopaminergic processes in the hippocampus.
Collapse
Affiliation(s)
- Bianca C Wittmann
- Wellcome Trust Centre for Neuroimaging, University College London, London, WC1N 3BG, UK; Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA; Department of Psychology, University of Giessen, 35394 Giessen, Germany.
| | | | | | | | | |
Collapse
|
49
|
Genome-wide patterns of codon bias are shaped by natural selection in the purple sea urchin, Strongylocentrotus purpuratus. G3-GENES GENOMES GENETICS 2013; 3:1069-83. [PMID: 23637123 PMCID: PMC3704236 DOI: 10.1534/g3.113.005769] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Codon usage bias has been documented in a wide diversity of species, but the relative contributions of mutational bias and various forms of natural selection remain unclear. Here, we describe for the first time genome-wide patterns of codon bias at 4623 genes in the purple sea urchin, Strongylocentrotus purpuratus. Preferred codons were identified at 18 amino acids that exclusively used G or C at third positions, which contrasted with the strong AT bias of the genome (overall GC content is 36.9%). The GC content of third positions and coding regions exhibited significant correlations with the magnitude of codon bias. In contrast, the GC content of introns and flanking regions was indistinguishable from the genome-wide background, which suggested a limited contribution of mutational bias to synonymous codon usage. Five distinct clusters of genes were identified that had significantly different synonymous codon usage patterns. A significant correlation was observed between codon bias and mRNA expression supporting translational selection, but this relationship was driven by only one highly biased cluster that represented only 8.6% of all genes. In all five clusters preferred codons were evolutionarily conserved to a similar degree despite differences in their synonymous codon usage distributions and magnitude of codon bias. The third positions of preferred codons in two codon usage groups also paired significantly more often in stems than in loops of mRNA secondary structure predictions, which suggested that codon bias might also affect mRNA stability. Our results suggest that mutational bias has played a minor role in determining codon bias in S. purpuratus and that preferred codon usage may be heterogeneous across different genes and subject to different forms of natural selection.
Collapse
|
50
|
Long-range transcriptome sequencing reveals cancer cell growth regulatory chimeric mRNA. Neoplasia 2013; 14:1087-96. [PMID: 23226102 DOI: 10.1593/neo.121342] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Revised: 08/16/2012] [Accepted: 09/30/2012] [Indexed: 12/15/2022] Open
Abstract
mRNA chimeras from chromosomal translocations often play a role as transforming oncogenes. However, cancer transcriptomes also contain mRNA chimeras that may play a role in tumor development, which arise as transcriptional or post-transcriptional events. To identify such chimeras, we developed a deterministic screening strategy for long-range sequence analysis. High-throughput, long-read sequencing was then performed on cDNA libraries from major tumor histotypes and corresponding normal tissues. These analyses led to the identification of 378 chimeras, with an unexpectedly high frequency of expression (≈2 x 10(-5) of all mRNA). Functional assays in breast and ovarian cancer cell lines showed that a large fraction of mRNA chimeras regulates cell replication. Strikingly, chimeras were shown to include both positive and negative regulators of cell growth, which functioned as such in a cell-type-specific manner. Replication-controlling chimeras were found to be expressed by most cancers from breast, ovary, colon, uterus, kidney, lung, and stomach, suggesting a widespread role in tumor development.
Collapse
|