1
|
Abstract
Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.
Collapse
|
2
|
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucleic Acids Res 2019; 46:D221-D228. [PMID: 29126148 PMCID: PMC5753299 DOI: 10.1093/nar/gkx1031] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 10/20/2017] [Indexed: 01/29/2023] Open
Abstract
The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.
Collapse
|
3
|
Abstract
Keratins represent a large protein family with essential structural and functional roles in epithelial cells of skin, hair follicles, and other organs. During evolution the genes encoding keratins have undergone multiple rounds of duplication and humans have two clusters with a total of 55 functional keratin genes in their genomes. Due to the high similarity between different keratin paralogs and species-specific differences in gene content, the currently available keratin gene annotation in species with draft genome assemblies such as dog and horse is still imperfect. We compared the National Center for Biotechnology Information (NCBI) (dog annotation release 103, horse annotation release 101) and Ensembl (release 87) gene predictions for the canine and equine keratin gene clusters to RNA-seq data that were generated from adult skin of five dogs and two horses and from adult hair follicle tissue of one dog. Taking into consideration the knowledge on the conserved exon/intron structure of keratin genes, we annotated 61 putatively functional keratin genes in both the dog and horse, respectively. Subsequently, curators in the RefSeq group at NCBI reviewed their annotation of keratin genes in the dog and horse genomes (Annotation Release 104 and Annotation Release 102, respectively) and updated annotation and gene nomenclature of several keratin genes. The updates are now available in the NCBI Gene database (https://www.ncbi.nlm.nih.gov/gene).
Collapse
|
4
|
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2015; 44:D733-45. [PMID: 26553804 PMCID: PMC4702849 DOI: 10.1093/nar/gkv1189] [Citation(s) in RCA: 3322] [Impact Index Per Article: 369.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 10/24/2015] [Indexed: 12/12/2022] Open
Abstract
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
Collapse
|
5
|
Abstract
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.
Collapse
|
6
|
Abstract
The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.
Collapse
|
7
|
Polycomb CBX7 promotes initiation of heritable repression of genes frequently silenced with cancer-specific DNA hypermethylation. Cancer Res 2009; 69:6322-30. [PMID: 19602592 PMCID: PMC2779702 DOI: 10.1158/0008-5472.can-09-0065] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Epigenetic silencing of genes in association with aberrant promoter DNA hypermethylation has emerged as a significant mechanism in the development of human cancers. Such genes are also often targets of the polycomb group repressive complexes in embryonic cells. The polycomb repressive complex 2 (PRC2) has been best studied in this regard. We now examine a link between PRC1 and cancer-specific gene silencing. Here, we show a novel and direct association between a constituent of the PRC1 complex, CBX7, with gene repression and promoter DNA hypermethylation of genes frequently silenced in cancer. CBX7 is able to complex with DNA methyltransferase (DNMT) enzymes, leading us to explore a role for CBX7 in maintenance and initiation of gene silencing. Knockdown of CBX7 was unable to relieve suppression of deeply silenced genes in cancer cells; however, in embryonal carcinoma (EC) cells, CBX7 can initiate stable repression of genes that are frequently silenced in adult cancers. Furthermore, we are able to observe assembly of DNMTs at CBX7 target gene promoters. Sustained expression of CBX7 in EC cells confers a growth advantage and resistance to retinoic acid-induced differentiation. In this setting, especially, there is increased promoter DNA hypermethylation for many genes by analysis of specific genes, as well as through epigenomic studies. Our results allow us to propose a potential mechanism through assembly of novel repressive complexes, by which the polycomb component of PRC1 can promote the initiation of epigenetic changes involving abnormal DNA hypermethylation of genes frequently silenced in adult cancers.
Collapse
|
8
|
PcG proteins, DNA methylation, and gene repression by chromatin looping. PLoS Biol 2009; 6:2911-27. [PMID: 19053175 PMCID: PMC2592355 DOI: 10.1371/journal.pbio.0060306] [Citation(s) in RCA: 153] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2008] [Accepted: 10/28/2008] [Indexed: 11/19/2022] Open
Abstract
Many DNA hypermethylated and epigenetically silenced genes in adult cancers are Polycomb group (PcG) marked in embryonic stem (ES) cells. We show that a large region upstream (∼30 kb) of and extending ∼60 kb around one such gene, GATA-4, is organized—in Tera-2 undifferentiated embryonic carcinoma (EC) cells—in a topologically complex multi-loop conformation that is formed by multiple internal long-range contact regions near areas enriched for EZH2, other PcG proteins, and the signature PcG histone mark, H3K27me3. Small interfering RNA (siRNA)–mediated depletion of EZH2 in undifferentiated Tera-2 cells leads to a significant reduction in the frequency of long-range associations at the GATA-4 locus, seemingly dependent on affecting the H3K27me3 enrichments around those chromatin regions, accompanied by a modest increase in GATA-4 transcription. The chromatin loops completely dissolve, accompanied by loss of PcG proteins and H3K27me3 marks, when Tera-2 cells receive differentiation signals which induce a ∼60-fold increase in GATA-4 expression. In colon cancer cells, however, the frequency of the long-range interactions are increased in a setting where GATA-4 has no basal transcription and the loops encompass multiple, abnormally DNA hypermethylated CpG islands, and the methyl-cytosine binding protein MBD2 is localized to these CpG islands, including ones near the gene promoter. Removing DNA methylation through genetic disruption of DNA methyltransferases (DKO cells) leads to loss of MBD2 occupancy and to a decrease in the frequency of long-range contacts, such that these now more resemble those in undifferentiated Tera-2 cells. Our findings reveal unexpected similarities in higher order chromatin conformation between stem/precursor cells and adult cancers. We also provide novel insight that PcG-occupied and H3K27me3-enriched regions can form chromatin loops and physically interact in cis around a single gene in mammalian cells. The loops associate with a poised, low transcription state in EC cells and, with the addition of DNA methylation, completely repressed transcription in adult cancer cells. Polycomb group (PcG) proteins and DNA methylation are fundamental epigenetic regulators of gene expression. The mechanisms underlying such regulation, the crosstalk between these mechanisms, and the role of higher order chromatin folding in mediating transcriptional control of involved genes remains unclear. Abnormal DNA methylation at gene promoters in cancer has been linked to PcG promoter occupancy and PcG-mediated maintenance of genes in a poised, low expression state in embryonic cells. We now strengthen these links and show that PcG occupancy around an entire gene, GATA-4, represses transcription by maintaining a series of long-range chromatin interactions. In embryonic cells, where DNA methylation is largely absent, GATA-4 is in a low, poised transcription state, and the loops can be virtually eliminated by retinoid-induced cellular differentiation, with attendant robust transcriptional up-regulation. When GATA-4 is DNA hypermethylated in colon cancer cells, the intensity of the long-range interactions is increased and associates with complete lack of transcription. Removal of DNA methylation in the cancer cells only slightly loosens the loops and restores expression to a low, poised state. Together, these findings suggest that both repressive pathways operate in part by the formation of chromatin higher order structures and provide important translational ramifications for targeting re-expression of epigenetically silenced genes for cancer therapy. Chromatin regions enriched for Polycomb group proteins physically interact in a series of loops around a single gene in mammalian cells. This higher order structure maintains a poised, low transcription state in embryonic cancer cells and, with addition of DNA methylation, a completely repressed transcription in adult cancer cells.
Collapse
|
9
|
Defining a chromatin pattern that characterizes DNA-hypermethylated genes in colon cancer cells. Cancer Res 2008; 68:5753-9. [PMID: 18632628 DOI: 10.1158/0008-5472.can-08-0700] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Epigenetic gene regulation is a key determinant of heritable gene expression patterns and is critical for normal cellular function. Dysregulation of epigenetic transcriptional control is a fundamental feature of cancer, particularly manifesting as increased promoter DNA methylation with associated aberrant gene silencing, which plays a significant role in tumor progression. We now globally map key chromatin parameters for genes with promoter CpG island DNA hypermethylation in colon cancer cells by combining microarray gene expression analyses with chromatin immunoprecipitation-on-chip technology. We first show that the silent state of such genes universally correlates with a broad distribution of a low but distinct level of the PcG-mediated histone modification, methylation of lysine 27 of histone 3 (H3K27me), and a very low level of the active mark H3K4me2. This chromatin pattern, and particularly H3K4me2 levels, crisply separates DNA-hypermethylated genes from those where histone deacetylation is responsible for transcriptional silencing. Moreover, the chromatin pattern can markedly enhance identification of truly silent and DNA-hypermethylated genes. We additionally find that when DNA-hypermethylated genes are demethylated and reexpressed, they adopt a bivalent chromatin pattern, which is associated with the poised gene expression state of a large group of embryonic stem cell genes and is characterized by an increase in levels of both the H3K27me3 and H3K4me2 marks. Our data have great relevance for the increasing interest in reexpression of DNA-hypermethylated genes for the treatment of cancer.
Collapse
|
10
|
Abstract
We describe construction of a novel modification, "6C," of chromatin looping assays that allows specific proteins that may mediate long-range chromatin interactions to be defined. This approach combines the standard looping approaches previously defined with an immunoprecipitation step to investigate involvement of the specific protein. The efficacy of this approach is demonstrated by using a Polycomb group (PcG) protein, Enhancer of Zeste (EZH2), as an example of how our assay might be used. EZH2, as a protein of the PcG complex, PRC2, has an important role in the propagation of epigenetic memory through deposition of the repressive mark, histone H3, lysine 27, tri-methylation (H3K27me3). Using our new 6C assay, we show how EZH2 is a direct mediator of long-range intra- and interchromosomal interactions that can regulate transcriptional down-regulation of multiple genes by facilitating physical proximities between distant chromatin regions, thus targeting sites within to PcG machinery.
Collapse
|
11
|
Abstract
Recent work suggests a link between the polycomb group protein EZH2 and mediation of gene silencing in association with maintenance of DNA methylation. However, we show that whereas basally expressed target cancer genes with minimal DNA methylation have increased transcription during EZH2 knockdown, densely DNA hypermethylated and silenced genes retain their methylation and remain transcriptionally silent. These results suggest that EZH2 can modulate transcription of basally expressed genes but not silent genes that are densely DNA methylated.
Collapse
|
12
|
Abstract
It is increasingly apparent that cancer development not only depends on genetic alterations but on an abnormal cellular memory, or epigenetic changes, which convey heritable gene expression patterns critical for neoplastic initiation and progression. These aberrant epigenetic mechanisms are manifest in both global changes in chromatin packaging and in localized gene promoter changes that influence the transcription of genes important to the cancer process. An exciting emerging theme is that an understanding of stem cell chromatin control of gene expression, including relationships between histone modifications and DNA methylation, may hold a key to understanding the origins of cancer epigenetic changes. This possibility, coupled with the reversible nature of epigenetics, has enormous significance for the prevention and control of cancer.
Collapse
|
13
|
A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing. Nat Genet 2007; 39:237-42. [PMID: 17211412 PMCID: PMC2744394 DOI: 10.1038/ng1972] [Citation(s) in RCA: 813] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2006] [Accepted: 01/04/2007] [Indexed: 02/08/2023]
Abstract
Adult cancers may derive from stem or early progenitor cells. Epigenetic modulation of gene expression is essential for normal function of these early cells but is highly abnormal in cancers, which often show aberrant promoter CpG island hypermethylation and transcriptional silencing of tumor suppressor genes and pro-differentiation factors. We find that for such genes, both normal and malignant embryonic cells generally lack the hypermethylation of DNA found in adult cancers. In embryonic stem cells, these genes are held in a 'transcription-ready' state mediated by a 'bivalent' promoter chromatin pattern consisting of the repressive mark, histone H3 methylated at Lys27 (H3K27) by Polycomb group proteins, plus the active mark, methylated H3K4. However, embryonic carcinoma cells add two key repressive marks, dimethylated H3K9 and trimethylated H3K9, both associated with DNA hypermethylation in adult cancers. We hypothesize that cell chromatin patterns and transient silencing of these important regulatory genes in stem or progenitor cells may leave these genes vulnerable to aberrant DNA hypermethylation and heritable gene silencing during tumor initiation and progression.
Collapse
|
14
|
Silenced tumor suppressor genes reactivated by DNA demethylation do not return to a fully euchromatic chromatin state. Cancer Res 2006; 66:3541-9. [PMID: 16585178 DOI: 10.1158/0008-5472.can-05-2481] [Citation(s) in RCA: 239] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Histone H3 lysine 9 (H3K9) and lysine 27 (H3K27) trimethylation are properties of stably silenced heterochromatin whereas H3K9 dimethylation (H3K9me2) is important for euchromatic gene repression. In colorectal cancer cells, all of these marks, as well as the key enzymes which establish them, surround the hMLH1 promoter when it is DNA hypermethylated and aberrantly silenced, but are absent when the gene is unmethylated and fully expressed in a euchromatic state. When the aberrantly silenced gene is DNA demethylated and reexpressed following 5-aza-2'-deoxycytidine treatment, H3K9me1 and H3K9me2 are the only silencing marks that are lost. A series of other silenced and DNA hypermethylated gene promoters behave identically even when the genes are chronically DNA demethylated and reexpressed after genetic knockout of DNA methyltransferases. Our data indicate that when transcription of DNA hypermethylated genes is activated in cancer cells, their promoters remain in an environment with certain heterochromatic characteristics. This finding has important implications for the translational goal of reactivating aberrantly silenced cancer genes as a therapeutic maneuver.
Collapse
|
15
|
Inhibition of SIRT1 reactivates silenced cancer genes without loss of promoter DNA hypermethylation. PLoS Genet 2006; 2:e40. [PMID: 16596166 PMCID: PMC1420676 DOI: 10.1371/journal.pgen.0020040] [Citation(s) in RCA: 296] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2005] [Accepted: 02/06/2006] [Indexed: 12/15/2022] Open
Abstract
The class III histone deactylase (HDAC), SIRT1, has cancer relevance because it regulates lifespan in multiple organisms, down-regulates p53 function through deacetylation, and is linked to polycomb gene silencing in Drosophila. However, it has not been reported to mediate heterochromatin formation or heritable silencing for endogenous mammalian genes. Herein, we show that SIRT1 localizes to promoters of several aberrantly silenced tumor suppressor genes (TSGs) in which 5′ CpG islands are densely hypermethylated, but not to these same promoters in cell lines in which the promoters are not hypermethylated and the genes are expressed. Heretofore, only type I and II HDACs, through deactylation of lysines 9 and 14 of histone H3 (H3-K9 and H3-K14, respectively), had been tied to the above TSG silencing. However, inhibition of these enzymes alone fails to re-activate the genes unless DNA methylation is first inhibited. In contrast, inhibition of SIRT1 by pharmacologic, dominant negative, and siRNA (small interfering RNA)–mediated inhibition in breast and colon cancer cells causes increased H4-K16 and H3-K9 acetylation at endogenous promoters and gene re-expression despite full retention of promoter DNA hypermethylation. Furthermore, SIRT1 inhibition affects key phenotypic aspects of cancer cells. We thus have identified a new component of epigenetic TSG silencing that may potentially link some epigenetic changes associated with aging with those found in cancer, and provide new directions for therapeutically targeting these important genes for re-expression. The propensity for cancer to arise and progress is influenced not only by gene mutations (genetic abnormalities), but also by defects in gene expression programs that are inherited from one dividing cell to another. This change in the inheritance of gene expression patterns not associated with changes in the primary DNA sequence is referred to as an epigenetic abnormality. In virtually every form of cancer, tumor suppressor genes (TSGs) and candidate TSGs are epigenetically altered such that the ability of these genes to become activated and lead to production of the corresponding proteins is lost. This so-called gene “silencing” is often linked with abnormal accumulation of methyl groups to DNA (DNA methylation) in a region of the gene that controls its expression. The SIRT1 protein is an enzyme that can remove acetyl groups attached to specific amino acids in a number of different protein targets and thereby regulate gene silencing in yeast. However, in mammalian cells this has not been demonstrated. Here, the authors show SIRT1 is involved in epigenetic silencing of DNA-hypermethylated TSGs in cancer cells. Inhibition of SIRT1 by multiple approaches leads to TSG re-expression and a block in tumor-causing networks of cell signaling that are activated by loss of the TSGs in a wide range of cancers. This finding has important ramifications for the biology of cancer in terms of what maintains abnormal gene silencing. Furthermore, the authors propose that their observations may have potential clinical relevance in suggesting new means for restoring expression of abnormally silenced genes in cancer.
Collapse
|