651
|
Hutchins JRA. What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins. Mol Biol Cell 2015; 25:1187-201. [PMID: 24723265 PMCID: PMC3982986 DOI: 10.1091/mbc.e13-10-0602] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The genomic era has enabled research projects that use approaches including genome-scale screens, microarray analysis, next-generation sequencing, and mass spectrometry-based proteomics to discover genes and proteins involved in biological processes. Such methods generate data sets of gene, transcript, or protein hits that researchers wish to explore to understand their properties and functions and thus their possible roles in biological systems of interest. Recent years have seen a profusion of Internet-based resources to aid this process. This review takes the viewpoint of the curious biologist wishing to explore the properties of protein-coding genes and their products, identified using genome-based technologies. Ten key questions are asked about each hit, addressing functions, phenotypes, expression, evolutionary conservation, disease association, protein structure, interactors, posttranslational modifications, and inhibitors. Answers are provided by presenting the latest publicly available resources, together with methods for hit-specific and data set-wide information retrieval, suited to any genome-based analytical technique and experimental species. The utility of these resources is demonstrated for 20 factors regulating cell proliferation. Results obtained using some of these are discussed in more depth using the p53 tumor suppressor as an example. This flexible and universally applicable approach for characterizing experimental hits helps researchers to maximize the potential of their projects for biological discovery.
Collapse
Affiliation(s)
- James R A Hutchins
- Institute of Human Genetics, Centre National de la Recherche Scientifique (CNRS), 34396 Montpellier, France
| |
Collapse
|
652
|
Menezes-Souza D, Mendes TADO, Gomes MDS, Bartholomeu DC, Fujiwara RT. Improving serodiagnosis of human and canine leishmaniasis with recombinant Leishmania braziliensis cathepsin l-like protein and a synthetic peptide containing its linear B-cell epitope. PLoS Negl Trop Dis 2015; 9:e3426. [PMID: 25569432 PMCID: PMC4287388 DOI: 10.1371/journal.pntd.0003426] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Accepted: 11/17/2014] [Indexed: 12/17/2022] Open
Abstract
Background The early and correct diagnosis of human leishmaniasis is essential for disease treatment. Another important step in the control of visceral leishmaniasis is the identification of infected dogs, which are the main domestic reservoir of L. infantum. Recombinant proteins and synthetic peptides based on Leishmania genes have emerged as valuable targets for serodiagnosis due to their increased sensitivity, specificity and potential for standardization. Cathepsin L-like genes are surface antigens that are secreted by amastigotes and have little similarity to host proteins, factors that enable this protein as a good target for serodiagnosis of the leishmaniasis. Methodology/Principal Findings We mapped a linear B-cell epitope within the Cathepsin L-like protein from L. braziliensis. A synthetic peptide containing the epitope and the recombinant protein was evaluated for serodiagnosis of human tegumentary and visceral leishmaniasis, as well as canine visceral leishmaniasis. Conclusions/Significance The recombinant protein performed best for human tegumentary and canine visceral leishmaniasis, with 96.30% and 89.33% accuracy, respectively. The synthetic peptide was the best to discriminate human visceral leishmaniasis, with 97.14% specificity, 94.55% sensitivity and 96.00% accuracy. Comparison with T. cruzi-infected humans and dogs suggests that the identified epitope is specific to Leishmania parasites, which minimizes the likelihood of cross-reactions. Leishmaniasis is one of the major diseases of importance in public health and its precise diagnosis may represent one of the most relevant challenges for the control and possible eradication of the disease. In this context, recombinant proteins and synthetic peptides based on Leishmania genes have emerged as valuable targets for serodiagnosis due to their increased sensitivity, specificity and potential for standardization. Cathepsin L-like (CatL) genes are more abundant in stationary promastigotes and amastigotes, and have less than 40% identity with human proteins and more than 60% identity with other Leishmania species. We mapped a linear B-cell epitope in the CatL protein sequence and compared its performance with the recombinant protein and current serology methodologies for the diagnosis of human tegumentary and visceral leishmaniasis as well as of canine visceral leishmaniasis (CVL). Both the recombinant protein and synthetic peptide showed higher specificity and sensitivity than crude preparations commonly used for other antigens, and thus, they are valuable targets to compose an antigen panel that could significantly improve leishmaniasis diagnosis.
Collapse
Affiliation(s)
- Daniel Menezes-Souza
- Departamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | | | - Matheus de Souza Gomes
- Instituto de Genética e Bioquímica, Universidade Federal de Uberlândia, Patos de Minas, Brazil
| | | | - Ricardo Toshio Fujiwara
- Departamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- * E-mail:
| |
Collapse
|
653
|
Park D, Jung JW, Choi BS, Jayakodi M, Lee J, Lim J, Yu Y, Choi YS, Lee ML, Park Y, Choi IY, Yang TJ, Edwards OR, Nah G, Kwon HW. Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing. BMC Genomics 2015; 16:1. [PMID: 25553907 PMCID: PMC4326529 DOI: 10.1186/1471-2164-16-1] [Citation(s) in RCA: 445] [Impact Index Per Article: 49.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 12/02/2014] [Indexed: 12/03/2022] Open
Abstract
Background The honey bee is an important model system for increasing understanding of molecular and neural mechanisms underlying social behaviors relevant to the agricultural industry and basic science. The western honey bee, Apis mellifera, has served as a model species, and its genome sequence has been published. In contrast, the genome of the Asian honey bee, Apis cerana, has not yet been sequenced. A. cerana has been raised in Asian countries for thousands of years and has brought considerable economic benefits to the apicultural industry. A cerana has divergent biological traits compared to A. mellifera and it has played a key role in maintaining biodiversity in eastern and southern Asia. Here we report the first whole genome sequence of A. cerana. Results Using de novo assembly methods, we produced a 238 Mbp draft of the A. cerana genome and generated 10,651 genes. A.cerana-specific genes were analyzed to better understand the novel characteristics of this honey bee species. Seventy-two percent of the A. cerana-specific genes had more than one GO term, and 1,696 enzymes were categorized into 125 pathways. Genes involved in chemoreception and immunity were carefully identified and compared to those from other sequenced insect models. These included 10 gustatory receptors, 119 odorant receptors, 10 ionotropic receptors, and 160 immune-related genes. Conclusions This first report of the whole genome sequence of A. cerana provides resources for comparative sociogenomics, especially in the field of social insect communication. These important tools will contribute to a better understanding of the complex behaviors and natural biology of the Asian honey bee and to anticipate its future evolutionary trajectory. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-16-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | - Gyoungju Nah
- Biomodulation Major, Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Republic of Korea.
| | | |
Collapse
|
654
|
Muthamilarasan M, Prasad M. Advances in Setaria genomics for genetic improvement of cereals and bioenergy grasses. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015. [PMID: 25239219 DOI: 10.1007/s00122-014-2399-325239219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Recent advances in Setaria genomics appear promising for genetic improvement of cereals and biofuel crops towards providing multiple securities to the steadily increasing global population. The prominent attributes of foxtail millet (Setaria italica, cultivated) and green foxtail (S. viridis, wild) including small genome size, short life-cycle, in-breeding nature, genetic close-relatedness to several cereals, millets and bioenergy grasses, and potential abiotic stress tolerance have accentuated these two Setaria species as novel model system for studying C4 photosynthesis, stress biology and biofuel traits. Considering this, studies have been performed on structural and functional genomics of these plants to develop genetic and genomic resources, and to delineate the physiology and molecular biology of stress tolerance, for the improvement of millets, cereals and bioenergy grasses. The release of foxtail millet genome sequence has provided a new dimension to Setaria genomics, resulting in large-scale development of genetic and genomic tools, construction of informative databases, and genome-wide association and functional genomic studies. In this context, this review discusses the advancements made in Setaria genomics, which have generated a considerable knowledge that could be used for the improvement of millets, cereals and biofuel crops. Further, this review also shows the nutritional potential of foxtail millet in providing health benefits to global population and provides a preliminary information on introgressing the nutritional properties in graminaceous species through molecular breeding and transgene-based approaches.
Collapse
Affiliation(s)
- Mehanathan Muthamilarasan
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, JNU Campus, New Delhi, 110 067, India
| | | |
Collapse
|
655
|
Muthamilarasan M, Prasad M. Advances in Setaria genomics for genetic improvement of cereals and bioenergy grasses. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:1-14. [PMID: 25239219 DOI: 10.1007/s00122-014-2399-3] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 09/11/2014] [Indexed: 05/18/2023]
Abstract
Recent advances in Setaria genomics appear promising for genetic improvement of cereals and biofuel crops towards providing multiple securities to the steadily increasing global population. The prominent attributes of foxtail millet (Setaria italica, cultivated) and green foxtail (S. viridis, wild) including small genome size, short life-cycle, in-breeding nature, genetic close-relatedness to several cereals, millets and bioenergy grasses, and potential abiotic stress tolerance have accentuated these two Setaria species as novel model system for studying C4 photosynthesis, stress biology and biofuel traits. Considering this, studies have been performed on structural and functional genomics of these plants to develop genetic and genomic resources, and to delineate the physiology and molecular biology of stress tolerance, for the improvement of millets, cereals and bioenergy grasses. The release of foxtail millet genome sequence has provided a new dimension to Setaria genomics, resulting in large-scale development of genetic and genomic tools, construction of informative databases, and genome-wide association and functional genomic studies. In this context, this review discusses the advancements made in Setaria genomics, which have generated a considerable knowledge that could be used for the improvement of millets, cereals and biofuel crops. Further, this review also shows the nutritional potential of foxtail millet in providing health benefits to global population and provides a preliminary information on introgressing the nutritional properties in graminaceous species through molecular breeding and transgene-based approaches.
Collapse
Affiliation(s)
- Mehanathan Muthamilarasan
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, JNU Campus, New Delhi, 110 067, India
| | | |
Collapse
|
656
|
Lin YH, Bundschuh R. RNA structure generates natural cooperativity between single-stranded RNA binding proteins targeting 5' and 3'UTRs. Nucleic Acids Res 2014; 43:1160-9. [PMID: 25550422 PMCID: PMC4333377 DOI: 10.1093/nar/gku1320] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
In post-transcriptional regulation, an mRNA molecule is bound by many proteins and/or miRNAs to modulate its function. To enable combinatorial gene regulation, these binding partners of an RNA must communicate with each other, exhibiting cooperativity. Even in the absence of direct physical interactions between the binding partners, such cooperativity can be mediated through RNA secondary structures, since they affect the accessibility of the binding sites. Here we propose a quantitative measure of this structure-mediated cooperativity that can be numerically calculated for an arbitrary RNA sequence. Focusing on an RNA with two binding sites, we derive a characteristic difference of free energy differences, i.e. ΔΔG, as a measure of the effect of the occupancy of one binding site on the binding strength of another. We apply this measure to a large number of human and Caenorhabditis elegans mRNAs, and find that structure-mediated cooperativity is a generic feature. Interestingly, this cooperativity not only affects binding sites in close proximity along the sequence but also configurations in which one binding site is located in the 5′UTR and the other is located in the 3′UTR of the mRNA. Furthermore, we find that this end-to-end cooperativity is determined by the UTR sequences while the sequences of the coding regions are irrelevant.
Collapse
Affiliation(s)
- Yi-Hsuan Lin
- Department of Physics, The Ohio State University, 191W Woodruff Avenue, Columbus, OH 43210-1107, USA
| | - Ralf Bundschuh
- Department of Physics, The Ohio State University, 191W Woodruff Avenue, Columbus, OH 43210-1107, USA Department of Chemistry & Biochemistry, The Ohio State University, 100W 18th Avenue, Columbus, OH 43210-1340, USA Division of Hematology, Department of Internal Medicine, The Ohio State University, 320W 10th Avenue, Columbus, OH 43210, USA Center for RNA Biology, The Ohio State University, 484W 12th Avenue, Columbus, OH 43210-1292, USA
| |
Collapse
|
657
|
Arai D, Hayakawa K, Ohgane J, Hirosawa M, Nakao Y, Tanaka S, Shiota K. An epigenetic regulatory element of the Nodal gene in the mouse and human genomes. Mech Dev 2014; 136:143-54. [PMID: 25528267 DOI: 10.1016/j.mod.2014.12.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Revised: 12/12/2014] [Accepted: 12/15/2014] [Indexed: 01/28/2023]
Abstract
Nodal signaling plays critical roles during embryonic development. The Nodal gene is not expressed in adult tissues but is frequently activated in cancer cells, contributing to progression toward malignancy. Although several regulatory elements of the Nodal gene have been identified, the epigenetic mechanisms by which Nodal expression is regulated over the long term remain unclear. We found a region exhibiting dynamic changes in DNA methylation at approximately -3.0 kb to -0.4 kb upstream from the transcriptional start site (TSS) that we termed the epigenetic regulatory element (ERE). The ERE was unmethylated in mouse embryonic stem cells (mESCs) but became increasingly methylated in differentiated cells and tissues, concomitant with the downregulation of Nodal mRNA expression. In vitro reporter assays identified an Oct3/4 binding motif within the ERE, indicating that the ERE is responsible for the activation of Nodal in mESCs. Furthermore, the ERE was a target of differentiation-associated Polycomb silencing, and the chromatin condensed when mESCs differentiated to embryoid bodies (EBs). Pharmacological inhibition of PRC2 led to the reactivation of Nodal expression in EBs and mouse embryonic fibroblasts (MEFs). The ERE was also targeted by PRC2 in normal human cells. In NODAL-expressing human cancer cells, accumulation of EZH2 and trimethylation of H3K27 at the ERE were diminished. In conclusion, Nodal is epigenetically controlled through the ERE in the mouse embryo and human cells.
Collapse
Affiliation(s)
- Daisuke Arai
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan; Laboratory of Chemical Biology, Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, 3-4-1 Ohkubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Koji Hayakawa
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Jun Ohgane
- Laboratory of Genomic Function Engineering, Department of Life Sciences, School of Agriculture, Meiji University, 1-1-1 Higashi-mita, Tama-ku, Kawasaki 214-8571, Japan
| | - Mitsuko Hirosawa
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Yoichi Nakao
- Laboratory of Chemical Biology, Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, 3-4-1 Ohkubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Satoshi Tanaka
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kunio Shiota
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan.
| |
Collapse
|
658
|
Wang M, Zhang P, Shu Y, Yuan F, Zhang Y, Zhou Y, Jiang M, Zhu Y, Hu L, Kong X, Zhang Z. Alternative splicing at GYNNGY 5' splice sites: more noise, less regulation. Nucleic Acids Res 2014; 42:13969-80. [PMID: 25428370 PMCID: PMC4267661 DOI: 10.1093/nar/gku1253] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Revised: 10/29/2014] [Accepted: 11/12/2014] [Indexed: 12/28/2022] Open
Abstract
Numerous eukaryotic genes are alternatively spliced. Recently, deep transcriptome sequencing has skyrocketed proportion of alternatively spliced genes; over 95% human multi-exon genes are alternatively spliced. One fundamental question is: are all these alternative splicing (AS) events functional? To look into this issue, we studied the most common form of alternative 5' splice sites-GYNNGYs (Y = C/T), where both GYs can function as splice sites. Global analyses suggest that splicing noise (due to stochasticity of splicing process) can cause AS at GYNNGYs, evidenced by higher AS frequency in non-coding than in coding regions, in non-conserved than in conserved genes and in lowly expressed than in highly expressed genes. However, ∼20% AS GYNNGYs in humans and ∼3% in mice exhibit tissue-dependent regulation. Consistent with being functional, regulated GYNNGYs are more conserved than unregulated ones. And regulated GYNNGYs have distinctive sequence features which may confer regulation. Particularly, each regulated GYNNGY comprises two splice sites more resembling each other than unregulated GYNNGYs, and has more conserved downstream flanking intron. Intriguingly, most regulated GYNNGYs may tune gene expression through coupling with nonsense-mediated mRNA decay, rather than encode different proteins. In summary, AS at GYNNGY 5' splice sites is primarily splicing noise, and secondarily a way of regulation.
Collapse
Affiliation(s)
- Meng Wang
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Peiwei Zhang
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Yang Shu
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Fei Yuan
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Yuchao Zhang
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - You Zhou
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Min Jiang
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Yufei Zhu
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Landian Hu
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Xiangyin Kong
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Zhenguo Zhang
- Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
659
|
Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res 2014; 43:D512-20. [PMID: 25514926 PMCID: PMC4383998 DOI: 10.1093/nar/gku1267] [Citation(s) in RCA: 2058] [Impact Index Per Article: 205.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
PhosphoSitePlus® (PSP, http://www.phosphosite.org/), a knowledgebase dedicated to mammalian post-translational modifications (PTMs), contains over 330 000 non-redundant PTMs, including phospho, acetyl, ubiquityl and methyl groups. Over 95% of the sites are from mass spectrometry (MS) experiments. In order to improve data reliability, early MS data have been reanalyzed, applying a common standard of analysis across over 1 000 000 spectra. Site assignments with P > 0.05 were filtered out. Two new downloads are available from PSP. The ‘Regulatory sites’ dataset includes curated information about modification sites that regulate downstream cellular processes, molecular functions and protein-protein interactions. The ‘PTMVar’ dataset, an intersect of missense mutations and PTMs from PSP, identifies over 25 000 PTMVars (PTMs Impacted by Variants) that can rewire signaling pathways. The PTMVar data include missense mutations from UniPROTKB, TCGA and other sources that cause over 2000 diseases or syndromes (MIM) and polymorphisms, or are associated with hundreds of cancers. PTMVars include 18 548 phosphorlyation sites, 3412 ubiquitylation sites, 2316 acetylation sites, 685 methylation sites and 245 succinylation sites.
Collapse
Affiliation(s)
| | - Bin Zhang
- Cell Signaling Technology, 3 Trask Lane, Danvers, MA 01923, USA
| | - Beth Murray
- Cell Signaling Technology, 3 Trask Lane, Danvers, MA 01923, USA
| | | | - Vaughan Latham
- Cell Signaling Technology, 3 Trask Lane, Danvers, MA 01923, USA
| | | |
Collapse
|
660
|
Jian X, Boerwinkle E, Liu X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res 2014; 42:13534-44. [PMID: 25416802 PMCID: PMC4267638 DOI: 10.1093/nar/gku1206] [Citation(s) in RCA: 337] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Revised: 10/12/2014] [Accepted: 11/04/2014] [Indexed: 01/17/2023] Open
Abstract
In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.
Collapse
Affiliation(s)
- Xueqiu Jian
- Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Center for Human Genetics, The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Eric Boerwinkle
- Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Center for Human Genetics, The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Xiaoming Liu
- Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
661
|
Bahrami-Samani E, Penalva LOF, Smith AD, Uren PJ. Leveraging cross-link modification events in CLIP-seq for motif discovery. Nucleic Acids Res 2014; 43:95-103. [PMID: 25505146 PMCID: PMC4288180 DOI: 10.1093/nar/gku1288] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
High-throughput protein-RNA interaction data generated by CLIP-seq has provided an unprecedented depth of access to the activities of RNA-binding proteins (RBPs), the key players in co- and post-transcriptional regulation of gene expression. Motif discovery forms part of the necessary follow-up data analysis for CLIP-seq, both to refine the exact locations of RBP binding sites, and to characterize them. The specific properties of RBP binding sites, and the CLIP-seq methods, provide additional information not usually present in the classic motif discovery problem: the binding site structure, and cross-linking induced events in reads. We show that CLIP-seq data contains clear secondary structure signals, as well as technology- and RBP-specific cross-link signals. We introduce Zagros, a motif discovery algorithm specifically designed to leverage this information and explore its impact on the quality of recovered motifs. Our results indicate that using both secondary structure and cross-link modifications can greatly improve motif discovery on CLIP-seq data. Further, the motifs we recover provide insight into the balance between sequence- and structure-specificity struck by RBP binding.
Collapse
Affiliation(s)
- Emad Bahrami-Samani
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Luiz O F Penalva
- Children's Cancer Research Institute and Department of Cellular and Structural Biology, University of Texas Health Science Center, San Antonio, TX 78229, USA
| | - Andrew D Smith
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Philip J Uren
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
662
|
Dritsou V, Deligianni E, Dialynas E, Allen J, Poulakakis N, Louis C, Lawson D, Topalis P. Non-coding RNA gene families in the genomes of anopheline mosquitoes. BMC Genomics 2014; 15:1038. [PMID: 25432596 PMCID: PMC4300560 DOI: 10.1186/1471-2164-15-1038] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 11/19/2014] [Indexed: 12/12/2022] Open
Abstract
Background Only a small fraction of the mosquito species of the genus Anopheles are able to transmit malaria, one of the biggest killer diseases of poverty, which is mostly prevalent in the tropics. This diversity has genetic, yet unknown, causes. In a further attempt to contribute to the elucidation of these variances, the international “Anopheles Genomes Cluster Consortium” project (a.k.a. “16 Anopheles genomes project”) was established, aiming at a comprehensive genomic analysis of several anopheline species, most of which are malaria vectors. In the frame of the international consortium carrying out this project our team studied the genes encoding families of non-coding RNAs (ncRNAs), concentrating on four classes: microRNA (miRNA), ribosomal RNA (rRNA), small nuclear RNA (snRNA), and in particular small nucleolar RNA (snoRNA) and, finally, transfer RNA (tRNA). Results Our analysis was carried out using, exclusively, computational approaches, and evaluating both the primary NGS reads as well as the respective genome assemblies produced by the consortium and stored in VectorBase; moreover, the results of RNAseq surveys in cases in which these were available and meaningful were also accessed in order to obtain supplementary data, as were “pre-genomic era” sequence data stored in nucleic acid databases. The investigation included the identification and analysis, in most species studied, of ncRNA genes belonging to several families, as well as the analysis of the evolutionary relations of some of those genes in cross-comparisons to other members of the genus Anopheles. Conclusions Our study led to the identification of members of these gene families in the majority of twenty different anopheline taxa. A set of tools for the study of the evolution and molecular biology of important disease vectors has, thus, been obtained. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1038) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Pantelis Topalis
- Institute of Molecular Biology and Biotechnology, FORTH, Heraklion, Greece.
| |
Collapse
|
663
|
Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hickey G, Hinrichs AS, Hubley R, Karolchik D, Learned K, Lee BT, Li CH, Miga KH, Nguyen N, Paten B, Raney BJ, Smit AFA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 2014; 43:D670-81. [PMID: 25428374 PMCID: PMC4383971 DOI: 10.1093/nar/gku1177] [Citation(s) in RCA: 690] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.
Collapse
Affiliation(s)
- Kate R Rosenbloom
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Joel Armstrong
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Timothy R Dreszer
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Pauline A Fujita
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Rachel A Harte
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Glenn Hickey
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Robert Hubley
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Donna Karolchik
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Chin H Li
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Karen H Miga
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ngan Nguyen
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | | | - Matthew L Speir
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - W James Kent
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| |
Collapse
|
664
|
Oates ME, Stahlhacke J, Vavoulis DV, Smithers B, Rackham OJL, Sardar AJ, Zaucha J, Thurlby N, Fang H, Gough J. The SUPERFAMILY 1.75 database in 2014: a doubling of data. Nucleic Acids Res 2014; 43:D227-33. [PMID: 25414345 PMCID: PMC4383889 DOI: 10.1093/nar/gku1041] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
We present updates to the SUPERFAMILY 1.75 (http://supfam.org) online resource and protein sequence collection. The hidden Markov model library that provides sequence homology to SCOP structural domains remains unchanged at version 1.75. In the last 4 years SUPERFAMILY has more than doubled its holding of curated complete proteomes over all cellular life, from 1400 proteomes reported previously in 2010 up to 3258 at present. Outside of the main sequence collection, SUPERFAMILY continues to provide domain annotation for sequences provided by other resources such as: UniProt, Ensembl, PDB, much of JGI Phytozome and selected subcollections of NCBI RefSeq. Despite this growth in data volume, SUPERFAMILY now provides users with an expanded and daily updated phylogenetic tree of life (sTOL). This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library. Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community. We have now introduced these data in an integrated manner online at the level of an individual sequence, and—in the case of whole genomes—with enrichment analysis against a taxonomically defined background.
Collapse
Affiliation(s)
- Matt E Oates
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | | | | | - Ben Smithers
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Owen J L Rackham
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London, UK
| | - Adam J Sardar
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK e-Therapeutics plc,17 Blenheim Office Park, Long Hanborough, Oxfordshire, OX29 8LN, UK
| | - Jan Zaucha
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK Bristol Centre for Complexity Sciences, University of Bristol, Bristol, UK
| | - Natalie Thurlby
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK Bristol Centre for Complexity Sciences, University of Bristol, Bristol, UK
| | - Hai Fang
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Julian Gough
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| |
Collapse
|
665
|
Krug K, Popic S, Carpy A, Taumer C, Macek B. Construction and assessment of individualized proteogenomic databases for large-scale analysis of nonsynonymous single nucleotide variants. Proteomics 2014; 14:2699-708. [DOI: 10.1002/pmic.201400219] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Revised: 08/02/2014] [Accepted: 09/19/2014] [Indexed: 01/08/2023]
Affiliation(s)
- Karsten Krug
- Proteome Center Tuebingen; University of Tuebingen; Germany
| | - Sasa Popic
- Proteome Center Tuebingen; University of Tuebingen; Germany
| | | | | | - Boris Macek
- Proteome Center Tuebingen; University of Tuebingen; Germany
| |
Collapse
|
666
|
Montague E, Janko I, Stanberry L, Lee E, Choiniere J, Anderson N, Stewart E, Broomall W, Higdon R, Kolker N, Kolker E. Beyond protein expression, MOPED goes multi-omics. Nucleic Acids Res 2014; 43:D1145-51. [PMID: 25404128 PMCID: PMC4383969 DOI: 10.1093/nar/gku1175] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
MOPED (Multi-Omics Profiling Expression Database; http://moped.proteinspire.org) has transitioned from solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface, MOPED presents consistently processed data for gene, protein and pathway expression. To improve data quality, consistency and use, MOPED includes metadata detailing experimental design and analysis methods. The multi-omics data are integrated through direct links between genes and proteins and further connected to pathways and experiments. MOPED now contains over 5 million records, information for approximately 75 000 genes and 50 000 proteins from four organisms (human, mouse, worm, yeast). These records correspond to 670 unique combinations of experiment, condition, localization and tissue. MOPED includes the following new features: pathway expression, Pathway Details pages, experimental metadata checklists, experiment summary statistics and more advanced searching tools. Advanced searching enables querying for genes, proteins, experiments, pathways and keywords of interest. The system is enhanced with visualizations for comparing across different data types. In the future MOPED will expand the number of organisms, increase integration with pathways and provide connections to disease.
Collapse
Affiliation(s)
- Elizabeth Montague
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Imre Janko
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Larissa Stanberry
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Elaine Lee
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - John Choiniere
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Nathaniel Anderson
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Elizabeth Stewart
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - William Broomall
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Roger Higdon
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Natali Kolker
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Eugene Kolker
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101 Departments of Biomedical Informatics and Medical Education and Pediatrics, University of Washington, Seattle, WA, USA 98109 Department of Chemistry and Chemical Biology, College of Science, Northeastern University, Boston, MA 02115
| |
Collapse
|
667
|
Huang PJ, Lee CC, Tan BCM, Yeh YM, Julie Chu L, Chen TW, Chang KP, Lee CY, Gan RC, Liu H, Tang P. CMPD: cancer mutant proteome database. Nucleic Acids Res 2014; 43:D849-55. [PMID: 25398898 PMCID: PMC4383976 DOI: 10.1093/nar/gku1182] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Whole-exome sequencing, which centres on the protein coding regions of disease/cancer associated genes, represents the most cost-effective method to-date for deciphering the association between genetic alterations and diseases. Large-scale whole exome/genome sequencing projects have been launched by various institutions, such as NCI, Broad Institute and TCGA, to provide a comprehensive catalogue of coding variants in diverse tissue samples and cell lines. Further functional and clinical interrogation of these sequence variations must rely on extensive cross-platforms integration of sequencing information and a proteome database that explicitly and comprehensively archives the corresponding mutated peptide sequences. While such data resource is a critical for the mass spectrometry-based proteomic analysis of exomic variants, no database is currently available for the collection of mutant protein sequences that correspond to recent large-scale genomic data. To address this issue and serve as bridge to integrate genomic and proteomics datasets, CMPD (http://cgbc.cgu.edu.tw/cmpd) collected over 2 millions genetic alterations, which not only facilitates the confirmation and examination of potential cancer biomarkers but also provides an invaluable resource for translational medicine research and opportunities to identify mutated proteins encoded by mutated genes.
Collapse
Affiliation(s)
- Po-Jung Huang
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | | | - Yuan-Ming Yeh
- Bioinformatics Division, Tri-I Biotech, Inc., Taipei 221, Taiwan
| | - Lichieh Julie Chu
- Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan
| | - Ting-Wen Chen
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | - Kai-Ping Chang
- Department of Otolaryngology, Head and Neck Surgery, Chang Gung Memorial Hospital, Lin-Kou, Taoyuan 333, Taiwan
| | - Cheng-Yang Lee
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | - Ruei-Chi Gan
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | - Hsuan Liu
- Department of Molecular and Cellular Biology, Chang Gung University, Taoyuan 333, Taiwan
| | - Petrus Tang
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| |
Collapse
|
668
|
dos Santos G, Schroeder AJ, Goodman JL, Strelets VB, Crosby MA, Thurmond J, Emmert DB, Gelbart WM. FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations. Nucleic Acids Res 2014; 43:D690-7. [PMID: 25398896 PMCID: PMC4383921 DOI: 10.1093/nar/gku1099] [Citation(s) in RCA: 291] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Release 6, the latest reference genome assembly of the fruit fly Drosophila melanogaster, was released by the Berkeley Drosophila Genome Project in 2014; it replaces their previous Release 5 genome assembly, which had been the reference genome assembly for over 7 years. With the enormous amount of information now attached to the D. melanogaster genome in public repositories and individual laboratories, the replacement of the previous assembly by the new one is a major event requiring careful migration of annotations and genome-anchored data to the new, improved assembly. In this report, we describe the attributes of the new Release 6 reference genome assembly, the migration of FlyBase genome annotations to this new assembly, how genome features on this new assembly can be viewed in FlyBase (http://flybase.org) and how users can convert coordinates for their own data to the corresponding Release 6 coordinates.
Collapse
Affiliation(s)
- Gilberto dos Santos
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Andrew J Schroeder
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Joshua L Goodman
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Victor B Strelets
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Madeline A Crosby
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Jim Thurmond
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - David B Emmert
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - William M Gelbart
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| |
Collapse
|
669
|
Kroll JE, de Souza SJ, de Souza GA. Identification of rare alternative splicing events in MS/MS data reveals a significant fraction of alternative translation initiation sites. PeerJ 2014; 2:e673. [PMID: 25405079 PMCID: PMC4232841 DOI: 10.7717/peerj.673] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 10/30/2014] [Indexed: 01/08/2023] Open
Abstract
Integration of transcriptome data is a crucial step for the identification of rare protein variants in mass-spectrometry (MS) data with important consequences for all branches of biotechnology research. Here, we used Splooce, a database of splicing variants recently developed by us, to search MS data derived from a variety of human tumor cell lines. More than 800 new protein variants were identified whose corresponding MS spectra were specific to protein entries from Splooce. Although the types of splicing variants (exon skipping, alternative splice sites and intron retention) were found at the same frequency as in the transcriptome, we observed a large variety of modifications at the protein level induced by alternative splicing events. Surprisingly, we found that 40% of all protein modifications induced by alternative splicing led to the use of alternative translation initiation sites. Other modifications include frameshifts in the open reading frame and inclusion or deletion of peptide sequences. To make the dataset generated here available to the community in a more effective form, the Splooce portal (http://www.bioinformatics-brazil.org/splooce) was modified to report the alternative splicing events supported by MS data.
Collapse
Affiliation(s)
- José E Kroll
- Institute of Bioinformatics and Biotechnology , Natal , Brazil ; Brain Institute, UFRN , Natal , Brazil
| | | | - Gustavo A de Souza
- Department of Immunology and Centre for Immune Regulation, Oslo University Hospital HF Rikshospitalet, University of Oslo , Oslo , Norway
| |
Collapse
|
670
|
Okamura Y, Aoki Y, Obayashi T, Tadaka S, Ito S, Narise T, Kinoshita K. COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems. Nucleic Acids Res 2014; 43:D82-6. [PMID: 25392420 PMCID: PMC4383961 DOI: 10.1093/nar/gku1163] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
The COXPRESdb (http://coxpresdb.jp) provides gene coexpression relationships for animal species. Here, we report the updates of the database, mainly focusing on the following two points. For the first point, we added RNAseq-based gene coexpression data for three species (human, mouse and fly), and largely increased the number of microarray experiments to nine species. The increase of the number of expression data with multiple platforms could enhance the reliability of coexpression data. For the second point, we refined the data assessment procedures, for each coexpressed gene list and for the total performance of a platform. The assessment of coexpressed gene list now uses more reasonable P-values derived from platform-specific null distribution. These developments greatly reduced pseudo-predictions for directly associated genes, thus expanding the reliability of coexpression data to design new experiments and to discuss experimental results.
Collapse
Affiliation(s)
- Yasunobu Okamura
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Yuichi Aoki
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Takeshi Obayashi
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Shu Tadaka
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Satoshi Ito
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Takafumi Narise
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan Institute of Development, Aging, and Cancer, Tohoku University, Sendai 980-8575, Japan Tohoku Medical Megabank Organization, Tohoku University, Sendai 980-8573, Japan
| |
Collapse
|
671
|
Li P, Liu Y, Wang H, He Y, Wang X, He Y, Lv F, Chen H, Pang X, Liu M, Shi T, Yi Z. PubAngioGen: a database and knowledge for angiogenesis and related diseases. Nucleic Acids Res 2014; 43:D963-7. [PMID: 25392416 PMCID: PMC4383947 DOI: 10.1093/nar/gku1139] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Angiogenesis is the process of generating new blood vessels based on existing ones, which is involved in many diseases including cancers, cardiovascular diseases and diabetes mellitus. Recently, great efforts have been made to explore the mechanisms of angiogenesis in various diseases and many angiogenic factors have been discovered as therapeutic targets in anti- or pro-angiogenic drug development. However, the resulted information is sparsely distributed and no systematical summarization has been made. In order to integrate these related results and facilitate the researches for the community, we conducted manual text-mining from published literature and built a database named as PubAngioGen (http://www.megabionet.org/aspd/). Our online application displays a comprehensive network for exploring the connection between angiogenesis and diseases at multilevels including protein–protein interaction, drug-target, disease-gene and signaling pathways among various cells and animal models recorded through text-mining. To enlarge the scope of the PubAngioGen application, our database also links to other common resources including STRING, DrugBank and OMIM databases, which will facilitate understanding the underlying molecular mechanisms of angiogenesis and drug development in clinical therapy.
Collapse
Affiliation(s)
- Peng Li
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Yongrui Liu
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Huan Wang
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Yuan He
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Xue Wang
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Yundong He
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Fang Lv
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Huaqing Chen
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Xiufeng Pang
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Mingyao Liu
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China Center for Cancer and Stem Cell Biology, Institute of Biosciences and Technology, Texas A&M University Health Science Center, Houston, TX 77030, USA
| | - Tieliu Shi
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Zhengfang Yi
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| |
Collapse
|
672
|
Peng X, Thierry-Mieg J, Thierry-Mieg D, Nishida A, Pipes L, Bozinoski M, Thomas MJ, Kelly S, Weiss JM, Raveendran M, Muzny D, Gibbs RA, Rogers J, Schroth GP, Katze MG, Mason CE. Tissue-specific transcriptome sequencing analysis expands the non-human primate reference transcriptome resource (NHPRTR). Nucleic Acids Res 2014; 43:D737-42. [PMID: 25392405 PMCID: PMC4383927 DOI: 10.1093/nar/gku1110] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The non-human primate reference transcriptome resource (NHPRTR, available online at http://nhprtr.org/) aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. In the 2012 Phase I of the NHPRTR project, 19 billion fragments or 3.8 terabases of transcriptome sequences were collected from pools of ∼20 tissues in 15 species and subspecies. Here we describe a major expansion of NHPRTR by adding 10.1 billion fragments of tissue-specific RNA-seq data. For this effort, we selected 11 of the original 15 NHP species and subspecies and constructed total RNA libraries for the same ∼15 tissues in each. The sequence quality is such that 88% of the reads align to human reference sequences, allowing us to compute the full list of expression abundance across all tissues for each species, using the reads mapped to human genes. This update also includes improved transcript annotations derived from RNA-seq data for rhesus and cynomolgus macaques, two of the most commonly used NHP models and additional RNA-seq data compiled from related projects. Together, these comprehensive reference transcriptomes from multiple primates serve as a valuable community resource for genome annotation, gene dynamics and comparative functional analysis.
Collapse
Affiliation(s)
- Xinxia Peng
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Andrew Nishida
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Lenore Pipes
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY 10065, USA Institute for Computational Biology (ICB), Weill Cornell Medical College, New York, NY 10065, USA
| | - Marjan Bozinoski
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY 10065, USA Institute for Computational Biology (ICB), Weill Cornell Medical College, New York, NY 10065, USA
| | - Matthew J Thomas
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Sara Kelly
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Jeffrey M Weiss
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | | | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Michael G Katze
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY 10065, USA Institute for Computational Biology (ICB), Weill Cornell Medical College, New York, NY 10065, USA Feil Family Brain and Mind Research Institute (BMRI), Weill Cornell Medical College, New York, NY 10065, USA
| |
Collapse
|
673
|
Bioinformatic analysis reveals genome size reduction and the emergence of tyrosine phosphorylation site in the movement protein of New World bipartite begomoviruses. PLoS One 2014; 9:e111957. [PMID: 25383632 PMCID: PMC4226511 DOI: 10.1371/journal.pone.0111957] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 10/09/2014] [Indexed: 11/19/2022] Open
Abstract
Begomovirus (genus Begomovirus, family Geminiviridae) infection is devastating to a wide variety of agricultural crops including tomato, squash, and cassava. Thus, understanding the replication and adaptation of begomoviruses has important translational value in alleviating substantial economic loss, particularly in developing countries. The bipartite genome of begomoviruses prevalent in the New World and their counterparts in the Old World share a high degree of genome homology except for a partially overlapping reading frame encoding the pre-coat protein (PCP, or AV2). PCP contributes to the essential functions of intercellular movement and suppression of host RNA silencing, but it is only present in the Old World viruses. In this study, we analyzed a set of non-redundant bipartite begomovirus genomes originating from the Old World (N = 28) and the New World (N = 65). Our bioinformatic analysis suggests ∼ 120 nucleotides were deleted from PCP's proximal promoter region that may have contributed to its loss in the New World viruses. Consequently, genomes of the New World viruses are smaller than the Old World counterparts, possibly compensating for the loss of the intercellular movement functions of PCP. Additionally, we detected substantial purifying selection on a portion of the New World DNA-B movement protein (MP, or BC1). Further analysis of the New World MP gene revealed the emergence of a putative tyrosine phosphorylation site, which likely explains the increased purifying selection in that region. These findings provide important information about the strategies adopted by bipartite begomoviruses in adapting to new environment and suggest future in planta experiments.
Collapse
|
674
|
Dreos R, Ambrosini G, Périer RC, Bucher P. The Eukaryotic Promoter Database: expansion of EPDnew and new promoter analysis tools. Nucleic Acids Res 2014; 43:D92-6. [PMID: 25378343 PMCID: PMC4383928 DOI: 10.1093/nar/gku1111] [Citation(s) in RCA: 199] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
We present an update of EPDNew (http://epd.vital-it.ch), a recently introduced new part of the Eukaryotic Promoter Database (EPD) which has been described in more detail in a previous NAR Database Issue. EPD is an old database of experimentally characterized eukaryotic POL II promoters, which are conceptually defined as transcription initiation sites or regions. EPDnew is a collection of automatically compiled, organism-specific promoter lists complementing the old corpus of manually compiled promoter entries of EPD. This new part is exclusively derived from next generation sequencing data from high-throughput promoter mapping experiments. We report on the recent growth of EPDnew, its extension to additional model organisms and its improved integration with other bioinformatics resources developed by our group, in particular the Signal Search Analysis and ChIP-Seq web servers.
Collapse
Affiliation(s)
- René Dreos
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - Giovanna Ambrosini
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| | - Rouayda Cavin Périer
- Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| | - Philipp Bucher
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| |
Collapse
|
675
|
Volders PJ, Verheggen K, Menschaert G, Vandepoele K, Martens L, Vandesompele J, Mestdagh P. An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res 2014; 43:D174-80. [PMID: 25378313 PMCID: PMC4383901 DOI: 10.1093/nar/gku1060] [Citation(s) in RCA: 212] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The human genome is pervasively transcribed, producing thousands of non-coding RNA transcripts. The majority of these transcripts are long non-coding RNAs (lncRNAs) and novel lncRNA genes are being identified at rapid pace. To streamline these efforts, we created LNCipedia, an online repository of lncRNA transcripts and annotation. Here, we present LNCipedia 3.0 (http://www.lncipedia.org), the latest version of the publicly available human lncRNA database. Compared to the previous version of LNCipedia, the database grew over five times in size, gaining over 90,000 new lncRNA transcripts. Assessment of the protein-coding potential of LNCipedia entries is improved with state-of-the art methods that include large-scale reprocessing of publicly available proteomics data. As a result, a high-confidence set of lncRNA transcripts with low coding potential is defined and made available for download. In addition, a tool to assess lncRNA gene conservation between human, mouse and zebrafish has been implemented.
Collapse
Affiliation(s)
| | - Kenneth Verheggen
- Department of Medical Protein Research, VIB, Ghent 9000, Belgium Department of Biochemistry, Ghent University, Ghent 9000 Belgium
| | - Gerben Menschaert
- Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent 9000, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent 9000, Belgium Department of Plant Systems Biology, VIB, Ghent 9000, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, Ghent 9000, Belgium Department of Biochemistry, Ghent University, Ghent 9000 Belgium
| | - Jo Vandesompele
- Center for Medical Genetics, Ghent University, Ghent 9000, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics, Ghent University, Ghent 9000, Belgium
| |
Collapse
|
676
|
Cato L, Neeb A, Brown M, Cato ACB. Control of steroid receptor dynamics and function by genomic actions of the cochaperones p23 and Bag-1L. NUCLEAR RECEPTOR SIGNALING 2014; 12:e005. [PMID: 25422595 PMCID: PMC4242288 DOI: 10.1621/nrs.12005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 09/20/2014] [Indexed: 01/23/2023]
Abstract
Molecular chaperones encompass a group of unrelated proteins that facilitate the
correct assembly and disassembly of other macromolecular structures, which they
themselves do not remain a part of. They associate with a large and diverse set
of coregulators termed cochaperones that regulate their function and
specificity. Amongst others, chaperones and cochaperones regulate the activity
of several signaling molecules including steroid receptors, which upon ligand
binding interact with discrete nucleotide sequences within the nucleus to
control the expression of diverse physiological and developmental genes.
Molecular chaperones and cochaperones are typically known to provide the correct
conformation for ligand binding by the steroid receptors. While this
contribution is widely accepted, recent studies have reported that they further
modulate steroid receptor action outside ligand binding. They are thought to
contribute to receptor turnover, transport of the receptor to different
subcellular localizations, recycling of the receptor on chromatin and even
stabilization of the DNA-binding properties of the receptor. In addition to
these combined effects with molecular chaperones, cochaperones are reported to
have additional functions that are independent of molecular chaperones. Some of
these functions also impact on steroid receptor action. Two well-studied
examples are the cochaperones p23 and Bag-1L, which have been identified as
modulators of steroid receptor activity in nuclei. Understanding details of
their regulatory action will provide new therapeutic opportunities of
controlling steroid receptor action independent of the widespread effects of
molecular chaperones.
Collapse
Affiliation(s)
- Laura Cato
- Division of Molecular and Cellular Oncology, Department of Medical Oncology and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA (LC, MB) and Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany (AN, ACBC)
| | - Antje Neeb
- Division of Molecular and Cellular Oncology, Department of Medical Oncology and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA (LC, MB) and Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany (AN, ACBC)
| | - Myles Brown
- Division of Molecular and Cellular Oncology, Department of Medical Oncology and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA (LC, MB) and Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany (AN, ACBC)
| | - Andrew C B Cato
- Division of Molecular and Cellular Oncology, Department of Medical Oncology and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA (LC, MB) and Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany (AN, ACBC)
| |
Collapse
|
677
|
Nagai Y, Takahashi Y, Imanishi T. VaDE: a manually curated database of reproducible associations between various traits and human genomic polymorphisms. Nucleic Acids Res 2014; 43:D868-72. [PMID: 25361969 PMCID: PMC4383886 DOI: 10.1093/nar/gku1037] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified numerous single nucleotide polymorphisms (SNPs) associated with the development of common diseases. However, it is clear that genetic risk factors of common diseases are heterogeneous among human populations. Therefore, we developed a database of genomic polymorphisms that are reproducibly associated with disease susceptibilities, drug responses and other traits for each human population: 'VarySysDB Disease Edition' (VaDE; http://bmi-tokai.jp/VaDE/). SNP-trait association data were obtained from the National Human Genome Research Institute GWAS (NHGRI GWAS) catalog and RAvariome, and we added detailed information of sample populations by curating original papers. In addition, we collected and curated original papers, and registered the detailed information of SNP-trait associations in VaDE. Then, we evaluated reproducibility of associations in each population by counting the number of significantly associated studies. VaDE provides literature-based SNP-trait association data and functional genomic region annotation for SNP functional research. SNP functional annotation data included experimental data of the ENCODE project, H-InvDB transcripts and the 1000 Genome Project. A user-friendly web interface was developed to assist quick search, easy download and fast swapping among viewers. We believe that our database will contribute to the future establishment of personalized medicine and increase our understanding of genetic factors underlying diseases.
Collapse
Affiliation(s)
- Yoko Nagai
- Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259-1193, Japan
| | - Yasuko Takahashi
- Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259-1193, Japan
| | - Tadashi Imanishi
- Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259-1193, Japan Data Management and Integration Team, Molecular Profiling Research Center for Drug Discovery, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
678
|
Yang X, Li M, Liu Q, Zhang Y, Qian J, Wan X, Wang A, Zhang H, Zhu C, Lu X, Mao Y, Sang X, Zhao H, Zhao Y, Zhang X. Dr.VIS v2.0: an updated database of human disease-related viral integration sites in the era of high-throughput deep sequencing. Nucleic Acids Res 2014; 43:D887-92. [PMID: 25355513 PMCID: PMC4383912 DOI: 10.1093/nar/gku1074] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Dr.VIS is a database of human disease-related viral integration sites (VIS). The number of VIS has grown rapidly since Dr.VIS was first released in 2011, and there is growing recognition of the important role that viral integration plays in the development of malignancies. The updated database version, Dr.VIS v2.0 (http://www.bioinfo.org/drvis or bminfor.tongji.edu.cn/drvis_v2), represents 25 diseases, covers 3340 integration sites of eight oncogenic viruses in human chromosomes and provides more accurate information about VIS from high-throughput deep sequencing results obtained mainly after 2012. Data of VISes for three newly identified oncogenic viruses for 14 related diseases have been added to this 2015 update, which has a 5-fold increase of VISes compared to Dr.VIS v1.0. Dr.VIS v2.0 has 2244 precise integration sites, 867 integration regions and 551 junction sequences. A total of 2295 integration sites are located near 1730 involved genes. Of the VISes, 1153 are detected in the exons or introns of genes, with 294 located up to 5 kb and a further 112 located up to 10 kb away. As viral integration may alter chromosome stability and gene expression levels, characterizing VISes will contribute toward the discovery of novel oncogenes, tumor suppressor genes and tumor-associated pathways.
Collapse
Affiliation(s)
- Xiaobo Yang
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Ming Li
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Qi Liu
- School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yabing Zhang
- Otolaryngology Head and Neck Surgery Department, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Junyan Qian
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Xueshuai Wan
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Anqiang Wang
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Haohai Zhang
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Chengpei Zhu
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Xin Lu
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Yilei Mao
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Xinting Sang
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Haitao Zhao
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Yi Zhao
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
| | - Xiaoyan Zhang
- School of Life Sciences and Technology, Tongji University, Shanghai, China
| |
Collapse
|
679
|
Brown GR, Hem V, Katz KS, Ovetsky M, Wallin C, Ermolaeva O, Tolstoy I, Tatusova T, Pruitt KD, Maglott DR, Murphy TD. Gene: a gene-centered information resource at NCBI. Nucleic Acids Res 2014; 43:D36-42. [PMID: 25355515 DOI: 10.1093/nar/gku1055] [Citation(s) in RCA: 419] [Impact Index Per Article: 41.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.
Collapse
Affiliation(s)
- Garth R Brown
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Vichet Hem
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Kenneth S Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Michael Ovetsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Craig Wallin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Olga Ermolaeva
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Igor Tolstoy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Tatiana Tatusova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Donna R Maglott
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| |
Collapse
|
680
|
Petrov AI, Kay SJE, Gibson R, Kulesha E, Staines D, Bruford EA, Wright MW, Burge S, Finn RD, Kersey PJ, Cochrane G, Bateman A, Griffiths-Jones S, Harrow J, Chan PP, Lowe TM, Zwieb CW, Wower J, Williams KP, Hudson CM, Gutell R, Clark MB, Dinger M, Quek XC, Bujnicki JM, Chua NH, Liu J, Wang H, Skogerbø G, Zhao Y, Chen R, Zhu W, Cole JR, Chai B, Huang HD, Huang HY, Cherry JM, Hatzigeorgiou A, Pruitt KD. RNAcentral: an international database of ncRNA sequences. Nucleic Acids Res 2014; 43:D123-9. [PMID: 25352543 PMCID: PMC4384043 DOI: 10.1093/nar/gku991] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
The field of non-coding RNA biology has been hampered by the lack of availability of a
comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the
first release of RNAcentral, a database that collates and integrates information from an
international consortium of established RNA sequence databases. The initial release
contains over 8.1 million sequences, including representatives of all major functional
classes. A web portal (http://rnacentral.org) provides free access to data, search functionality,
cross-references, source code and an integrated genome browser for selected species.
Collapse
|
681
|
Omer WH, Narita A, Hosomichi K, Mitsunaga S, Hayashi Y, Yamashita A, Krasniqi A, Iwasaki Y, Kimura M, Inoue I. Genome-wide linkage and exome analyses identify variants of HMCN1 for splenic epidermoid cyst. BMC MEDICAL GENETICS 2014; 15:115. [PMID: 25338956 PMCID: PMC4258954 DOI: 10.1186/s12881-014-0115-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 10/03/2014] [Indexed: 12/30/2022]
Abstract
BACKGROUND Splenic epidermoid cyst is a benign tumor-like lesion affecting the spleen and sometimes occurs in familial form. The causality of such rare diseases remain challenging, however recently, with the emergence of exome re-sequencing, the genetics of many diseases have been unveiled. In the present study, we performed a combinatorial approach of genome-wide parametric linkage and exome analyses for a moderate-sized Japanese family with frequent occurrence of splenic epidermoid cyst to identify the genetic causality of the disease. METHODS Twelve individuals from the family were subject to SNP typing and exome re-sequencing was done for 8 family members and 4 unrelated patients from Kosovo. Linkage was estimated using multi-point parametric linkage analysis assuming a dominant mode of inheritance. All of the candidate variants from exome analysis were confirmed by direct sequencing. RESULTS The parametric linkage analysis suggested two loci on 1q and 14q with a maximal LOD score of 2.5 . Exome generated variants were prioritized based on; impact on the protein coding sequence, novelty or rareness in public databases, and position within the linkage loci. This approach identified three variants; variants of HMCN1 and CNTN2 on 1q and a variant of DDHD1 on 14q. The variant of HMCN1 (p.R5205H) showed the best co-segregation in the family after validation with Sanger sequencing. Additionally, rare missense variants (p.A4704V, p.T5004I, and p.H5244Q) were detected in three unrelated Kosovo patients. The identified variants of HMCN1 are on conserved domains, particularly the two variants on calcium-binding epidermal growth factor domain. CONCLUSIONS The present study, by combining linkage and exome analyses, identified HMCN1 as a genetic causality of splenic epidermoid cyst. Understanding the biology of the disease is a key step toward developing innovative approaches of intervention.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Ituro Inoue
- Division of Human Genetics, National Institute of Genetics, The Graduate University for Advanced Studies (SOKENDAI), Yata 1111, Mishima 411-8540, Shizuoka, Japan.
| |
Collapse
|
682
|
Abstract
Identifying sequence variants that play a mechanistic role in human disease and other phenotypes is a fundamental goal in human genetics and will be important in translating the results of variation studies. Experimental validation to confirm that a variant causes the biochemical changes responsible for a given disease or phenotype is considered the gold standard, but this cannot currently be applied to the 3 million or so variants expected in an individual genome. This has prompted the development of a wide variety of computational approaches that use several different sources of information to identify functional variation. Here, we review and assess the limitations of computational techniques for categorizing variants according to functional classes, prioritizing variants for experimental follow-up and generating hypotheses about the possible molecular mechanisms to inform downstream experiments. We discuss the main current bioinformatics approaches to identifying functional variation, including widely used algorithms for coding variation such as SIFT and PolyPhen and also novel techniques for interpreting variation across the genome.
Collapse
Affiliation(s)
- Graham RS Ritchie
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD UK
- />Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA UK
| | - Paul Flicek
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD UK
- />Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA UK
| |
Collapse
|
683
|
Du X, Gertz EM, Wojtowicz D, Zhabinskaya D, Levens D, Benham CJ, Schäffer AA, Przytycka TM. Potential non-B DNA regions in the human genome are associated with higher rates of nucleotide mutation and expression variation. Nucleic Acids Res 2014; 42:12367-79. [PMID: 25336616 PMCID: PMC4227770 DOI: 10.1093/nar/gku921] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
While individual non-B DNA structures have been shown to impact gene expression, their broad regulatory role remains elusive. We utilized genomic variants and expression quantitative trait loci (eQTL) data to analyze genome-wide variation propensities of potential non-B DNA regions and their relation to gene expression. Independent of genomic location, these regions were enriched in nucleotide variants. Our results are consistent with previously observed mutagenic properties of these regions and counter a previous study concluding that G-quadruplex regions have a reduced frequency of variants. While such mutagenicity might undermine functionality of these elements, we identified in potential non-B DNA regions a signature of negative selection. Yet, we found a depletion of eQTL-associated variants in potential non-B DNA regions, opposite to what might be expected from their proposed regulatory role. However, we also observed that genes downstream of potential non-B DNA regions showed higher expression variation between individuals. This coupling between mutagenicity and tolerance for expression variability of downstream genes may be a result of evolutionary adaptation, which allows reconciling mutagenicity of non-B DNA structures with their location in functionally important regions and their potential regulatory role.
Collapse
Affiliation(s)
- Xiangjun Du
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - E Michael Gertz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Damian Wojtowicz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Dina Zhabinskaya
- Laboratory of Pathology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - David Levens
- UC Davis Genome Center, University of California Davis, Davis, CA 95616, USA
| | - Craig J Benham
- Laboratory of Pathology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Alejandro A Schäffer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Teresa M Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
684
|
Garson K, Vanderhyden BC. Epithelial ovarian cancer stem cells: underlying complexity of a simple paradigm. Reproduction 2014; 149:R59-70. [PMID: 25301968 DOI: 10.1530/rep-14-0234] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The lack of significant progress in the treatment of epithelial ovarian cancer (EOC) underscores the need to gain a better understanding of the processes that lead to chemoresistance and recurrence. The cancer stem cell (CSC) hypothesis offers an attractive explanation of how a subpopulation of cells within a patient's tumour might remain refractory to treatment and subsequently form the basis of recurrent chemoresistant disease. This review examines the literature defining somatic stem cells of the ovary and fallopian tube, two tissues that give rise to EOC. In addition, considerable research has been reviewed, that has identified subpopulations of EOC cells, based on marker expression (CD133, CD44, CD117, CD24, epithelial cell adhesion molecule, LY6A, ALDH1 and side population (SP)), which are enriched for tumour initiating cells (TICs). While many studies identified either CD133 or CD44 as markers useful for enriching for TICs, there is little consensus. This suggests that EOC cells may have a phenotypic plasticity that may preclude the identification of universal markers defining a CSC. The assay that forms the basis of quantifying TICs is the xenograft assay. Considerable controversy surrounds the xenograft assay and it is essential that some of the potential limitations be examined in this review. Highlighting such limitations or weaknesses is required to properly evaluate data and broaden our interpretation of potential mechanisms that might be contributing to the pathogenesis of ovarian cancer.
Collapse
Affiliation(s)
- Kenneth Garson
- Ottawa Hospital Research InstituteCentre for Cancer Therapeutics, Ottawa, Ontario, Canada K1H 8L6Department of Cellular and Molecular MedicineFaculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada K1H 8M5
| | - Barbara C Vanderhyden
- Ottawa Hospital Research InstituteCentre for Cancer Therapeutics, Ottawa, Ontario, Canada K1H 8L6Department of Cellular and Molecular MedicineFaculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada K1H 8M5 Ottawa Hospital Research InstituteCentre for Cancer Therapeutics, Ottawa, Ontario, Canada K1H 8L6Department of Cellular and Molecular MedicineFaculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada K1H 8M5
| |
Collapse
|
685
|
Nguyen H, Maier J, Huang H, Perrone V, Simmerling C. Folding simulations for proteins with diverse topologies are accessible in days with a physics-based force field and implicit solvent. J Am Chem Soc 2014; 136:13959-62. [PMID: 25255057 PMCID: PMC4195377 DOI: 10.1021/ja5032776] [Citation(s) in RCA: 168] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The millisecond time scale needed for molecular dynamics simulations to approach the quantitative study of protein folding is not yet routine. One approach to extend the simulation time scale is to perform long simulations on specialized and expensive supercomputers such as Anton. Ideally, however, folding simulations would be more economical while retaining reasonable accuracy, and provide feedback on structure, stability and function rapidly enough if partnered directly with experiment. Approaches to this problem typically involve varied compromises between accuracy, precision, and cost; the goal here is to address whether simple implicit solvent models have become sufficiently accurate for their weaknesses to be offset by their ability to rapidly provide much more precise conformational data as compared to explicit solvent. We demonstrate that our recently developed physics-based model performs well on this challenge, enabling accurate all-atom simulated folding for 16 of 17 proteins with a variety of sizes, secondary structure, and topologies. The simulations were carried out using the Amber software on inexpensive GPUs, providing ∼1 μs/day per GPU, and >2.5 ms data presented here. We also show that native conformations are preferred over misfolded structures for 14 of the 17 proteins. For the other 3, misfolded structures are thermodynamically preferred, suggesting opportunities for further improvement.
Collapse
Affiliation(s)
- Hai Nguyen
- Department of Chemistry, ‡Laufer Center for Physical and Quantitative Biology and §Graduate Program in Biochemistry and Structural Biology, Stony Brook University , Stony Brook, New York 11794-5252, United States
| | | | | | | | | |
Collapse
|
686
|
Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM. Long non-coding RNAs as a source of new peptides. eLife 2014; 3:e03523. [PMID: 25233276 PMCID: PMC4359382 DOI: 10.7554/elife.03523] [Citation(s) in RCA: 352] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 08/11/2014] [Indexed: 12/11/2022] Open
Abstract
Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution. DOI:http://dx.doi.org/10.7554/eLife.03523.001 Despite the terms being largely interchangeable in modern language, ‘DNA’ and ‘gene’ do not mean the same thing. A gene is made of DNA and contains the instructions to make a protein, and it is the protein that performs the function of the gene. However, cells in the body also contain DNA that does not form genes. Far from being ‘junk’ DNA with no biological purpose; this DNA has a variety of roles, including affecting how other genes are used. To produce a protein, the DNA sequence of a gene is transcribed into an intermediate molecule called RNA, which is then translated to produce a protein. So-called long non-coding RNA (lncRNA) molecules are also transcribed from DNA, but whether these are translated to make proteins has been a subject of much debate. Indeed, the function of the vast majority of lncRNA molecules is unknown. Ruiz-Orera et al. analyzed RNA sequences collected from earlier experiments on six different species—humans, mice, fish, flies, yeast, and a plant—and found nearly 2500 as yet unstudied lncRNAs in addition to those previously identified. Many of the lncRNAs that Ruiz-Orera et al. investigated could be found lodged inside the cellular machinery used to translate RNA into proteins. Furthermore, these lncRNA molecules are oriented in the machinery as if they are primed and ready for translation, suggesting that many lncRNAs do produce proteins. However, it is unclear how many of these proteins have a useful function. Very few lncRNAs were found in more than one species, suggesting that they have evolved recently. The properties of lncRNA molecules also show many similarities with the properties of ‘young’—recently evolved—genes that are known to produce proteins. The combined findings of Ruiz-Orera et al. therefore suggest that lncRNAs are important for developing new proteins. The emergence of proteins with new functions has been an important driving force in evolution, and this work provides important clues into the first steps of this process. DOI:http://dx.doi.org/10.7554/eLife.03523.002
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain
| | - Xavier Messeguer
- Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, Spain
| | - Juan Antonio Subirana
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain
| | - M Mar Alba
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
687
|
Patnaik SK, Helmberg W, Blumenfeld OO. BGMUT Database of Allelic Variants of Genes Encoding Human Blood Group Antigens. ACTA ACUST UNITED AC 2014; 41:346-51. [PMID: 25538536 DOI: 10.1159/000366108] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 05/19/2014] [Indexed: 12/30/2022]
Abstract
The Blood group antigen Gene MUTation (BGMUT) database documents variations in genes of human blood group systems. In March 2014, the database, accessible at www.ncbi.nlm.nih.gov/gv/mhc/xslcgi.cgi?cmd=bgmut, listed 1,545 alleles of 44 genes of 34 blood group systems. Besides allelic information, the BGMUT resource also presents comprehensive and current information on blood group systems. This review describes the database and notes its utility for the transfusion medicine and human genetics communities.
Collapse
Affiliation(s)
- Santosh Kumar Patnaik
- Department of Thoracic Surgery, Roswell Park Cancer Institute, Elm and Carlton Streets, Buffalo, NY, USA
| | - Wolfgang Helmberg
- Department of Blood Group Serology and Transfusion Medicine, Medical University of Graz, Graz, Austria
| | - Olga O Blumenfeld
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
688
|
Johnson S, Trost B, Long JR, Pittet V, Kusalik A. A better sequence-read simulator program for metagenomics. BMC Bioinformatics 2014; 15 Suppl 9:S14. [PMID: 25253095 PMCID: PMC4168713 DOI: 10.1186/1471-2105-15-s9-s14] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. Results We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. Conclusions BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.
Collapse
|
689
|
Fine mapping of eight psoriasis susceptibility loci. Eur J Hum Genet 2014; 23:844-53. [PMID: 25182136 DOI: 10.1038/ejhg.2014.172] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Revised: 06/03/2014] [Accepted: 06/06/2014] [Indexed: 01/04/2023] Open
Abstract
Previous studies have identified 41 independent genome-wide significant psoriasis susceptibility loci. After our first psoriasis genome-wide association study, we designed a custom genotyping array to fine-map eight genome-wide significant susceptibility loci known at that time (IL23R, IL13, IL12B, TNIP1, MHC, TNFAIP3, IL23A and RNF114) enabling genotyping of 2269 single-nucleotide polymorphisms (SNPs) in the eight loci for 2699 psoriasis cases and 2107 unaffected controls of European ancestry. We imputed these data using the latest 1000 Genome reference haplotypes, which included both indels and SNPs, to increase the marker density of the eight loci to 49 239 genetic variants. Using stepwise conditional association analysis, we identified nine independent signals distributed across six of the eight loci. In the major histocompatibility complex (MHC) region, we detected three independent signals at rs114255771 (P = 2.94 × 10(-74)), rs6924962 (P = 3.21 × 10(-19)) and rs892666 (P = 1.11 × 10(-10)). Near IL12B we detected two independent signals at rs62377586 (P = 7.42 × 10(-16)) and rs918518 (P = 3.22 × 10(-11)). Only one signal was observed in each of the TNIP1 (rs17728338; P = 4.15 × 10(-13)), IL13 (rs1295685; P = 1.65 × 10(-7)), IL23A (rs61937678; P = 1.82 × 10(-7)) and TNFAIP3 (rs642627; P = 5.90 × 10(-7)) regions. We also imputed variants for eight HLA genes and found that SNP rs114255771 yielded a more significant association than any HLA allele or amino-acid residue. Further analysis revealed that the HLA-C*06-B*57 haplotype tagged by this SNP had a significantly higher odds ratio than other HLA-C*06-bearing haplotypes. The results demonstrate allelic heterogeneity at IL12B and identify a high-risk MHC class I haplotype, consistent with the existence of multiple psoriasis effectors in the MHC.
Collapse
|
690
|
Gollin SM. Cytogenetic alterations and their molecular genetic correlates in head and neck squamous cell carcinoma: a next generation window to the biology of disease. Genes Chromosomes Cancer 2014; 53:972-90. [PMID: 25183546 DOI: 10.1002/gcc.22214] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2014] [Accepted: 08/15/2014] [Indexed: 01/14/2023] Open
Abstract
Cytogenetic alterations underlie the development of head and neck squamous cell carcinoma (HNSCC), whether tobacco and alcohol use, betel nut chewing, snuff or human papillomavirus (HPV) causes the disease. Many of the molecular genetic aberrations in HNSCC result from these cytogenetic alterations. This review presents a brief introduction to the epidemiology of HNSCC, and discusses the role of HPV in the disease, cytogenetic alterations and their frequencies in HNSCC, their molecular genetic and The Cancer Genome Atlas (TCGA) correlates, prognostic implications, and possible therapeutic considerations. The most frequent cytogenetic alterations in HNSCC are gains of 5p14-15, 8q11-12, and 20q12-13, gains or amplifications of 3q26, 7p11, 8q24, and 11q13, and losses of 3p, 4q35, 5q12, 8p23, 9p21-24, 11q14-23, 13q12-14, 18q23, and 21q22. To understand their effects on tumor cell biology and response to therapy, the cytogenetic findings in HNSCC are increasingly being examined in the context of the biochemical pathways they disrupt. The goal is to minimize morbidity and mortality from HNSCC using cytogenetic abnormalities to identify valuable diagnostic biomarkers for HNSCC, prognostic biomarkers of tumor behavior, recurrence risk, and outcome, and predictive biomarkers of therapeutic response to identify the most efficacious treatment for each individual patient's tumor, all based on a detailed understanding of the next generation biology of HNSCC.
Collapse
Affiliation(s)
- Susanne M Gollin
- Department of Human Genetics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA; Departments of Otolaryngology and Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA; University of Pittsburgh Cancer Institute, Pittsburgh, PA
| |
Collapse
|
691
|
Demeure K, Duriez E, Domon B, Niclou SP. PeptideManager: a peptide selection tool for targeted proteomic studies involving mixed samples from different species. Front Genet 2014; 5:305. [PMID: 25228907 PMCID: PMC4151198 DOI: 10.3389/fgene.2014.00305] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 08/16/2014] [Indexed: 02/02/2023] Open
Abstract
The search for clinically useful protein biomarkers using advanced mass spectrometry approaches represents a major focus in cancer research. However, the direct analysis of human samples may be challenging due to limited availability, the absence of appropriate control samples, or the large background variability observed in patient material. As an alternative approach, human tumors orthotopically implanted into a different species (xenografts) are clinically relevant models that have proven their utility in pre-clinical research. Patient derived xenografts for glioblastoma have been extensively characterized in our laboratory and have been shown to retain the characteristics of the parental tumor at the phenotypic and genetic level. Such models were also found to adequately mimic the behavior and treatment response of human tumors. The reproducibility of such xenograft models, the possibility to identify their host background and perform tumor-host interaction studies, are major advantages over the direct analysis of human samples. At the proteome level, the analysis of xenograft samples is challenged by the presence of proteins from two different species which, depending on tumor size, type or location, often appear at variable ratios. Any proteomics approach aimed at quantifying proteins within such samples must consider the identification of species specific peptides in order to avoid biases introduced by the host proteome. Here, we present an in-house methodology and tool developed to select peptides used as surrogates for protein candidates from a defined proteome (e.g., human) in a host proteome background (e.g., mouse, rat) suited for a mass spectrometry analysis. The tools presented here are applicable to any species specific proteome, provided a protein database is available. By linking the information from both proteomes, PeptideManager significantly facilitates and expedites the selection of peptides used as surrogates to analyze proteins of interest.
Collapse
Affiliation(s)
- Kevin Demeure
- NorLux Neuro-Oncology Laboratory, Department of Oncology, Centre de Recherche Public de la Santé Luxembourg, Luxembourg
| | - Elodie Duriez
- LCP, Luxembourg Clinical Proteomics Center, Centre de Recherche Public de la Santé Strassen, Luxembourg
| | - Bruno Domon
- LCP, Luxembourg Clinical Proteomics Center, Centre de Recherche Public de la Santé Strassen, Luxembourg
| | - Simone P Niclou
- NorLux Neuro-Oncology Laboratory, Department of Oncology, Centre de Recherche Public de la Santé Luxembourg, Luxembourg
| |
Collapse
|
692
|
Van Peer G, Lefever S, Anckaert J, Beckers A, Rihani A, Van Goethem A, Volders PJ, Zeka F, Ongenaert M, Mestdagh P, Vandesompele J. miRBase Tracker: keeping track of microRNA annotation changes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau080. [PMID: 25157074 PMCID: PMC4142392 DOI: 10.1093/database/bau080] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Since 2002, information on individual microRNAs (miRNAs), such as reference names and sequences, has been stored in miRBase, the reference database for miRNA annotation. As a result of progressive insights into the miRNome and its complexity, miRBase underwent addition and deletion of miRNA records, changes in annotated miRNA sequences and adoption of more complex naming schemes over time. Unfortunately, miRBase does not allow straightforward assessment of these ongoing miRNA annotation changes, which has resulted in substantial ambiguity regarding miRNA identity and sequence in public literature, in target prediction databases and in content on various commercially available analytical platforms. As a result, correct interpretation, comparison and integration of miRNA study results are compromised, which we demonstrate here by assessing the impact of ignoring sequence annotation changes. To address this problem, we developed miRBase Tracker (www.mirbasetracker.org), an easy-to-use online database that keeps track of all historical and current miRNA annotation present in the miRBase database. Three basic functionalities allow researchers to keep their miRNA annotation up-to-date, reannotate analytical miRNA platforms and link published results with outdated annotation to the latest miRBase release. We expect miRBase Tracker to increase the transparency and annotation accuracy in the field of miRNA research. Database URL:www.mirbasetracker.org
Collapse
Affiliation(s)
- Gert Van Peer
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| | - Steve Lefever
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| | - Jasper Anckaert
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| | - Anneleen Beckers
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| | - Ali Rihani
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| | - Alan Van Goethem
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| | | | - Fjoralba Zeka
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| | - Maté Ongenaert
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| | - Jo Vandesompele
- Center for Medical Genetics Ghent, Ghent University, Ghent, Belgium
| |
Collapse
|
693
|
Li W, Freudenberg J. Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. Comput Biol Chem 2014; 53 Pt A:108-17. [PMID: 25241312 DOI: 10.1016/j.compbiolchem.2014.08.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 12/31/2022]
Abstract
Repetitive and redundant regions of a genome are particularly problematic for mapping sequencing reads. In the present paper, we compile a list of the unmappable regions in the human genome based on the following definition: hypothetical reads with length 1 kb which cannot be uniquely mapped with zero-mismatch alignment for the described regions, considering both the forward and reverse strand. The respective collection of unmappable regions covers 0.77% of the sequence of human autosomes and 8.25% of the sex chromosomes in the reference genome GRCh37/hg19 (overall 1.23%). Not surprisingly, our unmappable regions overlap greatly with segmental duplication, transposable elements, and structural variants. About 99.8% of bases in our unmappable regions are part of either segmental duplication or transposable elements and 98.3% overlap structural variant annotations. Notably, some of these regions overlap units with important biological functions, including 4% of protein-coding genes. In contrast, these regions have zero intersection with the ultraconserved elements, very low overlap with microRNAs, tRNAs, pseudogenes, CpG islands, tandem repeats, microsatellites, sensitive non-coding regions, and the mapping blacklist regions from the ENCODE project.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, NY 11030, USA.
| | - Jan Freudenberg
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, NY 11030, USA
| |
Collapse
|
694
|
Abstract
BACKGROUND Long intergenic non-coding RNAs (lncRNAs) represent an emerging and under-studied class of transcripts that play a significant role in human cancers. Due to the tissue- and cancer-specific expression patterns observed for many lncRNAs it is believed that they could serve as ideal diagnostic biomarkers. However, until each tumor type is examined more closely, many of these lncRNAs will remain elusive. RESULTS Here we characterize the lncRNA landscape in lung cancer using publicly available transcriptome sequencing data from a cohort of 567 adenocarcinoma and squamous cell carcinoma tumors. Through this compendium we identify over 3,000 unannotated intergenic transcripts representing novel lncRNAs. Through comparison of both adenocarcinoma and squamous cell carcinomas with matched controls we discover 111 differentially expressed lncRNAs, which we term lung cancer-associated lncRNAs (LCALs). A pan-cancer analysis of 324 additional tumor and adjacent normal pairs enable us to identify a subset of lncRNAs that display enriched expression specific to lung cancer as well as a subset that appear to be broadly deregulated across human cancers. Integration of exome sequencing data reveals that expression levels of many LCALs have significant associations with the mutational status of key oncogenes in lung cancer. Functional validation, using both knockdown and overexpression, shows that the most differentially expressed lncRNA, LCAL1, plays a role in cellular proliferation. CONCLUSIONS Our systematic characterization of publicly available transcriptome data provides the foundation for future efforts to understand the role of LCALs, develop novel biomarkers, and improve knowledge of lung tumor biology.
Collapse
|
695
|
White NM, Cabanski CR, Silva-Fisher JM, Dang HX, Govindan R, Maher CA. Transcriptome sequencing reveals altered long intergenic non-coding RNAs in lung cancer. Genome Biol 2014; 15:429. [PMID: 25116943 PMCID: PMC4156652 DOI: 10.1186/s13059-014-0429-8] [Citation(s) in RCA: 165] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 07/31/2014] [Indexed: 02/07/2023] Open
Abstract
Background Long intergenic non-coding RNAs (lncRNAs) represent an emerging and under-studied class of transcripts that play a significant role in human cancers. Due to the tissue- and cancer-specific expression patterns observed for many lncRNAs it is believed that they could serve as ideal diagnostic biomarkers. However, until each tumor type is examined more closely, many of these lncRNAs will remain elusive. Results Here we characterize the lncRNA landscape in lung cancer using publicly available transcriptome sequencing data from a cohort of 567 adenocarcinoma and squamous cell carcinoma tumors. Through this compendium we identify over 3,000 unannotated intergenic transcripts representing novel lncRNAs. Through comparison of both adenocarcinoma and squamous cell carcinomas with matched controls we discover 111 differentially expressed lncRNAs, which we term lung cancer-associated lncRNAs (LCALs). A pan-cancer analysis of 324 additional tumor and adjacent normal pairs enable us to identify a subset of lncRNAs that display enriched expression specific to lung cancer as well as a subset that appear to be broadly deregulated across human cancers. Integration of exome sequencing data reveals that expression levels of many LCALs have significant associations with the mutational status of key oncogenes in lung cancer. Functional validation, using both knockdown and overexpression, shows that the most differentially expressed lncRNA, LCAL1, plays a role in cellular proliferation. Conclusions Our systematic characterization of publicly available transcriptome data provides the foundation for future efforts to understand the role of LCALs, develop novel biomarkers, and improve knowledge of lung tumor biology. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0429-8) contains supplementary material, which is available to authorized users.
Collapse
|
696
|
Distinct isoform of FABP7 revealed by screening for retroelement-activated genes in diffuse large B-cell lymphoma. Proc Natl Acad Sci U S A 2014; 111:E3534-43. [PMID: 25114248 DOI: 10.1073/pnas.1405507111] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Remnants of ancient transposable elements (TEs) are abundant in mammalian genomes. These sequences harbor multiple regulatory motifs and hence are capable of influencing expression of host genes. In response to environmental changes, TEs are known to be released from epigenetic repression and to become transcriptionally active. Such activation could also lead to lineage-inappropriate activation of oncogenes, as one study described in Hodgkin lymphoma. However, little further evidence for this mechanism in other cancers has been reported. Here, we reanalyzed whole transcriptome data from a large cohort of patients with diffuse large B-cell lymphoma (DLBCL) compared with normal B-cell centroblasts to detect genes ectopically expressed through activation of TE promoters. We have identified 98 such TE-gene chimeric transcripts that were exclusively expressed in primary DLBCL cases and confirmed several in DLBCL-derived cell lines. We further characterized a TE-gene chimeric transcript involving a fatty acid-binding protein gene (LTR2-FABP7), normally expressed in brain, that was ectopically expressed in a subset of DLBCL patients through the use of an endogenous retroviral LTR promoter of the LTR2 family. The LTR2-FABP7 chimeric transcript encodes a novel chimeric isoform of the protein with characteristics distinct from native FABP7. In vitro studies reveal a dependency for DLBCL cell line proliferation and growth on LTR2-FABP7 chimeric protein expression. Taken together, these data demonstrate the significance of TEs as regulators of aberrant gene expression in cancer and suggest that LTR2-FABP7 may contribute to the pathogenesis of DLBCL in a subgroup of patients.
Collapse
|
697
|
P2Y(12) receptor on the verge of a neuroinflammatory breakdown. Mediators Inflamm 2014; 2014:975849. [PMID: 25180027 PMCID: PMC4142314 DOI: 10.1155/2014/975849] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Revised: 06/25/2014] [Accepted: 06/26/2014] [Indexed: 12/22/2022] Open
Abstract
In the CNS, neuroinflammation occurring during pathologies as amyotrophic lateral sclerosis (ALS) and multiple sclerosis (MS) is the consequence of an intricate interplay orchestrated by various cell phenotypes. Among the molecular cues having a role in this process, extracellular nucleotides are responsible for intercellular communication and propagation of inflammatory stimuli. This occurs by binding to several receptor subtypes, defined P2X/P2Y, which are widespread in different tissues and simultaneously localized on multiple cells. For instance, the metabotropic P2Y12 subtype is found in the CNS on microglia, affecting activation and chemotaxis, on oligodendrocytes, possessing a hypothesized role in myelination, and on astrocytes. By comparative analysis, we have established here that P2Y12 receptor immunolabelled by antibodies against C-terminus or second intracellular loop, is, respectively, distributed and modulated under neuroinflammatory conditions on ramified microglia or myelinated fibers, in primary organotypic cerebellar cultures, tissue slices from rat striatum and cerebellum, spinal cord sections from symptomatic/end stage SOD1-G93A ALS mice, and finally autoptic cortical tissue from progressive MS donors. We suggest that modulation of P2Y12 expression might play a dual role as analytic marker of branched/surveillant microglia and demyelinating lesions, thus potentially acquiring a predictive value under neuroinflammatory conditions as those found in ALS and MS.
Collapse
|
698
|
Salamon A, Adam S, Rychly J, Peters K. Long-term tumor necrosis factor treatment induces NFκB activation and proliferation, but not osteoblastic differentiation of adipose tissue-derived mesenchymal stem cells in vitro. Int J Biochem Cell Biol 2014; 54:149-62. [PMID: 25066315 DOI: 10.1016/j.biocel.2014.07.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Revised: 07/16/2014] [Accepted: 07/17/2014] [Indexed: 01/08/2023]
Abstract
The pro-inflammatory cytokine tumor necrosis factor (TNF) is well known to induce differentiation of bone matrix-resorbing osteoclasts from hematopoietic stem cells. However, the impact of TNF on differentiation of bone matrix-forming osteoblasts from mesenchymal stem cells (MSC) was only fragmentarily studied so far. Therefore, we investigated what impact long-term TNF treatment has on osteoblastic differentiation of MSC isolated from the adipose tissue (ASC) in vitro. In summary, we found continuous TNF exposure to induce the nuclear factor of kappa B pathway in ASC as well as secretion of the pro-inflammatory chemokine interleukin 8, but not the mitogen-activated protein kinase and the apoptosis pathway in ASC. Moreover, TNF neither induced nor inhibited osteoblastic differentiation of ASC, but strongly increased their proliferation rate. In that manner, pro-inflammatory conditions in vivo may generate significantly increased numbers of progenitor cells, and ASC especially, in conjunction with external stimuli, may contribute to the events of ectopic ossification observed in chronic inflammatory diseases. The substantiation of the translation of our in vitro findings to the disease context encourages further in vivo studies.
Collapse
Affiliation(s)
- Achim Salamon
- Department of Cell Biology, Rostock University Medical Center, Schillingallee 69, D-18057 Rostock, Germany.
| | - Stefanie Adam
- Department of Cell Biology, Rostock University Medical Center, Schillingallee 69, D-18057 Rostock, Germany
| | - Joachim Rychly
- Department of Cell Biology, Rostock University Medical Center, Schillingallee 69, D-18057 Rostock, Germany
| | - Kirsten Peters
- Department of Cell Biology, Rostock University Medical Center, Schillingallee 69, D-18057 Rostock, Germany
| |
Collapse
|
699
|
Abstract
The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet. Here, we present a draft map of the human proteome using high resolution Fourier transform mass spectrometry. In-depth proteomic profiling of 30 histologically normal human samples including 17 adult tissues, 7 fetal tissues and 6 purified primary hematopoietic cells resulted in identification of proteins encoded by 17,294 genes accounting for ~84% of the total annotated protein-coding genes in humans. A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream ORFs. This large human proteome catalog (available as an interactive web-based resource at http://www.humanproteomemap.org) will complement available human genome and transcriptome data to accelerate biomedical research in health and disease.
Collapse
|
700
|
Hodgkinson KM, Vanderhyden BC. Consideration of GREB1 as a potential therapeutic target for hormone-responsive or endocrine-resistant cancers. Expert Opin Ther Targets 2014; 18:1065-76. [PMID: 24998469 DOI: 10.1517/14728222.2014.936382] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
INTRODUCTION Steroid hormones increase the incidence and promote the progression of many types of cancer. Exogenous estrogens increase the risk of developing breast, ovarian and endometrial cancer and many breast cancers initially respond to estrogen deprivation. Although steroid hormone signaling has been extensively studied, the mechanisms of hormone-stimulated cancer growth have not yet been fully elucidated, limiting opportunities for novel approaches to therapeutic intervention. AREAS COVERED This review examines growing evidence for the important role played by the steroid hormone-induced gene called GREB1, or growth regulation by estrogen in breast cancer 1. GREB1 is a critical mediator of both the estrogen-stimulated proliferation of breast cancer cells and the androgen-stimulated proliferation of prostate cancer cells. EXPERT OPINION Although its exact function in the cascade of hormone action remains unclear, the ability of GREB1 to modulate tumor progression in models of breast, ovarian and prostate cancer renders this gene an excellent candidate for further consideration as a potential therapeutic target. Research examining the mechanism of GREB1 action will help to elucidate its role in proliferation and its potential contribution to endocrine resistance and will determine whether GREB1 interference may have therapeutic efficacy.
Collapse
Affiliation(s)
- Kendra M Hodgkinson
- Ottawa Hospital Research Institute, Centre for Cancer Therapeutics , 501 Smyth Road, 3rd Floor, Box 926, Ottawa, Ontario K1H 8L6 , Canada
| | | |
Collapse
|