1
|
Norreen-Thorsen M, Struck EC, Öling S, Zwahlen M, Von Feilitzen K, Odeberg J, Lindskog C, Pontén F, Uhlén M, Dusart PJ, Butler LM. A human adipose tissue cell-type transcriptome atlas. Cell Rep 2022; 40:111046. [PMID: 35830816 DOI: 10.1016/j.celrep.2022.111046] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 04/29/2022] [Accepted: 06/13/2022] [Indexed: 12/19/2022] Open
Abstract
The importance of defining cell-type-specific genes is well acknowledged. Technological advances facilitate high-resolution sequencing of single cells, but practical challenges remain. Adipose tissue is composed primarily of adipocytes, large buoyant cells requiring extensive, artefact-generating processing for separation and analysis. Thus, adipocyte data are frequently absent from single-cell RNA sequencing (scRNA-seq) datasets, despite being the primary functional cell type. Here, we decipher cell-type-enriched transcriptomes from unfractionated human adipose tissue RNA-seq data. We profile all major constituent cell types, using 527 visceral adipose tissue (VAT) or 646 subcutaneous adipose tissue (SAT) samples, identifying over 2,300 cell-type-enriched transcripts. Sex-subset analysis uncovers a panel of male-only cell-type-enriched genes. By resolving expression profiles of genes differentially expressed between SAT and VAT, we identify mesothelial cells as the primary driver of this variation. This study provides an accessible method to profile cell-type-enriched transcriptomes using bulk RNA-seq, generating a roadmap for adipose tissue biology.
Collapse
|
2
|
Stefen C, Wagner F, Asztalos M, Giere P, Grobe P, Hiller M, Hofmann R, Jähde M, Lächele U, Lehmann T, Ortmann S, Peters B, Ruf I, Schiffmann C, Thier N, Unterhitzenberger G, Vogt L, Rudolf M, Wehner P, Stuckas H. Phenotyping in the era of genomics: MaTrics—a digital character matrix to document mammalian phenotypic traits. Mamm Biol. [DOI: 10.1007/s42991-021-00192-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
AbstractA new and uniquely structured matrix of mammalian phenotypes, MaTrics (Mammalian Traits for Comparative Genomics) in a digital form is presented. By focussing on mammalian species for which genome assemblies are available, MaTrics provides an interface between mammalogy and comparative genomics.MaTrics was developed within a project aimed to find genetic causes of phenotypic traits of mammals using Forward Genomics. This approach requires genomes and comprehensive and recorded information on homologous phenotypes that are coded as discrete categories in a matrix. MaTrics is an evolving online resource providing information on phenotypic traits in numeric code; traits are coded either as absent/present or with several states as multistate. The state record for each species is linked to at least one reference (e.g., literature, photographs, histological sections, CT scans, or museum specimens) and so MaTrics contributes to digitalization of museum collections. Currently, MaTrics covers 147 mammalian species and includes 231 characters related to structure, morphology, physiology, ecology, and ethology and available in a machine actionable NEXUS-format*. Filling MaTrics revealed substantial knowledge gaps, highlighting the need for phenotyping efforts. Studies based on selected data from MaTrics and using Forward Genomics identified associations between genes and certain phenotypes ranging from lifestyles (e.g., aquatic) to dietary specializations (e.g., herbivory, carnivory). These findings motivate the expansion of phenotyping in MaTrics by filling research gaps and by adding taxa and traits. Only databases like MaTrics will provide machine actionable information on phenotypic traits, an important limitation to genomics. MaTrics is available within the data repository Morph·D·Base (www.morphdbase.de).
Collapse
|
3
|
Zhan C, Zhang Y, Liu X, Wu R, Zhang K, Shi W, Shen L, Shen K, Fan X, Ye F, Shen B. MIKB: A manually curated and comprehensive knowledge base for myocardial infarction. Comput Struct Biotechnol J 2021; 19:6098-6107. [PMID: 34900127 PMCID: PMC8626632 DOI: 10.1016/j.csbj.2021.11.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 11/11/2021] [Accepted: 11/11/2021] [Indexed: 02/08/2023] Open
Abstract
Myocardial infarction knowledge base (MIKB; http://www.sysbio.org.cn/mikb/; latest update: December 31, 2020) is an open-access and manually curated database dedicated to integrating knowledge about MI to improve the efficiency of translational MI research. MIKB is an updated and expanded version of our previous MI Risk Knowledge Base (MIRKB), which integrated MI-related risk factors and risk models for providing help in risk assessment or diagnostic prediction of MI. The updated MIRKB includes 9701 records with 2054 single factors, 209 combined factors, 243 risk models, 37 MI subtypes and 3406 interactions between single factors and MIs collected from 4817 research articles. The expanded functional module, i.e. MIGD, is a database including not only MI associated genetic variants, but also the other multi-omics factors and the annotations for their functional alterations. The goal of MIGD is to provide a multi-omics level understanding of the molecular pathogenesis of MI. MIGD includes 1782 omics factors, 28 MI subtypes and 2347 omics factor-MI interactions as well as 1253 genes and 6 chromosomal alterations collected from 2647 research articles. The functions of MI associated genes and their interaction with drugs were analyzed. MIKB will be continuously updated and optimized to provide precision and comprehensive knowledge for the study of heterogeneous and personalized MI.
Collapse
Affiliation(s)
- Chaoying Zhan
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| | - Yingbo Zhang
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
- Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China
| | - Xingyun Liu
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| | - Rongrong Wu
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| | - Ke Zhang
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| | - Wenjing Shi
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| | - Li Shen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| | - Ke Shen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| | - Xuemeng Fan
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| | - Fei Ye
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| | - Bairong Shen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Sichuan 610212, China
| |
Collapse
|
4
|
Duan Y, Zhang W, Cheng Y, Shi M, Xia XQ. A systematic evaluation of bioinformatics tools for identification of long noncoding RNAs. RNA 2021; 27:80-98. [PMID: 33055239 PMCID: PMC7749630 DOI: 10.1261/rna.074724.120] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 10/07/2020] [Indexed: 06/11/2023]
Abstract
High-throughput RNA sequencing unveiled the complexity of transcriptome and significantly increased the records of long noncoding RNAs (lncRNAs), which were reported to participate in a variety of biological processes. Identification of lncRNAs is a key step in lncRNA analysis, and a bunch of bioinformatics tools have been developed for this purpose in recent years. While these tools allow us to identify lncRNA more efficiently and accurately, they may produce inconsistent results, making selection a confusing issue. We compared the performance of 41 analysis models based on 14 software packages and different data sets, including high-quality data and low-quality data from 33 species. In addition, computational efficiency, robustness, and joint prediction of the models were explored. As a practical guidance, key points for lncRNA identification under different situations were summarized. In this investigation, no one of these models could be superior to others under all test conditions. The performance of a model relied to a great extent on the source of transcripts and the quality of assemblies. As general references, FEELnc_all_cl, CPC, and CPAT_mouse work well in most species while COME, CNCI, and lncScore are good choices for model organisms. Since these tools are sensitive to different factors such as the species involved and the quality of assembly, researchers must carefully select the appropriate tool based on the actual data. Alternatively, our test suggests that joint prediction could behave better than any single model if proper models were chosen. All scripts/data used in this research can be accessed at http://bioinfo.ihb.ac.cn/elit.
Collapse
Affiliation(s)
- You Duan
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wanting Zhang
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Yingyin Cheng
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Mijuan Shi
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiao-Qin Xia
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
5
|
Moyer AJ, Gardiner K, Reeves RH. All Creatures Great and Small: New Approaches for Understanding Down Syndrome Genetics. Trends Genet 2020; 37:444-459. [PMID: 33097276 DOI: 10.1016/j.tig.2020.09.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 09/17/2020] [Accepted: 09/22/2020] [Indexed: 12/26/2022]
Abstract
Human chromosome 21 (Hsa21) contains more than 500 genes, making trisomy 21 one of the most complex genetic perturbations compatible with life. The ultimate goal of Down syndrome (DS) research is to design therapies that improve quality of life for individuals with DS by understanding which subsets of Hsa21 genes contribute to DS-associated phenotypes throughout the lifetime. However, the complexity of DS pathogenesis has made developing appropriate animal models an ongoing challenge. Here, we examine lessons learned from a variety of model systems, including yeast, nematode, fruit fly, and zebrafish, and discuss emerging methods for creating murine models that better reflect the genetic basis of trisomy 21.
Collapse
Affiliation(s)
- Anna J Moyer
- Department of Genetic Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD, USA; Department of Physiology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Katheleen Gardiner
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, USA (retired)
| | - Roger H Reeves
- Department of Genetic Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD, USA; Department of Physiology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
6
|
Sulakhe D, D'Souza M, Wang S, Balasubramanian S, Athri P, Xie B, Canzar S, Agam G, Gilliam TC, Maltsev N. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources. Brief Bioinform 2020; 20:1754-1768. [PMID: 29931155 DOI: 10.1093/bib/bby047] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/02/2018] [Indexed: 12/30/2022] Open
Abstract
In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Genentech, Inc. 1 DNA Way, Mail Stop: 35-6J, South San Francisco, CA, USA
| | - Prashanth Athri
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Kasavanahalli, Carmelaram P.O., Bengaluru, Karnataka, India
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - Stefan Canzar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| |
Collapse
|
7
|
Pujar S, O'Leary NA, Farrell CM, Loveland JE, Mudge JM, Wallin C, Girón CG, Diekhans M, Barnes I, Bennett R, Berry AE, Cox E, Davidson C, Goldfarb T, Gonzalez JM, Hunt T, Jackson J, Joardar V, Kay MP, Kodali VK, Martin FJ, McAndrews M, McGarvey KM, Murphy M, Rajput B, Rangwala SH, Riddick LD, Seal RL, Suner MM, Webb D, Zhu S, Aken BL, Bruford EA, Bult CJ, Frankish A, Murphy T, Pruitt KD. Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucleic Acids Res 2019; 46:D221-D228. [PMID: 29126148 PMCID: PMC5753299 DOI: 10.1093/nar/gkx1031] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 10/20/2017] [Indexed: 01/29/2023] Open
Abstract
The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.
Collapse
Affiliation(s)
- Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Nuala A O'Leary
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Catherine M Farrell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Craig Wallin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Carlos G Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Diekhans
- University of California Santa Cruz Genomics Institute, Santa Cruz, CA 95064, USA
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew E Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eric Cox
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tamara Goldfarb
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Jose M Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John Jackson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Vinita Joardar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Mike P Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vamsi K Kodali
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Monica McAndrews
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Kelly M McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Michael Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Bhanu Rajput
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Sanjida H Rangwala
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Lillian D Riddick
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Ruth L Seal
- HUGO Gene Nomenclature Committee, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Webb
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Sophia Zhu
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Bronwen L Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Elspeth A Bruford
- HUGO Gene Nomenclature Committee, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carol J Bult
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Terence Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
8
|
Mittempergher L, Delahaye LJMJ, Witteveen AT, Spangler JB, Hassenmahomed F, Mee S, Mahmoudi S, Chen J, Bao S, Snel MHJ, Leidelmeijer S, Besseling N, Bergstrom Lucas A, Pabón-Peña C, Linn SC, Dreezen C, Wehkamp D, Chan BY, Bernards R, van 't Veer LJ, Glas AM. MammaPrint and BluePrint Molecular Diagnostics Using Targeted RNA Next-Generation Sequencing Technology. J Mol Diagn 2019; 21:808-823. [PMID: 31173928 DOI: 10.1016/j.jmoldx.2019.04.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/21/2019] [Accepted: 04/16/2019] [Indexed: 01/31/2023] Open
Abstract
Next-generation DNA sequencing is rapidly becoming an indispensable tool for genome-directed cancer diagnostics, but next-generation RNA sequencing (RNA-seq) is currently not standardly used in clinical diagnostics for expression assessment. However, multigene RNA diagnostic assays are used increasingly in the routine diagnosis of early-stage breast cancer. Two of the most widely used tests are currently available only as a central laboratory service, which limits their clinical use. We evaluated the use of RNA-seq as a decentralized method to perform such tests. The MammaPrint and BluePrint RNA-seq tests were found to be equivalent to the clinically validated microarray tests. The RNA-seq tests were highly reproducible when performed in different locations and were stable over time. The MammaPrint RNA-seq test was clinically validated. Our data demonstrate that RNA-seq can be used as a decentralized platform, yielding results substantially equivalent to results derived from the predicate diagnostic device.
Collapse
Affiliation(s)
| | | | - Anke T Witteveen
- Research and Development, Agendia NV, Amsterdam, the Netherlands
| | | | | | - Sammy Mee
- Product Support, Agendia Inc., Irvine, California
| | | | - Jiang Chen
- Product Support, Agendia Inc., Irvine, California
| | - Simon Bao
- Product Support, Agendia Inc., Irvine, California
| | | | | | - Naomi Besseling
- Research and Development, Agendia NV, Amsterdam, the Netherlands
| | | | - Carlos Pabón-Peña
- Diagnostics and Genomics Group, Agilent Technologies, Santa Clara, California
| | - Sabine C Linn
- Division of Molecular Pathology and Medical Oncology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Christa Dreezen
- Research and Development, Agendia NV, Amsterdam, the Netherlands
| | - Diederik Wehkamp
- Research and Development, Agendia NV, Amsterdam, the Netherlands
| | - Bob Y Chan
- Product Support, Agendia Inc., Irvine, California
| | - René Bernards
- Research and Development, Agendia NV, Amsterdam, the Netherlands; Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Laura J van 't Veer
- Research and Development, Agendia NV, Amsterdam, the Netherlands; Department of Laboratory Medicine, University of California, San Francisco, San Francisco, California
| | - Annuska M Glas
- Research and Development, Agendia NV, Amsterdam, the Netherlands.
| |
Collapse
|
9
|
Manzoni C, Kia DA, Vandrovcova J, Hardy J, Wood NW, Lewis PA, Ferrari R. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief Bioinform 2019; 19:286-302. [PMID: 27881428 PMCID: PMC6018996 DOI: 10.1093/bib/bbw114] [Citation(s) in RCA: 352] [Impact Index Per Article: 70.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Indexed: 02/07/2023] Open
Abstract
Advances in the technologies and informatics used to generate and process large biological data sets (omics data) are promoting a critical shift in the study of biomedical sciences. While genomics, transcriptomics and proteinomics, coupled with bioinformatics and biostatistics, are gaining momentum, they are still, for the most part, assessed individually with distinct approaches generating monothematic rather than integrated knowledge. As other areas of biomedical sciences, including metabolomics, epigenomics and pharmacogenomics, are moving towards the omics scale, we are witnessing the rise of inter-disciplinary data integration strategies to support a better understanding of biological systems and eventually the development of successful precision medicine. This review cuts across the boundaries between genomics, transcriptomics and proteomics, summarizing how omics data are generated, analysed and shared, and provides an overview of the current strengths and weaknesses of this global approach. This work intends to target students and researchers seeking knowledge outside of their field of expertise and fosters a leap from the reductionist to the global-integrative analytical approach in research.
Collapse
Affiliation(s)
- Claudia Manzoni
- School of Pharmacy, University of Reading, Whiteknights, Reading, United Kingdom.,Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Demis A Kia
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Jana Vandrovcova
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - John Hardy
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Nicholas W Wood
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Patrick A Lewis
- School of Pharmacy, University of Reading, Whiteknights, Reading, United Kingdom.,Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Raffaele Ferrari
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| |
Collapse
|
10
|
Laulederkind SJF, Hayman GT, Wang SJ, Hoffman MJ, Smith JR, Bolton ER, De Pons J, Tutaj MA, Tutaj M, Thota J, Dwinell MR, Shimoyama M. Rat Genome Databases, Repositories, and Tools. Methods Mol Biol 2019; 2018:71-96. [PMID: 31228152 DOI: 10.1007/978-1-4939-9581-3_3] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Resources for rat researchers are extensive, including strain repositories and databases all around the world. The Rat Genome Database (RGD) serves as the primary rat data repository, providing both manual and computationally collected data from other databases.
Collapse
|
11
|
Lowe JWE. Sequencing through thick and thin: Historiographical and philosophical implications. Stud Hist Philos Biol Biomed Sci 2018; 72:10-27. [PMID: 30337139 DOI: 10.1016/j.shpsc.2018.10.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 07/11/2018] [Accepted: 10/01/2018] [Indexed: 06/08/2023]
Abstract
DNA sequencing has been characterised by scholars and life scientists as an example of 'big', 'fast' and 'automated' science in biology. This paper argues, however, that these characterisations are a product of a particular interpretation of what sequencing is, what I call 'thin sequencing'. The 'thin sequencing' perspective focuses on the determination of the order of bases in a particular stretch of DNA. Based upon my research on the pig genome mapping and sequencing projects, I provide an alternative 'thick sequencing' perspective, which also includes a number of practices that enable the sequence to travel across and be used in wider communities. If we take sequencing in the thin manner to be an event demarcated by the determination of sequences in automated sequencing machines and computers, this has consequences for the historical analysis of sequencing projects, as it focuses attention on those parts of the work of sequencing that are more centralised, fast (and accelerating) and automated. I argue instead that sequencing can be interpreted as a more open-ended process including activities such as the generation of a minimum tile path or annotation, and detail the historiographical and philosophical consequences of this move.
Collapse
Affiliation(s)
- James W E Lowe
- Science, Technology and Innovation Studies, University of Edinburgh, Old Surgeons' Hall, High School Yards, Edinburgh, EH1 1LZ, UK.
| |
Collapse
|
12
|
Abstract
It is accepted that confusion regarding the description of genetic variants occurs when researchers do not use standard nomenclature. The Human Genome Organization Gene Nomenclature Committee contacted a panel of consultants, all working on the KAL1 gene, to propose an update of the nomenclature of the gene, as there was a convention in the literature of using the ‘KAL1’ symbol, when referring to the gene, but using the name ‘anosmin-1’ when referring to the protein. The new name, ANOS1, reflects protein name and is more transferrable across species.
Collapse
|
13
|
Naderi A. SRARP and HSPB7 are epigenetically regulated gene pairs that function as tumor suppressors and predict clinical outcome in malignancies. Mol Oncol 2018; 12:724-755. [PMID: 29577611 PMCID: PMC5928383 DOI: 10.1002/1878-0261.12195] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 02/27/2018] [Accepted: 03/10/2018] [Indexed: 12/16/2022] Open
Abstract
Deletions of chromosome 1p36 are common in cancers; however, despite extensive studies, there has been limited success for discovering candidate tumor suppressors in this region. SRARP has recently been identified as a novel corepressor of the androgen receptor (AR) and is located on chromosome 1p36. Here, bioinformatics analysis of large tumor datasets was performed to study SRARP and its gene pair, HSPB7. In addition, using cancer cell lines, mechanisms of SRARP and HSPB7 regulation and their molecular functions were investigated. This study demonstrated that SRARP and HSPB7 are a gene pair located 5.2 kb apart on 1p36.13 and are inactivated by deletions and epigenetic silencing in malignancies. Importantly, SRARP and HSPB7 have tumor suppressor functions in clonogenicity and cell viability associated with the downregulation of Akt and ERK. SRARP expression is inversely correlated with genes that promote cell proliferation and signal transduction, which supports its functions as a tumor suppressor. In addition, AR exerts dual regulatory effects on SRARP, and although an increased AR activity suppresses SRARP transcription, a minimum level of AR activity is required to maintain baseline SRARP expression in AR+ cancer cells. Furthermore, as observed with SRARP, HSPB7 interacts with the 14-3-3 protein, presenting a shared molecular feature between SRARP and HSPB7. Of note, genome- and epigenome-wide associations of SRARP and HSPB7 with survival strongly support their tumor suppressor functions. In particular, DNA hypermethylation, lower expression, somatic mutations, and lower copy numbers of SRARP are associated with worse cancer outcome. Moreover, DNA hypermethylation and lower expression of SRARP in normal adjacent tissues predict poor survival, suggesting that SRARP inactivation is an early event in carcinogenesis. In summary, SRARP and HSPB7 are tumor suppressors that are commonly inactivated in malignancies. SRARP inactivation is an early event in carcinogenesis that is strongly associated with worse survival, presenting potential translational applications.
Collapse
Affiliation(s)
- Ali Naderi
- Cancer Biology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| |
Collapse
|
14
|
Eppig JT. Mouse Genome Informatics (MGI) Resource: Genetic, Genomic, and Biological Knowledgebase for the Laboratory Mouse. ILAR J 2017; 58:17-41. [PMID: 28838066 PMCID: PMC5886341 DOI: 10.1093/ilar/ilx013] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Revised: 03/14/2017] [Accepted: 03/28/2017] [Indexed: 12/13/2022] Open
Abstract
The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided.
Collapse
Affiliation(s)
- Janan T. Eppig
- Janan T. Eppig, PhD, is Professor Emeritus at The Jackson Laboratory in Bar Harbor, Maine
| |
Collapse
|
15
|
Dozmorov MG, Coit P, Maksimowicz-McKinnon K, Sawalha AH. Age-associated DNA methylation changes in naive CD4 + T cells suggest an evolving autoimmune epigenotype in aging T cells. Epigenomics 2017; 9:429-445. [PMID: 28322571 DOI: 10.2217/epi-2016-0143] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
AIM We sought to define age-associated DNA methylation changes in naive CD4+ T cells. MATERIALS & METHODS Naive CD4+ T cells were collected from 74 healthy individuals (age 19-66 years), and age-related DNA methylation changes were characterized. RESULTS We identified 11,431 age-associated CpG sites, 57% of which were hypermethylated with age. Hypermethylated sites were enriched in CpG islands and repressive transcription factor binding sites, while hypomethylated sites showed T cell specific enrichment in active enhancers marked by H3K27ac and H3K4me1. Our data emphasize cancer-related DNA methylation changes with age, and also reveal age-associated hypomethylation in immune-related pathways, such as T cell receptor signaling, FCγR-mediated phagocytosis, apoptosis and the mammalian target of rapamycin signaling pathway. The MAPK signaling pathway was hypermethylated with age, consistent with a defective MAPK signaling in aging T cells. CONCLUSION Age-associated DNA methylation changes may alter regulatory mechanisms and signaling pathways that predispose to autoimmunity.
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Patrick Coit
- Division of Rheumatology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | | | - Amr H Sawalha
- Division of Rheumatology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
16
|
Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, Billis K, Carvalho-Silva D, Cummins C, Clapham P, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Juettemann T, Keenan S, Laird MR, Lavidas I, Maurel T, McLaren W, Moore B, Murphy DN, Nag R, Newman V, Nuhn M, Ong CK, Parker A, Patricio M, Riat HS, Sheppard D, Sparrow H, Taylor K, Thormann A, Vullo A, Walts B, Wilder SP, Zadissa A, Kostadima M, Martin FJ, Muffato M, Perry E, Ruffier M, Staines DM, Trevanion SJ, Cunningham F, Yates A, Zerbino DR, Flicek P. Ensembl 2017. Nucleic Acids Res 2016; 45:D635-D642. [PMID: 27899575 PMCID: PMC5210575 DOI: 10.1093/nar/gkw1104] [Citation(s) in RCA: 409] [Impact Index Per Article: 51.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Revised: 10/25/2016] [Accepted: 10/28/2016] [Indexed: 12/12/2022] Open
Abstract
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Collapse
Affiliation(s)
- Bronwen L Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Premanand Achuthan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Wasiu Akanni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Friederike Bernsdorff
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Denise Carvalho-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Clapham
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laurent Gil
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sophie H Janacek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Juettemann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen Keenan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew R Laird
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ilias Lavidas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Maurel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - William McLaren
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel N Murphy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Victoria Newman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michael Nuhn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Chuang Kee Ong
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Harpreet Singh Riat
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Sparrow
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anja Thormann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alessandro Vullo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Brandon Walts
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Steven P Wilder
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Amonida Zadissa
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Myrto Kostadima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel M Staines
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel R Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK .,Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
17
|
Abstract
![]()
Proteogenomics leverages information
derived from proteomic data
to improve genome annotations. Of particular interest are “novel”
peptides that provide direct evidence of protein expression for genomic
regions not previously annotated as protein-coding. We present a modular,
automated data analysis pipeline aimed at detecting such “novel”
peptides in proteomic data sets. This pipeline implements criteria
developed by proteomics and genome annotation experts for high-stringency
peptide identification and filtering. Our pipeline is based on the
OpenMS computational framework; it incorporates multiple database
search engines for peptide identification and applies a machine-learning
approach (Percolator) to post-process search results. We describe
several new and improved software tools that we developed to facilitate
proteogenomic analyses that enhance the wealth of tools provided by
OpenMS. We demonstrate the application of our pipeline to a human
testis tissue data set previously acquired for the Chromosome-Centric
Human Proteome Project, which led to the addition of five new gene
annotations on the human reference genome.
Collapse
Affiliation(s)
| | | | | | - Petra Gutenbrunner
- School of Informatics, Communications, and Media, University of Applied Sciences Upper Austria , Hagenberg 4232, Austria
| | | |
Collapse
|
18
|
Yates B, Braschi B, Gray KA, Seal RL, Tweedie S, Bruford EA. Genenames.org: the HGNC and VGNC resources in 2017. Nucleic Acids Res 2016; 45:D619-D625. [PMID: 27799471 PMCID: PMC5210531 DOI: 10.1093/nar/gkw1033] [Citation(s) in RCA: 235] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Revised: 10/18/2016] [Accepted: 10/20/2016] [Indexed: 12/02/2022] Open
Abstract
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. Currently the HGNC database contains almost 40 000 approved gene symbols, over 19 000 of which represent protein-coding genes. In addition to naming genomic loci we manually curate genes into family sets based on shared characteristics such as homology, function or phenotype. We have recently updated our gene family resources and introduced new improved visualizations which can be seen alongside our gene symbol reports on our primary website http://www.genenames.org. In 2016 we expanded our remit and formed the Vertebrate Gene Nomenclature Committee (VGNC) which is responsible for assigning names to vertebrate species lacking a dedicated nomenclature group. Using the chimpanzee genome as a pilot project we have approved symbols and names for over 14 500 protein-coding genes in chimpanzee, and have developed a new website http://vertebrate.genenames.org to distribute these data. Here, we review our online data and resources, focusing particularly on the improvements and new developments made during the last two years.
Collapse
Affiliation(s)
- Bethan Yates
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Bryony Braschi
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Kristian A Gray
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ruth L Seal
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Susan Tweedie
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Elspeth A Bruford
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
19
|
Deutsch EW, Sun Z, Campbell DS, Binz PA, Farrah T, Shteynberg D, Mendoza L, Omenn GS, Moritz RL. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics. J Proteome Res 2016; 15:4091-4100. [PMID: 27577934 DOI: 10.1021/acs.jproteome.6b00445] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Zhi Sun
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - David S Campbell
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Pierre-Alain Binz
- CHUV Centre Universitaire Hospitalier Vaudois , 1011 Lausanne, Switzerland
| | - Terry Farrah
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - David Shteynberg
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology , Seattle, Washington 98109, United States
| | - Gilbert S Omenn
- Institute for Systems Biology , Seattle, Washington 98109, United States.,Departments of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics and School of Public Health, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Robert L Moritz
- Institute for Systems Biology , Seattle, Washington 98109, United States
| |
Collapse
|
20
|
Gupta M, Dhanasekaran AR, Gardiner KJ. Mouse models of Down syndrome: gene content and consequences. Mamm Genome 2016; 27:538-55. [PMID: 27538963 DOI: 10.1007/s00335-016-9661-8] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 07/27/2016] [Indexed: 12/25/2022]
Abstract
Down syndrome (DS), trisomy of human chromosome 21 (Hsa21), is challenging to model in mice. Not only is it a contiguous gene syndrome spanning 35 Mb of the long arm of Hsa21, but orthologs of Hsa21 genes map to segments of three mouse chromosomes, Mmu16, Mmu17, and Mmu10. The Ts65Dn was the first viable segmental trisomy mouse model for DS; it is a partial trisomy currently popular in preclinical evaluations of drugs for cognition in DS. Limitations of the Ts65Dn are as follows: (i) it is trisomic for 125 human protein-coding orthologs, but only 90 of these are Hsa21 orthologs and (ii) it lacks trisomy for ~75 Hsa21 orthologs. In recent years, several additional mouse models of DS have been generated, each trisomic for a different subset of Hsa21 genes or their orthologs. To best exploit these models and interpret the results obtained with them, prior to proposing clinical trials, an understanding of their trisomic gene content, relative to full trisomy 21, is necessary. Here we first review the functional information on Hsa21 protein-coding genes and the more recent annotation of a large number of functional RNA genes. We then discuss the conservation and genomic distribution of Hsa21 orthologs in the mouse genome and the distribution of mouse-specific genes. Lastly, we consider the strengths and weaknesses of mouse models of DS based on the number and nature of the Hsa21 orthologs that are, and are not, trisomic in each, and discuss their validity for use in preclinical evaluations of drug responses.
Collapse
|
21
|
Abstract
High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.
Collapse
Affiliation(s)
- Stefanie Friedrichs
- Department of Genetic Epidemiology, University Medical Center, Georg-August University Göttingen, Göttingen, Germany.
| | - Dörthe Malzahn
- Department of Genetic Epidemiology, University Medical Center, Georg-August University Göttingen, Göttingen, Germany.
| | - Elizabeth W Pugh
- Center for Inherited Disease Research, Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| | - Marcio Almeida
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, TX, USA.
| | - Xiao Qing Liu
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Department of Biochemistry and Medical Genetics, Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada.
- Children's Hospital Research Institute of Manitoba, Winnipeg, MB, Canada.
| | - Julia N Bailey
- Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, USA.
- Epilepsy Genetics/Genomics Laboratory, West Los Angeles Veterans Administration, Los Angeles, CA, USA.
| |
Collapse
|
22
|
Mouilleron H, Delcourt V, Roucou X. Death of a dogma: eukaryotic mRNAs can code for more than one protein. Nucleic Acids Res 2016; 44:14-23. [PMID: 26578573 PMCID: PMC4705651 DOI: 10.1093/nar/gkv1218] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Revised: 10/26/2015] [Accepted: 10/28/2015] [Indexed: 12/13/2022] Open
Abstract
mRNAs carry the genetic information that is translated by ribosomes. The traditional view of a mature eukaryotic mRNA is a molecule with three main regions, the 5' UTR, the protein coding open reading frame (ORF) or coding sequence (CDS), and the 3' UTR. This concept assumes that ribosomes translate one ORF only, generally the longest one, and produce one protein. As a result, in the early days of genomics and bioinformatics, one CDS was associated with each protein-coding gene. This fundamental concept of a single CDS is being challenged by increasing experimental evidence indicating that annotated proteins are not the only proteins translated from mRNAs. In particular, mass spectrometry (MS)-based proteomics and ribosome profiling have detected productive translation of alternative open reading frames. In several cases, the alternative and annotated proteins interact. Thus, the expression of two or more proteins translated from the same mRNA may offer a mechanism to ensure the co-expression of proteins which have functional interactions. Translational mechanisms already described in eukaryotic cells indicate that the cellular machinery is able to translate different CDSs from a single viral or cellular mRNA. In addition to summarizing data showing that the protein coding potential of eukaryotic mRNAs has been underestimated, this review aims to challenge the single translated CDS dogma.
Collapse
Affiliation(s)
- Hélène Mouilleron
- Department of biochemistry, Université de Sherbrooke, Sherbrooke, Quebec J1E 4K8, Canada PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec, Canada
| | - Vivian Delcourt
- Department of biochemistry, Université de Sherbrooke, Sherbrooke, Quebec J1E 4K8, Canada PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec, Canada Inserm U-1192, Laboratoire de Protéomique, Réponse Inflammatoire, Spectrométrie de Masse (PRISM), Université de Lille 1, Cité Scientifique, 59655 Villeneuve D'Ascq, France
| | - Xavier Roucou
- Department of biochemistry, Université de Sherbrooke, Sherbrooke, Quebec J1E 4K8, Canada PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec, Canada
| |
Collapse
|
23
|
Cook CE, Bergman MT, Finn RD, Cochrane G, Birney E, Apweiler R. The European Bioinformatics Institute in 2016: Data growth and integration. Nucleic Acids Res 2015; 44:D20-6. [PMID: 26673705 PMCID: PMC4702932 DOI: 10.1093/nar/gkv1352] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 11/18/2015] [Indexed: 11/17/2022] Open
Abstract
New technologies are revolutionising biological research and its applications by making it easier and cheaper to generate ever-greater volumes and types of data. In response, the services and infrastructure of the European Bioinformatics Institute (EMBL-EBI, www.ebi.ac.uk) are continually expanding: total disk capacity increases significantly every year to keep pace with demand (75 petabytes as of December 2015), and interoperability between resources remains a strategic priority. Since 2014 we have launched two new resources: the European Variation Archive for genetic variation data and EMPIAR for two-dimensional electron microscopy data, as well as a Resource Description Framework platform. We also launched the Embassy Cloud service, which allows users to run large analyses in a virtual environment next to EMBL-EBI's vast public data resources.
Collapse
Affiliation(s)
- Charles E Cook
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mary Todd Bergman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rolf Apweiler
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
24
|
Abstract
A report on the Wellcome Trust retreat on devising a consensus framework for the validation of novel human protein coding loci, held in Hinxton, U.K., May 11-13, 2015.
Collapse
Affiliation(s)
- Elspeth A Bruford
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics and University of Geneva, Faculty of Medicine, CMU, Michel Servet 1, 1211 Geneva 4, Switzerland
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute , Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom
| |
Collapse
|
25
|
Wilming LG, Boychenko V, Harrow JL. Comprehensive comparative homeobox gene annotation in human and mouse. Database (Oxford) 2015; 2015:bav091. [PMID: 26412852 PMCID: PMC4584094 DOI: 10.1093/database/bav091] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 08/31/2015] [Indexed: 11/14/2022]
Abstract
Homeobox genes are a group of genes coding for transcription factors with a DNA-binding helix-turn-helix structure called a homeodomain and which play a crucial role in pattern formation during embryogenesis. Many homeobox genes are located in clusters and some of these, most notably the HOX genes, are known to have antisense or opposite strand long non-coding RNA (lncRNA) genes that play a regulatory role. Because automated annotation of both gene clusters and non-coding genes is fraught with difficulty (over-prediction, under-prediction, inaccurate transcript structures), we set out to manually annotate all homeobox genes in the mouse and human genomes. This includes all supported splice variants, pseudogenes and both antisense and flanking lncRNAs. One of the areas where manual annotation has a significant advantage is the annotation of duplicated gene clusters. After comprehensive annotation of all homeobox genes and their antisense genes in human and in mouse, we found some discrepancies with the current gene set in RefSeq regarding exact gene structures and coding versus pseudogene locus biotype. We also identified previously un-annotated pseudogenes in the DUX, Rhox and Obox gene clusters, which helped us re-evaluate and update the gene nomenclature in these regions. We found that human homeobox genes are enriched in antisense lncRNA loci, some of which are known to play a role in gene or gene cluster regulation, compared to their mouse orthologues. Of the annotated set of 241 human protein-coding homeobox genes, 98 have an antisense locus (41%) while of the 277 orthologous mouse genes, only 62 protein coding gene have an antisense locus (22%), based on publicly available transcriptional evidence.
Collapse
Affiliation(s)
- Laurens G Wilming
- HAVANA Group, Informatics Department, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Veronika Boychenko
- HAVANA Group, Informatics Department, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Jennifer L Harrow
- HAVANA Group, Informatics Department, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| |
Collapse
|
26
|
Abstract
Annotation on the reference genome of the C57BL6/J mouse has been an ongoing project ever since the draft genome was first published. Initially, the principle focus was on the identification of all protein-coding genes, although today the importance of describing long non-coding RNAs, small RNAs, and pseudogenes is recognized. Here, we describe the progress of the GENCODE mouse annotation project, which combines manual annotation from the HAVANA group with Ensembl computational annotation, alongside experimental and in silico validation pipelines from other members of the consortium. We discuss the more recent incorporation of next-generation sequencing datasets into this workflow, including the usage of mass-spectrometry data to potentially identify novel protein-coding genes. Finally, we will outline how the C57BL6/J genebuild can be used to gain insights into the variant sites that distinguish different mouse strains and species.
Collapse
|
27
|
Matthews BB, Dos Santos G, Crosby MA, Emmert DB, St Pierre SE, Gramates LS, Zhou P, Schroeder AJ, Falls K, Strelets V, Russo SM, Gelbart WM; FlyBase Consortium. Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data. G3 (Bethesda) 2015; 5:1721-36. [PMID: 26109357 DOI: 10.1534/g3.115.018929] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3′ UTRs (up to 15–18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts.
Collapse
|
28
|
Frankish A, Uszczynska B, Ritchie GRS, Gonzalez JM, Pervouchine D, Petryszak R, Mudge JM, Fonseca N, Brazma A, Guigo R, Harrow J. Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction. BMC Genomics 2015; 16 Suppl 8:S2. [PMID: 26110515 PMCID: PMC4502323 DOI: 10.1186/1471-2164-16-s8-s2] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Background A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based. Results We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most ~30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome. Conclusions The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.
Collapse
|
29
|
Motta VN, Markle JGM, Gulban O, Mortin-Toth S, Liao KC, Mogridge J, Steward CA, Danska JS. Identification of the inflammasome Nlrp1b as the candidate gene conferring diabetes risk at the Idd4.1 locus in the nonobese diabetic mouse. J Immunol 2015; 194:5663-73. [PMID: 25964492 DOI: 10.4049/jimmunol.1400913] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 04/13/2015] [Indexed: 11/19/2022]
Abstract
Type 1 diabetes in the NOD mouse model has been linked to >30 insulin-dependent diabetes (Idd) susceptibility loci. Idd4 on chromosome 11 consists of two subloci, Idd4.1 and Idd4.2. Using congenic analysis of alleles in NOD and NOD-resistant (NOR) mice, we previously defined Idd4.1 as an interval containing >50 genes that controlled expression of genes in the type 1 IFN pathway. In this study, we report refined mapping of Idd4.1 to a 1.1-Mb chromosomal region and provide genomic sequence analysis and mechanistic evidence supporting its role in innate immune regulation of islet-directed autoimmunity. Genetic variation at Idd4.1 was mediated by radiation-sensitive hematopoietic cells, and type 1 diabetes protection conferred by the NOR allele was abrogated in mice treated with exogenous type 1 IFN-β. Next generation sequence analysis of the full Idd4.1 genomic interval in NOD and NOR strains supported Nlrp1b as a strong candidate gene for Idd4.1. Nlrp1b belongs to the Nod-like receptor (NLR) gene family and contributes to inflammasome assembly, caspase-1 recruitment, and release of IL-1β. The Nlrp1b of NOR was expressed as an alternative spliced isoform that skips exon 9, resulting in a premature stop codon predicted to encode a truncated protein. Functional analysis of the truncated NOR Nlrp1b protein demonstrated that it was unable to recruit caspase-1 and process IL-1β. Our data suggest that Idd4.1-dependent protection from islet autoimmunity is mediated by differences in type 1 IFN- and IL-1β-dependent immune responses resulting from genetic variation in Nlrp1b.
Collapse
Affiliation(s)
- Vinicius N Motta
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Janet G M Markle
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada; Department of Immunology, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Omid Gulban
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Steven Mortin-Toth
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Kuo-Chien Liao
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Jeremy Mogridge
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Charles A Steward
- Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom; and
| | - Jayne S Danska
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada; Department of Immunology, University of Toronto, Toronto, Ontario M5S 1A8, Canada; Department of Medical Biophysics, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| |
Collapse
|
30
|
Beynon RJ, Armstrong SD, Gómez-Baena G, Lee V, Simpson D, Unsworth J, Hurst JL. The complexity of protein semiochemistry in mammals. Biochem Soc Trans 2014; 42:837-45. [PMID: 25109966 DOI: 10.1042/BST20140133] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The high degree of protein sequence similarity in the MUPs (major urinary proteins) poses considerable challenges for their individual differentiation, analysis and quantification. In the present review, we discuss MS approaches for MUP quantification, at either the protein or the peptide level. In particular, we describe an approach to multiplexed quantification based on the design and synthesis of novel proteins (QconCATs) that are concatamers of quantification standards, providing a simple route to the generation of a set of stable-isotope-labelled peptide standards. The MUPs pose a particular challenge to QconCAT design, because of their sequence similarity and the limited number of peptides that can be used to construct the standards. Such difficulties can be overcome by careful attention to the analytical workflow.
Collapse
|
31
|
Smedley D, Haider S, Durinck S, Pandini L, Provero P, Allen J, Arnaiz O, Awedh MH, Baldock R, Barbiera G, Bardou P, Beck T, Blake A, Bonierbale M, Brookes AJ, Bucci G, Buetti I, Burge S, Cabau C, Carlson JW, Chelala C, Chrysostomou C, Cittaro D, Collin O, Cordova R, Cutts RJ, Dassi E, Di Genova A, Djari A, Esposito A, Estrella H, Eyras E, Fernandez-Banet J, Forbes S, Free RC, Fujisawa T, Gadaleta E, Garcia-Manteiga JM, Goodstein D, Gray K, Guerra-Assunção JA, Haggarty B, Han DJ, Han BW, Harris T, Harshbarger J, Hastings RK, Hayes RD, Hoede C, Hu S, Hu ZL, Hutchins L, Kan Z, Kawaji H, Keliet A, Kerhornou A, Kim S, Kinsella R, Klopp C, Kong L, Lawson D, Lazarevic D, Lee JH, Letellier T, Li CY, Lio P, Liu CJ, Luo J, Maass A, Mariette J, Maurel T, Merella S, Mohamed AM, Moreews F, Nabihoudine I, Ndegwa N, Noirot C, Perez-Llamas C, Primig M, Quattrone A, Quesneville H, Rambaldi D, Reecy J, Riba M, Rosanoff S, Saddiq AA, Salas E, Sallou O, Shepherd R, Simon R, Sperling L, Spooner W, Staines DM, Steinbach D, Stone K, Stupka E, Teague JW, Dayem Ullah AZ, Wang J, Ware D, Wong-Erasmus M, Youens-Clark K, Zadissa A, Zhang SJ, Kasprzyk A. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res 2015; 43:W589-98. [PMID: 25897122 PMCID: PMC4489294 DOI: 10.1093/nar/gkv350] [Citation(s) in RCA: 491] [Impact Index Per Article: 54.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 04/02/2015] [Indexed: 01/17/2023] Open
Abstract
The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations.
Collapse
Affiliation(s)
- Damian Smedley
- Wellcome Trust Sanger Institute, Welcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Syed Haider
- The Weatherall Institute Of Molecular Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Steffen Durinck
- Genentech, Inc. 1 DNA Way South San Francisco, CA 94080, USA
| | - Luca Pandini
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - Paolo Provero
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy Dept of Molecular Biotechnology and Health Sciences University of Turin, Italy
| | - James Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Olivier Arnaiz
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris Sud, 1 avenue de la terrasse, 91198 Gif sur Yvette, France
| | - Mohammad Hamza Awedh
- Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Richard Baldock
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Western General Hospital, Edinburgh, EH4 2XU, UK
| | - Giulia Barbiera
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | | | - Tim Beck
- Department of Genetics, University of Leicester, University Road, Leicester, LE1 7RH, UK
| | - Andrew Blake
- MRC Harwell, Harwell Science and Innovation Campus, Oxfordshire, OX11 0RD, UK
| | | | - Anthony J Brookes
- Department of Genetics, University of Leicester, University Road, Leicester, LE1 7RH, UK
| | - Gabriele Bucci
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - Iwan Buetti
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - Sarah Burge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | | | - Claude Chelala
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | | | - Davide Cittaro
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | | | - Raul Cordova
- International Potato Center (CIP), Lima, 1558, Peru
| | - Rosalind J Cutts
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Erik Dassi
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Alex Di Genova
- Center for Mathematical Modeling and Center for Genome Regulation, University of Chile, Beauchef 851, 7th floor, Chile
| | - Anis Djari
- Plate-forme bio-informatique Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet-Tolosan, France
| | | | | | - Eduardo Eyras
- Catalan Institute for Research and Advanced Studies (ICREA), Passeig Lluis Companys 23, E-08010 Barcelona, Spain Universitat Pompeu Fabra, Dr Aiguader 88 E-08003 Barcelona, Spain
| | | | - Simon Forbes
- Wellcome Trust Sanger Institute, Welcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Robert C Free
- Department of Genetics, University of Leicester, University Road, Leicester, LE1 7RH, UK
| | | | - Emanuela Gadaleta
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Jose M Garcia-Manteiga
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - David Goodstein
- Department of Energy, Joint Genome Institute, Walnut Creek, USA
| | - Kristian Gray
- HUGO Gene Nomenclature Committee (HGNC), European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - José Afonso Guerra-Assunção
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Bernard Haggarty
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Western General Hospital, Edinburgh, EH4 2XU, UK
| | - Dong-Jin Han
- Medicinal Bioconvergence Research Center, College of Pharmacy, Seoul National University, Seoul 151-742, Republic of Korea Department of Molecular Medicine and Biopharmaceutical Sciences, Seoul National University, Seoul 151-742, Republic of Korea
| | - Byung Woo Han
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Seoul National University, Seoul 151-742, Republic of Korea Information Center for Bio-pharmacological Network, Seoul National University, Suwon 443-270, Republic of Korea
| | - Todd Harris
- Ontario Institute for Cancer Research, Toronto, M5G 0A3, Canada
| | - Jayson Harshbarger
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies (DGT), Kanagawa, 230-0045, Japan
| | - Robert K Hastings
- Department of Genetics, University of Leicester, University Road, Leicester, LE1 7RH, UK
| | - Richard D Hayes
- Department of Energy, Joint Genome Institute, Walnut Creek, USA
| | - Claire Hoede
- Plate-forme bio-informatique Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet-Tolosan, France
| | - Shen Hu
- School of Dentistry and Dental Research Institute, University of California Los Angeles (UCLA), Los Angeles, CA 90095-1668, USA
| | | | - Lucie Hutchins
- Mouse Genomic Informatics Group, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Zhengyan Kan
- Oncology Computational Biology, Pfizer, La Jolla, USA
| | - Hideya Kawaji
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies (DGT), Kanagawa, 230-0045, Japan RIKEN Preventive Medicine and Diagnosis Innovation Program, Saitama 351-0198, Japan
| | - Aminah Keliet
- INRA URGI Centre de Versailles, bâtiment 18 Route de Saint Cyr 78026 Versailles, France
| | - Arnaud Kerhornou
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sunghoon Kim
- Medicinal Bioconvergence Research Center, College of Pharmacy, Seoul National University, Seoul 151-742, Republic of Korea Department of Molecular Medicine and Biopharmaceutical Sciences, Seoul National University, Seoul 151-742, Republic of Korea
| | - Rhoda Kinsella
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Christophe Klopp
- Plate-forme bio-informatique Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet-Tolosan, France
| | - Lei Kong
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, 100871, P.R. China
| | - Daniel Lawson
- VectorBase, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Dejan Lazarevic
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - Ji-Hyun Lee
- Medicinal Bioconvergence Research Center, College of Pharmacy, Seoul National University, Seoul 151-742, Republic of Korea Research Institute of Pharmaceutical Sciences, College of Pharmacy, Seoul National University, Seoul 151-742, Republic of Korea Information Center for Bio-pharmacological Network, Seoul National University, Suwon 443-270, Republic of Korea
| | - Thomas Letellier
- INRA URGI Centre de Versailles, bâtiment 18 Route de Saint Cyr 78026 Versailles, France
| | - Chuan-Yun Li
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Pietro Lio
- Computer Laboratory, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Chu-Jun Liu
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jie Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Alejandro Maass
- Center for Mathematical Modeling and Center for Genome Regulation, University of Chile, Beauchef 851, 7th floor, Chile Department of Mathematical Engineering, University of Chile, Av. Beauchef 851, 5th floor, Santiago, Chile
| | - Jerome Mariette
- Plate-forme bio-informatique Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet-Tolosan, France
| | - Thomas Maurel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Stefania Merella
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - Azza Mostafa Mohamed
- Departament of Biochemistry, Faculty of Science for Girls, King Abdulaziz University, Jeddah, Saudi Arabia
| | | | - Ibounyamine Nabihoudine
- Plate-forme bio-informatique Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet-Tolosan, France
| | - Nelson Ndegwa
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, PO Box 281, 17177 Stockholm, Sweden
| | - Céline Noirot
- Plate-forme bio-informatique Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRA, Castanet-Tolosan, France
| | | | - Michael Primig
- Inserm U1085 IRSET, University of Rennes 1, 35042 Rennes, France
| | - Alessandro Quattrone
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Hadi Quesneville
- INRA URGI Centre de Versailles, bâtiment 18 Route de Saint Cyr 78026 Versailles, France
| | - Davide Rambaldi
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | | | - Michela Riba
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - Steven Rosanoff
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Amna Ali Saddiq
- Department of Biological Sciences, Faculty of Science for Girls, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Elisa Salas
- International Potato Center (CIP), Lima, 1558, Peru
| | | | - Rebecca Shepherd
- Wellcome Trust Sanger Institute, Welcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | | | - Linda Sperling
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris Sud, 1 avenue de la terrasse, 91198 Gif sur Yvette, France
| | - William Spooner
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA Eagle Genomics Ltd., Babraham Research Campus, Cambridge, CB22 3AT, UK
| | - Daniel M Staines
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Delphine Steinbach
- INRA URGI Centre de Versailles, bâtiment 18 Route de Saint Cyr 78026 Versailles, France
| | - Kevin Stone
- Mouse Genomic Informatics Group, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Elia Stupka
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy
| | - Jon W Teague
- Wellcome Trust Sanger Institute, Welcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Abu Z Dayem Ullah
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Jun Wang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, 100871, P.R. China
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Marie Wong-Erasmus
- Human Longevity, Inc. 10835 Road to the Cure 140 San Diego, CA 92121, USA
| | - Ken Youens-Clark
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Amonida Zadissa
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shi-Jian Zhang
- Institute of Molecular Medicine, Peking University, Beijing, China
| | - Arek Kasprzyk
- Center for Translational Genomics and Bioinformatics San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
32
|
Lefranc MP, Giudicelli V, Duroux P, Jabado-Michaloud J, Folch G, Aouinti S, Carillon E, Duvergey H, Houles A, Paysan-Lafosse T, Hadi-Saljoqi S, Sasorith S, Lefranc G, Kossida S. IMGT®, the international ImMunoGeneTics information system® 25 years on. Nucleic Acids Res 2014; 43:D413-22. [PMID: 25378316 PMCID: PMC4383898 DOI: 10.1093/nar/gku1056] [Citation(s) in RCA: 380] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
IMGT®, the international ImMunoGeneTics information system®(http://www.imgt.org) is the global reference in immunogenetics and immunoinformatics. By its creation in 1989 by Marie-Paule Lefranc (Université de Montpellier and CNRS), IMGT® marked the advent of immunoinformatics, which emerged at the interface between immunogenetics and bioinformatics. IMGT® is specialized in the immunoglobulins (IG) or antibodies, T cell receptors (TR), major histocompatibility (MH) and proteins of the IgSF and MhSF superfamilies. IMGT® is built on the IMGT-ONTOLOGY axioms and concepts, which bridged the gap between genes, sequences and 3D structures. The concepts include the IMGT® standardized keywords (identification), IMGT® standardized labels (description), IMGT® standardized nomenclature (classification), IMGT unique numbering and IMGT Colliers de Perles (numerotation). IMGT® comprises 7 databases, 17 online tools and 15 000 pages of web resources, and provides a high-quality and integrated system for analysis of the genomic and expressed IG and TR repertoire of the adaptive immune responses, including NGS high-throughput data. Tools and databases are used in basic, veterinary and medical research, in clinical applications (mutation analysis in leukemia and lymphoma) and in antibody engineering and humanization. The IMGT/mAb-DB interface was developed for therapeutic antibodies and fusion proteins for immunological applications (FPIA). IMGT® is freely available at http://www.imgt.org.
Collapse
Affiliation(s)
- Marie-Paule Lefranc
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Véronique Giudicelli
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Patrice Duroux
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Joumana Jabado-Michaloud
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Géraldine Folch
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Safa Aouinti
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Emilie Carillon
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Hugo Duvergey
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Amélie Houles
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Typhaine Paysan-Lafosse
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Saida Hadi-Saljoqi
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Souphatta Sasorith
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Gérard Lefranc
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| | - Sofia Kossida
- IMGT, the international ImMunoGeneTics information system, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire LIGM, UPR CNRS 1142, Institut de Génétique Humaine IGH, 141 rue de la Cardonille, Montpellier, 34396 cedex 5, France
| |
Collapse
|
33
|
Abstract
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. To date the HGNC have assigned over 39 000 gene names and, representing an increase of over 5000 entries in the past two years. As well as increasing the size of our database, we have continued redesigning our website http://www.genenames.org and have modified, updated and improved many aspects of the site including a faster and more powerful search, a vastly improved HCOP tool and a REST service to increase the number of ways users can retrieve our data. This article provides an overview of our current online data and resources, and highlights the changes we have made in recent years.
Collapse
Affiliation(s)
- Kristian A Gray
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Bethan Yates
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ruth L Seal
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mathew W Wright
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Elspeth A Bruford
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
34
|
Baral R, Ngounou Wetie AG, Darie CC, Wallace KN. Mass spectrometry for proteomics-based investigation using the zebrafish vertebrate model system. Adv Exp Med Biol 2014; 806:331-40. [PMID: 24952190 DOI: 10.1007/978-3-319-06068-2_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The zebrafish (Danio rerio) is frequently being used to investigate the genetics of human diseases as well as resulting pathologies. Ease of both forward and reverse genetic manipulation along with conservation of vertebrate organ systems and disease causing genes has made this system a popular model. Many techniques have been developed to manipulate the genome of zebrafish producing mutants in a vast array of genes. While genetic manipulation of zebrafish has progressed, proteomics have been under-utilized. This review highlights studies that have already been performed using proteomic techniques and as well as our initial proteomic work comparing changes to the proteome between the ascl1a-/- and WT intestine.
Collapse
Affiliation(s)
- Reshica Baral
- Department of Biology, Clarkson University, 8 Clarkson Avenue, Potsdam, NY, 13699-5810, USA
| | | | | | | |
Collapse
|
35
|
Li W, Freudenberg J. Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. Comput Biol Chem 2014; 53 Pt A:108-17. [PMID: 25241312 DOI: 10.1016/j.compbiolchem.2014.08.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 12/31/2022]
Abstract
Repetitive and redundant regions of a genome are particularly problematic for mapping sequencing reads. In the present paper, we compile a list of the unmappable regions in the human genome based on the following definition: hypothetical reads with length 1 kb which cannot be uniquely mapped with zero-mismatch alignment for the described regions, considering both the forward and reverse strand. The respective collection of unmappable regions covers 0.77% of the sequence of human autosomes and 8.25% of the sex chromosomes in the reference genome GRCh37/hg19 (overall 1.23%). Not surprisingly, our unmappable regions overlap greatly with segmental duplication, transposable elements, and structural variants. About 99.8% of bases in our unmappable regions are part of either segmental duplication or transposable elements and 98.3% overlap structural variant annotations. Notably, some of these regions overlap units with important biological functions, including 4% of protein-coding genes. In contrast, these regions have zero intersection with the ultraconserved elements, very low overlap with microRNAs, tRNAs, pseudogenes, CpG islands, tandem repeats, microsatellites, sensitive non-coding regions, and the mapping blacklist regions from the ENCODE project.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, NY 11030, USA.
| | - Jan Freudenberg
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, NY 11030, USA
| |
Collapse
|