1
|
Raciti D, Van Auken KM, Arnaboldi V, Tabone CJ, Muller HM, Sternberg PW. Characterization and automated classification of sentences in the biomedical literature: a case study for biocuration of gene expression and protein kinase activity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.06.631539. [PMID: 39829858 PMCID: PMC11741306 DOI: 10.1101/2025.01.06.631539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Biological knowledgebases are essential resources for biomedical researchers, providing ready access to gene function and genomic data. Professional, manual curation of knowledgebases, however, is labor-intensive and thus high-performing machine learning methods that improve biocuration efficiency are needed. Here we report on sentence-level classification to identify biocuration-relevant sentences in the full text of published references for two gene function data types: gene expression and protein kinase activity. We performed a detailed characterization of sentences from references in the WormBase bibliography and used this characterization to define three tasks for classifying sentences as either 1) fully curatable, 2) fully and partially curatable, or 3) all language-related. We evaluated various machine learning (ML) models applied to these tasks and found that GPT and BioBERT achieve the highest average performance, resulting in F1 performance scores ranging from 0.89 to 0.99 depending upon the task. Our findings demonstrate the feasibility of extracting biocuration-relevant sentences from full text. Integrating these models into professional biocuration workflows, such as those used by the Alliance of Genome Resources and the ACKnowledge community curation platform, might well facilitate efficient and accurate annotation of the biomedical literature.
Collapse
Affiliation(s)
- Daniela Raciti
- Division of Biology and Biological Engineering, 1200 E. California Boulevard, California Institute of Technology, Pasadena, CA 91125, USA
| | - Kimberly M. Van Auken
- Division of Biology and Biological Engineering, 1200 E. California Boulevard, California Institute of Technology, Pasadena, CA 91125, USA
| | - Valerio Arnaboldi
- Division of Biology and Biological Engineering, 1200 E. California Boulevard, California Institute of Technology, Pasadena, CA 91125, USA
| | | | - Hans-Michael Muller
- Division of Biology and Biological Engineering, 1200 E. California Boulevard, California Institute of Technology, Pasadena, CA 91125, USA
| | - Paul W. Sternberg
- Division of Biology and Biological Engineering, 1200 E. California Boulevard, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
2
|
The Alliance of Genome Resources Consortium, Aleksander SA, Anagnostopoulos AV, Antonazzo G, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Cherry JM, Cho J, Crosby MA, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Dyer S, Ebert D, Engel SR, Fashena D, Fisher M, Foley S, Gibson AC, Gollapally VR, Gramates LS, Grove CA, Hale P, Harris T, Hayman GT, Hu Y, James-Zorn C, Karimi K, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, Markarian N, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nash RS, Nuin P, Paddock H, Pells T, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schindelman G, Shaw DR, Sherlock G, Shrivatsav A, Singer A, Smith CM, Smith CL, Smith JR, Stein L, Sternberg PW, Tabone CJ, Thomas PD, Thorat K, Thota J, Tomczuk M, Trovisco V, Tutaj MA, Urbano JM, Van Auken K, Van Slyke CE, Vize PD, Wang Q, Weng S, Westerfield M, Wilming LG, Wong ED, Wright A, Yook K, Zhou P, et alThe Alliance of Genome Resources Consortium, Aleksander SA, Anagnostopoulos AV, Antonazzo G, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Cherry JM, Cho J, Crosby MA, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Dyer S, Ebert D, Engel SR, Fashena D, Fisher M, Foley S, Gibson AC, Gollapally VR, Gramates LS, Grove CA, Hale P, Harris T, Hayman GT, Hu Y, James-Zorn C, Karimi K, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, Markarian N, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nash RS, Nuin P, Paddock H, Pells T, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schindelman G, Shaw DR, Sherlock G, Shrivatsav A, Singer A, Smith CM, Smith CL, Smith JR, Stein L, Sternberg PW, Tabone CJ, Thomas PD, Thorat K, Thota J, Tomczuk M, Trovisco V, Tutaj MA, Urbano JM, Van Auken K, Van Slyke CE, Vize PD, Wang Q, Weng S, Westerfield M, Wilming LG, Wong ED, Wright A, Yook K, Zhou P, Zorn A, Zytkovicz M. Updates to the Alliance of Genome Resources central infrastructure. Genetics 2024; 227:iyae049. [PMID: 38552170 PMCID: PMC11075569 DOI: 10.1093/genetics/iyae049] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 02/28/2024] [Accepted: 02/29/2024] [Indexed: 04/09/2024] Open
Abstract
The Alliance of Genome Resources (Alliance) is an extensible coalition of knowledgebases focused on the genetics and genomics of intensively studied model organisms. The Alliance is organized as individual knowledge centers with strong connections to their research communities and a centralized software infrastructure, discussed here. Model organisms currently represented in the Alliance are budding yeast, Caenorhabditis elegans, Drosophila, zebrafish, frog, laboratory mouse, laboratory rat, and the Gene Ontology Consortium. The project is in a rapid development phase to harmonize knowledge, store it, analyze it, and present it to the community through a web portal, direct downloads, and application programming interfaces (APIs). Here, we focus on developments over the last 2 years. Specifically, we added and enhanced tools for browsing the genome (JBrowse), downloading sequences, mining complex data (AllianceMine), visualizing pathways, full-text searching of the literature (Textpresso), and sequence similarity searching (SequenceServer). We enhanced existing interactive data tables and added an interactive table of paralogs to complement our representation of orthology. To support individual model organism communities, we implemented species-specific "landing pages" and will add disease-specific portals soon; in addition, we support a common community forum implemented in Discourse software. We describe our progress toward a central persistent database to support curation, the data modeling that underpins harmonization, and progress toward a state-of-the-art literature curation system with integrated artificial intelligence and machine learning (AI/ML).
Collapse
Affiliation(s)
| | | | | | - Giulia Antonazzo
- Department of Physiology, Development and Neuroscience , University of Cambridge, Downing Street, Cambridge CB2 3DY , UK
| | - Valerio Arnaboldi
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience , University of Cambridge, Downing Street, Cambridge CB2 3DY , UK
| | - Andrés Becerra
- European Molecular Biology Laboratory, European Bioinformatics Institute , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , UK
| | - Susan M Bello
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Olin Blodgett
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | | | - Carol J Bult
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Scott Cain
- Informatics and Bio-computing Platform, Ontario Institute for Cancer Research , Toronto, ON M5G0A3 , Canada
| | - Brian R Calvi
- Department of Biology, Indiana University , Bloomington, IN 47408 , USA
| | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA
| | - Juancarlos Chan
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Wen J Chen
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - J Michael Cherry
- Department of Genetics, Stanford University , Stanford, CA 94305
| | - Jaehyoung Cho
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Madeline A Crosby
- The Biological Laboratories, Harvard University , 16 Divinity Avenue, Cambridge, MA 02138 , USA
| | - Jeffrey L De Pons
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | | | - Stavros Diamantakis
- European Molecular Biology Laboratory, European Bioinformatics Institute , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , UK
| | - Mary E Dolan
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Gilberto dos Santos
- The Biological Laboratories, Harvard University , 16 Divinity Avenue, Cambridge, MA 02138 , USA
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , UK
| | - Dustin Ebert
- Department of Population and Public Health Sciences, University of Southern California , Los Angeles, CA 90033 , USA
| | - Stacia R Engel
- Department of Genetics, Stanford University , Stanford, CA 94305
| | - David Fashena
- Institute of Neuroscience, University of Oregon , Eugene, OR 97403
| | - Malcolm Fisher
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center , 3333 Burnet Ave, Cincinnati, OH 45229 , USA
| | - Saoirse Foley
- Department of Biological Sciences, Carnegie Mellon University , 5000 Forbes Ave, Pittsburgh, PA 15203
| | - Adam C Gibson
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | - Varun R Gollapally
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | - L Sian Gramates
- The Biological Laboratories, Harvard University , 16 Divinity Avenue, Cambridge, MA 02138 , USA
| | - Christian A Grove
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Paul Hale
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Todd Harris
- Informatics and Bio-computing Platform, Ontario Institute for Cancer Research , Toronto, ON M5G0A3 , Canada
| | - G Thomas Hayman
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | - Yanhui Hu
- Department of Genetics, Howard Hughes Medical Institute , Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115 , USA
| | - Christina James-Zorn
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center , 3333 Burnet Ave, Cincinnati, OH 45229 , USA
| | - Kamran Karimi
- Department of Biological Sciences, University of Calgary , 507 Campus Dr NW, Calgary, AB T2N 4V8 , Canada
| | - Kalpana Karra
- Department of Genetics, Stanford University , Stanford, CA 94305
| | - Ranjana Kishore
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Anne E Kwitek
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | - Stanley J F Laulederkind
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | - Raymond Lee
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Ian Longden
- The Biological Laboratories, Harvard University , 16 Divinity Avenue, Cambridge, MA 02138 , USA
| | - Manuel Luypaert
- European Molecular Biology Laboratory, European Bioinformatics Institute , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , UK
| | - Nicholas Markarian
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Steven J Marygold
- Department of Physiology, Development and Neuroscience , University of Cambridge, Downing Street, Cambridge CB2 3DY , UK
| | - Beverley Matthews
- The Biological Laboratories, Harvard University , 16 Divinity Avenue, Cambridge, MA 02138 , USA
| | - Monica S McAndrews
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Gillian Millburn
- Department of Physiology, Development and Neuroscience , University of Cambridge, Downing Street, Cambridge CB2 3DY , UK
| | - Stuart Miyasato
- Department of Genetics, Stanford University , Stanford, CA 94305
| | - Howie Motenko
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Sierra Moxon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA
| | - Hans-Michael Muller
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory , Berkeley, CA
| | - Anushya Muruganujan
- Department of Population and Public Health Sciences, University of Southern California , Los Angeles, CA 90033 , USA
| | - Tremayne Mushayahama
- Department of Population and Public Health Sciences, University of Southern California , Los Angeles, CA 90033 , USA
| | - Robert S Nash
- Department of Genetics, Stanford University , Stanford, CA 94305
| | - Paulo Nuin
- Informatics and Bio-computing Platform, Ontario Institute for Cancer Research , Toronto, ON M5G0A3 , Canada
| | - Holly Paddock
- Institute of Neuroscience, University of Oregon , Eugene, OR 97403
| | - Troy Pells
- Department of Biological Sciences, University of Calgary , 507 Campus Dr NW, Calgary, AB T2N 4V8 , Canada
| | - Norbert Perrimon
- Department of Genetics, Howard Hughes Medical Institute , Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115 , USA
| | - Christian Pich
- Institute of Neuroscience, University of Oregon , Eugene, OR 97403
| | - Mark Quinton-Tulloch
- European Molecular Biology Laboratory, European Bioinformatics Institute , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , UK
| | - Daniela Raciti
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | | | | | - Susan Russo Gelbart
- The Biological Laboratories, Harvard University , 16 Divinity Avenue, Cambridge, MA 02138 , USA
| | - Leyla Ruzicka
- Institute of Neuroscience, University of Oregon , Eugene, OR 97403
| | - Gary Schindelman
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - David R Shaw
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Gavin Sherlock
- Department of Genetics, Stanford University , Stanford, CA 94305
| | - Ajay Shrivatsav
- Department of Genetics, Stanford University , Stanford, CA 94305
| | - Amy Singer
- Institute of Neuroscience, University of Oregon , Eugene, OR 97403
| | - Constance M Smith
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Cynthia L Smith
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Jennifer R Smith
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | - Lincoln Stein
- Informatics and Bio-computing Platform, Ontario Institute for Cancer Research , Toronto, ON M5G0A3 , Canada
| | - Paul W Sternberg
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Christopher J Tabone
- The Biological Laboratories, Harvard University , 16 Divinity Avenue, Cambridge, MA 02138 , USA
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California , Los Angeles, CA 90033 , USA
| | - Ketaki Thorat
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | - Jyothi Thota
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | - Monika Tomczuk
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Vitor Trovisco
- Department of Physiology, Development and Neuroscience , University of Cambridge, Downing Street, Cambridge CB2 3DY , UK
| | - Marek A Tutaj
- Medical College of Wisconsin—Rat Genome Database, Departments of Physiology and Biomedical Engineering , Medical College of Wisconsin, Milwaukee, WI 53226 , USA
| | - Jose-Maria Urbano
- Department of Physiology, Development and Neuroscience , University of Cambridge, Downing Street, Cambridge CB2 3DY , UK
| | - Kimberly Van Auken
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Ceri E Van Slyke
- Institute of Neuroscience, University of Oregon , Eugene, OR 97403
| | - Peter D Vize
- Department of Biological Sciences, University of Calgary , 507 Campus Dr NW, Calgary, AB T2N 4V8 , Canada
| | - Qinghua Wang
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Shuai Weng
- Department of Genetics, Stanford University , Stanford, CA 94305
| | | | - Laurens G Wilming
- The Jackson Laboratory for Mammalian Genomics, Bar Harbor , ME 04609 , USA
| | - Edith D Wong
- Department of Genetics, Stanford University , Stanford, CA 94305
| | - Adam Wright
- Informatics and Bio-computing Platform, Ontario Institute for Cancer Research , Toronto, ON M5G0A3 , Canada
| | - Karen Yook
- Division of Biology and Biological Engineering 140-18, California Institute of Technology , Pasadena, CA 91125 , USA
| | - Pinglei Zhou
- The Biological Laboratories, Harvard University , 16 Divinity Avenue, Cambridge, MA 02138 , USA
| | - Aaron Zorn
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center , 3333 Burnet Ave, Cincinnati, OH 45229 , USA
| | - Mark Zytkovicz
- The Biological Laboratories, Harvard University , 16 Divinity Avenue, Cambridge, MA 02138 , USA
| |
Collapse
|
3
|
Bult CJ, Sternberg PW. The alliance of genome resources: transforming comparative genomics. Mamm Genome 2023; 34:531-544. [PMID: 37666946 PMCID: PMC10628019 DOI: 10.1007/s00335-023-10015-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 08/11/2023] [Indexed: 09/06/2023]
Abstract
Comparing genomic and biological characteristics across multiple species is essential to using model systems to investigate the molecular and cellular mechanisms underlying human biology and disease and to translate mechanistic insights from studies in model organisms for clinical applications. Building a scalable knowledge commons platform that supports cross-species comparison of rich, expertly curated knowledge regarding gene function, phenotype, and disease associations available for model organisms and humans is the primary mission of the Alliance of Genome Resources (the Alliance). The Alliance is a consortium of seven model organism knowledgebases (mouse, rat, yeast, nematode, zebrafish, frog, fruit fly) and the Gene Ontology resource. The Alliance uses a common set of gene ortholog assertions as the basis for comparing biological annotations across the organisms represented in the Alliance. The major types of knowledge associated with genes that are represented in the Alliance database currently include gene function, phenotypic alleles and variants, human disease associations, pathways, gene expression, and both protein-protein and genetic interactions. The Alliance has enhanced the ability of researchers to easily compare biological annotations for common data types across model organisms and human through the implementation of shared programmatic access mechanisms, data-specific web pages with a unified "look and feel", and interactive user interfaces specifically designed to support comparative biology. The modular infrastructure developed by the Alliance allows the resource to serve as an extensible "knowledge commons" capable of expanding to accommodate additional model organisms.
Collapse
|
4
|
Aleksander SA, Anagnostopoulos AV, Antonazzo G, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Crosby MA, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, Santos GD, Dyer S, Ebert D, Engel SR, Fashena D, Fisher M, Foley S, Gibson AC, Gollapally VR, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hu Y, James-Zorn C, Karimi K, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, Markarian N, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nash RS, Nuin P, Paddock H, Pells T, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schindelman G, Shaw DR, Sherlock G, Shrivatsav A, Singer A, Smith CM, Smith CL, Smith JR, Stein L, Sternberg PW, Tabone CJ, Thomas PD, Thorat K, Thota J, Tomczuk M, Trovisco V, Tutaj MA, Urbano JM, Auken KV, Van Slyke CE, Vize PD, Wang Q, Weng S, Westerfield M, Wilming LG, Wong ED, Wright A, Yook K, Zhou P, Zorn A, et alAleksander SA, Anagnostopoulos AV, Antonazzo G, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Crosby MA, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, Santos GD, Dyer S, Ebert D, Engel SR, Fashena D, Fisher M, Foley S, Gibson AC, Gollapally VR, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hu Y, James-Zorn C, Karimi K, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, Markarian N, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nash RS, Nuin P, Paddock H, Pells T, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schindelman G, Shaw DR, Sherlock G, Shrivatsav A, Singer A, Smith CM, Smith CL, Smith JR, Stein L, Sternberg PW, Tabone CJ, Thomas PD, Thorat K, Thota J, Tomczuk M, Trovisco V, Tutaj MA, Urbano JM, Auken KV, Van Slyke CE, Vize PD, Wang Q, Weng S, Westerfield M, Wilming LG, Wong ED, Wright A, Yook K, Zhou P, Zorn A, Zytkovicz M. Updates to the Alliance of Genome Resources Central Infrastructure Alliance of Genome Resources Consortium. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.20.567935. [PMID: 38045425 PMCID: PMC10690154 DOI: 10.1101/2023.11.20.567935] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
The Alliance of Genome Resources (Alliance) is an extensible coalition of knowledgebases focused on the genetics and genomics of intensively-studied model organisms. The Alliance is organized as individual knowledge centers with strong connections to their research communities and a centralized software infrastructure, discussed here. Model organisms currently represented in the Alliance are budding yeast, C. elegans, Drosophila, zebrafish, frog, laboratory mouse, laboratory rat, and the Gene Ontology Consortium. The project is in a rapid development phase to harmonize knowledge, store it, analyze it, and present it to the community through a web portal, direct downloads, and APIs. Here we focus on developments over the last two years. Specifically, we added and enhanced tools for browsing the genome (JBrowse), downloading sequences, mining complex data (AllianceMine), visualizing pathways, full-text searching of the literature (Textpresso), and sequence similarity searching (SequenceServer). We enhanced existing interactive data tables and added an interactive table of paralogs to complement our representation of orthology. To support individual model organism communities, we implemented species-specific "landing pages" and will add disease-specific portals soon; in addition, we support a common community forum implemented in Discourse. We describe our progress towards a central persistent database to support curation, the data modeling that underpins harmonization, and progress towards a state-of-the art literature curation system with integrated Artificial Intelligence and Machine Learning (AI/ML).
Collapse
|
5
|
Vedi M, Smith JR, Thomas Hayman G, Tutaj M, Brodie KC, De Pons JL, Demos WM, Gibson AC, Kaldunski ML, Lamers L, Laulederkind SJF, Thota J, Thorat K, Tutaj MA, Wang SJ, Zacher S, Dwinell MR, Kwitek AE. 2022 updates to the Rat Genome Database: a Findable, Accessible, Interoperable, and Reusable (FAIR) resource. Genetics 2023; 224:iyad042. [PMID: 36930729 PMCID: PMC10474928 DOI: 10.1093/genetics/iyad042] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 03/07/2023] [Accepted: 03/08/2023] [Indexed: 03/19/2023] Open
Abstract
The Rat Genome Database (RGD, https://rgd.mcw.edu) has evolved from simply a resource for rat genetic markers, maps, and genes, by adding multiple genomic data types and extensive disease and phenotype annotations and developing tools to effectively mine, analyze, and visualize the available data, to empower investigators in their hypothesis-driven research. Leveraging its robust and flexible infrastructure, RGD has added data for human and eight other model organisms (mouse, 13-lined ground squirrel, chinchilla, naked mole-rat, dog, pig, African green monkey/vervet, and bonobo) besides rat to enhance its translational aspect. This article presents an overview of the database with the most recent additions to RGD's genome, variant, and quantitative phenotype data. We also briefly introduce Virtual Comparative Map (VCMap), an updated tool that explores synteny between species as an improvement to RGD's suite of tools, followed by a discussion regarding the refinements to the existing PhenoMiner tool that assists researchers in finding and comparing quantitative data across rat strains. Collectively, RGD focuses on providing a continuously improving, consistent, and high-quality data resource for researchers while advancing data reproducibility and fulfilling Findable, Accessible, Interoperable, and Reusable (FAIR) data principles.
Collapse
Affiliation(s)
- Mahima Vedi
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jennifer R Smith
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - G Thomas Hayman
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Monika Tutaj
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Kent C Brodie
- Clinical and Translational Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jeffrey L De Pons
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Wendy M Demos
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Adam C Gibson
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Mary L Kaldunski
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Logan Lamers
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Stanley J F Laulederkind
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jyothi Thota
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Ketaki Thorat
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Marek A Tutaj
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Shur-Jen Wang
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Stacy Zacher
- Finance and Administration, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Melinda R Dwinell
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Anne E Kwitek
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| |
Collapse
|
6
|
Wang SJ, Brodie KC, De Pons JL, Demos WM, Gibson AC, Hayman GT, Hill ML, Kaldunski ML, Lamers L, Laulederkind SJF, Nalabolu HS, Thota J, Thorat K, Tutaj MA, Tutaj M, Vedi M, Zacher S, Smith JR, Dwinell MR, Kwitek AE. Ontological Analysis of Coronavirus Associated Human Genes at the COVID-19 Disease Portal. Genes (Basel) 2022; 13:genes13122304. [PMID: 36553571 PMCID: PMC9777590 DOI: 10.3390/genes13122304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/02/2022] [Accepted: 12/04/2022] [Indexed: 12/12/2022] Open
Abstract
The COVID-19 pandemic stemmed a parallel upsurge in the scientific literature about SARS-CoV-2 infection and its health burden. The Rat Genome Database (RGD) created a COVID-19 Disease Portal to leverage information from the scientific literature. In the COVID-19 Portal, gene-disease associations are established by manual curation of PubMed literature. The portal contains data for nine ontologies related to COVID-19, an embedded enrichment analysis tool, as well as links to a toolkit. Using these information and tools, we performed analyses on the curated COVID-19 disease genes. As expected, Disease Ontology enrichment analysis showed that the COVID-19 gene set is highly enriched with coronavirus infectious disease and related diseases. However, other less related diseases were also highly enriched, such as liver and rheumatic diseases. Using the comparison heatmap tool, we found nearly 60 percent of the COVID-19 genes were associated with nervous system disease and 40 percent were associated with gastrointestinal disease. Our analysis confirms the role of the immune system in COVID-19 pathogenesis as shown by substantial enrichment of immune system related Gene Ontology terms. The information in RGD's COVID-19 disease portal can generate new hypotheses to potentiate novel therapies and prevention of acute and long-term complications of COVID-19.
Collapse
Affiliation(s)
- Shur-Jen Wang
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Kent C. Brodie
- Clinical and Translational Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jeffrey L. De Pons
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Wendy M. Demos
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Adam C. Gibson
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - G. Thomas Hayman
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Morgan L. Hill
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Mary L. Kaldunski
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Logan Lamers
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Stanley J. F. Laulederkind
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Harika S. Nalabolu
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jyothi Thota
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Ketaki Thorat
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Marek A. Tutaj
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Monika Tutaj
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Mahima Vedi
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Stacy Zacher
- Finance and Administration, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jennifer R. Smith
- The Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Melinda R. Dwinell
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Anne E. Kwitek
- The Rat Genome Database, Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
- Correspondence:
| |
Collapse
|
7
|
Alliance of Genome Resources Consortium, Agapite J, Albou LP, Aleksander SA, Alexander M, Anagnostopoulos AV, Antonazzo G, Argasinska J, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blake JA, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Christie KR, Crosby MA, Davis P, da Veiga Beltrame E, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Douglass E, Dunn B, Eagle A, Ebert D, Engel SR, Fashena D, Foley S, Frazer K, Gao S, Gibson AC, Gondwe F, Goodman J, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hill DP, Howe DG, Howe KL, Hu Y, Jha S, Kadin JA, Kaufman TC, Kalita P, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, MacPherson KA, Martin R, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nalabolu HS, Nash RS, Ng P, Nuin P, Paddock H, Paulini M, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schaper K, Schindelman G, Shimoyama M, Simison M, Shaw DR, Shrivatsav A, Singer A, Skrzypek M, Smith CM, et alAlliance of Genome Resources Consortium, Agapite J, Albou LP, Aleksander SA, Alexander M, Anagnostopoulos AV, Antonazzo G, Argasinska J, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blake JA, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Christie KR, Crosby MA, Davis P, da Veiga Beltrame E, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Douglass E, Dunn B, Eagle A, Ebert D, Engel SR, Fashena D, Foley S, Frazer K, Gao S, Gibson AC, Gondwe F, Goodman J, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hill DP, Howe DG, Howe KL, Hu Y, Jha S, Kadin JA, Kaufman TC, Kalita P, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, MacPherson KA, Martin R, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nalabolu HS, Nash RS, Ng P, Nuin P, Paddock H, Paulini M, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schaper K, Schindelman G, Shimoyama M, Simison M, Shaw DR, Shrivatsav A, Singer A, Skrzypek M, Smith CM, Smith CL, Smith JR, Stein L, Sternberg PW, Tabone CJ, Thomas PD, Thorat K, Thota J, Toro S, Tomczuk M, Trovisco V, Tutaj MA, Tutaj M, Urbano JM, Van Auken K, Van Slyke CE, Wang Q, Wang SJ, Weng S, Westerfield M, Williams G, Wilming LG, Wong ED, Wright A, Yook K, Zarowiecki M, Zhou P, Zytkovicz M. Harmonizing model organism data in the Alliance of Genome Resources. Genetics 2022; 220:iyac022. [PMID: 35380658 PMCID: PMC8982023 DOI: 10.1093/genetics/iyac022] [Show More Authors] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 01/26/2022] [Indexed: 02/06/2023] Open
Abstract
The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein-protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides.
Collapse
|
8
|
Deep neural network prediction of genome-wide transcriptome signatures - beyond the Black-box. NPJ Syst Biol Appl 2022; 8:9. [PMID: 35197482 PMCID: PMC8866467 DOI: 10.1038/s41540-022-00218-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 01/24/2022] [Indexed: 11/28/2022] Open
Abstract
Prediction algorithms for protein or gene structures, including transcription factor binding from sequence information, have been transformative in understanding gene regulation. Here we ask whether human transcriptomic profiles can be predicted solely from the expression of transcription factors (TFs). We find that the expression of 1600 TFs can explain >95% of the variance in 25,000 genes. Using the light-up technique to inspect the trained NN, we find an over-representation of known TF-gene regulations. Furthermore, the learned prediction network has a hierarchical organization. A smaller set of around 125 core TFs could explain close to 80% of the variance. Interestingly, reducing the number of TFs below 500 induces a rapid decline in prediction performance. Next, we evaluated the prediction model using transcriptional data from 22 human diseases. The TFs were sufficient to predict the dysregulation of the target genes (rho = 0.61, P < 10−216). By inspecting the model, key causative TFs could be extracted for subsequent validation using disease-associated genetic variants. We demonstrate a methodology for constructing an interpretable neural network predictor, where analyses of the predictors identified key TFs that were inducing transcriptional changes during disease.
Collapse
|
9
|
Vedi M, Nalabolu HS, Lin CW, Hoffman MJ, Smith JR, Brodie K, De Pons JL, Demos WM, Gibson AC, Hayman GT, Hill ML, Kaldunski ML, Lamers L, Laulederkind SJF, Thorat K, Thota J, Tutaj M, Tutaj MA, Wang SJ, Zacher S, Dwinell MR, Kwitek AE. MOET: a web-based gene set enrichment tool at the Rat Genome Database for multiontology and multispecies analyses. Genetics 2022; 220:6516514. [PMID: 35380657 PMCID: PMC8982048 DOI: 10.1093/genetics/iyac005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 01/03/2022] [Indexed: 12/12/2022] Open
Abstract
Biological interpretation of a large amount of gene or protein data is complex. Ontology analysis tools are imperative in finding functional similarities through overrepresentation or enrichment of terms associated with the input gene or protein lists. However, most tools are limited by their ability to do ontology-specific and species-limited analyses. Furthermore, some enrichment tools are not updated frequently with recent information from databases, thus giving users inaccurate, outdated or uninformative data. Here, we present MOET or the Multi-Ontology Enrichment Tool (v.1 released in April 2019 and v.2 released in May 2021), an ontology analysis tool leveraging data that the Rat Genome Database (RGD) integrated from in-house expert curation and external databases including the National Center for Biotechnology Information (NCBI), Mouse Genome Informatics (MGI), The Kyoto Encyclopedia of Genes and Genomes (KEGG), The Gene Ontology Resource, UniProt-GOA, and others. Given a gene or protein list, MOET analysis identifies significantly overrepresented ontology terms using a hypergeometric test and provides nominal and Bonferroni corrected P-values and odds ratios for the overrepresented terms. The results are shown as a downloadable list of terms with and without Bonferroni correction, and a graph of the P-values and number of annotated genes for each term in the list. MOET can be accessed freely from https://rgd.mcw.edu/rgdweb/enrichment/start.html.
Collapse
Affiliation(s)
- Mahima Vedi
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Harika S Nalabolu
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Chien-Wei Lin
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Matthew J Hoffman
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jennifer R Smith
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Kent Brodie
- Clinical and Translational Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jeffrey L De Pons
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Wendy M Demos
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Adam C Gibson
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - G Thomas Hayman
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Morgan L Hill
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Mary L Kaldunski
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Logan Lamers
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | | - Ketaki Thorat
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jyothi Thota
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Monika Tutaj
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Marek A Tutaj
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Shur-Jen Wang
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Stacy Zacher
- Information Services, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Melinda R Dwinell
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Anne E Kwitek
- Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| |
Collapse
|
10
|
Kaldunski ML, Smith JR, Hayman GT, Brodie K, De Pons JL, Demos WM, Gibson AC, Hill ML, Hoffman MJ, Lamers L, Laulederkind SJF, Nalabolu HS, Thorat K, Thota J, Tutaj M, Tutaj MA, Vedi M, Wang SJ, Zacher S, Dwinell MR, Kwitek AE. The Rat Genome Database (RGD) facilitates genomic and phenotypic data integration across multiple species for biomedical research. Mamm Genome 2021; 33:66-80. [PMID: 34741192 PMCID: PMC8570235 DOI: 10.1007/s00335-021-09932-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 10/21/2021] [Indexed: 01/21/2023]
Abstract
Model organism research is essential for discovering the mechanisms of human diseases by defining biologically meaningful gene to disease relationships. The Rat Genome Database (RGD, ( https://rgd.mcw.edu )) is a cross-species knowledgebase and the premier online resource for rat genetic and physiologic data. This rich resource is enhanced by the inclusion and integration of comparative data for human and mouse, as well as other human disease models including chinchilla, dog, bonobo, pig, 13-lined ground squirrel, green monkey, and naked mole-rat. Functional information has been added to records via the assignment of annotations based on sequence similarity to human, rat, and mouse genes. RGD has also imported well-supported cross-species data from external resources. To enable use of these data, RGD has developed a robust infrastructure of standardized ontologies, data formats, and disease- and species-centric portals, complemented with a suite of innovative tools for discovery and analysis. Using examples of single-gene and polygenic human diseases, we illustrate how data from multiple species can help to identify or confirm a gene as involved in a disease and to identify model organisms that can be studied to understand the pathophysiology of a gene or pathway. The ultimate aim of this report is to demonstrate the utility of RGD not only as the core resource for the rat research community but also as a source of bioinformatic tools to support a wider audience, empowering the search for appropriate models for human afflictions.
Collapse
Affiliation(s)
- M L Kaldunski
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - J R Smith
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - G T Hayman
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - K Brodie
- Clinical and Translational Science Institute, Medical College of Wisconsin, Milwaukee, WI, USA
| | - J L De Pons
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - W M Demos
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - A C Gibson
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - M L Hill
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - M J Hoffman
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - L Lamers
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - S J F Laulederkind
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - H S Nalabolu
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - K Thorat
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - J Thota
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - M Tutaj
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - M A Tutaj
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - M Vedi
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - S J Wang
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - S Zacher
- Information Services, Medical College of Wisconsin, Milwaukee, WI, USA
| | - M R Dwinell
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - A E Kwitek
- Department of Biomedical Engineering, The Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA.
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA.
| |
Collapse
|
11
|
Venkatraman DL, Pulimamidi D, Shukla HG, Hegde SR. Tumor relevant protein functional interactions identified using bipartite graph analyses. Sci Rep 2021; 11:21530. [PMID: 34728699 PMCID: PMC8563864 DOI: 10.1038/s41598-021-00879-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 09/30/2021] [Indexed: 12/02/2022] Open
Abstract
An increased surge of -omics data for the diseases such as cancer allows for deriving insights into the affiliated protein interactions. We used bipartite network principles to build protein functional associations of the differentially regulated genes in 18 cancer types. This approach allowed us to combine expression data to functional associations in many cancers simultaneously. Further, graph centrality measures suggested the importance of upregulated genes such as BIRC5, UBE2C, BUB1B, KIF20A and PTH1R in cancer. Pathway analysis of the high centrality network nodes suggested the importance of the upregulation of cell cycle and replication associated proteins in cancer. Some of the downregulated high centrality proteins include actins, myosins and ATPase subunits. Among the transcription factors, mini-chromosome maintenance proteins (MCMs) and E2F family proteins appeared prominently in regulating many differentially regulated genes. The projected unipartite networks of the up and downregulated genes were comprised of 37,411 and 41,756 interactions, respectively. The conclusions obtained by collating these interactions revealed pan-cancer as well as subtype specific protein complexes and clusters. Therefore, we demonstrate that incorporating expression data from multiple cancers into bipartite graphs validates existing cancer associated mechanisms as well as directs to novel interactions and pathways.
Collapse
Affiliation(s)
| | - Deepshika Pulimamidi
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India
| | - Harsh G Shukla
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India
| | - Shubhada R Hegde
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India.
| |
Collapse
|
12
|
BioLitMine: Advanced Mining of Biomedical and Biological Literature About Human Genes and Genes from Major Model Organisms. G3-GENES GENOMES GENETICS 2020; 10:4531-4539. [PMID: 33028629 PMCID: PMC7718760 DOI: 10.1534/g3.120.401775] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The accumulation of biological and biomedical literature outpaces the ability of most researchers and clinicians to stay abreast of their own immediate fields, let alone a broader range of topics. Although available search tools support identification of relevant literature, finding relevant and key publications is not always straightforward. For example, important publications might be missed in searches with an official gene name due to gene synonyms. Moreover, ambiguity of gene names can result in retrieval of a large number of irrelevant publications. To address these issues and help researchers and physicians quickly identify relevant publications, we developed BioLitMine, an advanced literature mining tool that takes advantage of the medical subject heading (MeSH) index and gene-to-publication annotations already available for PubMed literature. Using BioLitMine, a user can identify what MeSH terms are represented in the set of publications associated with a given gene of the interest, or start with a term and identify relevant publications. Users can also use the tool to find co-cited genes and a build a literature co-citation network. In addition, BioLitMine can help users build a gene list relevant to a MeSH term, such as a list of genes relevant to “stem cells” or “breast neoplasms.” Users can also start with a gene or pathway of interest and identify authors associated with that gene or pathway, a feature that makes it easier to identify experts who might serve as collaborators or reviewers. Altogether, BioLitMine extends the value of PubMed-indexed literature and its existing expert curation by providing a robust and gene-centric approach to retrieval of relevant information.
Collapse
|
13
|
Smith JR, Hayman GT, Wang SJ, Laulederkind SJF, Hoffman MJ, Kaldunski ML, Tutaj M, Thota J, Nalabolu HS, Ellanki SLR, Tutaj MA, De Pons JL, Kwitek AE, Dwinell MR, Shimoyama ME. The Year of the Rat: The Rat Genome Database at 20: a multi-species knowledgebase and analysis platform. Nucleic Acids Res 2020; 48:D731-D742. [PMID: 31713623 PMCID: PMC7145519 DOI: 10.1093/nar/gkz1041] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/21/2019] [Accepted: 10/24/2019] [Indexed: 12/13/2022] Open
Abstract
Formed in late 1999, the Rat Genome Database (RGD, https://rgd.mcw.edu) will be 20 in 2020, the Year of the Rat. Because the laboratory rat, Rattus norvegicus, has been used as a model for complex human diseases such as cardiovascular disease, diabetes, cancer, neurological disorders and arthritis, among others, for >150 years, RGD has always been disease-focused and committed to providing data and tools for researchers doing comparative genomics and translational studies. At its inception, before the sequencing of the rat genome, RGD started with only a few data types localized on genetic and radiation hybrid (RH) maps and offered only a few tools for querying and consolidating that data. Since that time, RGD has expanded to include a wealth of structured and standardized genetic, genomic, phenotypic, and disease-related data for eight species, and a suite of innovative tools for querying, analyzing and visualizing this data. This article provides an overview of recent substantial additions and improvements to RGD's data and tools that can assist researchers in finding and utilizing the data they need, whether their goal is to develop new precision models of disease or to more fully explore emerging details within a system or across multiple systems.
Collapse
Affiliation(s)
- Jennifer R Smith
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
- To whom correspondence should be addressed. Tel: +1 414 955 8871; Fax: +1 414 955 6595;
| | - G Thomas Hayman
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Shur-Jen Wang
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Stanley J F Laulederkind
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Matthew J Hoffman
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
- Genomic Sciences and Precision Medicine Center and Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Mary L Kaldunski
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Monika Tutaj
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jyothi Thota
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Harika S Nalabolu
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Santoshi L R Ellanki
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Marek A Tutaj
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jeffrey L De Pons
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Anne E Kwitek
- Genomic Sciences and Precision Medicine Center and Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Melinda R Dwinell
- Genomic Sciences and Precision Medicine Center and Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Mary E Shimoyama
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| |
Collapse
|
14
|
Morenikeji OB, Akinyemi MO, Wheto M, Ogunshola OJ, Badejo AA, Chineke CA. Transcriptome profiling of four candidate milk genes in milk and tissue samples of temperate and tropical cattle. J Genet 2019. [DOI: 10.1007/s12041-019-1060-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
15
|
Das PP, Krishnan G, Doley J, Bhattacharya D, Deb SM, Chakravarty P, Das PJ. Establishing gene Amelogenin as sex-specific marker in yak by genomic approach. J Genet 2019; 98:7. [PMID: 30945688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Yak, an economically important bovine species considered as lifeline of the Himalaya. Indeed, this gigantic bovine is neglected because of the scientific intervention for its conservation as well as research documentation for a long time. Amelogenin is an essential protein for tooth enamel which eutherian mammals contain two copies in both X and Y chromosome each. In bovine, the deletion of a fragment of the nucleotide sequence in Y chromosome copy of exon 6 made Amelogenin an excellent sex-specific marker. Thus, an attempt was made to use the gene as an advanced molecular marker of sexing of the yak to improve breeding strategies and reproduction. The present study confirmed that the polymerase chain reaction amplification of the Amelogenin gene with a unique primer is useful in sex identification of the yak. The test is further refined with qPCR validation by quantifying the DNA copy number of the Amelogenin gene in male and female. We observed a high level of sequence polymorphisms of AMELX and AMELY in yak considered as novel identification. These tests can be further extended into several other specialized fields including forensics, meat production and processing, and quality control.
Collapse
Affiliation(s)
- P P Das
- Indian Council of Agricultural Research-National Research Centre on Yak, Dirang 790 101, India. ,
| | | | | | | | | | | | | |
Collapse
|
16
|
Abstract
In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.
Collapse
Affiliation(s)
- Patrick Ruch
- SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland.
- BiTeM Group, HES-SO\HEG Genève, 7 route de Drize, CH-1227, Carouge, Switzerland.
| |
Collapse
|
17
|
Gutiérrez-Sacristán A, Bravo À, Portero-Tresserra M, Valverde O, Armario A, Blanco-Gandía M, Farré A, Fernández-Ibarrondo L, Fonseca F, Giraldo J, Leis A, Mané A, Mayer M, Montagud-Romero S, Nadal R, Ortiz J, Pavon FJ, Perez EJ, Rodríguez-Arias M, Serrano A, Torrens M, Warnault V, Sanz F, Furlong LI. Text mining and expert curation to develop a database on psychiatric diseases and their genes. Database (Oxford) 2017; 2017:3891487. [PMID: 29220439 PMCID: PMC5502359 DOI: 10.1093/database/bax043] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Revised: 04/27/2017] [Accepted: 05/01/2017] [Indexed: 01/15/2023]
Abstract
Database URL http://www.psygenet.org. PsyGeNET corpus http://www.psygenet.org/ds/PsyGeNET/results/psygenetCorpus.tar.
Collapse
Affiliation(s)
- Alba Gutiérrez-Sacristán
- Research Group on Integrative Biomedical Informatics (GRIB), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), DCEXS, Universitat Pompeu Fabra (UPF), C/Dr. Aiguader 88, Barcelona 08003, Spain
| | - Àlex Bravo
- Research Group on Integrative Biomedical Informatics (GRIB), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), DCEXS, Universitat Pompeu Fabra (UPF), C/Dr. Aiguader 88, Barcelona 08003, Spain
| | - Marta Portero-Tresserra
- Neurobiology of Behaviour Research Group (GReNeC), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), DCEXS, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Olga Valverde
- Neurobiology of Behaviour Research Group (GReNeC), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), DCEXS, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Antonio Armario
- Institut de Neurociències and Animal Physiology Unit, Universitat Autònoma de Barcelona (UAB), Barcelona, Spain
- Network Biomedical Research Center on Mental Health (CIBERSAM)
| | - M.C. Blanco-Gandía
- Department of Psychobiology, Facultad de Psicología, Universitat de València, València, Spain
| | - Adriana Farré
- Institute of Neuropsychiatry and Addiction, Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Parc de Salut Mar, Universitat Autònoma de Barcelona (UAB), Bellaterra, Spain
| | - Lierni Fernández-Ibarrondo
- Programa de Cáncer (IMIM), Investigación Traslacional en Neoplasias Colorrectales, C/Dr. Aiguader 88, Barcelona, Spain
| | - Francina Fonseca
- Institute of Neuropsychiatry and Addiction, Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Parc de Salut Mar, Universitat Autònoma de Barcelona (UAB), Bellaterra, Spain
| | - Jesús Giraldo
- Network Biomedical Research Center on Mental Health (CIBERSAM)
- Institut de Neurociències and Unitat de Bioestadística, Universitat Autònoma de Barcelona (UAB), Bellaterra, Spain
| | - Angela Leis
- Research Group on Integrative Biomedical Informatics (GRIB), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), DCEXS, Universitat Pompeu Fabra (UPF), C/Dr. Aiguader 88, Barcelona 08003, Spain
| | - Anna Mané
- Network Biomedical Research Center on Mental Health (CIBERSAM)
- Institute of Neuropsychiatry and Addiction, Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Parc de Salut Mar, Universitat Autònoma de Barcelona (UAB), Bellaterra, Spain
| | - M.A. Mayer
- Research Group on Integrative Biomedical Informatics (GRIB), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), DCEXS, Universitat Pompeu Fabra (UPF), C/Dr. Aiguader 88, Barcelona 08003, Spain
| | - Sandra Montagud-Romero
- Department of Psychobiology, Facultad de Psicología, Universitat de València, València, Spain
| | - Roser Nadal
- Network Biomedical Research Center on Mental Health (CIBERSAM)
- Institut de Neurociències and Psychobiology Area, Universitat Autònoma de Barcelona (UAB), Bellaterra, Spain
| | - Jordi Ortiz
- Network Biomedical Research Center on Mental Health (CIBERSAM)
- Neuroscience Institute and Department of Biochemistry and Molecular Biology, School of Medicine, Universitat Autònoma de Barcelona (UAB), Bellaterra, Spain
| | - Francisco Javier Pavon
- Unidad de Gestión Clínica de Salud Mental, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospital Regional Universitario de Málaga, Málaga, Spain
| | - Ezequiel Jesús Perez
- Institute of Neuropsychiatry and Addiction, Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Parc de Salut Mar, Universitat Autònoma de Barcelona (UAB), Bellaterra, Spain
| | - Marta Rodríguez-Arias
- Department of Psychobiology, Facultad de Psicología, Universitat de València, València, Spain
| | - Antonia Serrano
- Unidad de Gestión Clínica de Salud Mental, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospital Regional Universitario de Málaga, Málaga, Spain
| | - Marta Torrens
- Institute of Neuropsychiatry and Addiction, Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Parc de Salut Mar, Universitat Autònoma de Barcelona (UAB), Bellaterra, Spain
| | - Vincent Warnault
- Neurobiology of Behaviour Research Group (GReNeC), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), DCEXS, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Ferran Sanz
- Research Group on Integrative Biomedical Informatics (GRIB), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), DCEXS, Universitat Pompeu Fabra (UPF), C/Dr. Aiguader 88, Barcelona 08003, Spain
| | - Laura I. Furlong
- Research Group on Integrative Biomedical Informatics (GRIB), Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), DCEXS, Universitat Pompeu Fabra (UPF), C/Dr. Aiguader 88, Barcelona 08003, Spain
| |
Collapse
|
18
|
Wang SJ, Laulederkind SJF, Hayman GT, Petri V, Smith JR, Tutaj M, Nigam R, Dwinell MR, Shimoyama M. Comprehensive coverage of cardiovascular disease data in the disease portals at the Rat Genome Database. Physiol Genomics 2016; 48:589-600. [PMID: 27287925 DOI: 10.1152/physiolgenomics.00046.2016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 06/08/2016] [Indexed: 01/18/2023] Open
Abstract
Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality.
Collapse
Affiliation(s)
- Shur-Jen Wang
- Department of Surgery, Medical College of Wisconsin, Milwaukee, Wisconsin; and
| | | | - G Thomas Hayman
- Department of Surgery, Medical College of Wisconsin, Milwaukee, Wisconsin; and
| | - Victoria Petri
- Department of Surgery, Medical College of Wisconsin, Milwaukee, Wisconsin; and
| | - Jennifer R Smith
- Department of Surgery, Medical College of Wisconsin, Milwaukee, Wisconsin; and
| | - Marek Tutaj
- Department of Surgery, Medical College of Wisconsin, Milwaukee, Wisconsin; and
| | - Rajni Nigam
- Department of Surgery, Medical College of Wisconsin, Milwaukee, Wisconsin; and
| | - Melinda R Dwinell
- Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin
| | - Mary Shimoyama
- Department of Surgery, Medical College of Wisconsin, Milwaukee, Wisconsin; and
| |
Collapse
|
19
|
Lim KMK, Li C, Chng KR, Nagarajan N. @MInter: automated text-mining of microbial interactions. Bioinformatics 2016; 32:2981-7. [PMID: 27312413 DOI: 10.1093/bioinformatics/btw357] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 05/31/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Microbial consortia are frequently defined by numerous interactions within the community that are key to understanding their function. While microbial interactions have been extensively studied experimentally, information regarding them is dispersed in the scientific literature. As manual collation is an infeasible option, automated data processing tools are needed to make this information easily accessible. RESULTS We present @MInter, an automated information extraction system based on Support Vector Machines to analyze paper abstracts and infer microbial interactions. @MInter was trained and tested on a manually curated gold standard dataset of 735 species interactions and 3917 annotated abstracts, constructed as part of this study. Cross-validation analysis showed that @MInter was able to detect abstracts pertaining to one or more microbial interactions with high specificity (specificity = 95%, AUC = 0.97). Despite challenges in identifying specific microbial interactions in an abstract (interaction level recall = 95%, precision = 25%), @MInter was shown to reduce annotator workload 13-fold compared to alternate approaches. Applying @MInter to 175 bacterial species abundant on human skin, we identified a network of 357 literature-reported microbial interactions, demonstrating its utility for the study of microbial communities. AVAILABILITY AND IMPLEMENTATION @MInter is freely available at https://github.com/CSB5/atminter CONTACT nagarajann@gis.a-star.edu.sg SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kun Ming Kenneth Lim
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore Computational Biology Program, Faculty of Science
| | - Chenhao Li
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore Department of Computer Science, National University of Singapore, Singapore, Singapore
| | - Kern Rei Chng
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Niranjan Nagarajan
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore Department of Computer Science, National University of Singapore, Singapore, Singapore
| |
Collapse
|
20
|
Hayman GT, Laulederkind SJF, Smith JR, Wang SJ, Petri V, Nigam R, Tutaj M, De Pons J, Dwinell MR, Shimoyama M. The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw034. [PMID: 27009807 PMCID: PMC4805243 DOI: 10.1093/database/baw034] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 02/29/2016] [Indexed: 12/23/2022]
Abstract
The Rat Genome Database (RGD;http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene-disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL:http://rgd.mcw.edu.
Collapse
Affiliation(s)
- G Thomas Hayman
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Stanley J F Laulederkind
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Jennifer R Smith
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Shur-Jen Wang
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Victoria Petri
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Rajni Nigam
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Marek Tutaj
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Jeff De Pons
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Melinda R Dwinell
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Physiology, Medical College of Wisconsin
| | - Mary Shimoyama
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| |
Collapse
|