1
|
Laughlin M, McIndoe R, Adams SH, Araiza R, Ayala JE, Kennedy L, Lanoue L, Lantier L, Macy J, Malabanan E, McGuinness OP, Perry R, Port D, Qi N, Elias CF, Shulman GI, Wasserman DH, Lloyd KCK. The mouse metabolic phenotyping center (MMPC) live consortium: an NIH resource for in vivo characterization of mouse models of diabetes and obesity. Mamm Genome 2024; 35:485-496. [PMID: 39191872 PMCID: PMC11522164 DOI: 10.1007/s00335-024-10067-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 08/14/2024] [Indexed: 08/29/2024]
Abstract
The Mouse Metabolic Phenotyping Center (MMPC)Live Program was established in 2023 by the National Institute for Diabetes, Digestive and Kidney Diseases (NIDDK) at the National Institutes of Health (NIH) to advance biomedical research by providing the scientific community with standardized, high quality phenotyping services for mouse models of diabetes and obesity. Emerging as the next iteration of the MMPC Program which served the biomedical research community for 20 years (2001-2021), MMPCLive is designed as an outwardly-facing consortium of service cores that collaborate to provide reduced-cost consultation and metabolic, physiologic, and behavioral phenotyping tests on live mice for U.S. biomedical researchers. Four MMPCLive Centers located at universities around the country perform complex and often unique procedures in vivo on a fee for service basis, typically on mice shipped from the client or directly from a repository or vendor. Current areas of expertise include energy balance and body composition, insulin action and secretion, whole body carbohydrate and lipid metabolism, cardiovascular and renal function, food intake and behavior, microbiome and xenometabolism, and metabolic pathway kinetics. Additionally, an opportunity arose to reduce barriers to access and expand the diversity of the biomedical research workforce by establishing the VIBRANT Program. Directed at researchers historically underrepresented in the biomedical sciences, VIBRANT-eligible investigators have access to testing services, travel and career development awards, expert advice and experimental design consultation, and short internships to learn test technologies. Data derived from experiments run by the Centers belongs to the researchers submitting mice for testing which can be made publicly available and accessible from the MMPCLive database following publication. In addition to services, MMPCLive staff provide expertise and advice to researchers, develop and refine test protocols, engage in outreach activities, publish scientific and technical papers, and conduct educational workshops and training sessions to aid researchers in unraveling the heterogeneity of diabetes and obesity.
Collapse
Affiliation(s)
- Maren Laughlin
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, USA
| | - Richard McIndoe
- Center for Biotechnology and Genomic Medicine, Augusta University, Augusta, USA
| | - Sean H Adams
- Department of Surgery, School of Medicine, University of California Davis, Davis, USA
- Center for Alimentary and Metabolic Science, School of Medicine, University of California Davis, Davis, USA
| | - Renee Araiza
- Mouse Biology Program, University of California Davis, Davis, USA
| | | | - Lucy Kennedy
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, USA
| | - Louise Lanoue
- Mouse Biology Program, University of California Davis, Davis, USA
| | | | - James Macy
- Department of Comparative Medicine, Yale School of Medicine, New Haven, USA
| | | | | | - Rachel Perry
- Department of Internal Medicine, Yale School of Medicine, New Haven, USA
- Department of Cellular & Molecular Physiology, Yale School of Medicine, New Haven, USA
| | - Daniel Port
- Mouse Biology Program, University of California Davis, Davis, USA
| | - Nathan Qi
- Department of Molecular & Integrated Physiology, University of Michigan, Ann Arbor, USA
- Caswell Diabetes Institute, University of Michigan Medical School, Ann Arbor, USA
| | - Carol F Elias
- Department of Molecular & Integrated Physiology, University of Michigan, Ann Arbor, USA
- Caswell Diabetes Institute, University of Michigan Medical School, Ann Arbor, USA
| | - Gerald I Shulman
- Department of Internal Medicine, Yale School of Medicine, New Haven, USA
- Department of Cellular & Molecular Physiology, Yale School of Medicine, New Haven, USA
| | | | - K C Kent Lloyd
- Department of Surgery, School of Medicine, University of California Davis, Davis, USA.
- Mouse Biology Program, University of California Davis, Davis, USA.
| |
Collapse
|
2
|
Marino GB, Ahmed N, Xie Z, Jagodnik KM, Han J, Clarke DJB, Lachmann A, Keller MP, Attie AD, Ma’ayan A. D2H2: diabetes data and hypothesis hub. BIOINFORMATICS ADVANCES 2023; 3:vbad178. [PMID: 38107655 PMCID: PMC10723036 DOI: 10.1093/bioadv/vbad178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/25/2023] [Accepted: 12/02/2023] [Indexed: 12/19/2023]
Abstract
Motivation There is a rapid growth in the production of omics datasets collected by the diabetes research community. However, such published data are underutilized for knowledge discovery. To make bioinformatics tools and published omics datasets from the diabetes field more accessible to biomedical researchers, we developed the Diabetes Data and Hypothesis Hub (D2H2). Results D2H2 contains hundreds of high-quality curated transcriptomics datasets relevant to diabetes, accessible via a user-friendly web-based portal. The collected and processed datasets are curated from the Gene Expression Omnibus (GEO). Each curated study has a dedicated page that provides data visualization, differential gene expression analysis, and single-gene queries. To enable the investigation of these curated datasets and to provide easy access to bioinformatics tools that serve gene and gene set-related knowledge, we developed the D2H2 chatbot. Utilizing GPT, we prompt users to enter free text about their data analysis needs. Parsing the user prompt, together with specifying information about all D2H2 available tools and workflows, we answer user queries by invoking the most relevant tools via the tools' API. D2H2 also has a hypotheses generation module where gene sets are randomly selected from the bulk RNA-seq precomputed signatures. We then find highly overlapping gene sets extracted from publications listed in PubMed Central with abstract dissimilarity. With the help of GPT, we speculate about a possible explanation of the high overlap between the gene sets. Overall, D2H2 is a platform that provides a suite of bioinformatics tools and curated transcriptomics datasets for hypothesis generation. Availability and implementation D2H2 is available at: https://d2h2.maayanlab.cloud/ and the source code is available from GitHub at https://github.com/MaayanLab/D2H2-site under the CC BY-NC 4.0 license.
Collapse
Affiliation(s)
- Giacomo B Marino
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Nasheath Ahmed
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Kathleen M Jagodnik
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Jason Han
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Daniel J B Clarke
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Alexander Lachmann
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Mark P Keller
- Department of Biochemistry, University of Wisconsin, Madison, WI 53706, United States
| | - Alan D Attie
- Department of Biochemistry, University of Wisconsin, Madison, WI 53706, United States
| | - Avi Ma’ayan
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| |
Collapse
|
3
|
Surles-Zeigler MC, Sincomb T, Gillespie TH, de Bono B, Bresnahan J, Mawe GM, Grethe JS, Tappan S, Heal M, Martone ME. Extending and using anatomical vocabularies in the stimulating peripheral activity to relieve conditions project. Front Neuroinform 2022; 16:819198. [PMID: 36090663 PMCID: PMC9449460 DOI: 10.3389/fninf.2022.819198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Accepted: 07/18/2022] [Indexed: 11/25/2022] Open
Abstract
The stimulating peripheral activity to relieve conditions (SPARC) program is a US National Institutes of Health-funded effort to improve our understanding of the neural circuitry of the autonomic nervous system (ANS) in support of bioelectronic medicine. As part of this effort, the SPARC project is generating multi-species, multimodal data, models, simulations, and anatomical maps supported by a comprehensive knowledge base of autonomic circuitry. To facilitate the organization of and integration across multi-faceted SPARC data and models, SPARC is implementing the findable, accessible, interoperable, and reusable (FAIR) data principles to ensure that all SPARC products are findable, accessible, interoperable, and reusable. We are therefore annotating and describing all products with a common FAIR vocabulary. The SPARC Vocabulary is built from a set of community ontologies covering major domains relevant to SPARC, including anatomy, physiology, experimental techniques, and molecules. The SPARC Vocabulary is incorporated into tools researchers use to segment and annotate their data, facilitating the application of these ontologies for annotation of research data. However, since investigators perform deep annotations on experimental data, not all terms and relationships are available in community ontologies. We therefore implemented a term management and vocabulary extension pipeline where SPARC researchers may extend the SPARC Vocabulary using InterLex, an online vocabulary management system. To ensure the quality of contributed terms, we have set up a curated term request and review pipeline specifically for anatomical terms involving expert review. Accepted terms are added to the SPARC Vocabulary and, when appropriate, contributed back to community ontologies to enhance ANS coverage. Here, we provide an overview of the SPARC Vocabulary, the infrastructure and process for implementing the term management and review pipeline. In an analysis of >300 anatomical contributed terms, the majority represented composite terms that necessitated combining terms within and across existing ontologies. Although these terms are not good candidates for community ontologies, they can be linked to structures contained within these ontologies. We conclude that the term request pipeline serves as a useful adjunct to community ontologies for annotating experimental data and increases the FAIRness of SPARC data.
Collapse
Affiliation(s)
| | - Troy Sincomb
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, United States
| | - Thomas H. Gillespie
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, United States
| | - Bernard de Bono
- Whitby et al., Inc., Indianapolis, IN, United States
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Jacqueline Bresnahan
- Brain and Spinal Injury Center, University of California, San Francisco, San Francisco, CA, United States
| | - Gary M. Mawe
- Department of Neurological Sciences, University of Vermont, Burlington, VT, United States
| | - Jeffrey S. Grethe
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, United States
| | | | - Maci Heal
- MBF Bioscience, Williston, VT, United States
| | - Maryann E. Martone
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, United States
| |
Collapse
|
4
|
Ochsner SA, Pillich RT, Rawool D, Grethe JS, McKenna NJ. Transcriptional regulatory networks of circulating immune cells in type 1 diabetes: A community knowledgebase. iScience 2022; 25:104581. [PMID: 35832893 PMCID: PMC9272393 DOI: 10.1016/j.isci.2022.104581] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 06/01/2022] [Accepted: 06/07/2022] [Indexed: 12/02/2022] Open
Abstract
Investigator-generated transcriptomic datasets interrogating circulating immune cell (CIC) gene expression in clinical type 1 diabetes (T1D) have underappreciated re-use value. Here, we repurposed these datasets to create an open science environment for the generation of hypotheses around CIC signaling pathways whose gain or loss of function contributes to T1D pathogenesis. We firstly computed sets of genes that were preferentially induced or repressed in T1D CICs and validated these against community benchmarks. We then inferred and validated signaling node networks regulating expression of these gene sets, as well as differentially expressed genes in the original underlying T1D case:control datasets. In a set of three use cases, we demonstrated how informed integration of these networks with complementary digital resources supports substantive, actionable hypotheses around signaling pathway dysfunction in T1D CICs. Finally, we developed a federated, cloud-based web resource that exposes the entire data matrix for unrestricted access and re-use by the research community. Re-use of transcriptomic type 1 diabetes (T1D) circulating immune cells (CICs) datasets We generated transcriptional regulatory networks for T1D CICs Use cases generate substantive hypotheses around signaling pathway dysfunction in T1D CICs Networks are freely accessible on the web for re-use by the research community
Collapse
Affiliation(s)
- Scott A. Ochsner
- Department of Molecular, Baylor College of Medicine, Houston, TX 77030, USA
- Cellular Biology and Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Rudolf T. Pillich
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Deepali Rawool
- Center for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093, USA
| | - Jeffrey S. Grethe
- Center for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093, USA
| | - Neil J. McKenna
- Department of Molecular, Baylor College of Medicine, Houston, TX 77030, USA
- Cellular Biology and Medicine, Baylor College of Medicine, Houston, TX 77030, USA
- Corresponding author
| |
Collapse
|
5
|
Torres-Espín A, Almeida CA, Chou A, Huie JR, Chiu M, Vavrek R, Sacramento J, Orr MB, Gensel JC, Grethe JS, Martone ME, Fouad K, Ferguson AR. Promoting FAIR Data Through Community-driven Agile Design: the Open Data Commons for Spinal Cord Injury (odc-sci.org). Neuroinformatics 2022; 20:203-219. [PMID: 34347243 PMCID: PMC9537193 DOI: 10.1007/s12021-021-09533-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/16/2021] [Indexed: 01/07/2023]
Abstract
The past decade has seen accelerating movement from data protectionism in publishing toward open data sharing to improve reproducibility and translation of biomedical research. Developing data sharing infrastructures to meet these new demands remains a challenge. One model for data sharing involves simply attaching data, irrespective of its type, to publisher websites or general use repositories. However, some argue this creates a 'data dump' that does not promote the goals of making data Findable, Accessible, Interoperable and Reusable (FAIR). Specialized data sharing communities offer an alternative model where data are curated by domain experts to make it both open and FAIR. We report on our experiences developing one such data-sharing ecosystem focusing on 'long-tail' preclinical data, the Open Data Commons for Spinal Cord Injury (odc-sci.org). ODC-SCI was developed with community-based agile design requirements directly pulled from a series of workshops with multiple stakeholders (researchers, consumers, non-profit funders, governmental agencies, journals, and industry members). ODC-SCI focuses on heterogeneous tabular data collected by preclinical researchers including bio-behaviour, histopathology findings and molecular endpoints. This has led to an example of a specialized neurocommons that is well-embraced by the community it aims to serve. In the present paper, we provide a review of the community-based design template and describe the adoption by the community including a high-level review of current data assets, publicly released datasets, and web analytics. Although odc-sci.org is in its late beta stage of development, it represents a successful example of a specialized data commons that may serve as a model for other fields.
Collapse
Affiliation(s)
- Abel Torres-Espín
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Carlos A Almeida
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Austin Chou
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - J Russell Huie
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Michael Chiu
- Department of Neuroscience, University of California, San Diego, San Diego, CA, USA
| | - Romana Vavrek
- Faculty of Rehabilitation Medicine and the Neuroscience and Mental Health Institute, University of Alberta, Edmonton, AB, Canada
| | - Jeff Sacramento
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Michael B Orr
- Spinal Cord and Brain Injury Research Center, Department of Physiology, University of Kentucky College of Medicine, Lexington, KY, USA
| | - John C Gensel
- Spinal Cord and Brain Injury Research Center, Department of Physiology, University of Kentucky College of Medicine, Lexington, KY, USA
| | - Jeffery S Grethe
- Department of Neuroscience, University of California, San Diego, San Diego, CA, USA
| | - Maryann E Martone
- Department of Neuroscience, University of California, San Diego, San Diego, CA, USA
| | - Karim Fouad
- Faculty of Rehabilitation Medicine and the Neuroscience and Mental Health Institute, University of Alberta, Edmonton, AB, Canada.
| | - Adam R Ferguson
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA.
- San Francisco Veterans Affairs Health Care System, San Francisco, CA, USA.
| |
Collapse
|
6
|
Murphy F, Bar-Sinai M, Martone ME. A tool for assessing alignment of biomedical data repositories with open, FAIR, citation and trustworthy principles. PLoS One 2021; 16:e0253538. [PMID: 34242248 PMCID: PMC8270168 DOI: 10.1371/journal.pone.0253538] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 06/08/2021] [Indexed: 11/19/2022] Open
Abstract
Increasing attention is being paid to the operation of biomedical data repositories in light of efforts to improve how scientific data is handled and made available for the long term. Multiple groups have produced recommendations for functions that biomedical repositories should support, with many using requirements of the FAIR data principles as guidelines. However, FAIR is but one set of principles that has arisen out of the open science community. They are joined by principles governing open science, data citation and trustworthiness, all of which are important aspects for biomedical data repositories to support. Together, these define a framework for data repositories that we call OFCT: Open, FAIR, Citable and Trustworthy. Here we developed an instrument using the open source PolicyModels toolkit that attempts to operationalize key aspects of OFCT principles and piloted the instrument by evaluating eight biomedical community repositories listed by the NIDDK Information Network (dkNET.org). Repositories included both specialist repositories that focused on a particular data type or domain, in this case diabetes and metabolomics, and generalist repositories that accept all data types and domains. The goal of this work was both to obtain a sense of how much the design of current biomedical data repositories align with these principles and to augment the dkNET listing with additional information that may be important to investigators trying to choose a repository, e.g., does the repository fully support data citation? The evaluation was performed from March to November 2020 through inspection of documentation and interaction with the sites by the authors. Overall, although there was little explicit acknowledgement of any of the OFCT principles in our sample, the majority of repositories provided at least some support for their tenets.
Collapse
Affiliation(s)
- Fiona Murphy
- MoreBrains Cooperative Ltd, Chichester, United Kingdom
| | - Michael Bar-Sinai
- Department of Computer Science, Ben-Gurion University of the Negev and The Institute of Quantitative Social Science at Harvard University, Beersheba, Israel
| | - Maryann E. Martone
- Department of Neurosciences, SciCrunch, Inc., University of California, San Diego, California, United States of America
| |
Collapse
|
7
|
Ochsner SA, Pillich RT, McKenna NJ. Consensus transcriptional regulatory networks of coronavirus-infected human cells. Sci Data 2020; 7:314. [PMID: 32963239 PMCID: PMC7509801 DOI: 10.1038/s41597-020-00628-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 08/05/2020] [Indexed: 02/08/2023] Open
Abstract
Establishing consensus around the transcriptional interface between coronavirus (CoV) infection and human cellular signaling pathways can catalyze the development of novel anti-CoV therapeutics. Here, we used publicly archived transcriptomic datasets to compute consensus regulatory signatures, or consensomes, that rank human genes based on their rates of differential expression in MERS-CoV (MERS), SARS-CoV-1 (SARS1) and SARS-CoV-2 (SARS2)-infected cells. Validating the CoV consensomes, we show that high confidence transcriptional targets (HCTs) of MERS, SARS1 and SARS2 infection intersect with HCTs of signaling pathway nodes with known roles in CoV infection. Among a series of novel use cases, we gather evidence for hypotheses that SARS2 infection efficiently represses E2F family HCTs encoding key drivers of DNA replication and the cell cycle; that progesterone receptor signaling antagonizes SARS2-induced inflammatory signaling in the airway epithelium; and that SARS2 HCTs are enriched for genes involved in epithelial to mesenchymal transition. The CoV infection consensomes and HCT intersection analyses are freely accessible through the Signaling Pathways Project knowledgebase, and as Cytoscape-style networks in the Network Data Exchange repository.
Collapse
Affiliation(s)
- Scott A Ochsner
- The Signaling Pathways Project and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Rudolf T Pillich
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Neil J McKenna
- The Signaling Pathways Project and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
8
|
Ochsner SA, Pillich RT, McKenna NJ. Consensus transcriptional regulatory networks of coronavirus-infected human cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.04.24.059527. [PMID: 32511379 PMCID: PMC7263508 DOI: 10.1101/2020.04.24.059527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Establishing consensus around the transcriptional interface between coronavirus (CoV) infection and human cellular signaling pathways can catalyze the development of novel anti-CoV therapeutics. Here, we used publicly archived transcriptomic datasets to compute consensus regulatory signatures, or consensomes, that rank human genes based on their rates of differential expression in MERS-CoV (MERS), SARS-CoV-1 (SARS1) and SARS-CoV-2 (SARS2)-infected cells. Validating the CoV consensomes, we show that high confidence transcriptional targets (HCTs) of CoV infection intersect with HCTs of signaling pathway nodes with known roles in CoV infection. Among a series of novel use cases, we gather evidence for hypotheses that SARS2 infection efficiently represses E2F family target genes encoding key drivers of DNA replication and the cell cycle; that progesterone receptor signaling antagonizes SARS2-induced inflammatory signaling in the airway epithelium; and that SARS2 HCTs are enriched for genes involved in epithelial to mesenchymal transition. The CoV infection consensomes and HCT intersection analyses are freely accessible through the Signaling Pathways Project knowledgebase, and as Cytoscape-style networks in the Network Data Exchange repository.
Collapse
Affiliation(s)
- Scott A Ochsner
- The Signaling Pathways Project and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030
| | - Rudolf T Pillich
- Department of Medicine, University of California San Diego, La Jolla, CA 92093
| | - Neil J McKenna
- The Signaling Pathways Project and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030
| |
Collapse
|
9
|
Reckoning the Dearth of Bioinformatics in the Arena of Diabetic Nephropathy (DN)—Need to Improvise. Processes (Basel) 2020. [DOI: 10.3390/pr8070808] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Diabetic nephropathy (DN) is a recent rising concern amongst diabetics and diabetologist. Characterized by abnormal renal function and ending in total loss of kidney function, this is becoming a lurking danger for the ever increasing population of diabetics. This review touches upon the intensity of this complication and briefly reviews the role of bioinformatics in the area of diabetes. The advances made in the area of DN using proteomic approaches are presented. Compared to the enumerable inputs observed through the use of bioinformatics resources in the area of proteomics and even diabetes, the existing scenario of skeletal application of bioinformatics advances to DN is highlighted and the reasons behind this discussed. As this review highlights, almost none of the well-established tools that have brought breakthroughs in proteomic research have been applied into DN. Laborious, voluminous, cost expensive and time-consuming methodologies and advances in diagnostics and biomarker discovery promised through beckoning bioinformatics mechanistic approaches to improvise DN research and achieve breakthroughs. This review is expected to sensitize the researchers to fill in this gap, exploiting the available inputs from bioinformatics resources.
Collapse
|
10
|
Hsu CN, Bandrowski AE, Gillespie TH, Udell J, Lin KW, Ozyurt IB, Grethe JS, Martone ME. Comparing the Use of Research Resource Identifiers and Natural Language Processing for Citation of Databases, Software, and Other Digital Artifacts. Comput Sci Eng 2020. [DOI: 10.1109/mcse.2019.2952838] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
11
|
Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 2018; 25:300-308. [PMID: 29346583 PMCID: PMC7378878 DOI: 10.1093/jamia/ocx121] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 09/20/2017] [Accepted: 09/28/2017] [Indexed: 12/17/2022] Open
Abstract
Objective Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. Materials and Methods DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Results and Conclusion Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.
Collapse
Affiliation(s)
- Xiaoling Chen
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Anupama E Gururaj
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Ruiling Liu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ergin Soysal
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Firat Tiryaki
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yueling Li
- Center for Research in Biological Systems
| | - Nansu Zong
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Min Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Deevakar Rogith
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Mandana Salimi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hyeon-Eui Kim
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | | | - Claudiu Farcas
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Todd Johnson
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ron Margolis
- National Institutes of Health, Bethesda, MD, USA
| | | | | | - Ian M Fore
- National Institutes of Health, Bethesda, MD, USA
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
12
|
Ozyurt IB, Grethe JS. Foundry: a message-oriented, horizontally scalable ETL system for scientific data integration and enhancement. Database (Oxford) 2018; 2018:5255189. [PMID: 30576493 PMCID: PMC6301337 DOI: 10.1093/database/bay130] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/18/2018] [Accepted: 11/14/2018] [Indexed: 11/12/2022]
Abstract
Data generated by scientific research enables further advancement in science through reanalyses and pooling of data for novel analyses. With the increasing amounts of scientific data generated by biomedical research providing researchers with more data than they have ever had access to, finding the data matching the researchers' requirements continues to be a major challenge and will only grow more challenging as more data is produced and shared. In this paper, we introduce a horizontally scalable distributed extract-transform-load system to tackle scientific data aggregation, transformation and enhancement for scientific data discovery and retrieval. We also introduce a data transformation language for biomedical curators allowing for the transformation and combination of data/metadata from heterogeneous data sources. Applicability of the system for scientific data is illustrated in biomedical and earth science domains.
Collapse
Affiliation(s)
- Ibrahim Burak Ozyurt
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| | - Jeffrey S Grethe
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
13
|
Darlington YF, Naumov A, McOwiti A, Kankanamge WH, Becnel LB, McKenna NJ. Improving the discoverability, accessibility, and citability of omics datasets: a case report. J Am Med Inform Assoc 2017; 24:388-393. [PMID: 27413121 DOI: 10.1093/jamia/ocw096] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/23/2016] [Indexed: 11/12/2022] Open
Abstract
Although omics datasets represent valuable assets for hypothesis generation, model testing, and data validation, the infrastructure supporting their reuse lacks organization and consistency. Using nuclear receptor signaling transcriptomic datasets as proof of principle, we developed a model to improve the discoverability, accessibility, and citability of published omics datasets. Primary datasets were retrieved from archives, processed to extract data points, then subjected to metadata enrichment and gap filling. The resulting secondary datasets were exposed on responsive web pages to support mining of gene lists, discovery of related datasets, and single-click citation integration with popular reference managers. Automated processes were established to embed digital object identifier-driven links to the secondary datasets in associated journal articles, small molecule and gene-centric databases, and a dataset search engine. Our model creates multiple points of access to reprocessed and reannotated derivative datasets across the digital biomedical research ecosystem, promoting their visibility and usability across disparate research communities.
Collapse
Affiliation(s)
- Yolanda F Darlington
- Dan L. Duncan Comprehensive Cancer Center Biomedical Informatics Group, Baylor College of Medicine, Houston, Texas, USA
| | - Alexey Naumov
- Dan L. Duncan Comprehensive Cancer Center Biomedical Informatics Group, Baylor College of Medicine, Houston, Texas, USA
| | - Apollo McOwiti
- Dan L. Duncan Comprehensive Cancer Center Biomedical Informatics Group, Baylor College of Medicine, Houston, Texas, USA
| | - Wasula H Kankanamge
- Dan L. Duncan Comprehensive Cancer Center Biomedical Informatics Group, Baylor College of Medicine, Houston, Texas, USA
| | - Lauren B Becnel
- Dan L. Duncan Comprehensive Cancer Center Biomedical Informatics Group, Baylor College of Medicine, Houston, Texas, USA.,Clinical Data Interchange Standards Consortium (CDISC), Austin, Texas, USA
| | - Neil J McKenna
- Nuclear Receptor Signaling Atlas Informatics Hub, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|