1
|
Halu A, Chelvanambi S, Decano JL, Matamalas JT, Whelan M, Asano T, Kalicharran N, Singh SA, Loscalzo J, Aikawa M. Integrating pharmacogenomics and cheminformatics with diverse disease phenotypes for cell type-guided drug discovery. Genome Med 2025; 17:7. [PMID: 39833831 PMCID: PMC11744892 DOI: 10.1186/s13073-025-01431-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 01/08/2025] [Indexed: 01/22/2025] Open
Abstract
BACKGROUND Large-scale pharmacogenomic resources, such as the Connectivity Map (CMap), have greatly assisted computational drug discovery. However, despite their widespread use, CMap-based methods have thus far been agnostic to the biological activity of drugs as well as to the genomic effects of drugs in multiple disease contexts. Here, we present a network-based statistical approach, Pathopticon, that uses CMap to build cell type-specific gene-drug perturbation networks and integrates these networks with cheminformatic data and diverse disease phenotypes to prioritize drugs in a cell type-dependent manner. METHODS We build cell type-specific gene-drug perturbation networks from CMap data using a statistical procedure we call Quantile-based Instance Z-score Consensus (QUIZ-C). Using these networks and a large-scale disease-gene network consisting of 569 disease signatures from the Enrichr database, we calculate Pathophenotypic Congruity Scores (PACOS) between input gene signatures and drug perturbation signatures and combine these scores with cheminformatic data from ChEMBL to prioritize drugs. We benchmark our approach by calculating area under the receiver operating characteristic curves (AUROC) for 73 gene sets from the Molecular Signatures Database (MSigDB) using target gene expression profiles from the Comparative Toxicogenomics Database (CTD). We validate the drugs predicted in our proofs-of-concept using real-time polymerase chain reaction (qPCR) experiments. RESULTS Cell type-specific gene-drug perturbation networks built using QUIZ-C are topologically distinct, reflecting the biological uniqueness of the cell lines in CMap, and are enriched in known drug targets. Pathopticon demonstrates a better prediction performance than solely cheminformatic measures as well as state-of-the-art network and deep learning-based methods. Top predictions made by Pathopticon have high chemical structural diversity, suggesting their potential for building compound libraries. In proof-of-concept applications on vascular diseases, we demonstrate that Pathopticon helps guide in vitro experiments by identifying pathways that are potentially regulated by the predicted therapeutic candidates. CONCLUSIONS Our network-based analytical framework integrating pharmacogenomics and cheminformatics (available at https://github.com/r-duh/Pathopticon ) provides a feasible blueprint for a cell type-specific drug discovery and repositioning platform with broad implications for the efficiency and success of drug development.
Collapse
Affiliation(s)
- Arda Halu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 181 Longwood Avenue, Boston, MA, 02115, USA.
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Center for Life Sciences Boston Bldg., 17th Floor, 3 Blackfan Street, Boston, MA, 02115, USA.
| | - Sarvesh Chelvanambi
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Center for Life Sciences Boston Bldg., 17th Floor, 3 Blackfan Street, Boston, MA, 02115, USA
| | - Julius L Decano
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Center for Life Sciences Boston Bldg., 17th Floor, 3 Blackfan Street, Boston, MA, 02115, USA
| | - Joan T Matamalas
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Center for Life Sciences Boston Bldg., 17th Floor, 3 Blackfan Street, Boston, MA, 02115, USA
| | - Mary Whelan
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Center for Life Sciences Boston Bldg., 17th Floor, 3 Blackfan Street, Boston, MA, 02115, USA
| | - Takaharu Asano
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Center for Life Sciences Boston Bldg., 17th Floor, 3 Blackfan Street, Boston, MA, 02115, USA
| | - Namitra Kalicharran
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Center for Life Sciences Boston Bldg., 17th Floor, 3 Blackfan Street, Boston, MA, 02115, USA
| | - Sasha A Singh
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Center for Life Sciences Boston Bldg., 17th Floor, 3 Blackfan Street, Boston, MA, 02115, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 181 Longwood Avenue, Boston, MA, 02115, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Masanori Aikawa
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 181 Longwood Avenue, Boston, MA, 02115, USA.
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Center for Life Sciences Boston Bldg., 17th Floor, 3 Blackfan Street, Boston, MA, 02115, USA.
| |
Collapse
|
2
|
Kyoda K, Itoga H, Yamagata Y, Fujisawa E, Wang F, Miranda-Miranda M, Yamamoto H, Nakano Y, Tohsato Y, Onami S. SSBD: an ecosystem for enhanced sharing and reuse of bioimaging data. Nucleic Acids Res 2025; 53:D1716-D1723. [PMID: 39479781 PMCID: PMC11701685 DOI: 10.1093/nar/gkae860] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/07/2024] [Accepted: 09/21/2024] [Indexed: 01/18/2025] Open
Abstract
SSBD (https://ssbd.riken.jp) is a platform for the sharing and reuse of bioimaging data. As part of efforts to build a bioimaging data ecosystem, SSBD has recently been updated to a two-tiered data resource comprising SSBD:repository, a public repository for the sharing of all types of bioimaging data reported in journals, and SSBD:database, an added-value database for the sharing of curated, highly reusable, metadata-rich data. This update addresses the conflicting demands of rapid data publication and sharing of richly annotated data, thereby promoting bioimaging data sharing and reuse. With this update, SSBD is now positioned as a core repository and database within the foundingGIDE, an international consortium working to establish a global image data ecosystem. Harmonizing metadata between data resources enables cross-searching and data exchange with data resources from other countries and regions.
Collapse
Affiliation(s)
- Koji Kyoda
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Hiroya Itoga
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Yuki Yamagata
- Life Science Data Sharing Unit, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
- Integrated Bioresource Information Division, RIKEN Bioresource Research Center, 3-1-1 Koyadai, Tsukuba, Ibaraki 350-0074, Japan
| | - Emi Fujisawa
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Fangfang Wang
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
- Life Science Data Sharing Unit, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Miguel Miranda-Miranda
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Haruna Yamamoto
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Yasue Nakano
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Yukako Tohsato
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
- Faculty of Information Science and Engineering, Ritsumeikan University, 2-150 Iwakura-cho, Ibaraki, Osaka 567-8570, Japan
| | - Shuichi Onami
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
- Life Science Data Sharing Unit, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| |
Collapse
|
3
|
Chow T, Humble W, Lucarelli E, Onofrillo C, Choong PF, Di Bella C, Duchi S. Feasibility and barriers to rapid establishment of patient-derived primary osteosarcoma cell lines in clinical management. iScience 2024; 27:110251. [PMID: 39286504 PMCID: PMC11403063 DOI: 10.1016/j.isci.2024.110251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2024] Open
Abstract
Osteosarcoma is a highly aggressive primary bone tumor that has seen little improvement in survival rates in the past three decades. Preclinical studies are conducted on a small pool of commercial cell lines which may not fully reflect the genetic heterogeneity of this complex cancer, potentially hindering translatability of in vitro results. Developing a single-site laboratory protocol to rapidly establish patient-derived primary cancer cell lines (PCCL) within a clinically actionable time frame of a few weeks will have significant scientific and clinical ramifications. These PCCL can widen the pool of available cell lines for study while patient-specific data could derive therapeutic correlation. This endeavor is exceedingly challenging considering the proposed time constraints. By proposing key definitions and a clear theoretical framework, this evaluation of osteosarcoma cell line establishment methodology over the past three decades assesses feasibility by identifying barriers and suggesting solutions, thereby facilitating systematic experimentation and optimization.
Collapse
Affiliation(s)
- Thomas Chow
- Melbourne Medical School, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia
- BioFab3D-ACMD, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
| | - William Humble
- BioFab3D-ACMD, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
- Department of Surgery, The University of Melbourne, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
| | - Enrico Lucarelli
- Osteoncology, Bone and Soft Tissue Sarcomas and Innovative Therapies Unit, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136 Bologna, Italy
| | - Carmine Onofrillo
- BioFab3D-ACMD, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
- Department of Surgery, The University of Melbourne, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
| | - Peter F Choong
- BioFab3D-ACMD, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
- Department of Surgery, The University of Melbourne, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
| | - Claudia Di Bella
- BioFab3D-ACMD, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
- Department of Surgery, The University of Melbourne, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
- Department of Orthopaedics, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
| | - Serena Duchi
- BioFab3D-ACMD, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
- Department of Surgery, The University of Melbourne, St Vincent's Hospital Melbourne, Fitzroy, VIC, Australia
| |
Collapse
|
4
|
Mulero-Hernández J, Mironov V, Miñarro-Giménez JA, Kuiper M, Fernández-Breis J. Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation. Nucleic Acids Res 2024; 52:e69. [PMID: 38967009 PMCID: PMC11347148 DOI: 10.1093/nar/gkae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 07/06/2024] Open
Abstract
Knowledge about transcription factor binding and regulation, target genes, cis-regulatory modules and topologically associating domains is not only defined by functional associations like biological processes or diseases but also has a determinative genome location aspect. Here, we exploit these location and functional aspects together to develop new strategies to enable advanced data querying. Many databases have been developed to provide information about enhancers, but a schema that allows the standardized representation of data, securing interoperability between resources, has been lacking. In this work, we use knowledge graphs for the standardized representation of enhancers and topologically associating domains, together with data about their target genes, transcription factors, location on the human genome, and functional data about diseases and gene ontology annotations. We used this schema to integrate twenty-five enhancer datasets and two domain datasets, creating the most powerful integrative resource in this field to date. The knowledge graphs have been implemented using the Resource Description Framework and integrated within the open-access BioGateway knowledge network, generating a resource that contains an interoperable set of knowledge graphs (enhancers, TADs, genes, proteins, diseases, GO terms, and interactions between domains). We show how advanced queries, which combine functional and location restrictions, can be used to develop new hypotheses about functional aspects of gene expression regulation.
Collapse
Affiliation(s)
- Juan Mulero-Hernández
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Vladimir Mironov
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - José Antonio Miñarro-Giménez
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| |
Collapse
|
5
|
Cavalleri E, Cabri A, Soto-Gomez M, Bonfitto S, Perlasca P, Gliozzo J, Callahan TJ, Reese J, Robinson PN, Casiraghi E, Valentini G, Mesiti M. An ontology-based knowledge graph for representing interactions involving RNA molecules. Sci Data 2024; 11:906. [PMID: 39174566 PMCID: PMC11341713 DOI: 10.1038/s41597-024-03673-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 07/23/2024] [Indexed: 08/24/2024] Open
Abstract
The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to each patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are constantly produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the "RNA world" is still lacking. We propose RNA-KG, a knowledge graph (KG) encompassing biological knowledge about RNAs gathered from more than 60 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts. To develop RNA-KG, we first identified, pre-processed, and characterized each data source; next, we built a meta-graph that provides an ontological description of the KG by representing all the bio-molecular entities and medical concepts of interest in this domain, as well as the types of interactions connecting them. Finally, we leveraged an instance-based semantically abstracted knowledge model to specify the ontological alignment according to which RNA-KG was generated. RNA-KG can be downloaded in different formats and also queried by a SPARQL endpoint. A thorough topological analysis of the resulting heterogeneous graph provides further insights into the characteristics of the "RNA world". RNA-KG can be both directly explored and visualized, and/or analyzed by applying computational methods to infer bio-medical knowledge from its heterogeneous nodes and edges. The resource can be easily updated with new experimental data, and specific views of the overall KG can be extracted according to the bio-medical problem to be studied.
Collapse
Affiliation(s)
- Emanuele Cavalleri
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Alberto Cabri
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Mauricio Soto-Gomez
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Sara Bonfitto
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Paolo Perlasca
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Jessica Gliozzo
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health - Charité, Universitätsmedizin, Berlin, 13353, Germany
- ELLIS, European Laboratory for Learning and Intelligent Systems, Munich, Germany
| | - Elena Casiraghi
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- ELLIS, European Laboratory for Learning and Intelligent Systems, Munich, Germany
| | - Giorgio Valentini
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Munich, Germany
| | - Marco Mesiti
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy.
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| |
Collapse
|
6
|
Martinez K, Agirre J, Akune Y, Aoki-Kinoshita KF, Arighi C, Axelsen KB, Bolton E, Bordeleau E, Edwards NJ, Fadda E, Feizi T, Hayes C, Ives CM, Joshi HJ, Krishna Prasad K, Kossida S, Lisacek F, Liu Y, Lütteke T, Ma J, Malik A, Martin M, Mehta AY, Neelamegham S, Panneerselvam K, Ranzinger R, Ricard-Blum S, Sanou G, Shanker V, Thomas PD, Tiemeyer M, Urban J, Vita R, Vora J, Yamamoto Y, Mazumder R. Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy. Database (Oxford) 2024; 2024:baae073. [PMID: 39137905 PMCID: PMC11321244 DOI: 10.1093/database/baae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/24/2024] [Accepted: 07/10/2024] [Indexed: 08/15/2024]
Abstract
Dynamic changes in protein glycosylation impact human health and disease progression. However, current resources that capture disease and phenotype information focus primarily on the macromolecules within the central dogma of molecular biology (DNA, RNA, proteins). To gain a better understanding of organisms, there is a need to capture the functional impact of glycans and glycosylation on biological processes. A workshop titled "Functional impact of glycans and their curation" was held in conjunction with the 16th Annual International Biocuration Conference to discuss ongoing worldwide activities related to glycan function curation. This workshop brought together subject matter experts, tool developers, and biocurators from over 20 projects and bioinformatics resources. Participants discussed four key topics for each of their resources: (i) how they curate glycan function-related data from publications and other sources, (ii) what type of data they would like to acquire, (iii) what data they currently have, and (iv) what standards they use. Their answers contributed input that provided a comprehensive overview of state-of-the-art glycan function curation and annotations. This report summarizes the outcome of discussions, including potential solutions and areas where curators, data wranglers, and text mining experts can collaborate to address current gaps in glycan and glycosylation annotations, leveraging each other's work to improve their respective resources and encourage impactful data sharing among resources. Database URL: https://wiki.glygen.org/Glycan_Function_Workshop_2023.
Collapse
Affiliation(s)
- Karina Martinez
- Department of Biochemistry & Molecular Medicine, The George Washington University School of Medicine and Health Sciences, 2300 I St. NW, Washington, DC 20052, United States
| | - Jon Agirre
- York Structural Biology Laboratory, Department of Chemistry, University of York, Wentworth Way, York YO10 5DD, United Kingdom
| | - Yukie Akune
- The Glycosciences Laboratory, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, United Kingdom
| | - Kiyoko F Aoki-Kinoshita
- Glycan and Life Systems Integration Center (GaLSIC), Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave, Newark, DE 19716, United States
| | - Kristian B Axelsen
- Swiss-Prot Group, Swiss Institute of Bioinformatics (SIB), CMU, 1 rue Michel Servet, Geneva 4 1211, Switzerland
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, United States
| | - Emily Bordeleau
- Michael Smith Laboratories, The University of British Columbia, 2185 East Mall, Vancouver, British Columbia V6T 1Z4, Canada
| | - Nathan J Edwards
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, 2115 Wisconsin Ave NW, Washington, DC 20007, United States
| | - Elisa Fadda
- Department of Chemistry and Hamilton Institute, Maynooth University, Kilcock Road, Maynooth, Co. Kildare W23 AH3Y, Ireland
| | - Ten Feizi
- The Glycosciences Laboratory, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, United Kingdom
| | - Catherine Hayes
- Proteome Informatics Group, Swiss Institute of Bioinformatics (SIB), route de Drize 7, Geneva CH-1227, Switzerland
| | - Callum M Ives
- Department of Chemistry and Hamilton Institute, Maynooth University, Kilcock Road, Maynooth, Co. Kildare W23 AH3Y, Ireland
| | - Hiren J Joshi
- Copenhagen Center for Glycomics, Department of Cellular and Molecular Medicine, Faculty of Health Sciences, University of Copenhagen, Blegdamsvej 3, Copenhagen DK-2200, Denmark
| | - Khakurel Krishna Prasad
- ELI Beamlines Facility, The Extreme Light Infrastructure ERIC, Za Radnicí 835, Dolní Břežany 25241, Czech Republic
| | - Sofia Kossida
- IMGT, The International ImMunoGeneTics Information System, National Center for Scientific Research (CNRS), Institute of Human Genetics (IGH), University of Montpellier (UM), 141 rue de la Cardonille, Montpellier 34 090, France
| | - Frederique Lisacek
- Proteome Informatics Group, Swiss Institute of Bioinformatics (SIB), route de Drize 7, Geneva CH-1227, Switzerland
| | - Yan Liu
- The Glycosciences Laboratory, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, United Kingdom
| | - Thomas Lütteke
- Institute of Veterinary Physiology and Biochemistry, Justus-Liebig-University Gießen, Frankfurter Str. 100, Gießen 35392, Germany
| | - Junfeng Ma
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, 3900 Reservior Road NW, Washington, DC 20007, United States
| | - Adnan Malik
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Akul Y Mehta
- Department of Surgery, Beth Israel Deaconess Medical Center, National Center for Functional Glycomics, Harvard Medical School, 330 Brookline Avenue, Boston, MA 02215, United States
| | - Sriram Neelamegham
- Departments of Chemical & Biological Engineering, Biomedical Engineering and Medicine, University at Buffalo, State University of New York, 906 Furnas Hall, Buffalo, NY 14260, United States
| | - Kalpana Panneerselvam
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - René Ranzinger
- Complex Carbohydrate Research Center, University of Georgia, 315 Riverbend Rd, Athens, GA 30602, United States
| | - Sylvie Ricard-Blum
- Institute of Molecular and Supramolecular Chemistry and Biochemistry (ICBMS), UMR 5246, University Lyon 1, CNRS, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex F-69622, France
| | - Gaoussou Sanou
- IMGT, The International ImMunoGeneTics Information System, National Center for Scientific Research (CNRS), Institute of Human Genetics (IGH), University of Montpellier (UM), 141 rue de la Cardonille, Montpellier 34 090, France
| | - Vijay Shanker
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave, Newark, DE 19716, United States
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, 2001 N Soto Street, Los Angeles, CA 90032, United States
| | - Michael Tiemeyer
- Complex Carbohydrate Research Center, University of Georgia, 315 Riverbend Rd, Athens, GA 30602, United States
| | - James Urban
- Department of Chemistry and Molecular Biology, University of Gothenburg, Medicinaregatan 7 B, Gothenburg 41390, Sweden
| | - Randi Vita
- Immune Epitope Database and Analysis Project, La Jolla Institute for Allergy & Immunology, 9420 Athena Circle, La Jolla, CA 92037, United States
| | - Jeet Vora
- Department of Biochemistry & Molecular Medicine, The George Washington University School of Medicine and Health Sciences, 2300 I St. NW, Washington, DC 20052, United States
| | - Yasunori Yamamoto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan
| | - Raja Mazumder
- Department of Biochemistry & Molecular Medicine, The George Washington University School of Medicine and Health Sciences, 2300 I St. NW, Washington, DC 20052, United States
| |
Collapse
|
7
|
Faria D, Eugénio P, Contreiras Silva M, Balbi L, Bedran G, Kallor AA, Nunes S, Palkowski A, Waleron M, Alfaro JA, Pesquita C. The Immunopeptidomics Ontology (ImPO). Database (Oxford) 2024; 2024:baae014. [PMID: 38857186 PMCID: PMC11164101 DOI: 10.1093/database/baae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 11/30/2023] [Accepted: 02/22/2024] [Indexed: 06/12/2024]
Abstract
The adaptive immune response plays a vital role in eliminating infected and aberrant cells from the body. This process hinges on the presentation of short peptides by major histocompatibility complex Class I molecules on the cell surface. Immunopeptidomics, the study of peptides displayed on cells, delves into the wide variety of these peptides. Understanding the mechanisms behind antigen processing and presentation is crucial for effectively evaluating cancer immunotherapies. As an emerging domain, immunopeptidomics currently lacks standardization-there is neither an established terminology nor formally defined semantics-a critical concern considering the complexity, heterogeneity, and growing volume of data involved in immunopeptidomics studies. Additionally, there is a disconnection between how the proteomics community delivers the information about antigen presentation and its uptake by the clinical genomics community. Considering the significant relevance of immunopeptidomics in cancer, this shortcoming must be addressed to bridge the gap between research and clinical practice. In this work, we detail the development of the ImmunoPeptidomics Ontology, ImPO, the first effort at standardizing the terminology and semantics in the domain. ImPO aims to encapsulate and systematize data generated by immunopeptidomics experimental processes and bioinformatics analysis. ImPO establishes cross-references to 24 relevant ontologies, including the National Cancer Institute Thesaurus, Mondo Disease Ontology, Logical Observation Identifier Names and Codes and Experimental Factor Ontology. Although ImPO was developed using expert knowledge to characterize a large and representative data collection, it may be readily used to encode other datasets within the domain. Ultimately, ImPO facilitates data integration and analysis, enabling querying, inference and knowledge generation and importantly bridging the gap between the clinical proteomics and genomics communities. As the field of immunogenomics uses protein-level immunopeptidomics data, we expect ImPO to play a key role in supporting a rich and standardized description of the large-scale data that emerging high-throughput technologies are expected to bring in the near future. Ontology URL: https://zenodo.org/record/10237571 Project GitHub: https://github.com/liseda-lab/ImPO/blob/main/ImPO.owl.
Collapse
Affiliation(s)
- Daniel Faria
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, Lisboa 1000-029, Portugal
| | - Patrícia Eugénio
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Marta Contreiras Silva
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Laura Balbi
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Georges Bedran
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Ashwin Adrian Kallor
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Susana Nunes
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Aleksander Palkowski
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Michal Waleron
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Javier A Alfaro
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
- Department of Biochemistry and Microbiology, University of Victoria, 3800 Finnerty Rd, Victoria, British Columbia, BC V8P 5C2, Canada
- Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, Old College, South Bridge, Edinburgh, EH8 9YL, UK
- The Canadian Association for Responsible AI in Medicine, Victoria, Canada
| | - Catia Pesquita
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| |
Collapse
|
8
|
Vidović D, Waller A, Holmes J, Sklar LA, Schürer SC. Best practices for managing and disseminating resources and outreach and evaluating the impact of the IDG Consortium. Drug Discov Today 2024; 29:103953. [PMID: 38508231 PMCID: PMC11335350 DOI: 10.1016/j.drudis.2024.103953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 03/08/2024] [Accepted: 03/14/2024] [Indexed: 03/22/2024]
Abstract
The Illuminating the Druggable Genome (IDG) consortium generated reagents, biological model systems, data, informatic databases, and computational tools. The Resource Dissemination and Outreach Center (RDOC) played a central administrative role, organized internal meetings, fostered collaboration, and coordinated consortium-wide efforts. The RDOC developed and deployed a Resource Management System (RMS) to enable efficient workflows for collecting, accessing, validating, registering, and publishing resource metadata. IDG policies for repositories and standardized representations of resources were established, adopting the FAIR (findable, accessible, interoperable, reusable) principles. The RDOC also developed metrics of IDG impact. Outreach initiatives included digital content, the Protein Illumination Timeline (representing milestones in generating data and reagents), the Target Watch publication series, the e-IDG Symposium series, and leveraging social media platforms.
Collapse
Affiliation(s)
- Dušica Vidović
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA; Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Anna Waller
- Department of Pathology, Health Sciences Center, University of New Mexico, Albuquerque, NM, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Larry A Sklar
- Department of Pathology, Health Sciences Center, University of New Mexico, Albuquerque, NM, USA; Autophagy, Inflammation, & Metabolism (AIM) Center, University of New Mexico, Albuquerque, NM, USA
| | - Stephan C Schürer
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA; Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, USA; Frost Institute for Data Science & Computing, University of Miami, Miami, FL, USA.
| |
Collapse
|
9
|
Claussnitzer M, Parikh VN, Wagner AH, Arbesfeld JA, Bult CJ, Firth HV, Muffley LA, Nguyen Ba AN, Riehle K, Roth FP, Tabet D, Bolognesi B, Glazer AM, Rubin AF. Minimum information and guidelines for reporting a multiplexed assay of variant effect. Genome Biol 2024; 25:100. [PMID: 38641812 PMCID: PMC11027375 DOI: 10.1186/s13059-024-03223-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 03/25/2024] [Indexed: 04/21/2024] Open
Abstract
Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.
Collapse
Affiliation(s)
- Melina Claussnitzer
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Cambridge, MA, 02142, USA
| | - Victoria N Parikh
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Alex H Wagner
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, 43210, USA
| | - Jeremy A Arbesfeld
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, 43215, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Carol J Bult
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Dept of Medical Genetics, Cambridge University Hospitals NHS Trust, Cambridge, UK
| | - Lara A Muffley
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Alex N Nguyen Ba
- Department of Biology, University of Toronto at Mississauga, Mississauga, ON, Canada
| | - Kevin Riehle
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Daniel Tabet
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalunya (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain.
| | - Andrew M Glazer
- Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
10
|
Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024; 11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open
Abstract
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Ignacio J Tripodi
- Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Scott A Malec
- Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Emanuele Cavalleri
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
| | - Marco Mesiti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Lucas A Gillenwater
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Brook Santangelo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tellen D Bennett
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael Bada
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - William A Baumgartner
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
11
|
Caballero-Oteyza A, Crisponi L, Peng XP, Yauy K, Volpi S, Giardino S, Freeman AF, Grimbacher B, Proietti M. GenIA, the Genetic Immunology Advisor database for inborn errors of immunity. J Allergy Clin Immunol 2024; 153:831-843. [PMID: 38040041 DOI: 10.1016/j.jaci.2023.11.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/23/2023] [Accepted: 11/15/2023] [Indexed: 12/03/2023]
Abstract
BACKGROUND To date, no publicly accessible platform has captured and synthesized all of the layered dimensions of genotypic, phenotypic, and mechanistic information published in the field of inborn errors of immunity (IEIs). Such a platform would represent the extensive and complex landscape of IEIs and could increase the rate of diagnosis in patients with a suspected IEI, which remains unacceptably low. OBJECTIVE Our aim was to create an expertly curated, patient-centered, multidimensional IEI database that enables aggregation and sophisticated data interrogation and promotes involvement from diverse stakeholders across the community. METHODS The database structure was designed following a subject-centered model and written in Structured Query Language (SQL). The web application is written in Hypertext Preprocessor (PHP), Hypertext Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript. All data stored in the Genetic Immunology Advisor (GenIA) are extracted by manually reviewing published research articles. RESULTS We completed data collection and curation for 24 pilot genes. Using these data, we have exemplified how GenIA can provide quick access to structured, longitudinal, more thorough, comprehensive, and up-to-date IEI knowledge than do currently existing databases, such as ClinGen, Human Phenotype Ontology (HPO), ClinVar, or Online Mendelian Inheritance in Man (OMIM), with which GenIA intends to dovetail. CONCLUSIONS GenIA strives to accurately capture the extensive genetic, mechanistic, and phenotypic heterogeneity found across IEIs, as well as genetic paradigms and diagnostic pitfalls associated with individual genes and conditions. The IEI community's involvement will help promote GenIA as an enduring resource that supports and improves knowledge sharing, research, diagnosis, and care for patients with genetic immune disease.
Collapse
Affiliation(s)
- Andrés Caballero-Oteyza
- Clinic for Immunology and Rheumatology, Hanover Medical School, Hanover, Germany; RESiST-Cluster of Excellence 2155, Hanover Medical School, Hanover, Germany; Institute for Immunodeficiency, Center for Chronic Immunodeficiency, University Hospital Freiburg, Freiburg, Germany.
| | - Laura Crisponi
- Institute for Genetic and Biomedical Research, The National Research Council, Monserrato, Cagliari, Italy
| | - Xiao P Peng
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Md
| | - Kevin Yauy
- University of Montpellier, LIRMM, CNRS, Reference Center for Congenital Anomalies, Clinical Genetic Unit, Montpellier University Hospital Center, Montpellier, France
| | - Stefano Volpi
- Center for Autoinflammatory Diseases and Immunodeficiencies, Pediatric Rheumatology Clinic, IRCCS Istituto Giannina Gaslini, Genova, and DINOGMI, Università degli Studi di Genova, Genova, Italy
| | - Stefano Giardino
- Hematopoietic Stem Cell Transplantation Unit, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - Alexandra F Freeman
- Laboratory of Clinical Immunology and Microbiology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Md
| | - Bodo Grimbacher
- Institute for Immunodeficiency, Center for Chronic Immunodeficiency, University Hospital Freiburg, Freiburg, Germany; Clinic of Rheumatology and Clinical Immunology, Center for Chronic Immunodeficiency, Medical Center, Faculty of Medicine, Albert-Ludwigs University of Freiburg, Freiburg, Germany; RESiST-Cluster of Excellence 2155, Hanover Medical School, Satellite Center Freiburg, Freiburg, Germany; German Center for Infection Research, Satellite Center Freiburg, Freiburg, Germany; Centre for Integrative Biological Signalling Studies, Albert-Ludwigs University of Freiburg, Freiburg, Germany
| | - Michele Proietti
- Clinic for Immunology and Rheumatology, Hanover Medical School, Hanover, Germany; RESiST-Cluster of Excellence 2155, Hanover Medical School, Hanover, Germany; Institute for Immunodeficiency, Center for Chronic Immunodeficiency, University Hospital Freiburg, Freiburg, Germany.
| |
Collapse
|
12
|
Kilicoglu H, Ensan F, McInnes B, Wang LL. Semantics-enabled biomedical literature analytics. J Biomed Inform 2024; 150:104588. [PMID: 38244957 PMCID: PMC11771130 DOI: 10.1016/j.jbi.2024.104588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 01/22/2024]
Affiliation(s)
- Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana Champaign, Champaign, IL, USA.
| | - Faezeh Ensan
- Department of Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON, Canada.
| | - Bridget McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Lucy Lu Wang
- Information School, University of Washington, Seattle, WA, USA.
| |
Collapse
|
13
|
Taing L, Dandawate A, L’Yi S, Gehlenborg N, Brown M, Meyer C. Cistrome Data Browser: integrated search, analysis and visualization of chromatin data. Nucleic Acids Res 2024; 52:D61-D66. [PMID: 37971305 PMCID: PMC10767960 DOI: 10.1093/nar/gkad1069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/14/2023] [Accepted: 11/02/2023] [Indexed: 11/19/2023] Open
Abstract
The Cistrome Data Browser is a resource of ChIP-seq, ATAC-seq and DNase-seq data from humans and mice. It provides maps of the genome-wide locations of transcription factors, cofactors, chromatin remodelers, histone post-translational modifications and regions of chromatin accessible to endonuclease activity. Cistrome DB v3.0 contains approximately 45 000 human and 44 000 mouse samples with about 32 000 newly collected datasets compared to the previous release. The Cistrome DB v3.0 user interface is implemented as a single page application that unifies menu driven and data driven search functions and provides an embedded genome browser, which allows users to find and visualize data more effectively. Users can find informative chromatin profiles through keyword, menu, and data-driven search tools. Browser search functions can predict the regulators of query genes as well as the cell type and factor dependent functionality of potential cis-regulatory elements. Cistrome DB v3.0 expands the display of quality control statistics, incorporates sequence logos into motif enrichment displays and includes more expansive sample metadata. Cistrome DB v3.0 is available at http://db3.cistrome.org/browser.
Collapse
Affiliation(s)
- Len Taing
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Ariaki Dandawate
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sehi L’Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Myles Brown
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, USA
| | - Clifford A Meyer
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
14
|
Xu HQ, Xiao H, Bu JH, Hong YF, Liu YH, Tao ZY, Ding SF, Xia YT, Wu E, Yan Z, Zhang W, Chen GX, Zhu F, Tao L. EMNPD: a comprehensive endophytic microorganism natural products database for prompt the discovery of new bioactive substances. J Cheminform 2023; 15:115. [PMID: 38017550 PMCID: PMC10683116 DOI: 10.1186/s13321-023-00779-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
The discovery and utilization of natural products derived from endophytic microorganisms have garnered significant attention in pharmaceutical research. While remarkable progress has been made in this field each year, the absence of dedicated open-access databases for endophytic microorganism natural products research is evident. To address the increasing demand for mining and sharing of data resources related to endophytic microorganism natural products, this study introduces EMNPD, a comprehensive endophytic microorganism natural products database comprising manually curated data. Currently, EMNPD offers 6632 natural products from 1017 endophytic microorganisms, targeting 1286 entities (including 94 proteins, 282 cell lines, and 910 species) with 91 diverse bioactivities. It encompasses the physico-chemical properties of natural products, ADMET information, quantitative activity data with their potency, natural products contents with diverse fermentation conditions, systematic taxonomy, and links to various well-established databases. EMNPD aims to function as an open-access knowledge repository for the study of endophytic microorganisms and their natural products, thereby facilitating drug discovery research and exploration of bioactive substances. The database can be accessed at http://emnpd.idrblab.cn/ without the need for registration, enabling researchers to freely download the data. EMNPD is expected to become a valuable resource in the field of endophytic microorganism natural products and contribute to future drug development endeavors.
Collapse
Affiliation(s)
- Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Huan Xiao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Jin-Hui Bu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Zi-Yue Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Shu-Fan Ding
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yi-Tong Xia
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - E Wu
- Rehabilitation and Nursing School, Hangzhou Vocational & Technical College, Hangzhou, 310018, Zhejiang, China
| | - Zhen Yan
- The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 310000, China
- First Clinical Medical Institute, Nanjing University of Chinese Medicine, Nanjing, 210023, Jiangsu, China
| | - Wei Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
- Innovation Institute for Affiliated Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China.
- Innovation Institute for Affiliated Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
15
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 98] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
16
|
Huang Q, Szklarczyk D, Wang M, Simonovic M, von Mering C. PaxDb 5.0: Curated Protein Quantification Data Suggests Adaptive Proteome Changes in Yeasts. Mol Cell Proteomics 2023; 22:100640. [PMID: 37659604 PMCID: PMC10551891 DOI: 10.1016/j.mcpro.2023.100640] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/25/2023] [Accepted: 08/30/2023] [Indexed: 09/04/2023] Open
Abstract
The "Protein Abundances Across Organisms" database (PaxDb) is an integrative metaresource dedicated to protein abundance levels, in tissue-specific or whole-organism proteomes. PaxDb focuses on computing best-estimate abundances for proteins in normal/healthy contexts and expresses abundance values for each protein in "parts per million" in relation to all other protein molecules in the cell. The uniform data reprocessing, quality scoring, and integrated orthology relations have made PaxDb one of the preferred tools for comparisons between individual datasets, tissues, or organisms. In describing the latest version 5.0 of PaxDb, we particularly emphasize the data integration from various types of raw data and how we expanded the number of organisms and tissue groups as well as the proteome coverage. The current collection of PaxDb includes 831 original datasets from 170 species, including 22 Archaea, 81 Bacteria, and 67 Eukaryota. Apart from detailing the data update, we also present a comparative analysis of the human proteome subset of PaxDb against the two most widely used human proteome data resources: Human Protein Atlas and Genotype-Tissue Expression. Lastly, through our protein abundance data, we reveal an evolutionary trend in the usage of sulfur-containing amino acids in the proteomes of Fungi.
Collapse
Affiliation(s)
- Qingyao Huang
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Mingcong Wang
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Milan Simonovic
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
17
|
Costanzo MC, von Grotthuss M, Massung J, Jang D, Caulkins L, Koesterer R, Gilbert C, Welch RP, Kudtarkar P, Hoang Q, Boughton AP, Singh P, Sun Y, Duby M, Moriondo A, Nguyen T, Smadbeck P, Alexander BR, Brandes M, Carmichael M, Dornbos P, Green T, Huellas-Bruskiewicz KC, Ji Y, Kluge A, McMahon AC, Mercader JM, Ruebenacker O, Sengupta S, Spalding D, Taliun D, Smith P, Thomas MK, Akolkar B, Brosnan MJ, Cherkas A, Chu AY, Fauman EB, Fox CS, Kamphaus TN, Miller MR, Nguyen L, Parsa A, Reilly DF, Ruetten H, Wholley D, Zaghloul NA, Abecasis GR, Altshuler D, Keane TM, McCarthy MI, Gaulton KJ, Florez JC, Boehnke M, Burtt NP, Flannick J. The Type 2 Diabetes Knowledge Portal: An open access genetic resource dedicated to type 2 diabetes and related traits. Cell Metab 2023; 35:695-710.e6. [PMID: 36963395 PMCID: PMC10231654 DOI: 10.1016/j.cmet.2023.03.001] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 10/23/2022] [Accepted: 02/28/2023] [Indexed: 03/26/2023]
Abstract
Associations between human genetic variation and clinical phenotypes have become a foundation of biomedical research. Most repositories of these data seek to be disease-agnostic and therefore lack disease-focused views. The Type 2 Diabetes Knowledge Portal (T2DKP) is a public resource of genetic datasets and genomic annotations dedicated to type 2 diabetes (T2D) and related traits. Here, we seek to make the T2DKP more accessible to prospective users and more useful to existing users. First, we evaluate the T2DKP's comprehensiveness by comparing its datasets with those of other repositories. Second, we describe how researchers unfamiliar with human genetic data can begin using and correctly interpreting them via the T2DKP. Third, we describe how existing users can extend their current workflows to use the full suite of tools offered by the T2DKP. We finally discuss the lessons offered by the T2DKP toward the goal of democratizing access to complex disease genetic results.
Collapse
Affiliation(s)
- Maria C Costanzo
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Marcin von Grotthuss
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Jeffrey Massung
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Dongkeun Jang
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Lizz Caulkins
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Ryan Koesterer
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Clint Gilbert
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Ryan P Welch
- Department of Biostatistics and The Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Parul Kudtarkar
- Department of Pediatrics, University of California San Diego, La Jolla, CA 92161, USA
| | - Quy Hoang
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Andrew P Boughton
- Department of Biostatistics and The Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Preeti Singh
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Ying Sun
- Department of Pediatrics, University of California San Diego, La Jolla, CA 92161, USA
| | - Marc Duby
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Annie Moriondo
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Trang Nguyen
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Patrick Smadbeck
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Benjamin R Alexander
- Simulation and Modeling Sciences, Pfizer Worldwide Research, Development and Medical, Cambridge, MA 02139, USA
| | - MacKenzie Brandes
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Mary Carmichael
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Peter Dornbos
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA; Department of Pediatrics, Boston Children's Hospital, Boston, MA 02115, USA; Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA
| | - Todd Green
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Kenneth C Huellas-Bruskiewicz
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Yue Ji
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Alexandria Kluge
- Genomics Platform, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Aoife C McMahon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Josep M Mercader
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA; Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Oliver Ruebenacker
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Sebanti Sengupta
- Department of Biostatistics and The Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Dylan Spalding
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Daniel Taliun
- Department of Biostatistics and The Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Philip Smith
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA
| | - Melissa K Thomas
- Tailored Therapeutics-Diabetes, Eli Lilly and Company, Lilly Corporate Center DC 0545, Indianapolis, IN 46285, USA
| | - Beena Akolkar
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA
| | - M Julia Brosnan
- Internal Medicine Research Unit, Pfizer Worldwide Research, Development and Medical, Cambridge, MA 02139, USA
| | - Andriy Cherkas
- Team Early Projects Type 1 Diabetes, Therapeutic Area Diabetes and Cardiovascular Medicine, Research & Development, Sanofi, Industriepark Höchst-H831, Frankfurt am Main 65926, Germany
| | - Audrey Y Chu
- Merck Research Laboratories, Boston, MA 02115, USA
| | - Eric B Fauman
- Integrative Biology, Internal Medicine Research Unit, Pfizer Worldwide Research, Development and Medical, Cambridge, MA 02139, USA
| | | | | | - Melissa R Miller
- Internal Medicine Research Unit, Pfizer Worldwide Research, Development and Medical, Cambridge, MA 02139, USA
| | - Lynette Nguyen
- Foundation for the National Institutes of Health, North Bethesda, MD 20852, USA
| | - Afshin Parsa
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA
| | | | - Hartmut Ruetten
- CardioMetabolism & Respiratory Medicine, Boehringer Ingelheim International GmbH, 55216 Ingelheim/Rhein, Germany
| | - David Wholley
- Foundation for the National Institutes of Health, North Bethesda, MD 20852, USA
| | - Norann A Zaghloul
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA
| | - Gonçalo R Abecasis
- Department of Biostatistics and The Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Regeneron Pharmaceuticals, Tarrytown, NY 10591, USA
| | - David Altshuler
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA
| | - Thomas M Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Mark I McCarthy
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 9DU, UK; Oxford Centre for Diabetes Endocrinology & Metabolism, University of Oxford, Oxford OX3 7BN, UK
| | - Kyle J Gaulton
- Department of Pediatrics, University of California San Diego, La Jolla, CA 92161, USA
| | - Jose C Florez
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA; Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Michael Boehnke
- Department of Biostatistics and The Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Noël P Burtt
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA.
| | - Jason Flannick
- Programs in Metabolism and Medical & Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA 02132, USA; Department of Pediatrics, Boston Children's Hospital, Boston, MA 02115, USA; Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
18
|
Taneja SB, Callahan TJ, Paine MF, Kane-Gill SL, Kilicoglu H, Joachimiak MP, Boyce RD. Developing a Knowledge Graph for Pharmacokinetic Natural Product-Drug Interactions. J Biomed Inform 2023; 140:104341. [PMID: 36933632 PMCID: PMC10150409 DOI: 10.1016/j.jbi.2023.104341] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/09/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023]
Abstract
BACKGROUND Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research. METHODS We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG. RESULTS The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. CONCLUSION NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.
Collapse
Affiliation(s)
- Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15206, USA.
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Mary F Paine
- Department of Pharmaceutical Sciences, College of Pharmacy and Pharmaceutical Sciences, Washington State University, Spokane, WA 99202, USA
| | | | - Halil Kilicoglu
- School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
| | - Marcin P Joachimiak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| |
Collapse
|
19
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res 2022; 51:D1373-D1380. [PMID: 36305812 PMCID: PMC9825602 DOI: 10.1093/nar/gkac956] [Citation(s) in RCA: 1245] [Impact Index Per Article: 415.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/13/2022] [Indexed: 01/30/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559;
| |
Collapse
|
20
|
Pavel A, Saarimäki LA, Möbus L, Federico A, Serra A, Greco D. The potential of a data centred approach & knowledge graph data representation in chemical safety and drug design. Comput Struct Biotechnol J 2022; 20:4837-4849. [PMID: 36147662 PMCID: PMC9464643 DOI: 10.1016/j.csbj.2022.08.061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/26/2022] [Accepted: 08/26/2022] [Indexed: 11/20/2022] Open
Abstract
Big Data pervades nearly all areas of life sciences, yet the analysis of large integrated data sets remains a major challenge. Moreover, the field of life sciences is highly fragmented and, consequently, so is its data, knowledge, and standards. This, in turn, makes integrated data analysis and knowledge gathering across sub-fields a demanding task. At the same time, the integration of various research angles and data types is crucial for modelling the complexity of organisms and biological processes in a holistic manner. This is especially valid in the context of drug development and chemical safety assessment where computational methods can provide solutions for the urgent need of fast, effective, and sustainable approaches. At the same time, such computational methods require the development of methodologies suitable for an integrated and data centred Big Data view. Here we discuss Knowledge Graphs (KG) as a solution to a data centred analysis approach for drug and chemical development and safety assessment. KGs are knowledge bases, data analysis engines, and knowledge discovery systems all in one, allowing them to be used from simple data retrieval, over meta-analysis to complex predictive and knowledge discovery systems. Therefore, KGs have immense potential to advance the data centred approach, the re-usability, and informativity of data. Furthermore, they can improve the power of analysis, and the complexity of modelled processes, all while providing knowledge in a natively human understandable network data model.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Laura A Saarimäki
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Lena Möbus
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland.,Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| |
Collapse
|
21
|
He P, Zhang C, Ji Y, Ge MK, Yu Y, Zhang N, Yang S, Yu JX, Shen SM, Chen GQ. Epithelial cells-enriched lncRNA SNHG8 regulates chromatin condensation by binding to Histone H1s. Cell Death Differ 2022; 29:1569-1581. [PMID: 35140358 PMCID: PMC9345976 DOI: 10.1038/s41418-022-00944-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 01/17/2022] [Accepted: 01/17/2022] [Indexed: 12/12/2022] Open
Abstract
Linker histone H1 proteins contain many variants in mammalian and can stabilize the condensed state of chromatin by binding to nucleosomes and promoting a more inaccessible structure of DNA. However, it is poorly understood how the binding of histone H1s to chromatin DNA is regulated. Screened as one of a collection of epithelial cells-enriched long non-coding RNAs (lncRNAs), here we found that small nucleolar RNA host gene 8 (SNHG8) is a chromatin-localized lncRNA and presents strong interaction and phase separation with histone H1 variants. Moreover, SNHG8 presents stronger ability to bind H1s than linker DNA, and outcompetes linker DNA for H1 binding. Consequently, loss of SNHG8 increases the amount of H1s that bind to chromatin, promotes chromatin condensation, and induces an epithelial differentiation-associated gene expression pattern. Collectively, our results propose that the highly abundant SNHG8 in epithelial cells keeps histone H1 variants out of nucleosome and its loss contributes to epithelial cell differentiation.
Collapse
Affiliation(s)
- Ping He
- State Key Laboratory of Oncogenes and Related Genes, and Chinese Academy of Medical Sciences Research Unit (NO.2019RU043), Shanghai Cancer Institute, Renji hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200127, China
| | - Cheng Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, SJTU-SM, Shanghai, 200025, China
| | - Yan Ji
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Meng-Kai Ge
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, SJTU-SM, Shanghai, 200025, China
| | - Yun Yu
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, SJTU-SM, Shanghai, 200025, China
| | - Na Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, SJTU-SM, Shanghai, 200025, China
| | - Shuo Yang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, SJTU-SM, Shanghai, 200025, China
| | - Jian-Xiu Yu
- Department of Biochemistry and Molecular Cell Biology, Shanghai Key Laboratory of Tumor Microenvironment and Inflammation, SJTU-SM, Shanghai, 200025, China
| | - Shao-Ming Shen
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, SJTU-SM, Shanghai, 200025, China.
| | - Guo-Qiang Chen
- State Key Laboratory of Oncogenes and Related Genes, and Chinese Academy of Medical Sciences Research Unit (NO.2019RU043), Shanghai Cancer Institute, Renji hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM), Shanghai, 200127, China. .,Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, SJTU-SM, Shanghai, 200025, China.
| |
Collapse
|
22
|
Figueiredo RQ, Del Ser SD, Raschka T, Hofmann-Apitius M, Kodamullil AT, Mubeen S, Domingo-Fernández D. Elucidating gene expression patterns across multiple biological contexts through a large-scale investigation of transcriptomic datasets. BMC Bioinformatics 2022; 23:231. [PMID: 35705903 PMCID: PMC9202106 DOI: 10.1186/s12859-022-04765-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 06/03/2022] [Indexed: 11/10/2022] Open
Abstract
Distinct gene expression patterns within cells are foundational for the diversity of functions and unique characteristics observed in specific contexts, such as human tissues and cell types. Though some biological processes commonly occur across contexts, by harnessing the vast amounts of available gene expression data, we can decipher the processes that are unique to a specific context. Therefore, with the goal of developing a portrait of context-specific patterns to better elucidate how they govern distinct biological processes, this work presents a large-scale exploration of transcriptomic signatures across three different contexts (i.e., tissues, cell types, and cell lines) by leveraging over 600 gene expression datasets categorized into 98 subcontexts. The strongest pairwise correlations between genes from these subcontexts are used for the construction of co-expression networks. Using a network-based approach, we then pinpoint patterns that are unique and common across these subcontexts. First, we focused on patterns at the level of individual nodes and evaluated their functional roles using a human protein-protein interactome as a referential network. Next, within each context, we systematically overlaid the co-expression networks to identify specific and shared correlations as well as relations already described in scientific literature. Additionally, in a pathway-level analysis, we overlaid node and edge sets from co-expression networks against pathway knowledge to identify biological processes that are related to specific subcontexts or groups of them. Finally, we have released our data and scripts at https://zenodo.org/record/5831786 and https://github.com/ContNeXt/ , respectively and developed ContNeXt ( https://contnext.scai.fraunhofer.de/ ), a web application to explore the networks generated in this work.
Collapse
Affiliation(s)
- Rebeca Queiroz Figueiredo
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115, Bonn, Germany
| | - Sara Díaz Del Ser
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115, Bonn, Germany
| | - Tamara Raschka
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115, Bonn, Germany.,Fraunhofer Center for Machine Learning, Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115, Bonn, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757, Sankt Augustin, Germany
| | - Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115, Bonn, Germany.,Fraunhofer Center for Machine Learning, Sankt Augustin, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757, Sankt Augustin, Germany. .,Fraunhofer Center for Machine Learning, Sankt Augustin, Germany. .,Enveda Biosciences, Boulder, CO, 80301, USA.
| |
Collapse
|
23
|
He Y. Development and Applications of Interoperable Biomedical Ontologies for Integrative Data and Knowledge Representation and Multiscale Modeling in Systems Medicine. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2486:233-244. [PMID: 35437726 DOI: 10.1007/978-1-0716-2265-0_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The data FAIR Guiding Principles state that all data should be Findable, Accessible, Interoperable, and Reusable. Ontology is critical to data integration, sharing, and analysis. Given thousands of ontologies have been developed in the era of artificial intelligence, it is critical to have interoperable ontologies to support standardized data and knowledge presentation and reasoning. For interoperable ontology development, the eXtensible ontology development (XOD) strategy offers four principles including ontology term reuse, semantic alignment, ontology design pattern usage, and community extensibility. Many software programs are available to help implement these principles. As a demonstration, the XOD strategy is applied to developing the interoperable Coronavirus Infectious Disease Ontology (CIDO). Various applications of interoperable ontologies, such as COVID-19 and kidney precision medicine research, are also introduced in this chapter.
Collapse
Affiliation(s)
- Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
24
|
Silva MC, Eugénio P, Faria D, Pesquita C. Ontologies and Knowledge Graphs in Oncology Research. Cancers (Basel) 2022; 14:cancers14081906. [PMID: 35454813 PMCID: PMC9029532 DOI: 10.3390/cancers14081906] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 03/25/2022] [Accepted: 04/07/2022] [Indexed: 11/16/2022] Open
Abstract
The complexity of cancer research stems from leaning on several biomedical disciplines for relevant sources of data, many of which are complex in their own right. A holistic view of cancer—which is critical for precision medicine approaches—hinges on integrating a variety of heterogeneous data sources under a cohesive knowledge model, a role which biomedical ontologies can fill. This study reviews the application of ontologies and knowledge graphs in cancer research. In total, our review encompasses 141 published works, which we categorized under 14 hierarchical categories according to their usage of ontologies and knowledge graphs. We also review the most commonly used ontologies and newly developed ones. Our review highlights the growing traction of ontologies in biomedical research in general, and cancer research in particular. Ontologies enable data accessibility, interoperability and integration, support data analysis, facilitate data interpretation and data mining, and more recently, with the emergence of the knowledge graph paradigm, support the application of Artificial Intelligence methods to unlock new knowledge from a holistic view of the available large volumes of heterogeneous data.
Collapse
|
25
|
Schröder M, Staehlke S, Groth P, Nebe JB, Spors S, Krüger F. Structure-based knowledge acquisition from electronic lab notebooks for research data provenance documentation. J Biomed Semantics 2022; 13:4. [PMID: 35101121 PMCID: PMC8802522 DOI: 10.1186/s13326-021-00257-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 12/07/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Electronic Laboratory Notebooks (ELNs) are used to document experiments and investigations in the wet-lab. Protocols in ELNs contain a detailed description of the conducted steps including the necessary information to understand the procedure and the raised research data as well as to reproduce the research investigation. The purpose of this study is to investigate whether such ELN protocols can be used to create semantic documentation of the provenance of research data by the use of ontologies and linked data methodologies. METHODS Based on an ELN protocol of a biomedical wet-lab experiment, a retrospective provenance model of the raised research data describing the details of the experiment in a machine-interpretable way is manually engineered. Furthermore, an automated approach for knowledge acquisition from ELN protocols is derived from these results. This structure-based approach exploits the structure in the experiment's description such as headings, tables, and links, to translate the ELN protocol into a semantic knowledge representation. To satisfy the Findable, Accessible, Interoperable, and Reuseable (FAIR) guiding principles, a ready-to-publish bundle is created that contains the research data together with their semantic documentation. RESULTS While the manual modelling efforts serve as proof of concept by employing one protocol, the automated structure-based approach demonstrates the potential generalisation with seven ELN protocols. For each of those protocols, a ready-to-publish bundle is created and, by employing the SPARQL query language, it is illustrated that questions about the processes and the obtained research data can be answered. CONCLUSIONS The semantic documentation of research data obtained from the ELN protocols allows for the representation of the retrospective provenance of research data in a machine-interpretable way. Research Object Crate (RO-Crate) bundles including these models enable researchers to easily share the research data including the corresponding documentation, but also to search and relate the experiment to each other.
Collapse
Affiliation(s)
- Max Schröder
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
- University Library, University of Rostock, Rostock, Germany
| | - Susanne Staehlke
- Department of Cell Biology, University Medical Center Rostock, Rostock, Germany
| | - Paul Groth
- Informatics Institute, University of Amsterdam, Amsterdam, Netherlands
| | - J. Barbara Nebe
- Department of Cell Biology, University Medical Center Rostock, Rostock, Germany
- Department Life, Light & Matter, University of Rostock, Rostock, Germany
| | - Sascha Spors
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
| | - Frank Krüger
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
- Department Knowledge, Culture & Transformation, University of Rostock, Rostock, Germany
| |
Collapse
|
26
|
Moreno P, Fexova S, George N, Manning JR, Miao Z, Mohammed S, Muñoz-Pomer A, Fullgrabe A, Bi Y, Bush N, Iqbal H, Kumbham U, Solovyev A, Zhao L, Prakash A, García-Seisdedos D, Kundu D, Wang S, Walzer M, Clarke L, Osumi-Sutherland D, Tello-Ruiz M, Kumari S, Ware D, Eliasova J, Arends M, Nawijn M, Meyer K, Burdett T, Marioni J, Teichmann S, Vizcaíno J, Brazma A, Papatheodorou I. Expression Atlas update: gene and protein expression in multiple species. Nucleic Acids Res 2022; 50:D129-D140. [PMID: 34850121 PMCID: PMC8728300 DOI: 10.1093/nar/gkab1030] [Citation(s) in RCA: 113] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 10/11/2021] [Accepted: 11/19/2021] [Indexed: 01/21/2023] Open
Abstract
The EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, developmental stage, cell type) and under which conditions (disease, treatment, gender, etc) a gene or protein of interest is expressed. Expression Atlas brings together data from >4500 expression studies from >65 different species, across different conditions and tissues. It makes these data freely available in an easy to visualise form, after expert curation to accurately represent the intended experimental design, re-analysed via standardised pipelines that rely on open-source community developed tools. Each study's metadata are annotated using ontologies. The data are re-analyzed with the aim of reproducing the original conclusions of the underlying experiments. Expression Atlas is currently divided into Bulk Expression Atlas and Single Cell Expression Atlas. Expression Atlas contains data from differential studies (microarray and bulk RNA-Seq) and baseline studies (bulk RNA-Seq and proteomics), whereas Single Cell Expression Atlas is currently dedicated to Single Cell RNA-Sequencing (scRNA-Seq) studies. The resource has been in continuous development since 2009 and it is available at https://www.ebi.ac.uk/gxa.
Collapse
Affiliation(s)
- Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Silvie Fexova
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Nancy George
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Jonathan R Manning
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Zhichiao Miao
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Suhaib Mohammed
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Alfonso Muñoz-Pomer
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Anja Fullgrabe
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Yalan Bi
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Natassja Bush
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Haider Iqbal
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Upendra Kumbham
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Andrey Solovyev
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Lingyun Zhao
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - David García-Seisdedos
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - David Osumi-Sutherland
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | | | - Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
- USDA ARS NEA, Plant Soil & Nutrition Laboratory Research Unit, Ithaca, NY 14853, USA
| | - Jana Eliasova
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Mark J Arends
- Edinburgh Pathology, University of Edinburgh, Institute of Genetics & Cancer, Edinburgh, UK
| | - Martijn C Nawijn
- Department of Pathology and Medical Biology, GRIAC research institute, University of Groningen, University Medical Center Groningen, Groningen, Netherlands
| | - Kerstin Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - John Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Sarah Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
27
|
Muscolino A, Di Maria A, Rapicavoli RV, Alaimo S, Bellomo L, Billeci F, Borzì S, Ferragina P, Ferro A, Pulvirenti A. NETME: on-the-fly knowledge network construction from biomedical literature. APPLIED NETWORK SCIENCE 2022; 7:1. [PMID: 35013714 PMCID: PMC8733431 DOI: 10.1007/s41109-021-00435-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/21/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND The rapidly increasing biological literature is a key resource to automatically extract and gain knowledge concerning biological elements and their relations. Knowledge Networks are helpful tools in the context of biological knowledge discovery and modeling. RESULTS We introduce a novel system called NETME, which, starting from a set of full-texts obtained from PubMed, through an easy-to-use web interface, interactively extracts biological elements from ontological databases and then synthesizes a network inferring relations among such elements. The results clearly show that our tool is capable of inferring comprehensive and reliable biological networks. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s41109-021-00435-x.
Collapse
Affiliation(s)
| | - Antonio Di Maria
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | | | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Lorenzo Bellomo
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Fabrizio Billeci
- Department of Maths and Computer Science, University of Catania, Catania, Italy
| | - Stefano Borzì
- Department of Maths and Computer Science, University of Catania, Catania, Italy
| | - Paolo Ferragina
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| |
Collapse
|
28
|
Porras P, Orchard S, Licata L. IMEx Databases: Displaying Molecular Interactions into a Single, Standards-Compliant Dataset. Methods Mol Biol 2022; 2449:27-42. [PMID: 35507258 DOI: 10.1007/978-1-0716-2095-3_2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Molecular interaction databases aim to systematically capture and organize the experimental interaction information described in the scientific literature. These data can then be used to perform network analysis, to assign putative roles to uncharacterized proteins and to investigate their involvement in cellular pathways.This chapter gives a brief overview of publicly available molecular interaction databases and focuses on the members of the IMEx Consortium, on their curation policies and standard data formats. All of the goals achieved by IMEx databases over the last 15 years, the data types provided and the many different ways in which such data can be utilized by the research community, are described in detail. The IMEx databases curate molecular interaction data to the highest caliber, following a detailed curation model and supplying rich metadata by employing common curation rules and harmonized standards. The IMEx Consortium provides comprehensively annotated molecular interaction data integrated into a single, non-redundant, open access dataset.
Collapse
Affiliation(s)
- Pablo Porras
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Luana Licata
- Department of Biology, University of Rome Tor Vergata, Rome, Italy.
| |
Collapse
|
29
|
Del Toro N, Shrivastava A, Ragueneau E, Meldal B, Combe C, Barrera E, Perfetto L, How K, Ratan P, Shirodkar G, Lu O, Mészáros B, Watkins X, Pundir S, Licata L, Iannuccelli M, Pellegrini M, Martin MJ, Panni S, Duesbury M, Vallet SD, Rappsilber J, Ricard-Blum S, Cesareni G, Salwinski L, Orchard S, Porras P, Panneerselvam K, Hermjakob H. The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res 2021; 50:D648-D653. [PMID: 34761267 PMCID: PMC8728211 DOI: 10.1093/nar/gkab1006] [Citation(s) in RCA: 155] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 10/06/2021] [Accepted: 10/21/2021] [Indexed: 01/18/2023] Open
Abstract
The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way.
Collapse
Affiliation(s)
- Noemi Del Toro
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anjali Shrivastava
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eliot Ragueneau
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Birgit Meldal
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Colin Combe
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Elisabet Barrera
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Livia Perfetto
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK.,Fondazione Human Technopole, Milan 20157, Italy
| | - Karyn How
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Prashansa Ratan
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Gautam Shirodkar
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Odilia Lu
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Bálint Mészáros
- Gibson Group, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Xavier Watkins
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sangya Pundir
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Luana Licata
- Bioinformatics and Computational Biology Unit, Dept. of Molecular Biology, University of Rome Tor Vergata, Rome, Italy
| | - Marta Iannuccelli
- Bioinformatics and Computational Biology Unit, Dept. of Molecular Biology, University of Rome Tor Vergata, Rome, Italy
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA 90095, USA
| | - Maria Jesus Martin
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Simona Panni
- Dipartimento di Biologia, Ecologia e Scienze della Terra, Università della Calabria, Rende, Italy
| | - Margaret Duesbury
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK.,UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Sylvain D Vallet
- ICBMS UMR CNRS 5246, University Lyon 1, Lyon, Villeurbanne 69622, France
| | - Juri Rappsilber
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK.,Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin 13355, Germany
| | - Sylvie Ricard-Blum
- ICBMS UMR CNRS 5246, University Lyon 1, Lyon, Villeurbanne 69622, France
| | - Gianni Cesareni
- Bioinformatics and Computational Biology Unit, Dept. of Molecular Biology, University of Rome Tor Vergata, Rome, Italy
| | - Lukasz Salwinski
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Pablo Porras
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Kalpana Panneerselvam
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Henning Hermjakob
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
30
|
Chu ECP, Morin A, Chang THC, Nguyen T, Tsai YC, Sharma A, Liu CC, Pavlidis P. Experiment level curation of transcriptional regulatory interactions in neurodevelopment. PLoS Comput Biol 2021; 17:e1009484. [PMID: 34665801 PMCID: PMC8565786 DOI: 10.1371/journal.pcbi.1009484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 11/03/2021] [Accepted: 09/28/2021] [Indexed: 11/23/2022] Open
Abstract
To facilitate the development of large-scale transcriptional regulatory networks (TRNs) that may enable in-silico analyses of disease mechanisms, a reliable catalogue of experimentally verified direct transcriptional regulatory interactions (DTRIs) is needed for training and validation. There has been a long history of using low-throughput experiments to validate single DTRIs. Therefore, we reason that a reliable set of DTRIs could be produced by curating the published literature for such evidence. In our survey of previous curation efforts, we identified the lack of details about the quantity and the types of experimental evidence to be a major gap, despite the theoretical importance of such details for the identification of bona fide DTRIs. We developed a curation protocol to inspect the published literature for support of DTRIs at the experiment level, focusing on genes important to the development of the mammalian nervous system. We sought to record three types of low-throughput experiments: Transcription factor (TF) perturbation, TF-DNA binding, and TF-reporter assays. Using this protocol, we examined a total of 1,310 papers to assemble a collection of 1,499 unique DTRIs, involving 251 TFs and 825 target genes, many of which were not reported in any other DTRI resource. The majority of DTRIs (965; 64%) were supported by two or more types of experimental evidence and 27% were supported by all three. Of the DTRIs with all three types of evidence, 170 had been tested using primary tissues or cells and 44 had been tested directly in the central nervous system. We used our resource to document research biases among reports towards a small number of well-studied TFs. To demonstrate a use case for this resource, we compared our curation to a previously published high-throughput perturbation screen and found significant enrichment of the curated targets among genes differentially expressed in the developing brain in response to Pax6 deletion. This study demonstrates a proof-of-concept for the assembly of a high resolution DTRI resource to support the development of large-scale TRNs. The capacity to computationally reconstruct gene regulatory networks using large-scale biological data is currently limited by the absence of a high confidence set of one-to-one regulatory interactions. Given the lengthy history of using small scale experimental assays to investigate individual interactions, we reason that a reliable collection of gene regulatory interactions could be compiled by systematically inspecting the published literature. To this end, we developed a curation protocol to examine and record evidence of regulatory interactions at the individual experiment level. Focusing on the area of brain development, we applied our pipeline to 1,310 publications. We identified 3,601 individual experiments, providing detailed information about 1,499 regulatory interactions. Many of these interactions have verified activity specifically in the embryonic brain. By capturing reports of regulatory interactions at this level of detail, we equip the users with more granular information than other similar resources, enabling more informed assessments of reliability.
Collapse
Affiliation(s)
- Eric Ching-Pan Chu
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Alexander Morin
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Tak Hou Calvin Chang
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Tue Nguyen
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Yi-Cheng Tsai
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Aman Sharma
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Chao Chun Liu
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
- * E-mail:
| |
Collapse
|
31
|
Wang Z, He Y. Precision omics data integration and analysis with interoperable ontologies and their application for COVID-19 research. Brief Funct Genomics 2021; 20:235-248. [PMID: 34159360 PMCID: PMC8287950 DOI: 10.1093/bfgp/elab029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 05/10/2021] [Accepted: 05/24/2021] [Indexed: 12/12/2022] Open
Abstract
Omics technologies are widely used in biomedical research. Precision medicine focuses on individual-level disease treatment and prevention. Here, we propose the usage of the term 'precision omics' to represent the combinatorial strategy that applies omics to translate large-scale molecular omics data for precision disease understanding and accurate disease diagnosis, treatment and prevention. Given the complexity of both omics and precision medicine, precision omics requires standardized representation and integration of heterogeneous data types. Ontology has emerged as an important artificial intelligence component to become critical for standard data and metadata representation, standardization and integration. To support precision omics, we propose a precision omics ontology hypothesis, which hypothesizes that the effectiveness of precision omics is positively correlated with the interoperability of ontologies used for data and knowledge integration. Therefore, to make effective precision omics studies, interoperable ontologies are required to standardize and incorporate heterogeneous data and knowledge in a human- and computer-interpretable manner. Methods for efficient development and application of interoperable ontologies are proposed and illustrated. With the interoperable omics data and knowledge, omics tools such as OmicsViz can also be evolved to process, integrate, visualize and analyze various omics data, leading to the identification of new knowledge and hypotheses of molecular mechanisms underlying the outcomes of diseases such as COVID-19. Given extensive COVID-19 omics research, we propose the strategy of precision omics supported by interoperable ontologies, accompanied with ontology-based semantic reasoning and machine learning, leading to systematic disease mechanism understanding and rational design of precision treatment and prevention. SHORT ABSTRACT Precision medicine focuses on individual-level disease treatment and prevention. Precision omics is a new strategy that applies omics for precision medicine research, which requires standardized representation and integration of individual genetics and phenotypes, experimental conditions, and data analysis settings. Ontology has emerged as an important artificial intelligence component to become critical for standard data and metadata representation, standardization and integration. To support precision omics, interoperable ontologies are required in order to standardize and incorporate heterogeneous data and knowledge in a human- and computer-interpretable manner. With the interoperable omics data and knowledge, omics tools such as OmicsViz can also be evolved to process, integrate, visualize and analyze various omics data, leading to the identification of new knowledge and hypotheses of molecular mechanisms underlying disease outcomes. The precision COVID-19 omics study is provided as the primary use case to illustrate the rationale and implementation of the precision omics strategy.
Collapse
Affiliation(s)
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI, USA
| |
Collapse
|
32
|
Puig RR, Boddie P, Khan A, Castro-Mondragon JA, Mathelier A. UniBind: maps of high-confidence direct TF-DNA interactions across nine species. BMC Genomics 2021; 22:482. [PMID: 34174819 PMCID: PMC8236138 DOI: 10.1186/s12864-021-07760-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 05/27/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Transcription factors (TFs) bind specifically to TF binding sites (TFBSs) at cis-regulatory regions to control transcription. It is critical to locate these TF-DNA interactions to understand transcriptional regulation. Efforts to predict bona fide TFBSs benefit from the availability of experimental data mapping DNA binding regions of TFs (chromatin immunoprecipitation followed by sequencing - ChIP-seq). RESULTS In this study, we processed ~ 10,000 public ChIP-seq datasets from nine species to provide high-quality TFBS predictions. After quality control, it culminated with the prediction of ~ 56 million TFBSs with experimental and computational support for direct TF-DNA interactions for 644 TFs in > 1000 cell lines and tissues. These TFBSs were used to predict > 197,000 cis-regulatory modules representing clusters of binding events in the corresponding genomes. The high-quality of the TFBSs was reinforced by their evolutionary conservation, enrichment at active cis-regulatory regions, and capacity to predict combinatorial binding of TFs. Further, we confirmed that the cell type and tissue specificity of enhancer activity was correlated with the number of TFs with binding sites predicted in these regions. All the data is provided to the community through the UniBind database that can be accessed through its web-interface ( https://unibind.uio.no/ ), a dedicated RESTful API, and as genomic tracks. Finally, we provide an enrichment tool, available as a web-service and an R package, for users to find TFs with enriched TFBSs in a set of provided genomic regions. CONCLUSIONS UniBind is the first resource of its kind, providing the largest collection of high-confidence direct TF-DNA interactions in nine species.
Collapse
Affiliation(s)
- Rafael Riudavets Puig
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0349, Oslo, Norway
| | - Paul Boddie
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0349, Oslo, Norway
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0349, Oslo, Norway
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | | | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0349, Oslo, Norway.
- Department of Medical Genetics, Oslo University Hospital, Oslo, 0424, Norway.
| |
Collapse
|
33
|
Choteau SA, Wagner A, Pierre P, Spinelli L, Brun C. MetamORF: a repository of unique short open reading frames identified by both experimental and computational approaches for gene and metagene analyses. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6307706. [PMID: 34156446 PMCID: PMC8218702 DOI: 10.1093/database/baab032] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 04/08/2021] [Accepted: 05/17/2021] [Indexed: 11/12/2022]
Abstract
The development of high-throughput technologies revealed the existence of non-canonical short open reading frames (sORFs) on most eukaryotic ribonucleic acids. They are ubiquitous genetic elements conserved across species and suspected to be involved in numerous cellular processes. MetamORF (https://metamorf.hb.univ-amu.fr/) aims to provide a repository of unique sORFs identified in the human and mouse genomes with both experimental and computational approaches. By gathering publicly available sORF data, normalizing them and summarizing redundant information, we were able to identify a total of 1 162 675 unique sORFs. Despite the usual characterization of ORFs as short, upstream or downstream, there is currently no clear consensus regarding the definition of these categories. Thus, the data have been reprocessed using a normalized nomenclature. MetamORF enables new analyses at locus, gene, transcript and ORF levels, which should offer the possibility to address new questions regarding sORF functions in the future. The repository is available through an user-friendly web interface, allowing easy browsing, visualization, filtering over multiple criteria and export possibilities. sORFs can be searched starting from a gene, a transcript and an ORF ID, looking in a genome area or browsing the whole repository for a species. The database content has also been made available through track hubs at UCSC Genome Browser. Finally, we demonstrated an enrichment of genes harboring upstream ORFs among genes expressed in response to reticular stress. Database URL https://metamorf.hb.univ-amu.fr/.
Collapse
Affiliation(s)
- Sebastien A Choteau
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,Aix-Marseille University, INSERM, CNRS, CIML, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France
| | - Audrey Wagner
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France
| | - Philippe Pierre
- Aix-Marseille University, INSERM, CNRS, CIML, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,Department of Medical Sciences, Institute for Research in Biomedicine (iBiMED) and Ilidio Pinho Foundation, University of Aveiro, Aveiro 3810-193, Portugal.,Shanghai Institute of Immunology, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Lionel Spinelli
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,Aix-Marseille University, INSERM, CNRS, CIML, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France
| | - Christine Brun
- Aix-Marseille University, INSERM, TAGC, Turing Centre for Living Systems, 163 Avenue de Luminy, Marseille 13009, France.,CNRS, 31 Chemin Joseph Aiguier, Marseille 13009, France
| |
Collapse
|
34
|
Wang Z, He Y, Huang J, Yang X. Integrative web-based analysis of omics data for study of drugs against SARS-CoV-2. Sci Rep 2021; 11:10763. [PMID: 34031435 PMCID: PMC8144609 DOI: 10.1038/s41598-021-89578-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/28/2021] [Indexed: 12/27/2022] Open
Abstract
Research on drugs against SARS-CoV-2 (cause of COVID-19) has been one of the major world concerns at present. There have been abundant research data and findings in this field. The interference of drugs on gene expression in cell lines, drug-target, protein-virus receptor networks, and immune cell infiltration of the host may provide useful information for anti-SARS-CoV-2 drug research. To simplify the complex bioinformatics analysis and facilitate the evaluation of the latest research data, we developed OmiczViz ( http://medcode.link/omicsviz ), a web tool that has integrated drug-cell line interference data, virus-host protein-protein interactions, and drug-target interactions. To demonstrate the usages of OmiczViz, we analyzed the gene expression data from cell lines treated with chloroquine and ruxolitinib, the drug-target protein networks of 48 anti-coronavirus drugs and drugs bound with ACE2, and the profiles of immune cell infiltration between different COVID-19 patient groups. Our research shows that chloroquine had a regulatory role of the immune response in renal cell line but not in lung cell line. The anti-coronavirus drug-target network analysis suggested that antihistamine of promethaziney and dietary supplement of Zinc might be beneficial when used jointly with antiviral drugs. The immune infiltration analysis indicated that both the COVID-19 patients admitted to the ICU and the elderly with infection showed immune exhaustion status, yet with different molecular mechanisms. The interactive graphic interface of OmiczViz also makes it easier to analyze newly discovered and user-uploaded data, leading to an in-depth understanding of existing findings and an expansion of existing knowledge of SARS-CoV-2. Collectively, OmicsViz is web program that promotes the research on medical agents against SARS-CoV-2 and supports the evaluation of the latest research findings.
Collapse
Affiliation(s)
- ZhiGang Wang
- Department of Biomedical Engineering, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, 100005, China
| | - YongQun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48105, USA
| | - Jing Huang
- Department of Respiratory and Critical Care Medicine, Chongqing General Hospital, University of Chinese Academy of Sciences, Chongqing, 400014, China
| | - XiaoLin Yang
- Department of Biomedical Engineering, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, 100005, China.
| |
Collapse
|
35
|
Lim N, Tesar S, Belmadani M, Poirier-Morency G, Mancarci BO, Sicherman J, Jacobson M, Leong J, Tan P, Pavlidis P. Curation of over 10 000 transcriptomic studies to enable data reuse. Database (Oxford) 2021; 2021:6143045. [PMID: 33599246 PMCID: PMC7904053 DOI: 10.1093/database/baab006] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 12/09/2020] [Accepted: 01/28/2021] [Indexed: 01/07/2023]
Abstract
Vast amounts of transcriptomic data reside in public repositories, but effective reuse remains challenging. Issues include unstructured dataset metadata, inconsistent data processing and quality control, and inconsistent probe-gene mappings across microarray technologies. Thus, extensive curation and data reprocessing are necessary prior to any reuse. The Gemma bioinformatics system was created to help address these issues. Gemma consists of a database of curated transcriptomic datasets, analytical software, a web interface and web services. Here we present an update on Gemma's holdings, data processing and analysis pipelines, our curation guidelines, and software features. As of June 2020, Gemma contains 10 811 manually curated datasets (primarily human, mouse and rat), over 395 000 samples and hundreds of curated transcriptomic platforms (both microarray and RNA sequencing). Dataset topics were represented with 10 215 distinct terms from 12 ontologies, for a total of 54 316 topic annotations (mean topics/dataset = 5.2). While Gemma has broad coverage of conditions and tissues, it captures a large majority of available brain-related datasets, accounting for 34% of its holdings. Users can access the curated data and differential expression analyses through the Gemma website, RESTful service and an R package. Database URL: https://gemma.msl.ubc.ca/home.html.
Collapse
Affiliation(s)
- Nathaniel Lim
- Genome Science and Technology Graduate Program, University of British Columbia, Vancouver, BC V6T1Z4, Canada,Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T1Z4, Canada
| | - Stepan Tesar
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T1Z4, Canada
| | - Manuel Belmadani
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T1Z4, Canada
| | - Guillaume Poirier-Morency
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T1Z4, Canada
| | - Burak Ogan Mancarci
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T1Z4, Canada,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| | - Jordan Sicherman
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T1Z4, Canada,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| | - Matthew Jacobson
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T1Z4, Canada
| | - Justin Leong
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T1Z4, Canada
| | - Patrick Tan
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC V6T1Z4, Canada
| | | |
Collapse
|
36
|
Cabau-Laporta J, Ascensión AM, Arrospide-Elgarresta M, Gerovska D, Araúzo-Bravo MJ. FOntCell: Fusion of Ontologies of Cells. Front Cell Dev Biol 2021; 9:562908. [PMID: 33644039 PMCID: PMC7905052 DOI: 10.3389/fcell.2021.562908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 01/05/2021] [Indexed: 11/25/2022] Open
Abstract
High-throughput cell-data technologies such as single-cell RNA-seq create a demand for algorithms for automatic cell classification and characterization. There exist several cell classification ontologies with complementary information. However, one needs to merge them to synergistically combine their information. The main difficulty in merging is to match the ontologies since they use different naming conventions. Therefore, we developed an algorithm that merges ontologies by integrating the name matching between class label names with the structure mapping between the ontology elements based on graph convolution. Since the structure mapping is a time consuming process, we designed two methods to perform the graph convolution: vectorial structure matching and constraint-based structure matching. To perform the vectorial structure matching, we designed a general method to calculate the similarities between vectors of different lengths for different metrics. Additionally, we adapted the slower Blondel method to work for structure matching. We implemented our algorithms into FOntCell, a software module in Python for efficient automatic parallel-computed merging/fusion of ontologies in the same or similar knowledge domains. FOntCell can unify dispersed knowledge from one domain into a unique ontology in OWL format and iteratively reuse it to continuously adapt ontologies with new data endlessly produced by data-driven classification methods, such as of the Human Cell Atlas. To navigate easily across the merged ontologies, it generates HTML files with tabulated and graphic summaries, and interactive circular Directed Acyclic Graphs. We used FOntCell to merge the CELDA, LifeMap and LungMAP Human Anatomy cell ontologies into a comprehensive cell ontology. We compared FOntCell with tools used for the alignment of mouse and human anatomy ontologies task proposed by the Ontology Alignment Evaluation Initiative (OAEI) and found that the Fβ alignment accuracies of FOntCell are above the geometric mean of the other tools; more importantly, it outperforms significantly the best OAEI tools in cell ontology alignment in terms of Fβ alignment accuracies.
Collapse
Affiliation(s)
- Javier Cabau-Laporta
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain
| | - Alex M Ascensión
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain
| | - Mikel Arrospide-Elgarresta
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain
| | - Daniela Gerovska
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain.,Computational Biomedicine Data Analysis Platform, Biodonostia Health Research Institute, San Sebastián, Spain
| | - Marcos J Araúzo-Bravo
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain.,Computational Biomedicine Data Analysis Platform, Biodonostia Health Research Institute, San Sebastián, Spain.,Basque Foundation for Science (IKERBASQUE), Bilbao, Spain.,Centro de Investigación Biomédica en Red (CIBER) of Frailty and Healthy Aging (CIBERfes), Madrid, Spain.,TransBioNet Thematic Network of Excellence for Transitional Bioinformatics, Barcelona Supercomputing Center, Barcelona, Spain.,Computational Biology and Bioinformatics, Department Cell and Developmental Biology Max Planck Institute for Molecular Biomedicine, Münster, Germany
| |
Collapse
|
37
|
Lyu C, Chen T, Qiang B, Liu N, Wang H, Zhang L, Liu Z. CMNPD: a comprehensive marine natural products database towards facilitating drug discovery from the ocean. Nucleic Acids Res 2021; 49:D509-D515. [PMID: 32986829 PMCID: PMC7779072 DOI: 10.1093/nar/gkaa763] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 09/01/2020] [Accepted: 09/03/2020] [Indexed: 12/15/2022] Open
Abstract
Marine organisms are expected to be an important source of inspiration for drug discovery after terrestrial plants and microorganisms. Despite the remarkable progress in the field of marine natural products (MNPs) chemistry, there are only a few open access databases dedicated to MNPs research. To meet the growing demand for mining and sharing for MNPs-related data resources, we developed CMNPD, a comprehensive marine natural products database based on manually curated data. CMNPD currently contains more than 31 000 chemical entities with various physicochemical and pharmacokinetic properties, standardized biological activity data, systematic taxonomy and geographical distribution of source organisms, and detailed literature citations. It is an integrated platform for structure dereplication (assessment of novelty) of (marine) natural products, discovery of lead compounds, data mining of structure-activity relationships and investigation of chemical ecology. Access is available through a user-friendly web interface at https://www.cmnpd.org. We are committed to providing a free data sharing platform for not only professional MNPs researchers but also the broader scientific community to facilitate drug discovery from the ocean.
Collapse
Affiliation(s)
- Chuanyu Lyu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100191, China
| | - Tong Chen
- National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Bo Qiang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100191, China
| | - Ningfeng Liu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100191, China
| | - Heyu Wang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100191, China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100191, China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100191, China
| |
Collapse
|
38
|
Xie J, Zi W, Li Z, He Y. Ontology-based Precision Vaccinology for Deep Mechanism Understanding and Precision Vaccine Development. Curr Pharm Des 2021; 27:900-910. [PMID: 33238868 DOI: 10.2174/1381612826666201125112131] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 10/08/2020] [Indexed: 11/22/2022]
Abstract
Vaccination is one of the most important innovations in human history. It has also become a hot research area in a new application - the development of new vaccines against non-infectious diseases such as cancers. However, effective and safe vaccines still do not exist for many diseases, and where vaccines exist, their protective immune mechanisms are often unclear. Although licensed vaccines are generally safe, various adverse events, and sometimes severe adverse events, still exist for a small population. Precision medicine tailors medical intervention to the personal characteristics of individual patients or sub-populations of individuals with similar immunity-related characteristics. Precision vaccinology is a new strategy that applies precision medicine to the development, administration, and post-administration analysis of vaccines. Several conditions contribute to make this the right time to embark on the development of precision vaccinology. First, the increased level of research in vaccinology has generated voluminous "big data" repositories of vaccinology data. Secondly, new technologies such as multi-omics and immunoinformatics bring new methods for investigating vaccines and immunology. Finally, the advent of AI and machine learning software now makes possible the marriage of Big Data to the development of new vaccines in ways not possible before. However, something is missing in this marriage, and that is a common language that facilitates the correlation, analysis, and reporting nomenclature for the field of vaccinology. Solving this bioinformatics problem is the domain of applied biomedical ontology. Ontology in the informatics field is human- and machine-interpretable representation of entities and the relations among entities in a specific domain. The Vaccine Ontology (VO) and Ontology of Vaccine Adverse Events (OVAE) have been developed to support the standard representation of vaccines, vaccine components, vaccinations, host responses, and vaccine adverse events. Many other biomedical ontologies have also been developed and can be applied in vaccine research. Here, we review the current status of precision vaccinology and how ontological development will enhance this field, and propose an ontology-based precision vaccinology strategy to support precision vaccine research and development.
Collapse
Affiliation(s)
- Jiangan Xie
- Chongqing Engineering Research Center of Medical Electronics and Information Technology, School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Wenrui Zi
- Chongqing engineering research center of medical electronics and information technology, School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Zhangyong Li
- Chongqing engineering research center of medical electronics and information technology, School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Yongqun He
- Unit of Laboratory Animal Medicine, Development of Microbiology and Immunology, Center of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, United States
| |
Collapse
|
39
|
|
40
|
Kanza S, Graham Frey J. Semantic Technologies in Drug Discovery. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
41
|
Issa NT, Stathias V, Schürer S, Dakshanamurthy S. Machine and deep learning approaches for cancer drug repurposing. Semin Cancer Biol 2021; 68:132-142. [PMID: 31904426 PMCID: PMC7723306 DOI: 10.1016/j.semcancer.2019.12.011] [Citation(s) in RCA: 133] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 10/31/2019] [Accepted: 12/15/2019] [Indexed: 02/07/2023]
Abstract
Knowledge of the underpinnings of cancer initiation, progression and metastasis has increased exponentially in recent years. Advanced "omics" coupled with machine learning and artificial intelligence (deep learning) methods have helped elucidate targets and pathways critical to those processes that may be amenable to pharmacologic modulation. However, the current anti-cancer therapeutic armamentarium continues to lag behind. As the cost of developing a new drug remains prohibitively expensive, repurposing of existing approved and investigational drugs is sought after given known safety profiles and reduction in the cost barrier. Notably, successes in oncologic drug repurposing have been infrequent. Computational in-silico strategies have been developed to aid in modeling biological processes to find new disease-relevant targets and discovering novel drug-target and drug-phenotype associations. Machine and deep learning methods have especially enabled leaps in those successes. This review will discuss these methods as they pertain to cancer biology as well as immunomodulation for drug repurposing opportunities in oncologic diseases.
Collapse
Affiliation(s)
- Naiem T Issa
- Dr. Phillip Frost Department of Dermatology and Cutaneous Surgery, University of Miami School of Medicine, Miami, FL, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, University of Miami School of Medicine, Miami, FL, USA
| | - Stephan Schürer
- Department of Molecular and Cellular Pharmacology, University of Miami School of Medicine, Miami, FL, USA
| | - Sivanesan Dakshanamurthy
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
42
|
Eklund N, Andrianarisoa NH, van Enckevort E, Anton G, Debucquoy A, Müller H, Zaharenko L, Engels C, Ebert L, Neumann M, Geeraert J, T'Joen V, Demski H, Caboux É, Proynova R, Parodi B, Mate S, van Iperen E, Merino-Martinez R, Quinlan PR, Holub P, Silander K. Extending the Minimum Information About BIobank Data Sharing Terminology to Describe Samples, Sample Donors, and Events. Biopreserv Biobank 2020; 18:155-164. [PMID: 32302498 PMCID: PMC7310316 DOI: 10.1089/bio.2019.0129] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Introduction: The Minimum Information About BIobank data Sharing (MIABIS) was initiated in 2012. MIABIS aims to create a common biobank terminology to facilitate data sharing in biobanks and sample collections. The MIABIS Core terminology consists of three components describing biobanks, sample collections, and studies, in which information on samples and sample donors is provided at aggregated form. However, there is also a need to describe samples and sample donors at an individual level to allow more elaborate queries on available biobank samples and data. Therefore the MIABIS terminology has now been extended with components describing samples and sample donors at an individual level. Materials and Methods: The components were defined according to specific scope and use cases by a large group of experts, and through several cycles of reviews, according to the new MIABIS governance model of BBMRI-ERIC (Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium). The guiding principles applied in developing these components included the following terms: model should consider only samples of human origin, model should be applicable to all types of samples and all sample donors, and model should describe the current status of samples stored in a given biobank. Results: A minimal set of standard attributes for defining samples and sample donors is presented here. We added an "event" component to describe attributes that are not directly describing samples or sample donors but are tightly related to them. To better utilize the generic data model, we suggest a procedure by which interoperability can be promoted, using specific MIABIS profiles. Discussion: The MIABIS sample and donor component extensions and the new generic data model complement the existing MIABIS Core 2.0 components, and substantially increase the potential usability of this terminology for better describing biobank samples and sample donors. They also support the use of individual level data about samples and sample donors to obtain accurate and detailed biobank availability queries.
Collapse
Affiliation(s)
- Niina Eklund
- THL Biobank, Department of Public Health Solutions, Finnish Institute for Health and Welfare, Helsinki, Finland
| | | | - Esther van Enckevort
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | | | | | - Heimo Müller
- Diagnostic and Research Center for Molecular BioMedicine, Medical University of Graz, Graz, Austria
| | | | | | | | - Michael Neumann
- Interdisciplinary Bank of Biomaterials and Data Würzburg, University Hospital Würzburg, Würzburg, Germany
| | - Joachim Geeraert
- Faculty of Medicine and Health Sciences, University of Ghent/University Hospital Ghent, Ghent, Belgium
| | - Veronique T'Joen
- Faculty of Medicine and Health Sciences, University of Ghent/University Hospital Ghent, Ghent, Belgium
| | - Hans Demski
- Helmholtz Zentrum München, Neuherberg, Germany
| | | | | | | | - Sebastian Mate
- Medical Centre for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Erik van Iperen
- Amsterdam UMC Biobank, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | | | - Philip R Quinlan
- Digital Research Service, University of Nottingham, Nottingham, United Kingdom
| | | | - Kaisa Silander
- THL Biobank, Department of Public Health Solutions, Finnish Institute for Health and Welfare, Helsinki, Finland
| |
Collapse
|
43
|
Moriya Y, Kawano S, Okuda S, Watanabe Y, Matsumoto M, Takami T, Kobayashi D, Yamanouchi Y, Araki N, Yoshizawa AC, Tabata T, Iwasaki M, Sugiyama N, Tanaka S, Goto S, Ishihama Y. The jPOST environment: an integrated proteomics data repository and database. Nucleic Acids Res 2020; 47:D1218-D1224. [PMID: 30295851 PMCID: PMC6324006 DOI: 10.1093/nar/gky899] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 09/24/2018] [Indexed: 01/13/2023] Open
Abstract
Rapid progress is being made in mass spectrometry (MS)-based proteomics, yielding an increasing number of larger datasets with higher quality and higher throughput. To integrate proteomics datasets generated from various projects and institutions, we launched a project named jPOST (Japan ProteOme STandard Repository/Database, https://jpostdb.org/) in 2015. Its proteomics data repository, jPOSTrepo, began operations in 2016 and has accepted more than 10 TB of MS-based proteomics datasets in the past two years. In addition, we have developed a new proteomics database named jPOSTdb in which the published raw datasets in jPOSTrepo are reanalyzed using standardized protocol. jPOSTdb provides viewers showing the frequency of detected post-translational modifications, the co-occurrence of phosphorylation sites on a peptide and peptide sharing among proteoforms. jPOSTdb also provides basic statistical analysis tools to compare proteomics datasets.
Collapse
Affiliation(s)
- Yuki Moriya
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa 277-0871, Japan
| | - Shin Kawano
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa 277-0871, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | - Yu Watanabe
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | - Masaki Matsumoto
- Medical Institute of Bioregulation, Kyushu University, Fukuoka 812-8582, Japan
| | - Tomoyo Takami
- Medical Institute of Bioregulation, Kyushu University, Fukuoka 812-8582, Japan
| | - Daiki Kobayashi
- Graduate School of Medical Sciences, Faculty of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| | - Yoshinori Yamanouchi
- Graduate School of Medical Sciences, Faculty of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan.,Kumamoto University Hospital, Kumamoto 860-8556, Japan
| | - Norie Araki
- Graduate School of Medical Sciences, Faculty of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| | - Akiyasu C Yoshizawa
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan
| | - Tsuyoshi Tabata
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan.,Center for iPS Cell Research and Application, Kyoto University, Kyoto 606-8507, Japan
| | - Mio Iwasaki
- Center for iPS Cell Research and Application, Kyoto University, Kyoto 606-8507, Japan
| | - Naoyuki Sugiyama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan
| | | | - Susumu Goto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa 277-0871, Japan
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan
| |
Collapse
|
44
|
Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 2020; 47:D930-D940. [PMID: 30398643 PMCID: PMC6323927 DOI: 10.1093/nar/gky1075] [Citation(s) in RCA: 1233] [Impact Index Per Article: 246.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/18/2018] [Indexed: 12/31/2022] Open
Abstract
ChEMBL is a large, open-access bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012, 2014 and 2017 Nucleic Acids Research Database Issues. In the last two years, several important improvements have been made to the database and are described here. These include more robust capture and representation of assay details; a new data deposition system, allowing updating of data sets and deposition of supplementary data; and a completely redesigned web interface, with enhanced search and filtering capabilities.
Collapse
Affiliation(s)
- David Mendez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - A Patrícia Bento
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jon Chambers
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marleen De Veij
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eloy Félix
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - María Paula Magariños
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Juan F Mosquera
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Prudence Mutowo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Michal Nowotka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - María Gordillo-Marañón
- Institute of Cardiovascular Science, University College London, Gower Street, London WC1E 6BT, UK
| | - Fiona Hunter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Laura Junco
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Grace Mugumbate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Milagros Rodriguez-Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Francis Atkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nicolas Bosc
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Chris J Radoux
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Aldo Segura-Cabrera
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anne Hersey
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
45
|
Vita R, Mahajan S, Overton JA, Dhanda SK, Martini S, Cantrell JR, Wheeler DK, Sette A, Peters B. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Res 2020; 47:D339-D343. [PMID: 30357391 PMCID: PMC6324067 DOI: 10.1093/nar/gky1006] [Citation(s) in RCA: 1285] [Impact Index Per Article: 257.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/11/2018] [Indexed: 12/18/2022] Open
Abstract
The Immune Epitope Database (IEDB, iedb.org) captures experimental data confined in figures, text and tables of the scientific literature, making it freely available and easily searchable to the public. The scope of the IEDB extends across immune epitope data related to all species studied and includes antibody, T cell, and MHC binding contexts associated with infectious, allergic, autoimmune, and transplant related diseases. Having been publicly accessible for >10 years, the recent focus of the IEDB has been improved query and reporting functionality to meet the needs of our users to access and summarize data that continues to grow in quantity and complexity. Here we present an update on our current efforts and future goals.
Collapse
Affiliation(s)
- Randi Vita
- La Jolla Institute for Allergy and Immunology, Division of Vaccine Discovery, La Jolla, CA 92037, USA
| | - Swapnil Mahajan
- La Jolla Institute for Allergy and Immunology, Division of Vaccine Discovery, La Jolla, CA 92037, USA
| | | | - Sandeep Kumar Dhanda
- La Jolla Institute for Allergy and Immunology, Division of Vaccine Discovery, La Jolla, CA 92037, USA
| | - Sheridan Martini
- La Jolla Institute for Allergy and Immunology, Division of Vaccine Discovery, La Jolla, CA 92037, USA
| | | | | | - Alessandro Sette
- La Jolla Institute for Allergy and Immunology, Division of Vaccine Discovery, La Jolla, CA 92037, USA.,University of California San Diego, Department of Medicine, La Jolla, CA 92093, USA
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, Division of Vaccine Discovery, La Jolla, CA 92037, USA.,University of California San Diego, Department of Medicine, La Jolla, CA 92093, USA
| |
Collapse
|
46
|
Minimum Information and Quality Standards for Conducting, Reporting, and Organizing In Vitro Research. Handb Exp Pharmacol 2020; 257:177-196. [PMID: 31628600 DOI: 10.1007/164_2019_284] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Insufficient description of experimental practices can contribute to difficulties in reproducing research findings. In response to this, "minimum information" guidelines have been developed for different disciplines. These standards help ensure that the complete experiment is described, including both experimental protocols and data processing methods, allowing a critical evaluation of the whole process and the potential recreation of the work. Selected examples of minimum information checklists with relevance for in vitro research are presented here and are collected by and registered at the MIBBI/FAIRsharing Information Resource portal.In addition, to support integrative research and to allow for comparisons and data sharing across studies, ontologies and vocabularies need to be defined and integrated across areas of in vitro research. As examples, this chapter addresses ontologies for cells and bioassays and discusses their importance for in vitro studies.Finally, specific quality requirements for important in vitro research tools (like chemical probes, antibodies, and cell lines) are suggested, and remaining issues are discussed.
Collapse
|
47
|
Rahman RU, Liebhoff AM, Bansal V, Fiosins M, Rajput A, Sattar A, Magruder DS, Madan S, Sun T, Gautam A, Heins S, Liwinski T, Bethune J, Trenkwalder C, Fluck J, Mollenhauer B, Bonn S. SEAweb: the small RNA Expression Atlas web application. Nucleic Acids Res 2020; 48:D204-D219. [PMID: 31598718 PMCID: PMC6943056 DOI: 10.1093/nar/gkz869] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 09/14/2019] [Accepted: 10/01/2019] [Indexed: 12/12/2022] Open
Abstract
We present the Small RNA Expression Atlas (SEAweb), a web application that allows for the interactive querying, visualization and analysis of known and novel small RNAs across 10 organisms. It contains sRNA and pathogen expression information for over 4200 published samples with standardized search terms and ontologies. In addition, SEAweb allows for the interactive visualization and re-analysis of 879 differential expression and 514 classification comparisons. SEAweb's user model enables sRNA researchers to compare and re-analyze user-specific and published datasets, highlighting common and distinct sRNA expression patterns. We provide evidence for SEAweb's fidelity by (i) generating a set of 591 tissue specific miRNAs across 29 tissues, (ii) finding known and novel bacterial and viral infections across diseases and (iii) determining a Parkinson's disease-specific blood biomarker signature using novel data. We believe that SEAweb's simple semantic search interface, the flexible interactive reports and the user model with rich analysis capabilities will enable researchers to better understand the potential function and diagnostic value of sRNAs or pathogens across tissues, diseases and organisms.
Collapse
Affiliation(s)
- Raza-Ur Rahman
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Anna-Maria Liebhoff
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Vikas Bansal
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
- German Center for Neurodegenerative Diseases, 72076 Tübingen, Germany
| | - Maksims Fiosins
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
- German Center for Neurodegenerative Diseases, 72076 Tübingen, Germany
- Genevention GmbH, 37079 Göttingen, Germany
| | - Ashish Rajput
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Abdul Sattar
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Daniel S Magruder
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
- Genevention GmbH, 37079 Göttingen, Germany
| | - Sumit Madan
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, 53757 Sankt Augustin, Germany
- Rheinische Friedrich-Wilhelms-Universität Bonn, 53113 Bonn, Germany
| | - Ting Sun
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
- Department of Neurogenetics, Max Planck Institute of Experimental Medicine, 37075 Göttingen, Germany
| | - Abhivyakti Gautam
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sven Heins
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Timur Liwinski
- Department of Medicine, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Jörn Bethune
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Claudia Trenkwalder
- Paracelsus-Elena-Klinik, 34128 Kassel, Germany
- Department of Neurosurgery, University Medical Center Göttingen, 37075 Göttingen, Germany
| | - Juliane Fluck
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, 53757 Sankt Augustin, Germany
- Institute of Geodesy and Geoinformation, University of Bonn, 53115 Bonn, Germany
- German National Library of Medicine (ZB MED) - Information Centre for Life Sciences, 53115 Bonn, Germany
| | - Brit Mollenhauer
- Paracelsus-Elena-Klinik, 34128 Kassel, Germany
- Institute of Neurology, University Medical Center Göttingen, 37075 Göttingen, Germany
| | - Stefan Bonn
- Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany
- German Center for Neurodegenerative Diseases, 72076 Tübingen, Germany
| |
Collapse
|
48
|
Abugessaisa I, Noguchi S, Böttcher M, Hasegawa A, Kouno T, Kato S, Tada Y, Ura H, Abe K, Shin JW, Plessy C, Carninci P, Kasukawa T. SCPortalen: human and mouse single-cell centric database. Nucleic Acids Res 2019; 46:D781-D787. [PMID: 29045713 PMCID: PMC5753281 DOI: 10.1093/nar/gkx949] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 10/05/2017] [Indexed: 01/07/2023] Open
Abstract
Published single-cell datasets are rich resources for investigators who want to address questions not originally asked by the creators of the datasets. The single-cell datasets might be obtained by different protocols and diverse analysis strategies. The main challenge in utilizing such single-cell data is how we can make the various large-scale datasets to be comparable and reusable in a different context. To challenge this issue, we developed the single-cell centric database ‘SCPortalen’ (http://single-cell.clst.riken.jp/). The current version of the database covers human and mouse single-cell transcriptomics datasets that are publicly available from the INSDC sites. The original metadata was manually curated and single-cell samples were annotated with standard ontology terms. Following that, common quality assessment procedures were conducted to check the quality of the raw sequence. Furthermore, primary data processing of the raw data followed by advanced analyses and interpretation have been performed from scratch using our pipeline. In addition to the transcriptomics data, SCPortalen provides access to single-cell image files whenever available. The target users of SCPortalen are all researchers interested in specific cell types or population heterogeneity. Through the web interface of SCPortalen users are easily able to search, explore and download the single-cell datasets of their interests.
Collapse
Affiliation(s)
- Imad Abugessaisa
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| | - Shuhei Noguchi
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| | - Michael Böttcher
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| | - Akira Hasegawa
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| | - Tsukasa Kouno
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| | - Sachi Kato
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| | - Yuhki Tada
- RIKEN BioResource Center, Tsukuba, Ibaraki 305-0074, Japan
| | - Hiroki Ura
- RIKEN BioResource Center, Tsukuba, Ibaraki 305-0074, Japan
| | - Kuniya Abe
- RIKEN BioResource Center, Tsukuba, Ibaraki 305-0074, Japan
| | - Jay W Shin
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| | - Charles Plessy
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| | - Piero Carninci
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| | - Takeya Kasukawa
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies (CLST), Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
49
|
Koleti A, Terryn R, Stathias V, Chung C, Cooper DJ, Turner JP, Vidovic D, Forlin M, Kelley TT, D'Urso A, Allen BK, Torre D, Jagodnik KM, Wang L, Jenkins SL, Mader C, Niu W, Fazel M, Mahi N, Pilarczyk M, Clark N, Shamsaei B, Meller J, Vasiliauskas J, Reichard J, Medvedovic M, Ma'ayan A, Pillai A, Schürer SC. Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: integrated access to diverse large-scale cellular perturbation response data. Nucleic Acids Res 2019; 46:D558-D566. [PMID: 29140462 PMCID: PMC5753343 DOI: 10.1093/nar/gkx1063] [Citation(s) in RCA: 113] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 10/19/2017] [Indexed: 11/21/2022] Open
Abstract
The Library of Integrated Network-based Cellular Signatures (LINCS) program is a national consortium funded by the NIH to generate a diverse and extensive reference library of cell-based perturbation-response signatures, along with novel data analytics tools to improve our understanding of human diseases at the systems level. In contrast to other large-scale data generation efforts, LINCS Data and Signature Generation Centers (DSGCs) employ a wide range of assay technologies cataloging diverse cellular responses. Integration of, and unified access to LINCS data has therefore been particularly challenging. The Big Data to Knowledge (BD2K) LINCS Data Coordination and Integration Center (DCIC) has developed data standards specifications, data processing pipelines, and a suite of end-user software tools to integrate and annotate LINCS-generated data, to make LINCS signatures searchable and usable for different types of users. Here, we describe the LINCS Data Portal (LDP) (http://lincsportal.ccs.miami.edu/), a unified web interface to access datasets generated by the LINCS DSGCs, and its underlying database, LINCS Data Registry (LDR). LINCS data served on the LDP contains extensive metadata and curated annotations. We highlight the features of the LDP user interface that is designed to enable search, browsing, exploration, download and analysis of LINCS data and related curated content.
Collapse
Affiliation(s)
- Amar Koleti
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA
| | - Raymond Terryn
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Vasileios Stathias
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA.,Department of Human Genetics and Genomics, Miller School of Medicine, University of Miami, FL, USA
| | - Caty Chung
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA
| | - Daniel J Cooper
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - John P Turner
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Dušica Vidovic
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Michele Forlin
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Tanya T Kelley
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Alessandro D'Urso
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA
| | - Bryce K Allen
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| | - Denis Torre
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kathleen M Jagodnik
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lily Wang
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sherry L Jenkins
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Christopher Mader
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA
| | - Wen Niu
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Mehdi Fazel
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Naim Mahi
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Marcin Pilarczyk
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Nicholas Clark
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Behrouz Shamsaei
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Jarek Meller
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Juozas Vasiliauskas
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - John Reichard
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Mario Medvedovic
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati, Cincinnati, OH, USA
| | - Avi Ma'ayan
- BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ajay Pillai
- Division of Genome Sciences, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephan C Schürer
- Center for Computational Science, University of Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, Icahn School of Medicine at Mount Sinai, University of Miami, University of Cincinnati, New York NY, Miami FL, Cincinnati OH, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, FL, USA
| |
Collapse
|
50
|
T'Joen V, Vaneeckhaute L, Priem S, Van Woensel S, Bekaert S, Berneel E, Van Der Straeten C. Rationalized Development of a Campus-Wide Cell Line Dataset for Implementation in the Biobank LIMS System at Bioresource Center Ghent. Front Med (Lausanne) 2019; 6:137. [PMID: 31294023 PMCID: PMC6603147 DOI: 10.3389/fmed.2019.00137] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 06/04/2019] [Indexed: 11/13/2022] Open
Abstract
The Bioresource center Ghent is the central hospital-integrated biobank of Ghent University Hospital. Our mission is to facilitate translational biomedical research by collecting, storing and providing high quality biospecimens to researchers. Several of our biobank partners store large amounts of cell lines. As cell lines are highly important both in basic research and preclinical screening phases, good annotation, authentication, and quality of these cell lines is pivotal in translational biomedical science. A Biobank Information Management System (BIMS) was implemented as sample and data management system for human bodily material. The samples are annotated by the use of defined datasets, based on the BRISQ (Biospecimen Reporting for Improved Study Quality) and Minimum Information About Biobank data Sharing (MIABIS) guidelines completed with SPREC (Standard PREanalytical Coding) information. However, the defined dataset for human bodily material is not ideal to capture the specific cell line data. Therefore, we set out to develop a rationalized cell line dataset. Through comparison of different datasets of online cell banks (human, animal, and stem cell), we established an extended cell line dataset of 156 data fields that was further analyzed until a smaller dataset—the survey dataset of 54 data fields—was obtained. The survey dataset was spread throughout our campus to all cell line users to rationalize the fields of the dataset and their potential use. Analysis of the survey data revealed only small differences in preferences in data fields between human, animal, and stem cell lines. Hence, one essential dataset for human, animal and stem cell lines was compiled consisting of 33 data fields. The essential dataset was prepared for implementation in our BIMS system. Good Clinical Data Management Practices formed the basis of our decisions in the implementation phase. Known standards, reference lists and ontologies (such as ICD-10-CM, animal taxonomy, cell line ontology…) were considered. The semantics of the data fields were clearly defined, enhancing the data quality of the stored cell lines. Therefore, we created an essential cell line dataset with defined data fields, useable for multiple cell line users.
Collapse
Affiliation(s)
- Veronique T'Joen
- Bioresource Center Ghent, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | - Lieven Vaneeckhaute
- Data Management Unit, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | - Sara Priem
- Bioresource Center Ghent, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | - Steven Van Woensel
- Bioresource Center Ghent, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | - Sofie Bekaert
- Department of Public Health and Primary Care, Faculty for Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Elke Berneel
- Bioresource Center Ghent, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | | |
Collapse
|