1
|
O’Cathail C, Ahamed A, Burgin J, Cummins C, Devaraj R, Gueye K, Gupta D, Gupta V, Haseeb M, Ihsan M, Ivanov E, Jayathilaka S, Kadhirvelu V, Kumar M, Lathi A, Leinonen R, McKinnon J, Meszaros L, Pauperio J, Pesant S, Rahman N, Rinck G, Selvakumar S, Suman S, Sunthornyotin Y, Ventouratou M, Waheed Z, Woollard P, Yuan D, Zyoud A, Burdett T, Cochrane G. The European Nucleotide Archive in 2024. Nucleic Acids Res 2025; 53:D49-D55. [PMID: 39558171 PMCID: PMC11701661 DOI: 10.1093/nar/gkae975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 10/08/2024] [Accepted: 10/14/2024] [Indexed: 11/20/2024] Open
Abstract
The European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena), maintained at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) provides freely accessible services, both for deposition of, and access to, open nucleotide sequencing data. Open scientific data are of paramount importance to the scientific community and contribute daily to the acceleration of scientific advance. Outlined here are changes to and updates on the ENA service in 2024, aligning with the broad goals of enhancing interoperability, globalisation of the service and scaling the platform to meet current and future needs.
Collapse
Affiliation(s)
- Colman O’Cathail
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Alisha Ahamed
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Josephine Burgin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Rajkumar Devaraj
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Khadim Gueye
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Vikas Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Muhammad Haseeb
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Maira Ihsan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eugene Ivanov
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Suran Jayathilaka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Vishnukumar Kadhirvelu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Manish Kumar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ankur Lathi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Rasko Leinonen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jasmine McKinnon
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Lili Meszaros
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Joana Pauperio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Stephane Pesant
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nadim Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gabriele Rinck
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sandeep Selvakumar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Swati Suman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Yanisa Sunthornyotin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marianna Ventouratou
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Zahra Waheed
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Peter Woollard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - David Yuan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ahmad Zyoud
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
2
|
Perez-Riverol Y, Bandla C, Kundu D, Kamatchinathan S, Bai J, Hewapathirana S, John N, Prakash A, Walzer M, Wang S, Vizcaíno J. The PRIDE database at 20 years: 2025 update. Nucleic Acids Res 2025; 53:D543-D553. [PMID: 39494541 PMCID: PMC11701690 DOI: 10.1093/nar/gkae1011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/11/2024] [Accepted: 10/16/2024] [Indexed: 11/05/2024] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's leading mass spectrometry (MS)-based proteomics data repository and one of the founding members of the ProteomeXchange consortium. This manuscript summarizes the developments in PRIDE resources and related tools for the last three years. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 534 datasets per month. This has been possible thanks to continuous improvements in infrastructure such as a new file transfer protocol for very large datasets (Globus), a new data resubmission pipeline and an automatic dataset validation process. Additionally, we will highlight novel activities such as the availability of the PRIDE chatbot (based on the use of open-source Large Language Models), and our work to improve support for MS crosslinking datasets. Furthermore, we will describe how we have increased our efforts to reuse, reanalyze and disseminate high-quality proteomics data into added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Nithu Sara John
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
3
|
Thakur M, Brooksbank C, Finn R, Firth H, Foreman J, Freeberg M, Gurwitz K, Harrison M, Hulcoop D, Hunt S, R. Leach A, Levchenko M, Marques D, McDonagh E, Mithani A, Parkinson H, Perez-Riverol Y, Perova Z, Sarkans U, Tirunagari S, Tzampatzopoulou E, Venkatesan A, Vizcaino JA, Wingfield B, Zdrazil B, McEntyre J. EMBL's European Bioinformatics Institute (EMBL-EBI) in 2024. Nucleic Acids Res 2025; 53:D10-D19. [PMID: 39607697 PMCID: PMC11701561 DOI: 10.1093/nar/gkae1089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/16/2024] [Accepted: 10/28/2024] [Indexed: 11/29/2024] Open
Abstract
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory, Europe's only intergovernmental life sciences organization. This overview summarizes the latest developments in services that EMBL-EBI data resources provide to scientific communities globally (https://www.ebi.ac.uk/services).
Collapse
Affiliation(s)
- Matthew Thakur
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Catherine Brooksbank
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Helen V Firth
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Cambridge University Hospitals NHS Foundation Trust, East Anglian Medical Genetics Service, Hills Road, Cambridge, CB2 0QQ, UK
| | - Julia Foreman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Mallory Freeberg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Kim T Gurwitz
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Melissa Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - David Hulcoop
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Andrew R. Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Mariia Levchenko
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Diana Marques
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Ellen M McDonagh
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Aziz Mithani
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Zinaida Perova
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Ugis Sarkans
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Santosh Tirunagari
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Eleni Tzampatzopoulou
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Aravind Venkatesan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Juan-Antonio Vizcaino
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Benjamin Wingfield
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Barbara Zdrazil
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Johanna McEntyre
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| |
Collapse
|
4
|
Ross KE, Bastian FB, Buys M, Cook CE, D’Eustachio P, Harrison M, Hermjakob H, Li D, Lord P, Natale DA, Peters B, Sternberg PW, Su AI, Thakur M, Thomas PD, Bateman A. Perspectives on tracking data reuse across biodata resources. BIOINFORMATICS ADVANCES 2024; 4:vbae057. [PMID: 38721398 PMCID: PMC11076920 DOI: 10.1093/bioadv/vbae057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/13/2024] [Accepted: 04/11/2024] [Indexed: 06/14/2024]
Abstract
Motivation Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge. Results The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources. Availability and implementation Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).
Collapse
Affiliation(s)
- Karen E Ross
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, United States
| | - Frederic B Bastian
- Evolutionary Bioinformatics Group, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | | | | | - Peter D’Eustachio
- Department of Biochemistry & Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10012, United States
| | - Melissa Harrison
- Literature Services, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Henning Hermjakob
- Molecular Systems, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Donghui Li
- Chan Zuckerberg Initiative, Redwood City, CA 94063, United States
| | - Phillip Lord
- School of Computing, Newcastle University, Newcastle upon Tyne NE4 5TG, United Kingdom
| | - Darren A Natale
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, United States
| | - Bjoern Peters
- Center for Vaccine Innovation, La Jolla Institute of Immunology, La Jolla, CA 92037, United States
| | - Paul W Sternberg
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, United States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Matthew Thakur
- Data Services, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90089, United States
| | - Alex Bateman
- MSCB, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| |
Collapse
|
5
|
Joseph SC, Eugin Simon S, Bohm MS, Kim M, Pye ME, Simmons BW, Graves DG, Thomas-Gooch SM, Tanveer UA, Holt JR, Ponnusamy S, Sipe LM, Hayes DN, Cook KL, Narayanan R, Pierre JF, Makowski L. FXR Agonism with Bile Acid Mimetic Reduces Pre-Clinical Triple-Negative Breast Cancer Burden. Cancers (Basel) 2024; 16:1368. [PMID: 38611046 PMCID: PMC11011133 DOI: 10.3390/cancers16071368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/20/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024] Open
Abstract
Bariatric surgery is associated with improved outcomes for several cancers, including breast cancer (BC), although the mechanisms mediating this protection are unknown. We hypothesized that elevated bile acid pools detected after bariatric surgery may be factors that contribute to improved BC outcomes. Patients with greater expression of the bile acid receptor FXR displayed improved survival in specific aggressive BC subtypes. FXR is a nuclear hormone receptor activated by primary bile acids. Therefore, we posited that activating FXR using an established FDA-approved agonist would induce anticancer effects. Using in vivo and in vitro approaches, we determined the anti-tumor potential of bile acid receptor agonism. Indeed, FXR agonism by the bile acid mimetic known commercially as Ocaliva ("OCA"), or Obeticholic acid (INT-747), significantly reduced BC progression and overall tumor burden in a pre-clinical model. The transcriptomic analysis of tumors in mice subjected to OCA treatment revealed differential gene expression patterns compared to vehicle controls. Notably, there was a significant down-regulation of the oncogenic transcription factor MAX (MYC-associated factor X), which interacts with the oncogene MYC. Gene set enrichment analysis (GSEA) further demonstrated a statistically significant downregulation of the Hallmark MYC-related gene set (MYC Target V1) following OCA treatment. In human and murine BC analyses in vitro, agonism of FXR significantly and dose-dependently inhibited proliferation, migration, and viability. In contrast, the synthetic agonism of another common bile acid receptor, the G protein-coupled bile acid receptor TGR5 (GPBAR1) which is mainly activated by secondary bile acids, failed to significantly alter cancer cell dynamics. In conclusion, agonism of FXR by primary bile acid memetic OCA yields potent anti-tumor effects potentially through inhibition of proliferation and migration and reduced cell viability. These findings suggest that FXR is a tumor suppressor gene with a high potential for use in personalized therapeutic strategies for individuals with BC.
Collapse
Affiliation(s)
- Sydney C. Joseph
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Samson Eugin Simon
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Margaret S. Bohm
- Department of Microbiology, Immunology and Biochemistry, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Minjeong Kim
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Madeline E. Pye
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Boston W. Simmons
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Dillon G. Graves
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Stacey M. Thomas-Gooch
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Ubaid A. Tanveer
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Jeremiah R. Holt
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Suriyan Ponnusamy
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Laura M. Sipe
- Department of Biological Sciences, University of Mary Washinton, Fredericksburg, VI 22401, USA
| | - D. Neil Hayes
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
- UTHSC Center for Cancer Research, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Katherine L. Cook
- Department of Cancer Biology, Wake Forest University School of Medicine, Winston Salem, NC 27157, USA;
| | - Ramesh Narayanan
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
- UTHSC Center for Cancer Research, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Joseph F. Pierre
- Department of Nutritional Sciences, College of Agricultural and Life Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Liza Makowski
- Department of Medicine, Division of Hematology and Oncology, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
- Department of Microbiology, Immunology and Biochemistry, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
- UTHSC Center for Cancer Research, College of Medicine, The University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
6
|
Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett S, de Veij M, Ioannidis H, Lopez DM, Mosquera J, Magarinos M, Bosc N, Arcila R, Kizilören T, Gaulton A, Bento A, Adasme M, Monecke P, Landrum G, Leach A. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res 2024; 52:D1180-D1192. [PMID: 37933841 PMCID: PMC10767899 DOI: 10.1093/nar/gkad1004] [Citation(s) in RCA: 266] [Impact Index Per Article: 266.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/08/2023] Open
Abstract
ChEMBL (https://www.ebi.ac.uk/chembl/) is a manually curated, high-quality, large-scale, open, FAIR and Global Core Biodata Resource of bioactive molecules with drug-like properties, previously described in the 2012, 2014, 2017 and 2019 Nucleic Acids Research Database Issues. Since its introduction in 2009, ChEMBL's content has changed dramatically in size and diversity of data types. Through incorporation of multiple new datasets from depositors since the 2019 update, ChEMBL now contains slightly more bioactivity data from deposited data vs data extracted from literature. In collaboration with the EUbOPEN consortium, chemical probe data is now regularly deposited into ChEMBL. Release 27 made curated data available for compounds screened for potential anti-SARS-CoV-2 activity from several large-scale drug repurposing screens. In addition, new patent bioactivity data have been added to the latest ChEMBL releases, and various new features have been incorporated, including a Natural Product likeness score, updated flags for Natural Products, a new flag for Chemical Probes, and the initial annotation of the action type for ∼270 000 bioactivity measurements.
Collapse
Affiliation(s)
- Barbara Zdrazil
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eloy Felix
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Fiona Hunter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Emma J Manners
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James Blackshaw
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sybilla Corbett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marleen de Veij
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Harris Ioannidis
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - David Mendez Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Juan F Mosquera
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Maria Paula Magarinos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nicolas Bosc
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ricardo Arcila
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Tevfik Kizilören
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - A Patrícia Bento
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Melissa F Adasme
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Peter Monecke
- Sanofi, R&D, Preclinical Safety, Industriepark Höchst, 65926 Frankfurt am Main, Germany
| | - Gregory A Landrum
- Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
7
|
Harrison PW, Amode MR, Austine-Orimoloye O, Azov A, Barba M, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji SK, Boddu S, Branco Lins PR, Brooks L, Ramaraju S, Campbell L, Martinez MC, Charkhchi M, Chougule K, Cockburn A, Davidson C, De Silva N, Dodiya K, Donaldson S, El Houdaigui B, Naboulsi T, Fatima R, Giron CG, Genez T, Grigoriadis D, Ghattaoraya G, Martinez JG, Gurbich T, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Lodha D, Marques-Coelho D, Maslen G, Merino G, Mirabueno L, Mushtaq A, Hossain S, Ogeh D, Sakthivel MP, Parker A, Perry M, Piližota I, Poppleton D, Prosovetskaia I, Raj S, Pérez-Silva J, Salam A, Saraf S, Saraiva-Agostinho N, Sheppard D, Sinha S, Sipos B, Sitnik V, Stark W, Steed E, Suner MM, Surapaneni L, Sutinen K, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh TA, Ware D, Wass E, Willhoft N, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley G, Keatley J, Loveland J, Moore B, Mudge J, Naamati G, Tate J, Trevanion S, Winterbottom A, Frankish A, Hunt SE, Cunningham F, Dyer S, Finn R, Martin F, Yates A. Ensembl 2024. Nucleic Acids Res 2024; 52:D891-D899. [PMID: 37953337 PMCID: PMC10767893 DOI: 10.1093/nar/gkad1049] [Citation(s) in RCA: 184] [Impact Index Per Article: 184.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/24/2023] [Indexed: 11/14/2023] Open
Abstract
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Collapse
Affiliation(s)
- Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthieu Barba
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Arne Becker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Simarpreet Kaur Bhurji
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Paulo R Branco Lins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lucy Brooks
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shashank Budhanuru Ramaraju
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lahcen I Campbell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manuel Carbajo Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
| | - Alexander Cockburn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nishadi H De Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tamara El Naboulsi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thiago Genez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dionysios Grigoriadis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gurpreet S Ghattaoraya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tatiana A Gurbich
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vinay Kaykala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tuan Le
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Disha Lodha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diego Marques-Coelho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gabriela Alejandra Merino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Louisse Paola Mirabueno
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Aleena Mushtaq
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Syed Nakib Hossain
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Denye N Ogeh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manoj Pandian Sakthivel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Malcolm Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Daniel Poppleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - José G Pérez-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ahamed Imran Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shradha Saraf
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Swati Sinha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Botond Sipos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vasily Sitnik
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - William Stark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kyösti Sutinen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - David Urbina-Gómez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andres Veidenberg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thomas A Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Natalie L Willhoft
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Garth R Ilsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jon Keatley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
8
|
Yuan D, Ahamed A, Burgin J, Cummins C, Devraj R, Gueye K, Gupta D, Gupta V, Haseeb M, Ihsan M, Ivanov E, Jayathilaka S, Kadhirvelu VB, Kumar M, Lathi A, Leinonen R, McKinnon J, Meszaros L, O’Cathail C, Ouma D, Paupério J, Pesant S, Rahman N, Rinck G, Selvakumar S, Suman S, Sunthornyotin Y, Ventouratou M, Vijayaraja S, Waheed Z, Woollard P, Zyoud A, Burdett T, Cochrane G. The European Nucleotide Archive in 2023. Nucleic Acids Res 2024; 52:D92-D97. [PMID: 37956313 PMCID: PMC10767888 DOI: 10.1093/nar/gkad1067] [Citation(s) in RCA: 38] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 10/23/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) is maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). The ENA is one of the three members of the International Nucleotide Sequence Database Collaboration (INSDC). It serves the bioinformatics community worldwide via the submission, processing, archiving and dissemination of sequence data. The ENA supports data types ranging from raw reads, through alignments and assemblies to functional annotation. The data is enriched with contextual information relating to samples and experimental configurations. In this article, we describe recent progress and improvements to ENA services. In particular, we focus upon three areas of work in 2023: FAIRness of ENA data, pandemic preparedness and foundational technology. For FAIRness, we have introduced minimal requirements for spatiotemporal annotation, created a metadata-based classification system, incorporated third party metadata curations with archived records, and developed a new rapid visualisation platform, the ENA Notebooks. For foundational enhancements, we have improved the INSDC data exchange and synchronisation pipelines, and invested in site reliability engineering for ENA infrastructure. In order to support genomic surveillance efforts, we have continued to provide ENA services in support of SARS-CoV-2 data mobilisation and have adapted these for broader pathogen surveillance efforts.
Collapse
Affiliation(s)
- David Yuan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alisha Ahamed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Josephine Burgin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rajkumar Devraj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Khadim Gueye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vikas Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Muhammad Haseeb
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Maira Ihsan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eugene Ivanov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suran Jayathilaka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Manish Kumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ankur Lathi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rasko Leinonen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jasmine McKinnon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lili Meszaros
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Colman O’Cathail
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dennis Ouma
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Joana Paupério
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephane Pesant
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nadim Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gabriele Rinck
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sandeep Selvakumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Swati Suman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yanisa Sunthornyotin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marianna Ventouratou
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Senthilnathan Vijayaraja
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahra Waheed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Woollard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ahmad Zyoud
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
9
|
Imker HJ, Schackart KE, Istrate AM, Cook CE. A machine learning-enabled open biodata resource inventory from the scientific literature. PLoS One 2023; 18:e0294812. [PMID: 38015968 PMCID: PMC10684096 DOI: 10.1371/journal.pone.0294812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 11/07/2023] [Indexed: 11/30/2023] Open
Abstract
Modern biological research depends on data resources. These resources archive difficult-to-reproduce data and provide added-value aggregation, curation, and analyses. Collectively, they constitute a global infrastructure of biodata resources. While the organic proliferation of biodata resources has enabled incredible research, sustained support for the individual resources that make up this distributed infrastructure is a challenge. The Global Biodata Coalition (GBC) was established by research funders in part to aid in developing sustainable funding strategies for biodata resources. An important component of this work is understanding the scope of the resource infrastructure; how many biodata resources there are, where they are, and how they are supported. Existing registries require self-registration and/or extensive curation, and we sought to develop a method for assembling a global inventory of biodata resources that could be periodically updated with minimal human intervention. The approach we developed identifies biodata resources using open data from the scientific literature. Specifically, we used a machine learning-enabled natural language processing approach to identify biodata resources from titles and abstracts of life sciences publications contained in Europe PMC. Pretrained BERT (Bidirectional Encoder Representations from Transformers) models were fine-tuned to classify publications as describing a biodata resource or not and to predict the resource name using named entity recognition. To improve the quality of the resulting inventory, low-confidence predictions and potential duplicates were manually reviewed. Further information about the resources were then obtained using article metadata, such as funder and geolocation information. These efforts yielded an inventory of 3112 unique biodata resources based on articles published from 2011-2021. The code was developed to facilitate reuse and includes automated pipelines. All products of this effort are released under permissive licensing, including the biodata resource inventory itself (CC0) and all associated code (BSD/MIT).
Collapse
Affiliation(s)
- Heidi J. Imker
- Global Biodata Coalition, Strasbourg, France
- University Library, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Kenneth E. Schackart
- Global Biodata Coalition, Strasbourg, France
- Department of Biosystems Engineering, The University of Arizona, Tucson, Arizona, United States of America
| | - Ana-Maria Istrate
- Chan Zuckerberg Initiative, Redwood City, California, United States of America
| | | |
Collapse
|
10
|
Xu F, Juty N, Goble C, Jupp S, Parkinson H, Courtot M. Features of a FAIR vocabulary. J Biomed Semantics 2023; 14:6. [PMID: 37264430 DOI: 10.1186/s13326-023-00286-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 04/27/2023] [Indexed: 06/03/2023] Open
Abstract
BACKGROUND The Findable, Accessible, Interoperable and Reusable(FAIR) Principles explicitly require the use of FAIR vocabularies, but what precisely constitutes a FAIR vocabulary remains unclear. Being able to define FAIR vocabularies, identify features of FAIR vocabularies, and provide assessment approaches against the features can guide the development of vocabularies. RESULTS We differentiate data, data resources and vocabularies used for FAIR, examine the application of the FAIR Principles to vocabularies, align their requirements with the Open Biomedical Ontologies principles, and propose FAIR Vocabulary Features. We also design assessment approaches for FAIR vocabularies by mapping the FVFs with existing FAIR assessment indicators. Finally, we demonstrate how they can be used for evaluating and improving vocabularies using exemplary biomedical vocabularies. CONCLUSIONS Our work proposes features of FAIR vocabularies and corresponding indicators for assessing the FAIR levels of different types of vocabularies, identifies use cases for vocabulary engineers, and guides the evolution of vocabularies.
Collapse
Affiliation(s)
- Fuqi Xu
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK
| | - Nick Juty
- The University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Carole Goble
- The University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Simon Jupp
- SciBite BioData Innovation Centre, Wellcome Genome Campus, Hinxton, CB10 1DR, UK
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK
| | - Mélanie Courtot
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK.
- Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, M5G 0A3, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada.
| |
Collapse
|
11
|
Gago F. Computational Approaches to Enzyme Inhibition by Marine Natural Products in the Search for New Drugs. Mar Drugs 2023; 21:100. [PMID: 36827141 PMCID: PMC9961086 DOI: 10.3390/md21020100] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 01/26/2023] [Accepted: 01/28/2023] [Indexed: 02/03/2023] Open
Abstract
The exploration of biologically relevant chemical space for the discovery of small bioactive molecules present in marine organisms has led not only to important advances in certain therapeutic areas, but also to a better understanding of many life processes. The still largely untapped reservoir of countless metabolites that play biological roles in marine invertebrates and microorganisms opens new avenues and poses new challenges for research. Computational technologies provide the means to (i) organize chemical and biological information in easily searchable and hyperlinked databases and knowledgebases; (ii) carry out cheminformatic analyses on natural products; (iii) mine microbial genomes for known and cryptic biosynthetic pathways; (iv) explore global networks that connect active compounds to their targets (often including enzymes); (v) solve structures of ligands, targets, and their respective complexes using X-ray crystallography and NMR techniques, thus enabling virtual screening and structure-based drug design; and (vi) build molecular models to simulate ligand binding and understand mechanisms of action in atomic detail. Marine natural products are viewed today not only as potential drugs, but also as an invaluable source of chemical inspiration for the development of novel chemotypes to be used in chemical biology and medicinal chemistry research.
Collapse
Affiliation(s)
- Federico Gago
- Department of Biomedical Sciences & IQM-CSIC Associate Unit, School of Medicine and Health Sciences, University of Alcalá, E-28805 Madrid, Alcalá de Henares, Spain
| |
Collapse
|
12
|
Hooft RW, Harrison E, Martin CS. The road to success: drawing parallels between 'road' and 'research data' infrastructures to foster understanding between service providers, funders and policymakers. F1000Res 2023; 12:ELIXIR-88. [PMID: 37065508 PMCID: PMC10102711 DOI: 10.12688/f1000research.128167.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/13/2022] [Indexed: 01/24/2023] Open
Abstract
Background: The work of data research infrastructure operators is poorly understood, yet the services they provide are used by millions of scientists across the planet. Policy and implications: As the data services and the underlying infrastructure are typically funded through the public purse, it is essential that policymakers, research funders, experts reviewing funding proposals, and possibly even end-users are equipped with a good understanding of the daily tasks of service providers. Recommendations: We suggest drawing parallels between research data infrastructure and road infrastructure. To trigger the imagination and foster understanding, this policy brief contains a table of corresponding aspects of the two classes of infrastructure. Conclusions: Just as economists and specialist evaluators are typically brought in to inform policies and funding decisions for road infrastructure, we encourage this to also be done for research infrastructures.
Collapse
Affiliation(s)
- Rob W.W. Hooft
- Dutch Techcentre for Life Sciences, Utrecht, 3521 AL, The Netherlands
| | | | | |
Collapse
|
13
|
Hooft RW, Harrison E, Martin CS. The road to success: drawing parallels between 'road' and 'research data' infrastructures to foster understanding between service providers, funders and policymakers. F1000Res 2023; 12:ELIXIR-88. [PMID: 37065508 PMCID: PMC10102711 DOI: 10.12688/f1000research.128167.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/18/2023] [Indexed: 08/25/2023] Open
Abstract
Background: The work of data research infrastructure operators is poorly understood, yet the services they provide are used by millions of scientists across the planet. Policy and implications: As the data services and the underlying infrastructure are typically funded through the public purse, it is essential that policymakers, research funders, experts reviewing funding proposals, and possibly even end-users are equipped with a good understanding of the daily tasks of service providers. Recommendations: We suggest drawing parallels between research data infrastructure and road infrastructure. To trigger the imagination and foster understanding, this policy brief contains a table of corresponding aspects of the two classes of infrastructure, and a table of policy implications. Conclusions: Just as economists and specialist evaluators are typically brought in to inform policies and funding decisions for road infrastructure, we encourage this to also be done for research infrastructures.
Collapse
Affiliation(s)
- Rob W.W. Hooft
- Dutch Techcentre for Life Sciences, Utrecht, 3521 AL, The Netherlands
| | | | | |
Collapse
|
14
|
Fahlgren N, Kapoor M, Yordanova G, Papatheodorou I, Waese J, Cole B, Harrison P, Ware D, Tickle T, Paten B, Burdett T, Elsik CG, Tuggle CK, Provart NJ. Toward a data infrastructure for the Plant Cell Atlas. PLANT PHYSIOLOGY 2023; 191:35-46. [PMID: 36200899 PMCID: PMC9806565 DOI: 10.1093/plphys/kiac468] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.
Collapse
Affiliation(s)
- Noah Fahlgren
- Donald Danforth Plant Science Center, Saint Louis, Missouri 63132, USA
| | - Muskan Kapoor
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Jamie Waese
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, 1, Cyclotron Road, Berkeley, California 94720, USA
| | - Peter Harrison
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Timothy Tickle
- Data Sciences Platform, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, Massachusetts 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Baskin School of Engineering, 1156 High Street, Santa Cruz, California 95064, USA
| | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine G Elsik
- Division of Animal Sciences/Division of Plant Science & Technology/Institute for Data Science & Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Christopher K Tuggle
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Nicholas J Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| |
Collapse
|
15
|
Pereira A, Almeida JR, Lopes RP, Oliveira JL. Querying semantic catalogues of biomedical databases. J Biomed Inform 2023; 137:104272. [PMID: 36563828 DOI: 10.1016/j.jbi.2022.104272] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 11/03/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Secondary use of health data is a valuable source of knowledge that boosts observational studies, leading to important discoveries in the medical and biomedical sciences. The fundamental guiding principle for performing a successful observational study is the research question and the approach in advance of executing a study. However, in multi-centre studies, finding suitable datasets to support the study is challenging, time-consuming, and sometimes impossible without a deep understanding of each dataset. METHODS We propose a strategy for retrieving biomedical datasets of interest that were semantically annotated, using an interface built by applying a methodology for transforming natural language questions into formal language queries. The advantages of creating biomedical semantic data are enhanced by using natural language interfaces to issue complex queries without manipulating a logical query language. RESULTS Our methodology was validated using Alzheimer's disease datasets published in a European platform for sharing and reusing biomedical data. We converted data to semantic information format using biomedical ontologies in everyday use in the biomedical community and published it as a FAIR endpoint. We have considered natural language questions of three types: single-concept questions, questions with exclusion criteria, and multi-concept questions. Finally, we analysed the performance of the question-answering module we used and its limitations. The source code is publicly available at https://bioinformatics-ua.github.io/BioKBQA/. CONCLUSION We propose a strategy for using information extracted from biomedical data and transformed into a semantic format using open biomedical ontologies. Our method uses natural language to formulate questions to be answered by this semantic data without the direct use of formal query languages.
Collapse
Affiliation(s)
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Rui Pedro Lopes
- CeDRI, Polytechnic Institute of Bragança, Bragança, Portugal.
| | | |
Collapse
|
16
|
Karp PD. Reviewing knowledgebase and database grant proposals in the life sciences: the role of innovation. Database (Oxford) 2022; 2022:6909819. [PMID: 36520791 PMCID: PMC9753974 DOI: 10.1093/database/baac106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/11/2022] [Accepted: 12/10/2022] [Indexed: 12/23/2022]
Abstract
This article offers thoughts on reviewing grant proposals for biological knowledgebases and databases (KDs) in the hope of aiding grant reviewers and applicants in addressing the issue of innovation. Assessing such grant proposals involves a number of subtleties that are worthy of discussion, particularly for new reviewers and applicants. In part, this article is motivated by the release of two funding opportunity announcements by the US National Institutes of Health concerning KDs. We find that the amount of innovation required for different KD projects can vary significantly, particularly depending on where in its life cycle a given project is. Strong innovation is not necessarily required to have an impactful KD project. For example, PubMed has low innovation but high impact. The importance of innovation should be weighted differently for different KD projects depending on the challenges they face and their maturity. The score for the overall impact of a grant proposal might have little dependence on the innovation score, such as for a mature project that is already delivering strong impact.
Collapse
Affiliation(s)
- Peter D Karp
- *Corresponding author: Tel: +650-859-4358; Fax: +650-859-3735;
| |
Collapse
|
17
|
Laufs D, Peters M, Schultz C. Data platforms for open life sciences-A systematic analysis of management instruments. PLoS One 2022; 17:e0276204. [PMID: 36282849 PMCID: PMC9595524 DOI: 10.1371/journal.pone.0276204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 10/02/2022] [Indexed: 11/05/2022] Open
Abstract
Open data platforms are interfaces between data demand of and supply from their users. Yet, data platform providers frequently struggle to aggregate data to suit their users' needs and to establish a high intensity of data exchange in a collaborative environment. Here, using open life science data platforms as an example for a diverse data structure, we systematically categorize these platforms based on their technology intermediation and the range of domains they cover to derive general and specific success factors for their management instruments. Our qualitative content analysis is based on 39 in-depth interviews with experts employed by data platforms and external stakeholders. We thus complement peer initiatives which focus solely on data quality, by additionally highlighting the data platforms' role to enable data utilization for innovative output. Based on our analysis, we propose a clearly structured and detailed guideline for seven management instruments. This guideline helps to establish and operationalize data platforms and to best exploit the data provided. Our findings support further exploitation of the open innovation potential in the life sciences and beyond.
Collapse
Affiliation(s)
- Daniel Laufs
- Technology Management Research Group, Faculty of Business, Economics and Social Sciences, Kiel University, Kiel, SH, Germany
| | - Mareike Peters
- Technology Management Research Group, Faculty of Business, Economics and Social Sciences, Kiel University, Kiel, SH, Germany
| | - Carsten Schultz
- Technology Management Research Group, Faculty of Business, Economics and Social Sciences, Kiel University, Kiel, SH, Germany
| |
Collapse
|
18
|
BRIDE v2: A Validated Collection of Genes Involved in the Mammalian Brain Response to Low-Dose Ionizing Radiation. RADIATION 2022. [DOI: 10.3390/radiation2040024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
There is significant interest in the response of the mammalian brain to low-dose ionizing radiation (LDIR), mainly examined by gene or protein expression, with applications in radiation safety on Earth, the atmosphere and outer space. Potential associations of molecular-level responses with sensory or cognitive defects and neurodegenerative diseases are currently under investigation. Previously, we have described a light-weight approach for the storage, analysis and distribution of relevant datasets, with the platform BRIDE. We have re-implemented the platform as BRIDE v2 on the cloud, using the bioinformatics infrastructure ELIXIR. We connected the annotated list of 3174 unique gene records with modern omics resources for downstream computational analysis. BRIDE v2 is a cloud-based platform with capabilities that enable researchers to extract, analyze, visualize as well as export the gene collection. The resource is freely available online at <http://bride-db.eu>.
Collapse
|
19
|
Mini-review: Recent advances in post-translational modification site prediction based on deep learning. Comput Struct Biotechnol J 2022; 20:3522-3532. [PMID: 35860402 PMCID: PMC9284371 DOI: 10.1016/j.csbj.2022.06.045] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 11/23/2022] Open
Abstract
Post-translational modifications (PTMs) are closely linked to numerous diseases, playing a significant role in regulating protein structures, activities, and functions. Therefore, the identification of PTMs is crucial for understanding the mechanisms of cell biology and diseases therapy. Compared to traditional machine learning methods, the deep learning approaches for PTM prediction provide accurate and rapid screening, guiding the downstream wet experiments to leverage the screened information for focused studies. In this paper, we reviewed the recent works in deep learning to identify phosphorylation, acetylation, ubiquitination, and other PTM types. In addition, we summarized PTM databases and discussed future directions with critical insights.
Collapse
Key Words
- AAindex, Amino acid index
- ATP, Adenosine triphosphate
- AUC, Area under curve
- Ac, Acetylation
- BE, Binary encoding
- BLOSUM, Blocks substitution matrix
- Bi-LSTM, Bidirectional LSTM
- CKSAAP, Composition of k-spaced amino acid Pairs
- CNN, Convolutional neural network
- CNNOH, CNN with the one-hot encoding
- CNNWE, CNN with the word-embedding encoding
- CNNrgb, CNN red green blue
- CV, Cross-validation
- DC-CNN, Densely connected convolutional neural network
- DL, Deep learning
- DNNs, Deep neural networks
- Deep learning
- E. coli, Escherichia coli
- EBGW, Encoding based on grouped weight
- EGAAC, Enhanced grouped amino acids content
- IG, Information gain
- K, Lysine
- KNN, k nearest neighbor
- LASSO, Least absolute shrinkage and selection operator
- LSTM, Long short-term memory
- LSTMWE, LSTM with the word-embedding encoding
- M.musculus, Mus musculus
- MDC, Modular densely connected convolutional networks
- MDCAN, Multilane dense convolutional attention network
- ML, Machine learning
- MLP, Multilayer perceptron
- MMI, Multivariate mutual information
- Machine learning
- Mass spectrometry
- NMBroto, Normalized Moreau-Broto autocorrelation
- P, Proline
- PSP, PhosphoSitePlus
- PSSM, Position-specific scoring matrix
- PTM, Post-translational modifications
- Ph, Phosphorylation
- Post-translational modification
- Prediction
- PseAAC, Pseudo-amino acid composition
- R, Arginine
- RF, Random forest
- RNN, Recurrent neural network
- ROC, Receiver operating characteristic
- S, Serine
- S. typhimurium, Salmonella typhimurium
- S.cerevisiae, Saccharomyces cerevisiae
- SE, Squeeze and excitation
- SEV, Split to Equal Validation
- ST, Source and target
- SUMO, Small ubiquitin-like modifier
- SVM, Support vector machines
- T, Threonine
- Ub, Ubiquitination
- Y, Tyrosine
- ZSL, Zero-shot learning
Collapse
|
20
|
Gabella C, Duvaud S, Durinx C. Managing the life cycle of a portfolio of open data resources at the SIB Swiss Institute of Bioinformatics. Brief Bioinform 2022; 23:bbab478. [PMID: 34850820 PMCID: PMC8769900 DOI: 10.1093/bib/bbab478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 11/23/2022] Open
Abstract
Data resources are essential for the long-term preservation of scientific data and the reproducibility of science. The SIB Swiss Institute of Bioinformatics provides the life science community with a portfolio of openly accessible, high-quality databases and software platforms, which vary from expert-curated knowledgebases, such as UniProtKB/Swiss-Prot (part of the UniProt consortium) and STRING, to online platforms such as SWISS-MODEL and SwissDrugDesign. SIB's mission is to ensure that these resources are available in the long term, as long as their return on investment and their scientific impact are high. To this end, SIB provides its resources, in addition to stable financial support, with a range of high-quality, innovative services that are, to our knowledge, unique in the field. Through this first-class management framework with central services, such as user-centric consulting activities, legal support, open-science guidance, knowledge sharing and training efforts, SIB supports the promotion of excellence in resource development and operation. This review presents the ecosystem of data resources at SIB; the process used for the identification, evaluation and development of resources; and the support activities that SIB provides. A set of indicators has been put in place to select the resources and establish quality standards, reflecting their multifaceted nature and complexity. Through this paper, the reader will discover how SIB's leading tools and databases are fostered by the institute, leading them to be best-in-class resources able to tackle the burning matters that society faces from disease outbreaks and cancer to biodiversity and open science.
Collapse
Affiliation(s)
- Chiara Gabella
- Severine DuvaudSIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Severine Duvaud
- Severine DuvaudSIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Christine Durinx
- Severine DuvaudSIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| |
Collapse
|
21
|
Bansal P, Morgat A, Axelsen KB, Muthukrishnan V, Coudert E, Aimo L, Hyka-Nouspikel N, Gasteiger E, Kerhornou A, Neto TB, Pozzato M, Blatter MC, Ignatchenko A, Redaschi N, Bridge A. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res 2022; 50:D693-D700. [PMID: 34755880 PMCID: PMC8728268 DOI: 10.1093/nar/gkab1016] [Citation(s) in RCA: 104] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/08/2021] [Accepted: 11/09/2021] [Indexed: 12/15/2022] Open
Abstract
Rhea (https://www.rhea-db.org) is an expert-curated knowledgebase of biochemical reactions based on the chemical ontology ChEBI (Chemical Entities of Biological Interest) (https://www.ebi.ac.uk/chebi). In this paper, we describe a number of key developments in Rhea since our last report in the database issue of Nucleic Acids Research in 2019. These include improved reaction coverage in Rhea, the adoption of Rhea as the reference vocabulary for enzyme annotation in the UniProt knowledgebase UniProtKB (https://www.uniprot.org), the development of a new Rhea website, and the designation of Rhea as an ELIXIR Core Data Resource. We hope that these and other developments will enhance the utility of Rhea as a reference resource to study and engineer enzymes and the metabolic systems in which they function.
Collapse
Affiliation(s)
- Parit Bansal
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Kristian B Axelsen
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Venkatesh Muthukrishnan
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Elisabeth Coudert
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Lucila Aimo
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Nevila Hyka-Nouspikel
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Elisabeth Gasteiger
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Arnaud Kerhornou
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Teresa Batista Neto
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Monica Pozzato
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Marie-Claude Blatter
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Alex Ignatchenko
- EMBL-EBI European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicole Redaschi
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Alan Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| |
Collapse
|
22
|
Perez-Riverol Y, Bai J, Bandla C, García-Seisdedos D, Hewapathirana S, Kamatchinathan S, Kundu D, Prakash A, Frericks-Zipper A, Eisenacher M, Walzer M, Wang S, Brazma A, Vizcaíno J. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res 2022; 50:D543-D552. [PMID: 34723319 PMCID: PMC8728295 DOI: 10.1093/nar/gkab1038] [Citation(s) in RCA: 3952] [Impact Index Per Article: 1317.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 12/12/2022] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David García-Seisdedos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anika Frericks-Zipper
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
23
|
Zadissa A, Apweiler R. Data Mining, Quality and Management in the Life Sciences. Methods Mol Biol 2022; 2449:3-25. [PMID: 35507257 DOI: 10.1007/978-1-0716-2095-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
With the evermore emphasis put on open science and its invaluable benefits to the scientific community, it is no longer the case where a research project simply ends with a scientific publication. The benefits of data sharing and reproducibility of results have taken the centerpiece within the life science research supported by FAIR principles that firmly underline the importance of open data. The current data-intensive multidisciplinary research has also highlighted the significance of how data is mined and managed. Here we describe some of the features adopted by EMBL-EBI data resources to support data mining, data quality, and data management. We also highlight how EMBL-EBI has responded to the current pandemic through its data resources.
Collapse
Affiliation(s)
- Amonida Zadissa
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.
| | - Rolf Apweiler
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| |
Collapse
|
24
|
Porras P, Orchard S, Licata L. IMEx Databases: Displaying Molecular Interactions into a Single, Standards-Compliant Dataset. Methods Mol Biol 2022; 2449:27-42. [PMID: 35507258 DOI: 10.1007/978-1-0716-2095-3_2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Molecular interaction databases aim to systematically capture and organize the experimental interaction information described in the scientific literature. These data can then be used to perform network analysis, to assign putative roles to uncharacterized proteins and to investigate their involvement in cellular pathways.This chapter gives a brief overview of publicly available molecular interaction databases and focuses on the members of the IMEx Consortium, on their curation policies and standard data formats. All of the goals achieved by IMEx databases over the last 15 years, the data types provided and the many different ways in which such data can be utilized by the research community, are described in detail. The IMEx databases curate molecular interaction data to the highest caliber, following a detailed curation model and supplying rich metadata by employing common curation rules and harmonized standards. The IMEx Consortium provides comprehensively annotated molecular interaction data integrated into a single, non-redundant, open access dataset.
Collapse
Affiliation(s)
- Pablo Porras
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Luana Licata
- Department of Biology, University of Rome Tor Vergata, Rome, Italy.
| |
Collapse
|
25
|
Chatterjee A, Swierstra T, Kuiper M. Dealing with different conceptions of pollution in the Gene Regulation Knowledge Commons. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2022; 1865:194779. [PMID: 34971789 DOI: 10.1016/j.bbagrm.2021.194779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 11/28/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
Current research of gene regulatory mechanisms is increasingly dependent on the availability of high-quality information from manually curated databases. Biocurators undertake the task of extracting knowledge claims from scholarly publications, organizing these claims in a meaningful format and making them computable. In doing so, they enhance the value of existing scientific knowledge by making it accessible to the users of their databases. In this capacity, biocurators are well positioned to identify and weed out information that is of insufficient quality. The criteria that define information quality are typically outlined in curation guidelines developed by biocurators. These guidelines have been prudently developed to reflect the needs of the user community the database caters to. The guidelines depict the standard evidence that this community recognizes as sufficient justification for trustworthy data. Additionally, these guidelines determine the process by which data should be organized and maintained to be valuable to users. Following these guidelines, biocurators assess the quality, reliability, and validity of the information they encounter. In this article we explore to what extent different use cases agree with the inclusion criteria that define positive and negative data, implemented by the database. What are the drawbacks to users who have queries that would be well served by results that fall just short of the criteria used by a database? Finally, how can databases (and biocurators) accommodate the needs of such more explorative use cases?
Collapse
Affiliation(s)
- Anamika Chatterjee
- Department of Philosophy and Religious Studies, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.
| | - Tsjalling Swierstra
- Department of Philosophy, Maastricht University, Maastricht, the Netherlands
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| |
Collapse
|
26
|
Ming J, Sana SRGL, Deng X. Identification of copper-related biomarkers and potential molecule mechanism in diabetic nephropathy. Front Endocrinol (Lausanne) 2022; 13:978601. [PMID: 36329882 PMCID: PMC9623046 DOI: 10.3389/fendo.2022.978601] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 10/05/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Diabetic nephropathy (DN) is a chronic microvascular complication in patients with diabetes mellitus, which is the leading cause of end-stage renal disease. However, the role of copper-related genes (CRGs) in DN development remains unclear. MATERIALS AND METHODS CRGs were acquired from the GeneCards and NCBI databases. Based on the GSE96804 and GSE111154 datasets from the GEO repository, we identified hub CRGs for DN progression by taking the intersection of differentially expressed CRGs (DECRGs) and genes in the key module from Weighted Gene Co-expression Network Analysis. The Maximal Clique Centrality algorithm was used to identify the key CRGs from hub CRGs. Transcriptional factors (TFs) and microRNAs (miRNAs) targeting hub CRGs were acquired from publicly available databases. The CIBERSORT algorithm was used to perform comparative immune cell infiltration analysis between normal and DN samples. RESULTS Eighty-two DECRGs were identified between normal and DN samples, as were 10 hub CRGs, namely PTGS2, DUSP1, JUN, FOS, S100A8, S100A12, NAIP, CLEC4E, CXCR1, and CXCR2. Thirty-nine TFs and 165 miRNAs potentially targeted these 10 hub CRGs. PTGS2 was identified as the key CRG and FOS as the most significant gene among all of DECRGs. RELA was identified as the hub TF interacting with PTGS2 by taking the intersection of potential TFs from the ChEA and JASPAR public databases. let-7b-5p was identified as the hub miRNA targeting PTGS2 by taking the intersection of miRNAs from the miRwalk, RNA22, RNAInter, TargetMiner, miRTarBase, and ENCORI databases. Similarly, CREB1, E2F1, and RELA were revealed as hub TFs for FOS, and miR-338-3p as the hub miRNA. Finally, compared with those in healthy samples, there are more infiltrating memory B cells, M1 macrophages, M2 macrophages, and resting mast cells and fewer infiltrating activated mast cells and neutrophils in DN samples (all p< 0.05). CONCLUSION The 10 identified hub copper-related genes provide insight into the mechanisms of DN development. It is beneficial to examine and understand the interaction between hub CRGs and potential regulatory molecules in DN. This knowledge may provide a novel theoretical foundation for the development of diagnostic biomarkers and copper-related therapy targets in DN.
Collapse
Affiliation(s)
- Jie Ming
- Department of Urology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Si Ri Gu Leng Sana
- Department of Anaesthesiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
- *Correspondence: Si Ri Gu Leng Sana,
| | - Xijin Deng
- Department of Anaesthesiology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
27
|
BEHZADI PAYAM, GAJDÁCS MÁRIÓ. Worldwide Protein Data Bank (wwPDB): A virtual treasure for research in biotechnology. Eur J Microbiol Immunol (Bp) 2021; 11:77-86. [PMID: 34908533 PMCID: PMC8830413 DOI: 10.1556/1886.2021.00020] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 11/23/2021] [Indexed: 12/25/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RSCB PDB) provides a wide range of digital data regarding biology and biomedicine. This huge internet resource involves a wide range of important biological data, obtained from experiments around the globe by different scientists. The Worldwide Protein Data Bank (wwPDB) represents a brilliant collection of 3D structure data associated with important and vital biomolecules including nucleic acids (RNAs and DNAs) and proteins. Moreover, this database accumulates knowledge regarding function and evolution of biomacromolecules which supports different disciplines such as biotechnology. 3D structure, functional characteristics and phylogenetic properties of biomacromolecules give a deep understanding of the biomolecules' characteristics. An important advantage of the wwPDB database is the data updating time, which is done every week. This updating process helps users to have the newest data and information for their projects. The data and information in wwPDB can be a great support to have an accurate imagination and illustrations of the biomacromolecules in biotechnology. As demonstrated by the SARS-CoV-2 pandemic, rapidly reliable and accessible biological data for microbiology, immunology, vaccinology, and drug development are critical to address many healthcare-related challenges that are facing humanity. The aim of this paper is to introduce the readers to wwPDB, and to highlight the importance of this database in biotechnology, with the expectation that the number of scientists interested in the utilization of Protein Data Bank's resources will increase substantially in the coming years.
Collapse
Affiliation(s)
- PAYAM BEHZADI
- Department of Microbiology, College of Basic Sciences, Shahr-e-Qods Branch, Islamic Azad University, Tehran, 37541-374, Iran
| | - MÁRIÓ GAJDÁCS
- Department of Oral Biology and Experimental Dental Research, Faculty of Dentistry, University of Szeged, 6720, Szeged, Hungary,*Corresponding author. Tel.: +36-62-342-532. E-mail:
| |
Collapse
|
28
|
Del Toro N, Shrivastava A, Ragueneau E, Meldal B, Combe C, Barrera E, Perfetto L, How K, Ratan P, Shirodkar G, Lu O, Mészáros B, Watkins X, Pundir S, Licata L, Iannuccelli M, Pellegrini M, Martin MJ, Panni S, Duesbury M, Vallet SD, Rappsilber J, Ricard-Blum S, Cesareni G, Salwinski L, Orchard S, Porras P, Panneerselvam K, Hermjakob H. The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res 2021; 50:D648-D653. [PMID: 34761267 PMCID: PMC8728211 DOI: 10.1093/nar/gkab1006] [Citation(s) in RCA: 156] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 10/06/2021] [Accepted: 10/21/2021] [Indexed: 01/18/2023] Open
Abstract
The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way.
Collapse
Affiliation(s)
- Noemi Del Toro
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anjali Shrivastava
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eliot Ragueneau
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Birgit Meldal
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Colin Combe
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Elisabet Barrera
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Livia Perfetto
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK.,Fondazione Human Technopole, Milan 20157, Italy
| | - Karyn How
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Prashansa Ratan
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Gautam Shirodkar
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Odilia Lu
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Bálint Mészáros
- Gibson Group, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Xavier Watkins
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sangya Pundir
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Luana Licata
- Bioinformatics and Computational Biology Unit, Dept. of Molecular Biology, University of Rome Tor Vergata, Rome, Italy
| | - Marta Iannuccelli
- Bioinformatics and Computational Biology Unit, Dept. of Molecular Biology, University of Rome Tor Vergata, Rome, Italy
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA 90095, USA
| | - Maria Jesus Martin
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Simona Panni
- Dipartimento di Biologia, Ecologia e Scienze della Terra, Università della Calabria, Rende, Italy
| | - Margaret Duesbury
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK.,UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Sylvain D Vallet
- ICBMS UMR CNRS 5246, University Lyon 1, Lyon, Villeurbanne 69622, France
| | - Juri Rappsilber
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK.,Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin 13355, Germany
| | - Sylvie Ricard-Blum
- ICBMS UMR CNRS 5246, University Lyon 1, Lyon, Villeurbanne 69622, France
| | - Gianni Cesareni
- Bioinformatics and Computational Biology Unit, Dept. of Molecular Biology, University of Rome Tor Vergata, Rome, Italy
| | - Lukasz Salwinski
- UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Pablo Porras
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Kalpana Panneerselvam
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Henning Hermjakob
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
29
|
Young RG, Gill R, Gillis D, Hanner RH. Molecular Acquisition, Cleaning and Evaluation in R (MACER) - A tool to assemble molecular marker datasets from BOLD and GenBank. Biodivers Data J 2021; 9:e71378. [PMID: 34594153 PMCID: PMC8443542 DOI: 10.3897/bdj.9.e71378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Accepted: 08/25/2021] [Indexed: 11/20/2022] Open
Abstract
Molecular sequence data is an essential component for many biological fields of study. The strength of these data is in their ability to be centralised and compared across research studies. There are many online repositories for molecular sequence data, some of which are very large accumulations of varying data types like NCBI’s GenBank. Due to the size and the complexity of the data in these repositories, challenges arise in searching for data of interest. While data repositories exist for molecular markers, taxa and other specific research interests, repositories may not contain, or be suitable for, more specific applications. Manually accessing, searching, downloading, accumulating, dereplicating and cleaning data to construct project-specific datasets is time-consuming. In addition, the manual assembly of datasets presents challenges with reproducibility. Here, we present the MACER package to assist researchers in assembling molecular datasets and provide reproducibility in the process.
Collapse
Affiliation(s)
- Robert G Young
- University of Guelph, Guelph, Canada University of Guelph Guelph Canada
| | - Rekkab Gill
- University of Guelph, Guelph, Canada University of Guelph Guelph Canada
| | - Daniel Gillis
- University of Guelph, Guelph, Canada University of Guelph Guelph Canada
| | - Robert H Hanner
- University of Guelph, Guelph, Canada University of Guelph Guelph Canada
| |
Collapse
|
30
|
Touré V, Flobak Å, Niarakis A, Vercruysse S, Kuiper M. The status of causality in biological databases: data resources and data retrieval possibilities to support logical modeling. Brief Bioinform 2021; 22:bbaa390. [PMID: 33378765 PMCID: PMC8294520 DOI: 10.1093/bib/bbaa390] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 11/26/2020] [Accepted: 11/27/2020] [Indexed: 12/16/2022] Open
Abstract
Causal molecular interactions represent key building blocks used in computational modeling, where they facilitate the assembly of regulatory networks. Logical regulatory networks can be used to predict biological and cellular behaviors by system perturbations and in silico simulations. Today, broad sets of causal interactions are available in a variety of biological knowledge resources. However, different visions, based on distinct biological interests, have led to the development of multiple ways to describe and annotate causal molecular interactions. It can therefore be challenging to efficiently explore various resources of causal interaction and maintain an overview of recorded contextual information that ensures valid use of the data. This review lists the different types of public resources with causal interactions, the different views on biological processes that they represent, the various data formats they use for data representation and storage, and the data exchange and conversion procedures that are available to extract and download these interactions. This may further raise awareness among the targeted audience, i.e. logical modelers and other scientists interested in molecular causal interactions, but also database managers and curators, about the abundance and variety of causal molecular interaction data, and the variety of tools and approaches to convert them into one interoperable resource.
Collapse
Affiliation(s)
- Vasundra Touré
- Department of Biology of the Norwegian University of Science and Technology
| | | | - Anna Niarakis
- Department of Biology, Univ Evry, University of Paris-Saclay, affiliated with the laboratory GenHotel in Genopole campus, and a delegate at the Lifeware Group, INRIA Saclay
| | - Steven Vercruysse
- Researcher in computer science and computational biology and focuses on building a bridge between human and computer understanding
| | - Martin Kuiper
- systems biology at the Department of Biology of the Norwegian University of Science and Technology
| |
Collapse
|
31
|
Harrow J, Drysdale R, Smith A, Repo S, Lanfear J, Blomberg N. ELIXIR: Providing a Sustainable Infrastructure for Life Science Data at European Scale. Bioinformatics 2021; 37:2506-2511. [PMID: 34175941 PMCID: PMC8388016 DOI: 10.1093/bioinformatics/btab481] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 02/19/2021] [Accepted: 06/25/2021] [Indexed: 11/12/2022] Open
Affiliation(s)
- Jennifer Harrow
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Rachel Drysdale
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Andrew Smith
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Susanna Repo
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jerry Lanfear
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Niklas Blomberg
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
32
|
Hall CR, Griffin PC, Lonie AJ, Christiansen JH. Application of a bioinformatics training delivery method for reaching dispersed and distant trainees. PLoS Comput Biol 2021; 17:e1008715. [PMID: 33735276 PMCID: PMC7971692 DOI: 10.1371/journal.pcbi.1008715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Many initiatives have addressed the global need to upskill biologists in bioinformatics tools and techniques. Australia is not unique in its requirement for such training, but due to its large size and relatively small and geographically dispersed population, Australia faces specific challenges. A combined training approach was implemented by the authors to overcome these challenges. The “hybrid” method combines guidance from experienced trainers with the benefits of both webinar-style delivery and concurrent face-to-face hands-on practical exercises in classrooms. Since 2017, the hybrid method has been used to conduct 9 hands-on bioinformatics training sessions at international scale in which over 800 researchers have been trained in diverse topics on a range of software platforms. The method has become a key tool to ensure scalable and more equitable delivery of short-course bioinformatics training across Australia and can be easily adapted to other locations, topics, or settings.
Collapse
Affiliation(s)
- Christina R. Hall
- Australian BioCommons, Australia
- EMBL Australia Bioinformatics Resource, Australia
- Melbourne Bioinformatics, University of Melbourne, Victoria, Australia
- * E-mail:
| | - Philippa C. Griffin
- Australian BioCommons, Australia
- EMBL Australia Bioinformatics Resource, Australia
- Melbourne Bioinformatics, University of Melbourne, Victoria, Australia
| | - Andrew J. Lonie
- Australian BioCommons, Australia
- EMBL Australia Bioinformatics Resource, Australia
- Melbourne Bioinformatics, University of Melbourne, Victoria, Australia
| | - Jeffrey H. Christiansen
- Australian BioCommons, Australia
- EMBL Australia Bioinformatics Resource, Australia
- Research Computing Centre, The University of Queensland, Queensland, Australia
- Queensland Cyber Infrastructure Foundation, Queensland, Australia
| |
Collapse
|
33
|
Sarkans U, Füllgrabe A, Ali A, Athar A, Behrangi E, Diaz N, Fexova S, George N, Iqbal H, Kurri S, Munoz J, Rada J, Papatheodorou I, Brazma A. From ArrayExpress to BioStudies. Nucleic Acids Res 2021; 49:D1502-D1506. [PMID: 33211879 PMCID: PMC7778911 DOI: 10.1093/nar/gkaa1062] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 10/16/2020] [Accepted: 10/27/2020] [Indexed: 11/13/2022] Open
Abstract
ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.
Collapse
Affiliation(s)
- Ugis Sarkans
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Anja Füllgrabe
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Ahmed Ali
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Awais Athar
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Ehsan Behrangi
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Nestor Diaz
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Silvie Fexova
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Nancy George
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Haider Iqbal
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Sandeep Kurri
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Jhoan Munoz
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Juan Rada
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
34
|
Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P, Jensen LJ, von Mering C. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 2021; 49:D605-D612. [PMID: 33237311 PMCID: PMC7779004 DOI: 10.1093/nar/gkaa1074] [Citation(s) in RCA: 4672] [Impact Index Per Article: 1168.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/20/2020] [Accepted: 11/23/2020] [Indexed: 12/19/2022] Open
Abstract
Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Annika L Gable
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Katerina C Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - David Lyon
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP Group, Department of Future Technologies, University of Turku, 20014 Turun Yliopisto, Finland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Marc Legeay
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Tao Fang
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany.,Department of Bioinformatics, Biozentrum, University of Würzburg, 97074 Würzburg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christian von Mering
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
35
|
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
|
36
|
Chang A, Jeske L, Ulbrich S, Hofmann J, Koblitz J, Schomburg I, Neumann-Schaal M, Jahn D, Schomburg D. BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res 2021; 49:D498-D508. [PMID: 33211880 PMCID: PMC7779020 DOI: 10.1093/nar/gkaa1025] [Citation(s) in RCA: 381] [Impact Index Per Article: 95.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 10/14/2020] [Accepted: 10/26/2020] [Indexed: 12/31/2022] Open
Abstract
The BRENDA enzyme database (https://www.brenda-enzymes.org), established in 1987, has evolved into the main collection of functional enzyme and metabolism data. In 2018, BRENDA was selected as an ELIXIR Core Data Resource. BRENDA provides reliable data, continuous curation and updates of classified enzymes, and the integration of newly discovered enzymes. The main part contains >5 million data for ∼90 000 enzymes from ∼13 000 organisms, manually extracted from ∼157 000 primary literature references, combined with information of text and data mining, data integration, and prediction algorithms. Supplements comprise disease-related data, protein sequences, 3D structures, genome annotations, ligand information, taxonomic, bibliographic, and kinetic data. BRENDA offers an easy access to enzyme information from quick to advanced searches, text- and structured-based queries for enzyme-ligand interactions, word maps, and visualization of enzyme data. The BRENDA Pathway Maps are completely revised and updated for an enhanced interactive and intuitive usability. The new design of the Enzyme Summary Page provides an improved access to each individual enzyme. A new protein structure 3D viewer was integrated. The prediction of the intracellular localization of eukaryotic enzymes has been implemented. The new EnzymeDetector combines BRENDA enzyme annotations with protein and genome databases for the detection of eukaryotic and prokaryotic enzymes.
Collapse
Affiliation(s)
- Antje Chang
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Lisa Jeske
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Sandra Ulbrich
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Julia Hofmann
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Julia Koblitz
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7 B, 38124 Braunschweig, Germany
| | - Ida Schomburg
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Meina Neumann-Schaal
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7 B, 38124 Braunschweig, Germany
| | - Dieter Jahn
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Dietmar Schomburg
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| |
Collapse
|
37
|
The UniProt Consortium, Bateman A, Martin MJ, Orchard S, Magrane M, Agivetova R, Ahmad S, Alpi E, Bowler-Barnett EH, Britto R, Bursteinas B, Bye-A-Jee H, Coetzee R, Cukura A, Da Silva A, Denny P, Dogan T, Ebenezer T, Fan J, Castro LG, Garmiri P, Georghiou G, Gonzales L, Hatton-Ellis E, Hussein A, Ignatchenko A, Insana G, Ishtiaq R, Jokinen P, Joshi V, Jyothi D, Lock A, Lopez R, Luciani A, Luo J, Lussi Y, MacDougall A, Madeira F, Mahmoudy M, Menchi M, Mishra A, Moulang K, Nightingale A, Oliveira CS, Pundir S, Qi G, Raj S, Rice D, Lopez MR, Saidi R, Sampson J, Sawford T, Speretta E, Turner E, Tyagi N, Vasudev P, Volynkin V, Warner K, Watkins X, Zaru R, Zellner H, Bridge A, Poux S, Redaschi N, Aimo L, Argoud-Puy G, Auchincloss A, Axelsen K, Bansal P, Baratin D, Blatter MC, Bolleman J, Boutet E, Breuza L, Casals-Casas C, de Castro E, Echioukh KC, Coudert E, Cuche B, Doche M, Dornevil D, Estreicher A, Famiglietti ML, Feuermann M, Gasteiger E, Gehant S, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hyka-Nouspikel N, Jungo F, Keller G, Kerhornou A, Lara V, Le Mercier P, Lieberherr D, Lombardot T, Martin X, et alThe UniProt Consortium, Bateman A, Martin MJ, Orchard S, Magrane M, Agivetova R, Ahmad S, Alpi E, Bowler-Barnett EH, Britto R, Bursteinas B, Bye-A-Jee H, Coetzee R, Cukura A, Da Silva A, Denny P, Dogan T, Ebenezer T, Fan J, Castro LG, Garmiri P, Georghiou G, Gonzales L, Hatton-Ellis E, Hussein A, Ignatchenko A, Insana G, Ishtiaq R, Jokinen P, Joshi V, Jyothi D, Lock A, Lopez R, Luciani A, Luo J, Lussi Y, MacDougall A, Madeira F, Mahmoudy M, Menchi M, Mishra A, Moulang K, Nightingale A, Oliveira CS, Pundir S, Qi G, Raj S, Rice D, Lopez MR, Saidi R, Sampson J, Sawford T, Speretta E, Turner E, Tyagi N, Vasudev P, Volynkin V, Warner K, Watkins X, Zaru R, Zellner H, Bridge A, Poux S, Redaschi N, Aimo L, Argoud-Puy G, Auchincloss A, Axelsen K, Bansal P, Baratin D, Blatter MC, Bolleman J, Boutet E, Breuza L, Casals-Casas C, de Castro E, Echioukh KC, Coudert E, Cuche B, Doche M, Dornevil D, Estreicher A, Famiglietti ML, Feuermann M, Gasteiger E, Gehant S, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hyka-Nouspikel N, Jungo F, Keller G, Kerhornou A, Lara V, Le Mercier P, Lieberherr D, Lombardot T, Martin X, Masson P, Morgat A, Neto TB, Paesano S, Pedruzzi I, Pilbout S, Pourcel L, Pozzato M, Pruess M, Rivoire C, Sigrist C, Sonesson K, Stutz A, Sundaram S, Tognolli M, Verbregue L, Wu CH, Arighi CN, Arminski L, Chen C, Chen Y, Garavelli JS, Huang H, Laiho K, McGarvey P, Natale DA, Ross K, Vinayaka CR, Wang Q, Wang Y, Yeh LS, Zhang J, Ruch P, Teodoro D. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 2021; 49:D480-D489. [PMID: 33237286 PMCID: PMC7778908 DOI: 10.1093/nar/gkaa1100] [Show More Authors] [Citation(s) in RCA: 4192] [Impact Index Per Article: 1048.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/21/2020] [Accepted: 11/02/2020] [Indexed: 02/07/2023] Open
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
|
38
|
Cantelli G, Cochrane G, Brooksbank C, McDonagh E, Flicek P, McEntyre J, Birney E, Apweiler R. The European Bioinformatics Institute: empowering cooperation in response to a global health crisis. Nucleic Acids Res 2021; 49:D29-D37. [PMID: 33245775 PMCID: PMC7778996 DOI: 10.1093/nar/gkaa1077] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 10/20/2020] [Accepted: 10/22/2020] [Indexed: 02/06/2023] Open
Abstract
The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effectively tackle a global health crisis. EMBL-EBI has been able to build on its position to contribute to the fight against COVID-19 in a number of ways. Firstly, EMBL-EBI has used its infrastructure, expertise and network of international collaborations to help build the European COVID-19 Data Platform (https://www.covid19dataportal.org/), which brings together COVID-19 biomolecular data and connects it to researchers, clinicians and public health professionals. By September 2020, the COVID-19 Data Platform has integrated in excess of 170 000 COVID-19 biomolecular data and literature records, collected through a number of EMBL-EBI resources. Secondly, EMBL-EBI has strived to continue its support of the life science communities through the crisis, with updated Training provision and improved service provision throughout its resources. The COVID-19 pandemic has highlighted the importance of EMBL-EBI's core principles, including international cooperation, resource sharing and central data brokering, and has further empowered scientific cooperation.
Collapse
Affiliation(s)
- Gaia Cantelli
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cath Brooksbank
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ellen McDonagh
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Open Targets, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Johanna McEntyre
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rolf Apweiler
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
39
|
Digre A, Lindskog C. The Human Protein Atlas-Spatial localization of the human proteome in health and disease. Protein Sci 2021; 30:218-233. [PMID: 33146890 PMCID: PMC7737765 DOI: 10.1002/pro.3987] [Citation(s) in RCA: 123] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/29/2020] [Accepted: 10/30/2020] [Indexed: 12/11/2022]
Abstract
For a complete understanding of a system's processes and each protein's role in health and disease, it is essential to study protein expression with a spatial resolution, as the exact location of proteins at tissue, cellular, or subcellular levels is tightly linked to protein function. The Human Protein Atlas (HPA) project is a large-scale initiative aiming at mapping the entire human proteome using antibody-based proteomics and integration of various other omics technologies. The publicly available knowledge resource www.proteinatlas.org is one of the world's most visited biological databases and has been extensively updated during the last few years. The current version is divided into six main sections, each focusing on particular aspects of the human proteome: (a) the Tissue Atlas showing the distribution of proteins across all major tissues and organs in the human body; (b) the Cell Atlas showing the subcellular localization of proteins in single cells; (c) the Pathology Atlas showing the impact of protein levels on survival of patients with cancer; (d) the Blood Atlas showing the expression profiles of blood cells and actively secreted proteins; (e) the Brain Atlas showing the distribution of proteins in human, mouse, and pig brain; and (f) the Metabolic Atlas showing the involvement of proteins in human metabolism. The HPA constitutes an important resource for further understanding of human biology, and the publicly available datasets hold much promise for integration with other emerging efforts focusing on single cell analyses, both at transcriptomic and proteomic level.
Collapse
Affiliation(s)
- Andreas Digre
- Department of Immunology, Genetics and PathologyRudbeck Laboratory, Uppsala UniversityUppsalaSweden
| | - Cecilia Lindskog
- Department of Immunology, Genetics and PathologyRudbeck Laboratory, Uppsala UniversityUppsalaSweden
| |
Collapse
|
40
|
Porras P, Barrera E, Bridge A, Del-Toro N, Cesareni G, Duesbury M, Hermjakob H, Iannuccelli M, Jurisica I, Kotlyar M, Licata L, Lovering RC, Lynn DJ, Meldal B, Nanduri B, Paneerselvam K, Panni S, Pastrello C, Pellegrini M, Perfetto L, Rahimzadeh N, Ratan P, Ricard-Blum S, Salwinski L, Shirodkar G, Shrivastava A, Orchard S. Towards a unified open access dataset of molecular interactions. Nat Commun 2020; 11:6144. [PMID: 33262342 PMCID: PMC7708836 DOI: 10.1038/s41467-020-19942-z] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 11/09/2020] [Indexed: 12/16/2022] Open
Abstract
The International Molecular Exchange (IMEx) Consortium provides scientists with a single body of experimentally verified protein interactions curated in rich contextual detail to an internationally agreed standard. In this update to the work of the IMEx Consortium, we discuss how this initiative has been working in practice, how it has ensured database sustainability, and how it is meeting emerging annotation challenges through the introduction of new interactor types and data formats. Additionally, we provide examples of how IMEx data are being used by biomedical researchers and integrated in other bioinformatic tools and resources.
Collapse
Affiliation(s)
- Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Elisabet Barrera
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Alan Bridge
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, CH-1211, Geneva, Switzerland
| | - Noemi Del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Gianni Cesareni
- University of Rome Tor Vergata, Rome, Italy
- IRCCS Fondazione Santa Lucia, 00143, Rome, Italy
| | - Margaret Duesbury
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
- UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Igor Jurisica
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, and Krembil Research Institute, University Health Network, 60 Leonard Avenue, 5KD-407, Toronto, ON, M5T 0S8, Canada
- Departments of Medical Biophysics, and Computer Science, University of Toronto, Toronto, ON, Canada
- Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Max Kotlyar
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, and Krembil Research Institute, University Health Network, 60 Leonard Avenue, 5KD-407, Toronto, ON, M5T 0S8, Canada
| | | | - Ruth C Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London, WC1E 6JF, UK
| | - David J Lynn
- Computational and Systems Biology Program, Precision Medicine Theme, South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia
- College of Medicine and Public Health, Flinders University, Bedford Park, SA, 5042, Australia
| | - Birgit Meldal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Bindu Nanduri
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, MS, USA
| | - Kalpana Paneerselvam
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Simona Panni
- Università della Calabria, Dipartimento di Biologia, Ecologia e Scienze della Terra, Via Pietro Bucci Cubo 6/C, Rende, CS, Italy
| | - Chiara Pastrello
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, and Krembil Research Institute, University Health Network, 60 Leonard Avenue, 5KD-407, Toronto, ON, M5T 0S8, Canada
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, UCLA, Box 951606, Los Angeles, CA, 90095-1606, USA
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Negin Rahimzadeh
- UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Prashansa Ratan
- UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Sylvie Ricard-Blum
- ICBMS, UMR 5246 University Lyon 1 - CNRS, Univ. Lyon, 69622, Villeurbanne, France
| | - Lukasz Salwinski
- UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Gautam Shirodkar
- UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Anjalia Shrivastava
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
41
|
Adhikari S, Nice EC, Deutsch EW, Lane L, Omenn GS, Pennington SR, Paik YK, Overall CM, Corrales FJ, Cristea IM, Van Eyk JE, Uhlén M, Lindskog C, Chan DW, Bairoch A, Waddington JC, Justice JL, LaBaer J, Rodriguez H, He F, Kostrzewa M, Ping P, Gundry RL, Stewart P, Srivastava S, Srivastava S, Nogueira FCS, Domont GB, Vandenbrouck Y, Lam MPY, Wennersten S, Vizcaino JA, Wilkins M, Schwenk JM, Lundberg E, Bandeira N, Marko-Varga G, Weintraub ST, Pineau C, Kusebauch U, Moritz RL, Ahn SB, Palmblad M, Snyder MP, Aebersold R, Baker MS. A high-stringency blueprint of the human proteome. Nat Commun 2020; 11:5301. [PMID: 33067450 PMCID: PMC7568584 DOI: 10.1038/s41467-020-19045-9] [Citation(s) in RCA: 152] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 09/25/2020] [Indexed: 02/07/2023] Open
Abstract
The Human Proteome Organization (HUPO) launched the Human Proteome Project (HPP) in 2010, creating an international framework for global collaboration, data sharing, quality assurance and enhancing accurate annotation of the genome-encoded proteome. During the subsequent decade, the HPP established collaborations, developed guidelines and metrics, and undertook reanalysis of previously deposited community data, continuously increasing the coverage of the human proteome. On the occasion of the HPP's tenth anniversary, we here report a 90.4% complete high-stringency human proteome blueprint. This knowledge is essential for discerning molecular processes in health and disease, as we demonstrate by highlighting potential roles the human proteome plays in our understanding, diagnosis and treatment of cancers, cardiovascular and infectious diseases.
Collapse
Grants
- WT101477MA Wellcome Trust
- R24 GM127667 NIGMS NIH HHS
- U24 CA210985 NCI NIH HHS
- U19 AG023122 NIA NIH HHS
- U24 CA210967 NCI NIH HHS
- R01 GM087221 NIGMS NIH HHS
- R01 GM114141 NIGMS NIH HHS
- U24 CA115102 NCI NIH HHS
- P30 ES017885 NIEHS NIH HHS
- R01 HL111362 NHLBI NIH HHS
- Wellcome Trust
- 208391/Z/17/Z Wellcome Trust
- International Macquarie Research Excellence Scholarship
- NHMRC 1010303 (MSB, ECN); Cancer Council NSW RG19-04 (MSB, SBA, ECN); Cancer Institute NSW Fellowship 15/ECF/1-38 (SBA), Sydney Vital CINSW Translational Cancer Research Centre grant (MSB, SBA, SA), “Fight on the Beaches” (MSB, SBA, ECN, SA)
- Department of Health | National Health and Medical Research Council (NHMRC)
- Cancer Institute NSW (Cancer Institute New South Wales)
- “Fight on the Beaches” research grant
Collapse
Affiliation(s)
- Subash Adhikari
- Faculty of Medicine, Health and Human Sciences, Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, 2109, Australia
| | - Edouard C Nice
- Faculty of Medicine, Health and Human Sciences, Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, 2109, Australia
- Faculty of Medicine, Nursing and Health Sciences, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
| | - Eric W Deutsch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA, 98109, USA
| | - Lydie Lane
- Faculty of Medicine, SIB-Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, University of Geneva, CMU, Michel-Servet 1, 1211, Geneva, Switzerland
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA
| | - Stephen R Pennington
- UCD Conway Institute of Biomolecular and Biomedical Research, School of Medicine, University College Dublin, Dublin, Ireland
| | - Young-Ki Paik
- Yonsei Proteome Research Center, 50 Yonsei-ro, Sudaemoon-ku, Seoul, 120-749, South Korea
| | | | - Fernando J Corrales
- Functional Proteomics Laboratory, Centro Nacional de Biotecnología-CSIC, Proteored-ISCIII, 28049, Madrid, Spain
| | - Ileana M Cristea
- Department of Molecular Biology, Princeton University, Princeton, NJ, 08544, USA
| | - Jennifer E Van Eyk
- Cedars Sinai Medical Center, Advanced Clinical Biosystems Research Institute, The Smidt Heart Institute, Los Angeles, CA, 90048, USA
| | - Mathias Uhlén
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 17121, Solna, Sweden
| | - Cecilia Lindskog
- Rudbeck Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 75185, Uppsala, Sweden
| | - Daniel W Chan
- Department of Pathology and Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21224, USA
| | - Amos Bairoch
- Faculty of Medicine, SIB-Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, University of Geneva, CMU, Michel-Servet 1, 1211, Geneva, Switzerland
| | - James C Waddington
- UCD Conway Institute of Biomolecular and Biomedical Research, School of Medicine, University College Dublin, Dublin, Ireland
| | - Joshua L Justice
- Department of Molecular Biology, Princeton University, Princeton, NJ, 08544, USA
| | - Joshua LaBaer
- Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Henry Rodriguez
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Markus Kostrzewa
- Bruker Daltonik GmbH, Microbiology and Diagnostics, Fahrenheitstrasse, 428359, Bremen, Germany
| | - Peipei Ping
- Cardiac Proteomics and Signaling Laboratory, Department of Physiology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Rebekah L Gundry
- CardiOmics Program, Center for Heart and Vascular Research, Division of Cardiovascular Medicine and Department of Cellular and Integrative Physiology, University of Nebraska Medical Center, Omaha, NE, 68198, USA
| | - Peter Stewart
- Department of Chemical Pathology, Royal Prince Alfred Hospital, Camperdown, NSW, Australia
| | | | - Sudhir Srivastava
- Cancer Biomarkers Research Branch, National Cancer Institute, National Institutes of Health, 9609 Medical Center Drive, Suite 5E136, Rockville, MD, 20852, USA
| | - Fabio C S Nogueira
- Proteomics Unit and Laboratory of Proteomics, Institute of Chemistry, Federal University of Rio de Janeiro, Av Athos da Silveria Ramos, 149, 21941-909, Rio de Janeiro, RJ, Brazil
| | - Gilberto B Domont
- Proteomics Unit and Laboratory of Proteomics, Institute of Chemistry, Federal University of Rio de Janeiro, Av Athos da Silveria Ramos, 149, 21941-909, Rio de Janeiro, RJ, Brazil
| | - Yves Vandenbrouck
- University of Grenoble Alpes, Inserm, CEA, IRIG-BGE, U1038, 38000, Grenoble, France
| | - Maggie P Y Lam
- Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
- Consortium for Fibrosis Research and Translation, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Sara Wennersten
- Division of Cardiology, Department of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Juan Antonio Vizcaino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Marc Wilkins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Jochen M Schwenk
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 17121, Solna, Sweden
| | - Emma Lundberg
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 17121, Solna, Sweden
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, Mail Code 0404, La Jolla, CA, 92093-0404, USA
| | | | - Susan T Weintraub
- Department of Biochemistry and Structural Biology, University of Texas Health Science Center San Antonio, UT Health, 7703 Floyd Curl Drive, San Antonio, TX, 78229-3900, USA
| | - Charles Pineau
- University of Rennes, Inserm, EHESP, IREST, UMR_S 1085, F-35042, Rennes, France
| | - Ulrike Kusebauch
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA, 98109, USA
| | - Robert L Moritz
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA, 98109, USA
| | - Seong Beom Ahn
- Faculty of Medicine, Health and Human Sciences, Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, 2109, Australia
| | - Magnus Palmblad
- Leiden University Medical Center, Leiden, 2333, The Netherlands
| | - Michael P Snyder
- Department of Genetics, Stanford School of Medicine, Stanford, CA, 94305, USA
| | - Ruedi Aebersold
- Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA, 98109, USA
- Faculty of Science, University of Zurich, Zurich, Switzerland
| | - Mark S Baker
- Faculty of Medicine, Health and Human Sciences, Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, 2109, Australia.
- Department of Genetics, Stanford School of Medicine, Stanford, CA, 94305, USA.
| |
Collapse
|
42
|
Carretero-Puche C, García-Martín S, García-Carbonero R, Gómez-López G, Al-Shahrour F. How can bioinformatics contribute to the routine application of personalized precision medicine? EXPERT REVIEW OF PRECISION MEDICINE AND DRUG DEVELOPMENT 2020. [DOI: 10.1080/23808993.2020.1758062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Carlos Carretero-Puche
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- Laboratorio de Oncología Clínico-Traslacional, Unidad de Investigación en Tumores Digestivos, Instituto de Investigación I+12, Hospital 12 de Octubre, Madrid, Spain
| | | | - Rocío García-Carbonero
- Laboratorio de Oncología Clínico-Traslacional, Unidad de Investigación en Tumores Digestivos, Instituto de Investigación I+12, Hospital 12 de Octubre, Madrid, Spain
| | - Gonzalo Gómez-López
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Fátima Al-Shahrour
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
43
|
Ostaszewski M, Mazein A, Gillespie ME, Kuperstein I, Niarakis A, Hermjakob H, Pico AR, Willighagen EL, Evelo CT, Hasenauer J, Schreiber F, Dräger A, Demir E, Wolkenhauer O, Furlong LI, Barillot E, Dopazo J, Orta-Resendiz A, Messina F, Valencia A, Funahashi A, Kitano H, Auffray C, Balling R, Schneider R. COVID-19 Disease Map, building a computational repository of SARS-CoV-2 virus-host interaction mechanisms. Sci Data 2020; 7:136. [PMID: 32371892 PMCID: PMC7200764 DOI: 10.1038/s41597-020-0477-8] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 04/24/2020] [Indexed: 11/20/2022] Open
Abstract
Researchers around the world join forces to reconstruct the molecular processes of the virus-host interactions aiming to combat the cause of the ongoing pandemic.
Collapse
Affiliation(s)
- Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Alexander Mazein
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
- European Institute for Systems Biology and Medicine (EISBM), Vourles, France
| | - Marc E Gillespie
- Ontario Institute for Cancer Research, Toronto, Canada
- College of Pharmacy and Health Sciences, St. John's University, Queens, NY, USA
| | - Inna Kuperstein
- Institut Curie, PSL Research University, Mines Paris Tech, Inserm, Paris, France
| | - Anna Niarakis
- Department of Biology, Univ. Évry, University of Paris-Saclay, Genopole, 91025, Évry, France
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, United States
| | - Egon L Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology, Maastricht University, Maastricht, The Netherlands
| | - Jan Hasenauer
- Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
| | - Falk Schreiber
- University of Konstanz, Department of Computer and Information Science, Konstanz, Germany
- Monash University, Faculty of Information Technology, Melbourne, Australia
| | - Andreas Dräger
- Computational Systems Biology of Infection and Antimicrobial-Resistant Pathogens, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, 72076, Tübingen, Germany
- Department of Computer Science, University of Tübingen, 72076, Tübingen, Germany
- German Center for Infection Research (DZIF), partner site, Tübingen, Germany
| | - Emek Demir
- Department of Molecular and Medical Genetics, School of Medicine, Oregon Health & Science University, Portland, USA
| | - Olaf Wolkenhauer
- Department of Systems Biology & Bioinformatics, University of Rostock, Rostock, Germany
- Stellenbosch Institute of Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, 7602, Stellenbosch, South Africa
| | - Laura I Furlong
- Research Programme on Biomedical Informatics, Hospital del Mar Medical Research Institute, Department of Experimental and Health Sciences, Pompeu Fabra University, Barcelona, Spain
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, Mines Paris Tech, Inserm, Paris, France
| | - Joaquin Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud. Hosp. Virgen del Rocío, Sevilla, Spain
- Bioinformatics in Rare Diseases. Centro de Investigación Biomédica en Red de Enfermedades Raras, Fundación Progreso y Salud, Hosp. Virgen del Rocío, Sevilla, Spain
- INB-ELIXIR-es, FPS, Hospital Virgen del Rocío, Sevilla, 42013, Spain
- Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Aurelio Orta-Resendiz
- HIV, Inflammation and Persistence Unit, Virology Department, Institut Pasteur, Paris, France
- Bio Sorbonne Paris Cité, Université de Paris, Paris, France
| | - Francesco Messina
- Dipartimento di Epidemiologia Ricerca Pre-Clinica e Diagnostica Avanzata, National Institute for Infectious Diseases "Lazzaro Spallanzani" I.R.C.C.S., Rome, Italy
- COVID 19 INMI Network Medicine for IDs Study Group, National Institute for Infectious Diseases "Lazzaro Spallanzani" I.R.C.C.S., Rome, Italy
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Institucio Catalana de Recerca I Estudis Avançats (ICREA), Barcelona, Spain
| | - Akira Funahashi
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, Japan
| | - Hiroaki Kitano
- The Systems Biology Institute, Shinagawa, Tokyo, Japan
- Okinawa Institute of Science and Technology Graduate University, Kunigami, Okinawa, Japan
- Sony Computer Science Laboratories, Inc., Tokyo, Japan
| | - Charles Auffray
- European Institute for Systems Biology and Medicine (EISBM), Vourles, France
| | - Rudi Balling
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg.
| |
Collapse
|
44
|
Southan C. Opening up connectivity between documents, structures and bioactivity. Beilstein J Org Chem 2020; 16:596-606. [PMID: 32280387 PMCID: PMC7136548 DOI: 10.3762/bjoc.16.54] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Bioscientists reading papers or patents strive to discern the key relationships reported within a document "D" where a bioactivity "A" with a quantitative result "R" (e.g., an IC50) is reported for chemical structure "C" that modulates (e.g., inhibits) a protein target "P". A useful shorthand for this connectivity thus becomes DARCP. The problem at the core of this article is that the community has spent millions effectively burying these relationships in PDFs over many decades but must now spend millions more trying to get them back out. The key imperative for this is to increase the flow into structured open databases. The positive impacts will include expanded data mining opportunities for drug discovery and chemical biology. Over the last decade commercial sources have manually extracted DARCP from ≈300,000 documents encompassing ≈7 million compounds interacting with ≈10,000 targets. Over a similar time, the Guide to Pharmacology, BindingDB and ChEMBL have carried out analogues DARCP extractions. Although their expert-curated numbers are lower (i.e., ≈2 million compounds against ≈3700 human proteins), these open sources have the great advantage of being merged within PubChem. Parallel efforts have focused on the extraction of document-to-compound (D-C-only) connectivity. In the absence of molecular mechanism of action (mmoa) annotation, this is of less value but can be automatically extracted. This has been significantly accomplished for patents, (e.g., by IBM, SureChEMBL and WIPO) for over 30 million compounds in PubChem. These have recently been joined by 1.4 million D-C submissions from three major chemistry publishers. In addition, both the European and US PubMed Central portals now add chemistry look-ups from abstracts and full-text papers. However, the fully automated extraction of DARCLP has not yet been achieved. This stands in contrast to the ability of biocurators to discern these relationships in minutes. Unfortunately, no journals have yet instigated a flow of author-specified DARCP directly into open databases. Progress may come from trends such as open science, open access (OA), findable, accessible, interoperable and reusable (FAIR), resource description framework (RDF) and WikiData. However, we will need to await the technical applicability in respect to DARCP capture to see if this opens up connectivity.
Collapse
Affiliation(s)
- Christopher Southan
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, EH8 9XD, UK
- TW2Informatics Ltd, Västra Frölunda, Gothenburg, 42166, Sweden
| |
Collapse
|
45
|
Wibberg D, Batut B, Belmann P, Blom J, Glöckner FO, Grüning B, Hoffmann N, Kleinbölting N, Rahn R, Rey M, Scholz U, Sharan M, Tauch A, Trojahn U, Usadel B, Kohlbacher O. The de.NBI / ELIXIR-DE training platform - Bioinformatics training in Germany and across Europe within ELIXIR. F1000Res 2019; 8. [PMID: 33163154 PMCID: PMC7607484 DOI: 10.12688/f1000research.20244.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/04/2020] [Indexed: 12/25/2022] Open
Abstract
The German Network for Bioinformatics Infrastructure (de.NBI) is a national and academic infrastructure funded by the German Federal Ministry of Education and Research (BMBF). The de.NBI provides (i) service, (ii) training, and (iii) cloud computing to users in life sciences research and biomedicine in Germany and Europe and (iv) fosters the cooperation of the German bioinformatics community with international network structures. The de.NBI members also run the German node (ELIXIR-DE) within the European ELIXIR infrastructure. The de.NBI / ELIXIR-DE training platform, also known as special interest group 3 (SIG 3) ‘Training & Education’, coordinates the bioinformatics training of de.NBI and the German ELIXIR node. The network provides a high-quality, coherent, timely, and impactful training program across its eight service centers. Life scientists learn how to handle and analyze biological big data more effectively by applying tools, standards and compute services provided by de.NBI. Since 2015, more than 300 training courses were carried out with about 6,000 participants and these courses received recommendation rates of almost 90% (status as of July 2020). In addition to face-to-face training courses, online training was introduced on the de.NBI website in 2016 and guidelines for the preparation of e-learning material were established in 2018. In 2016, ELIXIR-DE joined the ELIXIR training platform. Here, the de.NBI / ELIXIR-DE training platform collaborates with ELIXIR in training activities, advertising training courses via TeSS and discussions on the exchange of data for training events essential for quality assessment on both the technical and administrative levels. The de.NBI training program trained thousands of scientists from Germany and beyond in many different areas of bioinformatics.
Collapse
Affiliation(s)
- Daniel Wibberg
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, 33501, Germany
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, 79110, Germany
| | - Peter Belmann
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, 33501, Germany
| | - Jochen Blom
- Bioinformatics and Systems Biology, Justus-Liebig-University Giessen, Giessen, 35392, Germany
| | - Frank Oliver Glöckner
- Alfred-Wegener-Institut - Helmholtz Zentrum für Polar- und Meeresforschung and Jacobs University Bremen, Campus Ring 1, Bremen, 28759, Germany
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, 79110, Germany
| | - Nils Hoffmann
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, 44227, Germany
| | - Nils Kleinbölting
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, 33501, Germany
| | - René Rahn
- Algorithmic Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Takustraße 9, Berlin, 14195, Germany
| | - Maja Rey
- Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS) gGmbH, Schloss-Wolfsbrunnenweg 35, Heidelberg, 69118, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Malvika Sharan
- The Heidelberg Center for Human Bioinformatics (HD-HuB), European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg, 69117, Germany
| | - Andreas Tauch
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, 33501, Germany
| | - Ulrike Trojahn
- The Heidelberg Center for Human Bioinformatics (HD-HuB), European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg, 69117, Germany
| | - Björn Usadel
- IBG-2 Plant Sciences, Forschungszentrum Jülich, Jülich, 52428, Germany
| | - Oliver Kohlbacher
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, 72076, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, 72076, Germany.,Translational Bioinformatics, University Hospital Tubingen, Tübingen, 72076, Germany.,Biomolecular Interactions, Max Planck Institute for Development Biology, Tübingen, 72076, Germany
| |
Collapse
|