1
|
Magateshvaren Saras MA, Mitra MK, Tyagi S. Navigating the Multiverse: a Hitchhiker's guide to selecting harmonization methods for multimodal biomedical data. Biol Methods Protoc 2025; 10:bpaf028. [PMID: 40308831 PMCID: PMC12043205 DOI: 10.1093/biomethods/bpaf028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Revised: 03/20/2025] [Accepted: 04/15/2025] [Indexed: 05/02/2025] Open
Abstract
The application of machine learning (ML) techniques in predictive modelling has greatly advanced our comprehension of biological systems. There is a notable shift in the trend towards integration methods that specifically target the simultaneous analysis of multiple modes or types of data, showcasing superior results compared to individual analyses. Despite the availability of diverse ML architectures for researchers interested in embracing a multimodal approach, the current literature lacks a comprehensive taxonomy that includes the pros and cons of these methods to guide the entire process. Closing this gap is imperative, necessitating the creation of a robust framework. This framework should not only categorize the diverse ML architectures suitable for multimodal analysis but also offer insights into their respective advantages and limitations. Additionally, such a framework can serve as a valuable guide for selecting an appropriate workflow for multimodal analysis. This comprehensive taxonomy would provide a clear guidance and support informed decision-making within the progressively intricate landscape of biomedical and clinical data analysis. This is an essential step towards advancing personalized medicine. The aims of the work are to comprehensively study and describe the harmonization processes that are performed and reported in the literature and present a working guide that would enable planning and selecting an appropriate integrative model. We present harmonization as a dual process of representation and integration, each with multiple methods and categories. The taxonomy of the various representation and integration methods are classified into six broad categories and detailed with the advantages, disadvantages and examples. A guide flowchart describing the step-by-step processes that are needed to adopt a multimodal approach is also presented along with examples and references. This review provides a thorough taxonomy of methods for harmonizing multimodal data and introduces a foundational 10-step guide for newcomers to implement a multimodal workflow.
Collapse
Affiliation(s)
- Murali Aadhitya Magateshvaren Saras
- IITB-Monash Research Academy, Mumbai, Maharashtra 400076, India
- Department of Physics, Indian Institute of Technology Bombay, Mumbai, Maharashtra 400076, India
- School of Translational Medicine, Monash University, Melbourne, Victoria 3181, Australia
| | - Mithun K Mitra
- Department of Physics, Indian Institute of Technology Bombay, Mumbai, Maharashtra 400076, India
| | - Sonika Tyagi
- School of Translational Medicine, Monash University, Melbourne, Victoria 3181, Australia
- School of Computing Technologies, RMIT University, Melbourne, Victoria 3001, Australia
| |
Collapse
|
2
|
Hegde H, Vendetti J, Goutte-Gattat D, Caufield JH, Graybeal JB, Harris NL, Karam N, Kindermann C, Matentzoglu N, Overton JA, Musen MA, Mungall CJ. A change language for ontologies and knowledge graphs. Database (Oxford) 2025; 2025:baae133. [PMID: 39841813 PMCID: PMC11753292 DOI: 10.1093/database/baae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 11/21/2024] [Accepted: 12/30/2024] [Indexed: 01/24/2025]
Abstract
Ontologies and knowledge graphs (KGs) are general-purpose computable representations of some domain, such as human anatomy, and are frequently a crucial part of modern information systems. Most of these structures change over time, incorporating new knowledge or information that was previously missing. Managing these changes is a challenge, both in terms of communicating changes to users and providing mechanisms to make it easier for multiple stakeholders to contribute. To fill that need, we have created KGCL, the Knowledge Graph Change Language (https://github.com/INCATools/kgcl), a standard data model for describing changes to KGs and ontologies at a high level, and an accompanying human-readable Controlled Natural Language (CNL). This language serves two purposes: a curator can use it to request desired changes, and it can also be used to describe changes that have already happened, corresponding to the concepts of "apply patch" and "diff" commonly used for managing changes in text documents and computer programs. Another key feature of KGCL is that descriptions are at a high enough level to be useful and understood by a variety of stakeholders-e.g. ontology edits can be specified by commands like "add synonym 'arm' to 'forelimb'" or "move 'Parkinson disease' under 'neurodegenerative disease'." We have also built a suite of tools for managing ontology changes. These include an automated agent that integrates with and monitors GitHub ontology repositories and applies any requested changes and a new component in the BioPortal ontology resource that allows users to make change requests directly from within the BioPortal user interface. Overall, the KGCL data model, its CNL, and associated tooling allow for easier management and processing of changes associated with the development of ontologies and KGs. Database URL: https://github.com/INCATools/kgcl.
Collapse
Affiliation(s)
- Harshad Hegde
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, One Cyclotron Rd., Berkeley, CA 94720, United States
| | - Jennifer Vendetti
- Center for Biomedical Informatics Research, Stanford University, 3180 Porter Dr., Palo Alto, CA 94304, United States
| | - Damien Goutte-Gattat
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, United Kingdom
| | - J Harry Caufield
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, One Cyclotron Rd., Berkeley, CA 94720, United States
| | - John B Graybeal
- Center for Biomedical Informatics Research, Stanford University, 3180 Porter Dr., Palo Alto, CA 94304, United States
| | - Nomi L Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, One Cyclotron Rd., Berkeley, CA 94720, United States
| | - Naouel Karam
- Institute for Applied Informatics (InfAI), Leipzig University, Goerdelerring 9, Leipzig 04109, Germany
| | - Christian Kindermann
- Center for Biomedical Informatics Research, Stanford University, 3180 Porter Dr., Palo Alto, CA 94304, United States
| | | | - James A Overton
- Knocean Inc., 2 - 107 Quebec Ave., Toronto, Ontario M6P 2T3, Canada
| | - Mark A Musen
- Center for Biomedical Informatics Research, Stanford University, 3180 Porter Dr., Palo Alto, CA 94304, United States
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, One Cyclotron Rd., Berkeley, CA 94720, United States
| |
Collapse
|
3
|
Lee WY, Park KI, Bak SB, Lee S, Bae SJ, Kim MJ, Park SD, Kim CO, Kim JH, Kim YW, Kim CE. Evaluating current status of network pharmacology for herbal medicine focusing on identifying mechanisms and therapeutic effects. J Adv Res 2024:S2090-1232(24)00618-0. [PMID: 39730024 DOI: 10.1016/j.jare.2024.12.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 12/05/2024] [Accepted: 12/24/2024] [Indexed: 12/29/2024] Open
Abstract
INTRODUCTION Network pharmacology has gained significant traction as a tool for identifying the mechanisms and therapeutic effects of herbal medicines. However, despite the usefulness of these approaches, their diversity underscores the critical need for a systematic evaluation to ensure consistency and reliability. OBJECTIVES We aimed to evaluate the network pharmacological analyses, focusing on identifying the mechanisms and therapeutic effects of herbal medicines. METHODS We employed a comprehensive approach involving systematic data retrieval, network construction, and analysis. Herbal compounds and their targets were meticulously extracted from five distinct network pharmacology databases to ensure extensive coverage and high data reliability. Advanced network-based methods were used to identify key herbal targets and predict therapeutic effects, thereby enriching the depth and breadth of the analysis. Experimental validation was performed on prostate cancer models to substantiate the computational predictions. RESULTS The results of the recapitulating task for known herbal ingredient targets revealed distinct patterns in performance and coverage based on network construction and aggregation methods. We performed the same analysis to identify herbal targets and found that network centrality, path counts, and downweighted path counts had their own pros and cons. By comparing network-based methods, we found that considering the impact on the multiscale interactome yielded the highest accuracy in discriminating known therapeutic effects. Using optimal conditions, we successfully identified new indications for herbal medicines and validated these findings through follow-up in vitro and in vivo experiments. CONCLUSION This study presents the first comprehensive and critical evaluation of the current network pharmacology analyses in the field of herbal medicine and provides valuable guidance for continued advances in the elucidation of the mechanisms and therapeutic effects.
Collapse
Affiliation(s)
- Won-Yung Lee
- School of Korean Medicine, Wonkwang University, Iksan 54538, Republic of Korea; Research Center of Traditional Korean Medicine, Wonkwang University, Iksan 54538, Republic of Korea; School of Korean Medicine, Woosuk University, Jeonju 54986, Republic of Korea
| | - Kwang-Il Park
- Department of Veterinary Medicine, Research Institute of Life Science, Gyeongsang National University, Jinju 52828, Republic of Korea
| | - Seon-Been Bak
- School of Korean Medicine, Dongguk University, Gyeongju 38066, Republic of Korea; Department of Nutritional Science and Food Management, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Seungho Lee
- School of Korean Medicine, Woosuk University, Jeonju 54986, Republic of Korea
| | - Su-Jin Bae
- School of Korean Medicine, Wonkwang University, Iksan 54538, Republic of Korea
| | - Min-Jin Kim
- School of Korean Medicine, Dongguk University, Gyeongju 38066, Republic of Korea
| | - Sun-Dong Park
- School of Korean Medicine, Dongguk University, Gyeongju 38066, Republic of Korea
| | - Choon Ok Kim
- Department of Clinical Pharmacology, Severance Hospital, Yonsei University Health System, Seoul 03722, Republic of Korea
| | - Ji-Hwan Kim
- School of Korean Medicine, Pusan National University, Yangsan-si 50612, Republic of Korea
| | - Young Woo Kim
- School of Korean Medicine, Dongguk University, Gyeongju 38066, Republic of Korea.
| | - Chang-Eop Kim
- School of Korean Medicine, Gachon University, Seongnam 13110, Republic of Korea.
| |
Collapse
|
4
|
Johnson R, Gottlieb U, Shaham G, Eisen L, Waxman J, Devons-Sberro S, Ginder CR, Hong P, Sayeed R, Reis BY, Balicer RD, Dagan N, Zitnik M. Unified Clinical Vocabulary Embeddings for Advancing Precision Medicine. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.12.03.24318322. [PMID: 39677476 PMCID: PMC11643188 DOI: 10.1101/2024.12.03.24318322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Integrating clinical knowledge into AI remains challenging despite numerous medical guidelines and vocabularies. Medical codes, central to healthcare systems, often reflect operational patterns shaped by geographic factors, national policies, insurance frameworks, and physician practices rather than the precise representation of clinical knowledge. This disconnect hampers AI in representing clinical relationships, raising concerns about bias, transparency, and generalizability. Here, we developed a resource of 67,124 clinical vocabulary embeddings derived from a clinical knowledge graph tailored to electronic health record vocabularies, spanning over 1.3 million edges. Using graph transformer neural networks, we generated clinical vocabulary embeddings that provide a new representation of clinical knowledge by unifying seven medical vocabularies. These embeddings were validated through a phenotype risk score analysis involving 4.57 million patients from Clalit Healthcare Services, effectively stratifying individuals based on survival outcomes. Inter-institutional panels of clinicians evaluated the embeddings for alignment with clinical knowledge across 90 diseases and 3,000 clinical codes, confirming their robustness and transferability. This resource addresses gaps in integrating clinical vocabularies into AI models and training datasets, paving the way for knowledge-grounded population and patient-level models.
Collapse
Affiliation(s)
- Ruth Johnson
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Uri Gottlieb
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Galit Shaham
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Lihi Eisen
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Jacob Waxman
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Stav Devons-Sberro
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
| | - Curtis R. Ginder
- Cardiovascular Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Peter Hong
- Division of General Pediatrics, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
- Information Technology, Enterprise Data Analytics and Reporting, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Raheel Sayeed
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Ben Y. Reis
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Predictive Medicine Group, Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| | - Ran D. Balicer
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
- Faculty of Health Sciences, School of Public Health, Ben Gurion University of the Negev, Be’er Sheva, Israel
| | - Noa Dagan
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Clalit Research Institute, Innovation Division, Clalit Health Services, Ramat-Gan, Israel
- Software and Information Systems Engineering, Ben Gurion University, Be’er Sheva, Israel
| | - Marinka Zitnik
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA
| |
Collapse
|
5
|
Brusini L, Dolci G, Pini L, Cruciani F, Pizzagalli F, Provero P, Menegaz G, Boscolo Galazzo I. Morphometric Similarity Patterning of Amyloid- β and Tau Proteins Correlates with Transcriptomics in the Alzheimer's Disease Continuum. Int J Mol Sci 2024; 25:12871. [PMID: 39684582 DOI: 10.3390/ijms252312871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 11/23/2024] [Accepted: 11/26/2024] [Indexed: 12/18/2024] Open
Abstract
Bridging the gap between cortical morphometric remodeling and gene expression can help to clarify the effects of the selective brain accumulation of Amyloid-β (Aβ) and tau proteins occurring in the Alzheimer's disease (AD). To this aim, we derived morphometric similarity (MS) networks from 126 Aβ- and tau-positive (Aβ+/tau+) and 172 Aβ-/tau- subjects, and we investigated the association between group-wise regional MS differences and transcriptional correlates thanks to an imaging transcriptomics approach grounded in the Allen Human Brain Atlas (AHBA). The expressed gene with the highest correlation with MS alterations was BCHE, a gene related to Aβ homeostasis. In addition, notably, among the most promising results derived from the enrichment analysis, we found the immune response to be a biological process and astrocytes, microglia, and oligodendrocyte precursors for the cell types. In summary, by relating cortical MS and AHBA-derived transcriptomics, we were able to retrieve findings suggesting the biological mechanisms underlying the Aβ- and tau- induced cortical MS alterations in the AD continuum.
Collapse
Affiliation(s)
- Lorenza Brusini
- Department of Engineering for Innovation Medicine, University of Verona, 37134 Verona, Italy
| | - Giorgio Dolci
- Department of Engineering for Innovation Medicine, University of Verona, 37134 Verona, Italy
- Department of Computer Science, University of Verona, 37134 Verona, Italy
| | - Lorenzo Pini
- Department of Neuroscience, University of Padova, 35121 Padova, Italy
| | - Federica Cruciani
- Department of Engineering for Innovation Medicine, University of Verona, 37134 Verona, Italy
- Istituto Fondazione Oncologia Molecolare Ente del Terzo Settore (IFOM ETS)-The Associazione Italiana per la Ricerca sul Cancro (AIRC) Institute of Molecular Oncology, 20139 Milano, Italy
| | - Fabrizio Pizzagalli
- Department of Neurosciences "Rita Levi Montalcini", University of Turin, 10126 Turin, Italy
| | - Paolo Provero
- Department of Neurosciences "Rita Levi Montalcini", University of Turin, 10126 Turin, Italy
| | - Gloria Menegaz
- Department of Engineering for Innovation Medicine, University of Verona, 37134 Verona, Italy
| | - Ilaria Boscolo Galazzo
- Department of Engineering for Innovation Medicine, University of Verona, 37134 Verona, Italy
| |
Collapse
|
6
|
Soman K, Rose PW, Morris JH, Akbas RE, Smith B, Peetoom B, Villouta-Reyes C, Cerono G, Shi Y, Rizk-Jackson A, Israni S, Nelson CA, Huang S, Baranzini SE. Biomedical knowledge graph-optimized prompt generation for large language models. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae560. [PMID: 39288310 PMCID: PMC11441322 DOI: 10.1093/bioinformatics/btae560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 08/29/2024] [Accepted: 09/15/2024] [Indexed: 09/19/2024]
Abstract
MOTIVATION Large language models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains such as biomedicine. Solutions such as pretraining and domain-specific fine-tuning add substantial computational overhead, requiring further domain-expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo, and GPT-4, to generate meaningful biomedical text rooted in established knowledge. RESULTS Compared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain-specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion. AVAILABILITY AND IMPLEMENTATION SPOKE KG can be accessed at https://spoke.rbvi.ucsf.edu/neighborhood.html. It can also be accessed using REST-API (https://spoke.rbvi.ucsf.edu/swagger/). KG-RAG code is made available at https://github.com/BaranziniLab/KG_RAG. Biomedical benchmark datasets used in this study are made available to the research community in the same GitHub repository.
Collapse
Affiliation(s)
- Karthik Soman
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Peter W Rose
- San Diego Supercomputer Center, University of California, San Diego, CA 92093, United States
| | - John H Morris
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94158, United States
| | - Rabia E Akbas
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Brett Smith
- Institute for Systems Biology, Seattle, WA 98109, United States
| | - Braian Peetoom
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Catalina Villouta-Reyes
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Gabriel Cerono
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Yongmei Shi
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA 94158, United States
| | - Angela Rizk-Jackson
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA 94158, United States
| | - Sharat Israni
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA 94158, United States
| | - Charlotte A Nelson
- Mate Bioservices, Inc. Swallowtail Ct., Brisbane, CA 94005, United States
| | - Sui Huang
- Institute for Systems Biology, Seattle, WA 98109, United States
| | - Sergio E Baranzini
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, United States
| |
Collapse
|
7
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
8
|
Johnson R, Li MM, Noori A, Queen O, Zitnik M. Graph Artificial Intelligence in Medicine. Annu Rev Biomed Data Sci 2024; 7:345-368. [PMID: 38749465 PMCID: PMC11344018 DOI: 10.1146/annurev-biodatasci-110723-024625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks and graph transformer architectures, stands out for its capability to capture intricate relationships and structures within clinical datasets. With diverse data-from patient records to imaging-graph AI models process data holistically by viewing modalities and entities within them as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters and with minimal to no retraining. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on relational datasets, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph AI models integrate diverse data modalities through pretraining, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way toward clinically meaningful predictions.
Collapse
Affiliation(s)
- Ruth Johnson
- Berkowitz Family Living Laboratory, Harvard Medical School, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Ayush Noori
- Department of Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences, Allston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Owen Queen
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Marinka Zitnik
- Harvard Data Science Initiative, Cambridge, Massachusetts, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| |
Collapse
|
9
|
2023 Beijing Health Data Science Summit. HEALTH DATA SCIENCE 2024; 4:0112. [PMID: 38854991 PMCID: PMC11157085 DOI: 10.34133/hds.0112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/05/2023] [Indexed: 06/11/2024]
Abstract
The 5th annual Beijing Health Data Science Summit, organized by the National Institute of Health Data Science at Peking University, recently concluded with resounding success. This year, the summit aimed to foster collaboration among researchers, practitioners, and stakeholders in the field of health data science to advance the use of data for better health outcomes. One significant highlight of this year's summit was the introduction of the Abstract Competition, organized by Health Data Science, a Science Partner Journal, which focused on the use of cutting-edge data science methodologies, particularly the application of artificial intelligence in the healthcare scenarios. The competition provided a platform for researchers to showcase their groundbreaking work and innovations. In total, the summit received 61 abstract submissions. Following a rigorous evaluation process by the Abstract Review Committee, eight exceptional abstracts were selected to compete in the final round and give presentations in the Abstract Competition. The winners of the Abstract Competition are as follows:•First Prize: "Interpretable Machine Learning for Predicting Outcomes of Childhood Kawasaki Disease: Electronic Health Record Analysis" presented by researchers from the Chinese Academy of Medical Sciences, Peking Union Medical College, and Chongqing Medical University (presenter Yifan Duan).•Second Prize: "Survival Disparities among Mobility Patterns of Patients with Cancer: A Population-Based Study" presented by a team from Peking University (presenter Fengyu Wen).•Third Prize: "Deep Learning-Based Real-Time Predictive Model for the Development of Acute Stroke" presented by researchers from Beijing Tiantan Hospital (presenter Lan Lan). We extend our heartfelt gratitude to the esteemed panel of judges whose expertise and dedication ensured the fairness and quality of the competition. The judging panel included Jiebo Luo from the University of Rochester (chair), Shenda Hong from Peking University, Xiaozhong Liu from Worcester Polytechnic Institute, Liu Yang from Hong Kong Baptist University, Ma Jianzhu from Tsinghua University, Ting Ma from Harbin Institute of Technology, and Jian Tang from Mila-Quebec Artificial Intelligence Institute. We wish to convey our deep appreciation to Zixuan He and Haoyang Hong for their invaluable assistance in the meticulous planning and execution of the event. As the 2023 Beijing Health Data Science Summit comes to a close, we look forward to welcoming all participants to join us in 2024. Together, we will continue to advance the frontiers of health data science and work toward a healthier future for all.
Collapse
|
10
|
Chitnis T, Qureshi F, Gehman VM, Becich M, Bove R, Cree BAC, Gomez R, Hauser SL, Henry RG, Katrib A, Lokhande H, Paul A, Caillier SJ, Santaniello A, Sattarnezhad N, Saxena S, Weiner H, Yano H, Baranzini SE. Inflammatory and neurodegenerative serum protein biomarkers increase sensitivity to detect clinical and radiographic disease activity in multiple sclerosis. Nat Commun 2024; 15:4297. [PMID: 38769309 PMCID: PMC11106245 DOI: 10.1038/s41467-024-48602-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 05/07/2024] [Indexed: 05/22/2024] Open
Abstract
The multifaceted nature of multiple sclerosis requires quantitative biomarkers that can provide insights related to diverse physiological pathways. To this end, proteomic analysis of deeply-phenotyped serum samples, biological pathway modeling, and network analysis were performed to elucidate inflammatory and neurodegenerative processes, identifying sensitive biomarkers of multiple sclerosis disease activity. Here, we evaluated the concentrations of > 1400 serum proteins in 630 samples from three multiple sclerosis cohorts for association with clinical and radiographic new disease activity. Twenty proteins were associated with increased clinical and radiographic multiple sclerosis disease activity for inclusion in a custom assay panel. Serum neurofilament light chain showed the strongest univariate correlation with gadolinium lesion activity, clinical relapse status, and annualized relapse rate. Multivariate modeling outperformed univariate for all endpoints. A comprehensive biomarker panel including the twenty proteins identified in this study could serve to characterize disease activity for a patient with multiple sclerosis.
Collapse
Affiliation(s)
| | | | | | | | - Riley Bove
- Department of Neurology. Weill Institute for Neurosciences. University of California San Francisco, San Francisco, CA, USA
| | - Bruce A C Cree
- Department of Neurology. Weill Institute for Neurosciences. University of California San Francisco, San Francisco, CA, USA
| | - Refujia Gomez
- Department of Neurology. Weill Institute for Neurosciences. University of California San Francisco, San Francisco, CA, USA
| | - Stephen L Hauser
- Department of Neurology. Weill Institute for Neurosciences. University of California San Francisco, San Francisco, CA, USA
| | - Roland G Henry
- Department of Neurology. Weill Institute for Neurosciences. University of California San Francisco, San Francisco, CA, USA
| | | | | | - Anu Paul
- Brigham and Women's Hospital, Boston, MA, USA
| | - Stacy J Caillier
- Department of Neurology. Weill Institute for Neurosciences. University of California San Francisco, San Francisco, CA, USA
| | - Adam Santaniello
- Department of Neurology. Weill Institute for Neurosciences. University of California San Francisco, San Francisco, CA, USA
| | | | | | | | - Hajime Yano
- Brigham and Women's Hospital, Boston, MA, USA
| | - Sergio E Baranzini
- Department of Neurology. Weill Institute for Neurosciences. University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
11
|
Di Maria A, Bellomo L, Billeci F, Cardillo A, Alaimo S, Ferragina P, Ferro A, Pulvirenti A. NetMe 2.0: a web-based platform for extracting and modeling knowledge from biomedical literature as a labeled graph. Bioinformatics 2024; 40:btae194. [PMID: 38597890 PMCID: PMC11074003 DOI: 10.1093/bioinformatics/btae194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/29/2024] [Accepted: 04/08/2024] [Indexed: 04/11/2024] Open
Abstract
MOTIVATION The rapid increase of bio-medical literature makes it harder and harder for scientists to keep pace with the discoveries on which they build their studies. Therefore, computational tools have become more widespread, among which network analysis plays a crucial role in several life-science contexts. Nevertheless, building correct and complete networks about some user-defined biomedical topics on top of the available literature is still challenging. RESULTS We introduce NetMe 2.0, a web-based platform that automatically extracts relevant biomedical entities and their relations from a set of input texts-i.e. in the form of full-text or abstract of PubMed Central's papers, free texts, or PDFs uploaded by users-and models them as a BioMedical Knowledge Graph (BKG). NetMe 2.0 also implements an innovative Retrieval Augmented Generation module (Graph-RAG) that works on top of the relationships modeled by the BKG and allows the distilling of well-formed sentences that explain their content. The experimental results show that NetMe 2.0 can infer comprehensive and reliable biological networks with significant Precision-Recall metrics when compared to state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION https://netme.click/.
Collapse
Affiliation(s)
- Antonio Di Maria
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | | | - Fabrizio Billeci
- Department of Computer Science, University of Catania, Catania, 95125, Italy
| | - Alfio Cardillo
- Department of Computer Science, University of Catania, Catania, 95125, Italy
| | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Paolo Ferragina
- Department of Computer Science, University of Pisa, Pisa, 56126 , Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| |
Collapse
|
12
|
Yang JJ, Goff A, Wild DJ, Ding Y, Annis A, Kerber R, Foote B, Passi A, Duerksen JL, London S, Puhl AC, Lane TR, Braunstein M, Waddell SJ, Ekins S. Computational drug repositioning identifies niclosamide and tribromsalan as inhibitors of Mycobacterium tuberculosis and Mycobacterium abscessus. Tuberculosis (Edinb) 2024; 146:102500. [PMID: 38432118 PMCID: PMC10978224 DOI: 10.1016/j.tube.2024.102500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/20/2024] [Accepted: 02/24/2024] [Indexed: 03/05/2024]
Abstract
Tuberculosis (TB) is still a major global health challenge, killing over 1.5 million people each year, and hence, there is a need to identify and develop novel treatments for Mycobacterium tuberculosis (M. tuberculosis). The prevalence of infections caused by nontuberculous mycobacteria (NTM) is also increasing and has overtaken TB cases in the United States and much of the developed world. Mycobacterium abscessus (M. abscessus) is one of the most frequently encountered NTM and is difficult to treat. We describe the use of drug-disease association using a semantic knowledge graph approach combined with machine learning models that has enabled the identification of several molecules for testing anti-mycobacterial activity. We established that niclosamide (M. tuberculosis IC90 2.95 μM; M. abscessus IC90 59.1 μM) and tribromsalan (M. tuberculosis IC90 76.92 μM; M. abscessus IC90 147.4 μM) inhibit M. tuberculosis and M. abscessus in vitro. To investigate the mode of action, we determined the transcriptional response of M. tuberculosis and M. abscessus to both compounds in axenic log phase, demonstrating a broad effect on gene expression that differed from known M. tuberculosis inhibitors. Both compounds elicited transcriptional responses indicative of respiratory pathway stress and the dysregulation of fatty acid metabolism.
Collapse
Affiliation(s)
- Jeremy J Yang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA; Data2Discovery, Inc., Bloomington, IN, USA; Department of Internal Medicine Translational Informatics Division, University of New Mexico, Albuquerque, NM, USA
| | - Aaron Goff
- Department of Global Health and Infection, Brighton & Sussex Medical School, University of Sussex, UK
| | - David J Wild
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA; Data2Discovery, Inc., Bloomington, IN, USA
| | - Ying Ding
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA; Data2Discovery, Inc., Bloomington, IN, USA; School of Information, Dell Medical School, University of Texas, Austin, TX, USA
| | - Ayano Annis
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina at Chapel Hill, NC, 27599, USA
| | | | | | - Anurag Passi
- Department of Pediatrics, UC San Diego, San Diego, CA, USA
| | | | | | - Ana C Puhl
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Thomas R Lane
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Miriam Braunstein
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina at Chapel Hill, NC, 27599, USA
| | - Simon J Waddell
- Department of Global Health and Infection, Brighton & Sussex Medical School, University of Sussex, UK
| | - Sean Ekins
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| |
Collapse
|
13
|
Kim JB, Kim SJ, So M, Kim DK, Noh HR, Kim BJ, Choi YR, Kim D, Koo H, Kim T, Woo HG, Park SM. Artificial intelligence-driven drug repositioning uncovers efavirenz as a modulator of α-synuclein propagation: Implications in Parkinson's disease. Biomed Pharmacother 2024; 174:116442. [PMID: 38513596 DOI: 10.1016/j.biopha.2024.116442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/09/2024] [Accepted: 03/15/2024] [Indexed: 03/23/2024] Open
Abstract
Parkinson's disease (PD) is a complex neurodegenerative disorder with an unclear etiology. Despite significant research efforts, developing disease-modifying treatments for PD remains a major unmet medical need. Notably, drug repositioning is becoming an increasingly attractive direction in drug discovery, and computational approaches offer a relatively quick and resource-saving method for identifying testable hypotheses that promote drug repositioning. We used an artificial intelligence (AI)-based drug repositioning strategy to screen an extensive compound library and identify potential therapeutic agents for PD. Our AI-driven analysis revealed that efavirenz and nevirapine, approved for treating human immunodeficiency virus infection, had distinct profiles, suggesting their potential effects on PD pathophysiology. Among these, efavirenz attenuated α-synuclein (α-syn) propagation and associated neuroinflammation in the brain of preformed α-syn fibrils-injected A53T α-syn Tg mice and α-syn propagation and associated behavioral changes in the C. elegans BiFC model. Through in-depth molecular investigations, we found that efavirenz can modulate cholesterol metabolism and mitigate α-syn propagation, a key pathological feature implicated in PD progression by regulating CYP46A1. This study opens new avenues for further investigation into the mechanisms underlying PD pathology and the exploration of additional drug candidates using advanced computational methodologies.
Collapse
Affiliation(s)
- Jae-Bong Kim
- Department of Pharmacology, Ajou University School of Medicine, Suwon, Korea; Center for Convergence Research of Neurological Disorders, Ajou University School of Medicine, Suwon, Korea; Neuroscience Graduate Program, Department of Biomedical Sciences, Ajou University School of Medicine, Suwon, Korea
| | - Soo-Jeong Kim
- Center for Convergence Research of Neurological Disorders, Ajou University School of Medicine, Suwon, Korea
| | | | - Dong-Kyu Kim
- Center for Convergence Research of Neurological Disorders, Ajou University School of Medicine, Suwon, Korea
| | - Hye Rin Noh
- Department of Pharmacology, Ajou University School of Medicine, Suwon, Korea; Center for Convergence Research of Neurological Disorders, Ajou University School of Medicine, Suwon, Korea; Neuroscience Graduate Program, Department of Biomedical Sciences, Ajou University School of Medicine, Suwon, Korea
| | - Beom Jin Kim
- Department of Pharmacology, Ajou University School of Medicine, Suwon, Korea; Center for Convergence Research of Neurological Disorders, Ajou University School of Medicine, Suwon, Korea; Neuroscience Graduate Program, Department of Biomedical Sciences, Ajou University School of Medicine, Suwon, Korea
| | - Yu Ree Choi
- Center for Convergence Research of Neurological Disorders, Ajou University School of Medicine, Suwon, Korea
| | - Doyoon Kim
- Center for Convergence Research of Neurological Disorders, Ajou University School of Medicine, Suwon, Korea; Department of Physiology, Ajou University School of Medicine, Suwon, Korea
| | | | | | - Hyun Goo Woo
- Center for Convergence Research of Neurological Disorders, Ajou University School of Medicine, Suwon, Korea; Department of Physiology, Ajou University School of Medicine, Suwon, Korea
| | - Sang Myun Park
- Department of Pharmacology, Ajou University School of Medicine, Suwon, Korea; Center for Convergence Research of Neurological Disorders, Ajou University School of Medicine, Suwon, Korea; Neuroscience Graduate Program, Department of Biomedical Sciences, Ajou University School of Medicine, Suwon, Korea.
| |
Collapse
|
14
|
Saravanan KS, Satish KS, Saraswathy GR, Kuri U, Vastrad SJ, Giri R, Dsouza PL, Kumar AP, Nair G. Innovative target mining stratagems to navigate drug repurposing endeavours. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:303-355. [PMID: 38789185 DOI: 10.1016/bs.pmbts.2024.03.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
The conventional theory linking a single gene with a particular disease and a specific drug contributes to the dwindling success rates of traditional drug discovery. This requires a substantial shift focussing on contemporary drug design or drug repurposing, which entails linking multiple genes to diverse physiological or pathological pathways and drugs. Lately, drug repurposing, the art of discovering new/unlabelled indications for existing drugs or candidates in clinical trials, is gaining attention owing to its success rates. The rate-limiting phase of this strategy lies in target identification, which is generally driven through disease-centric and/or drug-centric approaches. The disease-centric approach is based on exploration of crucial biomolecules such as genes or proteins underlying pathological cascades of the disease of interest. Investigating these pathological interplays aids in the identification of potential drug targets that can be leveraged for novel therapeutic interventions. The drug-centric approach involves various strategies such as exploring the mechanism of adverse drug reactions that can unearth potential targets, as these untoward reactions might be considered desirable therapeutic actions in other disease conditions. Currently, artificial intelligence is an emerging robust tool that can be used to translate the aforementioned intricate biological networks to render interpretable data for extracting precise molecular targets. Integration of multiple approaches, big data analytics, and clinical corroboration are essential for successful target mining. This chapter highlights the contemporary strategies steering target identification and diverse frameworks for drug repurposing. These strategies are illustrated through case studies curated from recent drug repurposing research inclined towards neurodegenerative diseases, cancer, infections, immunological, and cardiovascular disorders.
Collapse
Affiliation(s)
- Kamatchi Sundara Saravanan
- Department of Pharmacognosy, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India
| | - Kshreeraja S Satish
- Department of Pharmacy Practice, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India
| | - Ganesan Rajalekshmi Saraswathy
- Department of Pharmacy Practice, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India.
| | - Ushnaa Kuri
- Department of Pharmacy Practice, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India
| | - Soujanya J Vastrad
- Department of Pharmacy Practice, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India
| | - Ritesh Giri
- Department of Pharmacy Practice, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India
| | - Prizvan Lawrence Dsouza
- Department of Pharmacy Practice, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India
| | - Adusumilli Pramod Kumar
- Department of Pharmacy Practice, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India
| | - Gouri Nair
- Department of Pharmacology, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India
| |
Collapse
|
15
|
da Silva Rosa SC, Barzegar Behrooz A, Guedes S, Vitorino R, Ghavami S. Prioritization of genes for translation: a computational approach. Expert Rev Proteomics 2024; 21:125-147. [PMID: 38563427 DOI: 10.1080/14789450.2024.2337004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 02/21/2024] [Indexed: 04/04/2024]
Abstract
INTRODUCTION Gene identification for genetic diseases is critical for the development of new diagnostic approaches and personalized treatment options. Prioritization of gene translation is an important consideration in the molecular biology field, allowing researchers to focus on the most promising candidates for further investigation. AREAS COVERED In this paper, we discussed different approaches to prioritize genes for translation, including the use of computational tools and machine learning algorithms, as well as experimental techniques such as knockdown and overexpression studies. We also explored the potential biases and limitations of these approaches and proposed strategies to improve the accuracy and reliability of gene prioritization methods. Although numerous computational methods have been developed for this purpose, there is a need for computational methods that incorporate tissue-specific information to enable more accurate prioritization of candidate genes. Such methods should provide tissue-specific predictions, insights into underlying disease mechanisms, and more accurate prioritization of genes. EXPERT OPINION Using advanced computational tools and machine learning algorithms to prioritize genes, we can identify potential targets for therapeutic intervention of complex diseases. This represents an up-and-coming method for drug development and personalized medicine.
Collapse
Affiliation(s)
- Simone C da Silva Rosa
- Department of Human Anatomy and Cell Science, Max Rady College of Medicine, Rady Faculty of Health Science, University of Manitoba, Winnipeg, Canada
| | - Amir Barzegar Behrooz
- Department of Human Anatomy and Cell Science, Max Rady College of Medicine, Rady Faculty of Health Science, University of Manitoba, Winnipeg, Canada
- Electrophysiology Research Center, Neuroscience Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Sofia Guedes
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Rui Vitorino
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro, Portugal
- Department of Medical Sciences, Institute of Biomedicine-iBiMED, University of Aveiro, Aveiro, Portugal
- UnIC@RISE, Department of Surgery and Physiology, Faculty of Medicine of the University of Porto, Porto, Portugal
| | - Saeid Ghavami
- Department of Human Anatomy and Cell Science, Max Rady College of Medicine, Rady Faculty of Health Science, University of Manitoba, Winnipeg, Canada
- Faculty of Medicine in Zabrze, Academia of Silesia, Katowice, Poland
- Research Institute of Oncology and Hematology, Cancer Care Manitoba, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
16
|
Ghandikota SK, Jegga AG. Application of artificial intelligence and machine learning in drug repurposing. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:171-211. [PMID: 38789178 DOI: 10.1016/bs.pmbts.2024.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
The purpose of drug repurposing is to leverage previously approved drugs for a particular disease indication and apply them to another disease. It can be seen as a faster and more cost-effective approach to drug discovery and a powerful tool for achieving precision medicine. In addition, drug repurposing can be used to identify therapeutic candidates for rare diseases and phenotypic conditions with limited information on disease biology. Machine learning and artificial intelligence (AI) methodologies have enabled the construction of effective, data-driven repurposing pipelines by integrating and analyzing large-scale biomedical data. Recent technological advances, especially in heterogeneous network mining and natural language processing, have opened up exciting new opportunities and analytical strategies for drug repurposing. In this review, we first introduce the challenges in repurposing approaches and highlight some success stories, including those during the COVID-19 pandemic. Next, we review some existing computational frameworks in the literature, organized on the basis of the type of biomedical input data analyzed and the computational algorithms involved. In conclusion, we outline some exciting new directions that drug repurposing research may take, as pioneered by the generative AI revolution.
Collapse
Affiliation(s)
- Sudhir K Ghandikota
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Anil G Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.
| |
Collapse
|
17
|
Wang Y, Yang Z, Yao Q. Accurate and interpretable drug-drug interaction prediction enabled by knowledge subgraph learning. COMMUNICATIONS MEDICINE 2024; 4:59. [PMID: 38548835 PMCID: PMC10978847 DOI: 10.1038/s43856-024-00486-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 03/18/2024] [Indexed: 04/01/2024] Open
Abstract
BACKGROUND Discovering potential drug-drug interactions (DDIs) is a long-standing challenge in clinical treatments and drug developments. Recently, deep learning techniques have been developed for DDI prediction. However, they generally require a huge number of samples, while known DDIs are rare. METHODS In this work, we present KnowDDI, a graph neural network-based method that addresses the above challenge. KnowDDI enhances drug representations by adaptively leveraging rich neighborhood information from large biomedical knowledge graphs. Then, it learns a knowledge subgraph for each drug-pair to interpret the predicted DDI, where each of the edges is associated with a connection strength indicating the importance of a known DDI or resembling strength between a drug-pair whose connection is unknown. Thus, the lack of DDIs is implicitly compensated by the enriched drug representations and propagated drug similarities. RESULTS Here we show the evaluation results of KnowDDI on two benchmark DDI datasets. Results show that KnowDDI obtains the state-of-the-art prediction performance with better interpretability. We also find that KnowDDI suffers less than existing works given a sparser knowledge graph. This indicates that the propagated drug similarities play a more important role in compensating for the lack of DDIs when the drug representations are less enriched. CONCLUSIONS KnowDDI nicely combines the efficiency of deep learning techniques and the rich prior knowledge in biomedical knowledge graphs. As an original open-source tool, KnowDDI can help detect possible interactions in a broad range of relevant interaction prediction tasks, such as protein-protein interactions, drug-target interactions and disease-gene interactions, eventually promoting the development of biomedicine and healthcare.
Collapse
Affiliation(s)
| | - Zaifei Yang
- Baidu Research, Baidu Inc., Beijing, China
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Quanming Yao
- Department of Electronic Engineering, Tsinghua University, Beijing, China.
| |
Collapse
|
18
|
Abstract
Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small-molecule drugs. AI technologies, such as generative chemistry, machine learning, and multiproperty optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.
Collapse
Affiliation(s)
- Catrin Hasselgren
- Safety Assessment, Genentech, Inc., South San Francisco, California, USA
| | - Tudor I Oprea
- Expert Systems Inc., San Diego, California, USA;
- Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, New Mexico, USA
| |
Collapse
|
19
|
Zietz M, Himmelstein DS, Kloster K, Williams C, Nagle MW, Greene CS. The probability of edge existence due to node degree: a baseline for network-based predictions. Gigascience 2024; 13:giae001. [PMID: 38323677 PMCID: PMC10848215 DOI: 10.1093/gigascience/giae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 09/25/2023] [Accepted: 01/02/2024] [Indexed: 02/08/2024] Open
Abstract
Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections using network permutation to generate features that depend only on degree. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Researchers seeking to predict new or missing edges in biological networks should use our permutation approach to obtain a baseline for performance that may be nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).
Collapse
Affiliation(s)
- Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Related Sciences, Denver, CO 80202, USA
| | - Kyle Kloster
- Carbon, Inc., Redwood City, CA 94063, USA
- Department of Computer Science, North Carolina State University, Raleigh, NC 27606, USA
| | - Christopher Williams
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Michael W Nagle
- Internal Medicine Research Unit, Pfizer Worldwide Research, Development, and Medical, Cambridge, MA 02139, USA
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc., Cambridge, MA 02139, USA
- Human Biology Integration Foundation, Deep Human Biology Learning, Eisai Inc., Cambridge, MA 02140, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
20
|
Beasley JMT, Korn DR, Tucker NN, Alves ETM, Muratov EN, Bizon C, Tropsha A. ExEmPLAR (Extracting, Exploring, and Embedding Pathways Leading to Actionable Research): a user-friendly interface for knowledge graph mining. Bioinformatics 2024; 40:btad779. [PMID: 38175789 PMCID: PMC10812875 DOI: 10.1093/bioinformatics/btad779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 12/05/2023] [Accepted: 01/02/2024] [Indexed: 01/06/2024] Open
Abstract
SUMMARY Knowledge graphs are being increasingly used in biomedical research to link large amounts of heterogenous data and facilitate reasoning across diverse knowledge sources. Wider adoption and exploration of knowledge graphs in the biomedical research community is limited by requirements to understand the underlying graph structure in terms of entity types and relationships, represented as nodes and edges, respectively, and learn specialized query languages for graph mining and exploration. We have developed a user-friendly interface dubbed ExEmPLAR (Extracting, Exploring, and Embedding Pathways Leading to Actionable Research) to aid reasoning over biomedical knowledge graphs and assist with data-driven research and hypothesis generation. We explain the key functionalities of ExEmPLAR and demonstrate its use with a case study considering the relationship of Trypanosoma cruzi, the etiological agent of Chagas disease, to frequently associated cardiovascular conditions. AVAILABILITY AND IMPLEMENTATION ExEmPLAR is freely accessible at https://www.exemplar.mml.unc.edu/. For code and instructions for the using the application, see: https://github.com/beasleyjonm/AOP-COP-Path-Extractor.
Collapse
Affiliation(s)
- Jon-Michael T Beasley
- Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Daniel R Korn
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Nyssa N Tucker
- Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Erick T M Alves
- Department of Pharmacy, University of São Paulo, São Paulo, SP 05508, Brazil
| | - Eugene N Muratov
- Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Chris Bizon
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Alexander Tropsha
- Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
21
|
Frazer SA, Baghbanzadeh M, Rahnavard A, Crandall KA, Oakley TH. Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD). Gigascience 2024; 13:giae073. [PMID: 39460934 PMCID: PMC11512451 DOI: 10.1093/gigascience/giae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 06/25/2024] [Accepted: 09/01/2024] [Indexed: 10/28/2024] Open
Abstract
BACKGROUND Predicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families, including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax-the wavelength of maximum absorbance-which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. RESULTS Here, we report a newly compiled database of all heterologously expressed opsin genes with λmax phenotypes that we call the Visual Physiology Opsin Database (VPOD). VPOD_1.0 contains 864 unique opsin genotypes and corresponding λmax phenotypes collected across all animals from 73 separate publications. We use VPOD data and deepBreaks to show regression-based machine learning (ML) models often reliably predict λmax, account for nonadditive effects of mutations on function, and identify functionally critical amino acid sites. CONCLUSION The ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism's ecological niche, and may be used more broadly for de novo protein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes.
Collapse
Affiliation(s)
- Seth A Frazer
- Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, California 93106, USA
| | - Mahdi Baghbanzadeh
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA
| | - Ali Rahnavard
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA
| | - Keith A Crandall
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA
- Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20012, USA
| | - Todd H Oakley
- Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, California 93106, USA
| |
Collapse
|
22
|
Cruciani F, Aparo A, Brusini L, Combi C, Storti SF, Giugno R, Menegaz G, Boscolo Galazzo I. Identifying the joint signature of brain atrophy and gene variant scores in Alzheimer's Disease. J Biomed Inform 2024; 149:104569. [PMID: 38104851 DOI: 10.1016/j.jbi.2023.104569] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 11/20/2023] [Accepted: 12/07/2023] [Indexed: 12/19/2023]
Abstract
The joint modeling of genetic data and brain imaging information allows for determining the pathophysiological pathways of neurodegenerative diseases such as Alzheimer's disease (AD). This task has typically been approached using mass-univariate methods that rely on a complete set of Single Nucleotide Polymorphisms (SNPs) to assess their association with selected image-derived phenotypes (IDPs). However, such methods are prone to multiple comparisons bias and, most importantly, fail to account for potential cross-feature interactions, resulting in insufficient detection of significant associations. Ways to overcome these limitations while reducing the number of traits aim at conveying genetic information at the gene level and capturing the integrated genetic effects of a set of genetic variants, rather than looking at each SNP individually. Their associations with brain IDPs are still largely unexplored in the current literature, though they can uncover new potential genetic determinants for brain modulations in the AD continuum. In this work, we explored an explainable multivariate model to analyze the genetic basis of the grey matter modulations, relying on the AD Neuroimaging Initiative (ADNI) phase 3 dataset. Cortical thicknesses and subcortical volumes derived from T1-weighted Magnetic Resonance were considered to describe the imaging phenotypes. At the same time the genetic counterpart was represented by gene variant scores extracted by the Sequence Kernel Association Test (SKAT) filtering model. Moreover, transcriptomic analysis was carried on to assess the expression of the resulting genes in the main brain structures as a form of validation. Results highlighted meaningful genotype-phenotype interactionsas defined by three latent components showing a significant difference in the projection scores between patients and controls. Among the significant associations, the model highlighted EPHX1 and BCAS1 gene variant scores involved in neurodegenerative and myelination processes, hence relevant for AD. In particular, the first was associated with decreased subcortical volumes and the second with decreasedtemporal lobe thickness. Noteworthy, BCAS1 is particularly expressed in the dentate gyrus. Overall, the proposed approach allowed capturing genotype-phenotype interactions in a restricted study cohort that was confirmed by transcriptomic analysis, offering insights into the underlying mechanisms of neurodegeneration in AD in line with previous findings and suggesting new potential disease biomarkers.
Collapse
Affiliation(s)
- Federica Cruciani
- Department of Engineering for Innovation Medicine, University of Verona, Verona, Italy.
| | - Antonino Aparo
- Department of Computer Science, University of Verona, Verona, Italy
| | - Lorenza Brusini
- Department of Engineering for Innovation Medicine, University of Verona, Verona, Italy
| | - Carlo Combi
- Department of Computer Science, University of Verona, Verona, Italy
| | - Silvia F Storti
- Department of Engineering for Innovation Medicine, University of Verona, Verona, Italy
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Verona, Italy
| | - Gloria Menegaz
- Department of Engineering for Innovation Medicine, University of Verona, Verona, Italy
| | | |
Collapse
|
23
|
Boudin M, Diallo G, Drancé M, Mougin F. The OREGANO knowledge graph for computational drug repurposing. Sci Data 2023; 10:871. [PMID: 38057380 PMCID: PMC10700660 DOI: 10.1038/s41597-023-02757-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 11/16/2023] [Indexed: 12/08/2023] Open
Abstract
Drug repositioning is a faster and more affordable solution than traditional drug discovery approaches. From this perspective, computational drug repositioning using knowledge graphs is a very promising direction. Knowledge graphs constructed from drug data and information can be used to generate hypotheses (molecule/drug - target links) through link prediction using machine learning algorithms. However, it remains rare to have a holistically constructed knowledge graph using the broadest possible features and drug characteristics, which is freely available to the community. The OREGANO knowledge graph aims at filling this gap. The purpose of this paper is to present the OREGANO knowledge graph, which includes natural compounds related data. The graph was developed from scratch by retrieving data directly from the knowledge sources to be integrated. We therefore designed the expected graph model and proposed a method for merging nodes between the different knowledge sources, and finally, the data were cleaned. The knowledge graph, as well as the source codes for the ETL process, are openly available on the GitHub of the OREGANO project ( https://gitub.u-bordeaux.fr/erias/oregano ).
Collapse
Affiliation(s)
- Marina Boudin
- AHeaD team, Bordeaux Population Health Inserm U1219, Univ. Bordeaux, F-33000, Bordeaux, France.
| | - Gayo Diallo
- AHeaD team, Bordeaux Population Health Inserm U1219, Univ. Bordeaux, F-33000, Bordeaux, France
| | - Martin Drancé
- AHeaD team, Bordeaux Population Health Inserm U1219, Univ. Bordeaux, F-33000, Bordeaux, France
| | - Fleur Mougin
- AHeaD team, Bordeaux Population Health Inserm U1219, Univ. Bordeaux, F-33000, Bordeaux, France
| |
Collapse
|
24
|
Huang L, Chen Q, Lan W. Predicting drug-drug interactions based on multi-view and multichannel attention deep learning. Health Inf Sci Syst 2023; 11:50. [PMID: 37941825 PMCID: PMC10628064 DOI: 10.1007/s13755-023-00250-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/25/2023] [Indexed: 11/10/2023] Open
Abstract
Predicting drug-drug interactions (DDIs) has become a major concern in the drug research field because it helps explore the pharmacological function of drugs and enables the development of new therapeutic drugs. Existing prediction methods simply integrate multiple drug attributes or perform tasks on a biomedical knowledge graph (KG). Though effective, few methods can fully utilize multi-source drug data information. In this paper, a multi-view and multichannel attention deep learning (MMADL) model is proposed, which not only extracts rich drug features containing both drug attributes and drug-related entity information from multi-source databases, but also considers the consistency and complementarity of different drug feature representation learning approaches to improve the effectiveness and accuracy of DDI prediction. A single-layer perceptron encoder is applied to encode multi-source drug information to obtain multi-view drug representation vectors in the same linear space. Then, the multichannel attention mechanism is introduced to obtain the attention weight by adaptively learning the importance of drug features according to their contributions to DDI prediction. Further, the representation vectors of multi-view drug pairs with attention weights are used as inputs of the deep neural network to predict potential DDI. The accuracy and precision-recall curves of MMADL are 93.05 and 95.94, respectively. The results indicate that the proposed method outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Liyu Huang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006 China
| | - Qingfeng Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 China
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, 3086 Australia
| | - Wei Lan
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 China
| |
Collapse
|
25
|
Yu S, Wang Z, Nan J, Li A, Yang X, Tang X. Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation. JMIR Form Res 2023; 7:e50998. [PMID: 37966892 PMCID: PMC10687686 DOI: 10.2196/50998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/28/2023] [Accepted: 10/27/2023] [Indexed: 11/16/2023] Open
Abstract
BACKGROUND Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may enhance research on schizophrenia pathology and lead to the identification of new treatment targets. OBJECTIVE The aim of this study was to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in a protein-protein interaction (PPI)-keywords (PPIK) network and understand the complex disease-causing property. Consequently, a PPIK-based metagraph representation approach is proposed. METHODS To enrich the PPI network, we integrated keywords describing protein properties and constructed a PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. We then trained and optimized our model using random forest (RF), extreme gradient boosting, light gradient boosting machine, and logistic regression models. RESULTS Comprehensive experiments demonstrated the good performance of our proposed method with an area under the receiver operating characteristic curve (AUC) value between 0.72 and 0.76. Our model also outperformed baseline methods for overall disease protein prediction, including the random walk with restart, average commute time, and Katz models. Compared with the PPI network constructed from the baseline models, complementation of keywords in the PPIK network improved the performance (AUC) by 0.08 on average, and the metagraph-based method improved the AUC by 0.30 on average compared with that of the baseline methods. According to the comprehensive performance of the four models, RF was selected as the best model for disease protein prediction, with precision, recall, F1-score, and AUC values of 0.76, 0.73, 0.72, and 0.76, respectively. We transformed these proteins to their encoding gene IDs and identified the top 20 genes as the most probable schizophrenia-risk genes, including the EYA3, CNTN4, HSPA8, LRRK2, and AFP genes. We further validated these outcomes against metagraph features and evidence from the literature, performed a features analysis, and exploited evidence from the literature to interpret the correlation between the predicted genes and diseases. CONCLUSIONS The metagraph representation based on the PPIK network framework was found to be effective for potential schizophrenia risk genes identification. The results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia.
Collapse
Affiliation(s)
- Shirui Yu
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Ziyang Wang
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Jiale Nan
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Aihua Li
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Xuemei Yang
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Xiaoli Tang
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
26
|
Ratajczak F, Joblin M, Hildebrandt M, Ringsquandl M, Falter-Braun P, Heinig M. Speos: an ensemble graph representation learning framework to predict core gene candidates for complex diseases. Nat Commun 2023; 14:7206. [PMID: 37938585 PMCID: PMC10632370 DOI: 10.1038/s41467-023-42975-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/27/2023] [Indexed: 11/09/2023] Open
Abstract
Understanding phenotype-to-genotype relationships is a grand challenge of 21st century biology with translational implications. The recently proposed "omnigenic" model postulates that effects of genetic variation on traits are mediated by core-genes and -proteins whose activities mechanistically influence the phenotype, whereas peripheral genes encode a regulatory network that indirectly affects phenotypes via core gene products. Here, we develop a positive-unlabeled graph representation-learning ensemble-approach based on a nested cross-validation to predict core-like genes for diverse diseases using Mendelian disorder genes for training. Employing mouse knockout phenotypes for external validations, we demonstrate that core-like genes display several key properties of core genes: Mouse knockouts of genes corresponding to our most confident predictions give rise to relevant mouse phenotypes at rates on par with the Mendelian disorder genes, and all candidates exhibit core gene properties like transcriptional deregulation in disease and loss-of-function intolerance. Moreover, as predicted for core genes, our candidates are enriched for drug targets and druggable proteins. In contrast to Mendelian disorder genes the new core-like genes are enriched for druggable yet untargeted gene products, which are therefore attractive targets for drug development. Interpretation of the underlying deep learning model suggests plausible explanations for our core gene predictions in form of molecular mechanisms and physical interactions. Our results demonstrate the potential of graph representation learning for the interpretation of biological complexity and pave the way for studying core gene properties and future drug development.
Collapse
Affiliation(s)
- Florin Ratajczak
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany
| | | | | | | | - Pascal Falter-Braun
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany.
- Microbe-Host Interactions, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| | - Matthias Heinig
- Institute of Computational Biology (ICB), Helmholtz Munich, Neuherberg, Germany.
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- German Centre for Cardiovascular Research (DZHK), Munich Heart Association, Partner Site Munich, Berlin, Germany.
| |
Collapse
|
27
|
Velleuer E, Domínguez-Hüttinger E, Rodríguez A, Harris LA, Carlberg C. Concepts of multi-level dynamical modelling: understanding mechanisms of squamous cell carcinoma development in Fanconi anemia. Front Genet 2023; 14:1254966. [PMID: 38028610 PMCID: PMC10652399 DOI: 10.3389/fgene.2023.1254966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/18/2023] [Indexed: 12/01/2023] Open
Abstract
Fanconi anemia (FA) is a rare disease (incidence of 1:300,000) primarily based on the inheritance of pathogenic variants in genes of the FA/BRCA (breast cancer) pathway. These variants ultimately reduce the functionality of different proteins involved in the repair of DNA interstrand crosslinks and DNA double-strand breaks. At birth, individuals with FA might present with typical malformations, particularly radial axis and renal malformations, as well as other physical abnormalities like skin pigmentation anomalies. During the first decade of life, FA mostly causes bone marrow failure due to reduced capacity and loss of the hematopoietic stem and progenitor cells. This often makes hematopoietic stem cell transplantation necessary, but this therapy increases the already intrinsic risk of developing squamous cell carcinoma (SCC) in early adult age. Due to the underlying genetic defect in FA, classical chemo-radiation-based treatment protocols cannot be applied. Therefore, detecting and treating the multi-step tumorigenesis process of SCC in an early stage, or even its progenitors, is the best option for prolonging the life of adult FA individuals. However, the small number of FA individuals makes classical evidence-based medicine approaches based on results from randomized clinical trials impossible. As an alternative, we introduce here the concept of multi-level dynamical modelling using large, longitudinally collected genome, proteome- and transcriptome-wide data sets from a small number of FA individuals. This mechanistic modelling approach is based on the "hallmarks of cancer in FA", which we derive from our unique database of the clinical history of over 750 FA individuals. Multi-omic data from healthy and diseased tissue samples of FA individuals are to be used for training constituent models of a multi-level tumorigenesis model, which will then be used to make experimentally testable predictions. In this way, mechanistic models facilitate not only a descriptive but also a functional understanding of SCC in FA. This approach will provide the basis for detecting signatures of SCCs at early stages and their precursors so they can be efficiently treated or even prevented, leading to a better prognosis and quality of life for the FA individual.
Collapse
Affiliation(s)
- Eunike Velleuer
- Department of Cytopathology, Heinrich Heine University, Düsseldorf, Germany
- Center for Child and Adolescent Health, Helios Klinikum, Krefeld, Germany
| | - Elisa Domínguez-Hüttinger
- Departamento Düsseldorf Biología Molecular y Biotecnología, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Ciudad México, Mexico
| | - Alfredo Rodríguez
- Departamento de Medicina Genómica y Toxicología Ambiental, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Ciudad México, Mexico
- Instituto Nacional de Pediatría, Ciudad México, Mexico
| | - Leonard A. Harris
- Department of Biomedical Engineering, University of Arkansas, Fayetteville, AR, United States
- Interdisciplinary Graduate Program in Cell and Molecular Biology, University of Arkansas, Fayetteville, AR, United States
- Cancer Biology Program, Winthrop P Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Carsten Carlberg
- Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, Olsztyn, Poland
- School of Medicine, Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| |
Collapse
|
28
|
Renaux A, Terwagne C, Cochez M, Tiddi I, Nowé A, Lenaerts T. A knowledge graph approach to predict and interpret disease-causing gene interactions. BMC Bioinformatics 2023; 24:324. [PMID: 37644440 PMCID: PMC10463539 DOI: 10.1186/s12859-023-05451-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 08/22/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results. RESULTS We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction. CONCLUSION Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.
Collapse
Affiliation(s)
- Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Chloé Terwagne
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
| | - Michael Cochez
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| | - Ilaria Tiddi
- Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| |
Collapse
|
29
|
Chen C, Wang J, Pan D, Wang X, Xu Y, Yan J, Wang L, Yang X, Yang M, Liu G. Applications of multi-omics analysis in human diseases. MedComm (Beijing) 2023; 4:e315. [PMID: 37533767 PMCID: PMC10390758 DOI: 10.1002/mco2.315] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/25/2023] [Accepted: 05/31/2023] [Indexed: 08/04/2023] Open
Abstract
Multi-omics usually refers to the crossover application of multiple high-throughput screening technologies represented by genomics, transcriptomics, single-cell transcriptomics, proteomics and metabolomics, spatial transcriptomics, and so on, which play a great role in promoting the study of human diseases. Most of the current reviews focus on describing the development of multi-omics technologies, data integration, and application to a particular disease; however, few of them provide a comprehensive and systematic introduction of multi-omics. This review outlines the existing technical categories of multi-omics, cautions for experimental design, focuses on the integrated analysis methods of multi-omics, especially the approach of machine learning and deep learning in multi-omics data integration and the corresponding tools, and the application of multi-omics in medical researches (e.g., cancer, neurodegenerative diseases, aging, and drug target discovery) as well as the corresponding open-source analysis tools and databases, and finally, discusses the challenges and future directions of multi-omics integration and application in precision medicine. With the development of high-throughput technologies and data integration algorithms, as important directions of multi-omics for future disease research, single-cell multi-omics and spatial multi-omics also provided a detailed introduction. This review will provide important guidance for researchers, especially who are just entering into multi-omics medical research.
Collapse
Affiliation(s)
- Chongyang Chen
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
| | - Jing Wang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Donghui Pan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xinyu Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Yuping Xu
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Junjie Yan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Lizhen Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xifei Yang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Min Yang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Gong‐Ping Liu
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
- Department of PathophysiologySchool of Basic MedicineKey Laboratory of Ministry of Education of China and Hubei Province for Neurological DisordersTongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
30
|
Gospodarska E, Ghosh Dastidar R, Carlberg C. Intervention Approaches in Studying the Response to Vitamin D 3 Supplementation. Nutrients 2023; 15:3382. [PMID: 37571318 PMCID: PMC10420637 DOI: 10.3390/nu15153382] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 07/20/2023] [Accepted: 07/28/2023] [Indexed: 08/13/2023] Open
Abstract
Vitamin D intervention studies are designed to evaluate the impact of the micronutrient vitamin D3 on health and disease. The appropriate design of studies is essential for their quality, successful execution, and interpretation. Randomized controlled trials (RCTs) are considered the "gold standard" for intervention studies. However, the most recent large-scale (up to 25,000 participants), long-term RCTs involving vitamin D3 did not provide any statistically significant primary results. This may be because they are designed similarly to RCTs of a therapeutic drug but not of a nutritional compound and that only a limited set of parameters per individual were determined. We propose an alternative concept using the segregation of study participants into different groups of responsiveness to vitamin D3 supplementation and in parallel measuring a larger set of genome-wide parameters over multiple time points. This is in accordance with recently developed mechanistic modeling approaches that do not require a large number of study participants, as in the case of statistical modeling of the results of a RCT. Our experience is based on the vitamin D intervention trials VitDmet, VitDbol, and VitDHiD, which allowed us to distinguish the study participants into high, mid, and low vitamin D responders. In particular, investigating the vulnerable group of low vitamin D responders will provide future studies with more conclusive results both on the clinical and molecular benefits of vitamin D3 supplementation. In conclusion, our approach suggests a paradigm shift towards detailed investigations of transcriptome and epigenome-wide parameters of a limited set of individuals, who, due to a longitudinal design, can act as their own controls.
Collapse
Affiliation(s)
- Emilia Gospodarska
- Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, PL-10-748 Olsztyn, Poland; (E.G.); (R.G.D.)
| | - Ranjini Ghosh Dastidar
- Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, PL-10-748 Olsztyn, Poland; (E.G.); (R.G.D.)
| | - Carsten Carlberg
- Institute of Animal Reproduction and Food Research, Polish Academy of Sciences, PL-10-748 Olsztyn, Poland; (E.G.); (R.G.D.)
- School of Medicine, Institute of Biomedicine, University of Eastern Finland, FI-70211 Kuopio, Finland
| |
Collapse
|
31
|
Han CD, Wang CC, Huang L, Chen X. MCFF-MTDDI: multi-channel feature fusion for multi-typed drug-drug interaction prediction. Brief Bioinform 2023; 24:bbad215. [PMID: 37291761 DOI: 10.1093/bib/bbad215] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/11/2023] [Accepted: 05/21/2023] [Indexed: 06/10/2023] Open
Abstract
Adverse drug-drug interactions (DDIs) have become an increasingly serious problem in the medical and health system. Recently, the effective application of deep learning and biomedical knowledge graphs (KGs) have improved the DDI prediction performance of computational models. However, the problems of feature redundancy and KG noise also arise, bringing new challenges for researchers. To overcome these challenges, we proposed a Multi-Channel Feature Fusion model for multi-typed DDI prediction (MCFF-MTDDI). Specifically, we first extracted drug chemical structure features, drug pairs' extra label features, and KG features of drugs. Then, these different features were effectively fused by a multi-channel feature fusion module. Finally, multi-typed DDIs were predicted through the fully connected neural network. To our knowledge, we are the first to integrate the extra label information into KG-based multi-typed DDI prediction; besides, we innovatively proposed a novel KG feature learning method and a State Encoder to obtain target drug pairs' KG-based features which contained more abundant and more key drug-related KG information with less noise; furthermore, a Gated Recurrent Unit-based multi-channel feature fusion module was proposed in an innovative way to yield more comprehensive feature information about drug pairs, effectively alleviating the problem of feature redundancy. We experimented with four datasets in the multi-class and the multi-label prediction tasks to comprehensively evaluate the performance of MCFF-MTDDI for predicting interactions of known-known drugs, known-new drugs and new-new drugs. In addition, we further conducted ablation studies and case studies. All the results fully demonstrated the effectiveness of MCFF-MTDDI.
Collapse
Affiliation(s)
- Chen-Di Han
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Li Huang
- The Future Laboratory, Tsinghua University, Beijing, 100084, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
- School of Science, Jiangnan University, Wuxi, 214122, China
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
32
|
Chitnis T, Qureshi F, Gehman VM, Becich M, Bove R, Cree BAC, Gomez R, Hauser SL, Henry RG, Katrib A, Lokhande H, Paul A, Caillier SJ, Santaniello A, Sattarnezhad N, Saxena S, Weiner H, Yano H, Baranzini SE. Inflammatory and neurodegenerative serum protein biomarkers increase sensitivity to detect disease activity in multiple sclerosis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.28.23291157. [PMID: 37461671 PMCID: PMC10350151 DOI: 10.1101/2023.06.28.23291157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/29/2024]
Abstract
Background/Objectives Serum proteomic analysis of deeply-phenotyped samples, biological pathway modeling and network analysis were performed to elucidate the inflammatory and neurodegenerative processes of multiple sclerosis (MS) and identify sensitive biomarkers of MS disease activity (DA). Methods Over 1100 serum proteins were evaluated in >600 samples from three MS cohorts to identify biomarkers of clinical and radiographic (gadolinium-enhancing lesions) new MS DA. Protein levels were analyzed and associated with presence of gadolinium-enhancing lesions, clinical relapse status (CRS), and annualized relapse rate (ARR) to create a custom assay panel. Results Twenty proteins were associated with increased clinical and radiographic MS DA. Serum neurofilament light chain (NfL) showed the strongest univariate correlation with radiographic and clinical DA measures. Multivariate modeling significantly outperformed univariate NfL to predict gadolinium lesion activity, CRS and ARR. Discussion These findings provide insight regarding correlations between inflammatory and neurodegenerative biomarkers and clinical and radiographic MS DA. Funding Octave Bioscience, Inc (Menlo Park, CA).
Collapse
|
33
|
Noori A, Li MM, Tan ALM, Zitnik M. Metapaths: similarity search in heterogeneous knowledge graphs via meta-paths. Bioinformatics 2023; 39:btad297. [PMID: 37140542 PMCID: PMC10209523 DOI: 10.1093/bioinformatics/btad297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 03/10/2023] [Accepted: 04/29/2023] [Indexed: 05/05/2023] Open
Abstract
SUMMARY Heterogeneous knowledge graphs (KGs) have enabled the modeling of complex systems, from genetic interaction graphs and protein-protein interaction networks to networks representing drugs, diseases, proteins, and side effects. Analytical methods for KGs rely on quantifying similarities between entities, such as nodes, in the graph. However, such methods must consider the diversity of node and edge types contained within the KG via, for example, defined sequences of entity types known as meta-paths. We present metapaths, the first R software package to implement meta-paths and perform meta-path-based similarity search in heterogeneous KGs. The metapaths package offers various built-in similarity metrics for node pair comparison by querying KGs represented as either edge or adjacency lists, as well as auxiliary aggregation methods to measure set-level relationships. Indeed, evaluation of these methods on an open-source biomedical KG recovered meaningful drug and disease-associated relationships, including those in Alzheimer's disease. The metapaths framework facilitates the scalable and flexible modeling of network similarities in KGs with applications across KG learning. AVAILABILITY AND IMPLEMENTATION The metapaths R package is available via GitHub at https://github.com/ayushnoori/metapaths and is released under MPL 2.0 (Zenodo DOI: 10.5281/zenodo.7047209). Package documentation and usage examples are available at https://www.ayushnoori.com/metapaths.
Collapse
Affiliation(s)
- Ayush Noori
- Harvard College, Cambridge, MA 02138, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Amelia L M Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|
34
|
Morris JH, Soman K, Akbas RE, Zhou X, Smith B, Meng EC, Huang CC, Cerono G, Schenk G, Rizk-Jackson A, Harroud A, Sanders L, Costes SV, Bharat K, Chakraborty A, Pico AR, Mardirossian T, Keiser M, Tang A, Hardi J, Shi Y, Musen M, Israni S, Huang S, Rose PW, Nelson CA, Baranzini SE. The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information. Bioinformatics 2023; 39:btad080. [PMID: 36759942 PMCID: PMC9940622 DOI: 10.1093/bioinformatics/btad080] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 01/17/2023] [Accepted: 02/08/2023] [Indexed: 02/11/2023] Open
Abstract
MOTIVATION Knowledge graphs (KGs) are being adopted in industry, commerce and academia. Biomedical KG presents a challenge due to the complexity, size and heterogeneity of the underlying information. RESULTS In this work, we present the Scalable Precision Medicine Open Knowledge Engine (SPOKE), a biomedical KG connecting millions of concepts via semantically meaningful relationships. SPOKE contains 27 million nodes of 21 different types and 53 million edges of 55 types downloaded from 41 databases. The graph is built on the framework of 11 ontologies that maintain its structure, enable mappings and facilitate navigation. SPOKE is built weekly by python scripts which download each resource, check for integrity and completeness, and then create a 'parent table' of nodes and edges. Graph queries are translated by a REST API and users can submit searches directly via an API or a graphical user interface. Conclusions/Significance: SPOKE enables the integration of seemingly disparate information to support precision medicine efforts. AVAILABILITY AND IMPLEMENTATION The SPOKE neighborhood explorer is available at https://spoke.rbvi.ucsf.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- John H Morris
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Karthik Soman
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Rabia E Akbas
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Xiaoyuan Zhou
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Brett Smith
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Elaine C Meng
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Conrad C Huang
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Gabriel Cerono
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Gundolf Schenk
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Angela Rizk-Jackson
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Adil Harroud
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Lauren Sanders
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Sylvain V Costes
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Krish Bharat
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Arjun Chakraborty
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Taline Mardirossian
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94143-2550, USA
| | - Michael Keiser
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94143-2550, USA
| | - Alice Tang
- UCSF-UC Berkeley Bioengineering Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Josef Hardi
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305-5479, USA
| | - Yongmei Shi
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Mark Musen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305-5479, USA
| | - Sharat Israni
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sui Huang
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Peter W Rose
- San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Charlotte A Nelson
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sergio E Baranzini
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
35
|
Hong E, Jeon J, Kim HU. Recent development of machine learning models for the prediction of drug-drug interactions. KOREAN J CHEM ENG 2023; 40:276-285. [PMID: 36748027 PMCID: PMC9894510 DOI: 10.1007/s11814-023-1377-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 12/09/2022] [Accepted: 12/16/2022] [Indexed: 02/05/2023]
Abstract
Polypharmacy, the co-administration of multiple drugs, has become an area of concern as the elderly population grows and an unexpected infection, such as COVID-19 pandemic, keeps emerging. However, it is very costly and time-consuming to experimentally examine the pharmacological effects of polypharmacy. To address this challenge, machine learning models that predict drug-drug interactions (DDIs) have actively been developed in recent years. In particular, the growing volume of drug datasets and the advances in machine learning have facilitated the model development. In this regard, this review discusses the DDI-predicting machine learning models that have been developed since 2018. Our discussion focuses on dataset sources used to develop the models, featurization approaches of molecular structures and biological information, and types of DDI prediction outcomes from the models. Finally, we make suggestions for research opportunities in this field.
Collapse
Affiliation(s)
- Eujin Hong
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Junhyeok Jeon
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
- BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141 Korea
| |
Collapse
|
36
|
Zietz M, Himmelstein DS, Kloster K, Williams C, Nagle MW, Greene CS. The probability of edge existence due to node degree: a baseline for network-based predictions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522939. [PMID: 36711569 PMCID: PMC9881952 DOI: 10.1101/2023.01.05.522939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degree's predictive performance diminishes when the networks used for training and testing-despite measuring the same biological relationships-were generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).
Collapse
Affiliation(s)
- Michael Zietz
- Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Kyle Kloster
- Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Christopher Williams
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Michael W Nagle
- Internal Medicine Research Unit, Pfizer Worldwide Research, Development, and Medical
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
37
|
Soman K, Nelson CA, Cerono G, Baranzini SE. Time-aware Embeddings of Clinical Data using a Knowledge Graph. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023; 28:97-108. [PMID: 36540968 PMCID: PMC9782808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Meaningful representations of clinical data using embedding vectors is a pivotal step to invoke any machine learning (ML) algorithm for data inference. In this article, we propose a time-aware embedding approach of electronic health records onto a biomedical knowledge graph for creating machine readable patient representations. This approach not only captures the temporal dynamics of patient clinical trajectories, but also enriches it with additional biological information from the knowledge graph. To gauge the predictivity of this approach, we propose an ML pipeline called TANDEM (Temporal and Non-temporal Dynamics Embedded Model) and apply it on the early detection of Parkinson's disease. TANDEM results in a classification AUC score of 0.85 on unseen test dataset. These predictions are further explained by providing a biological insight using the knowledge graph. Taken together, we show that temporal embeddings of clinical data could be a meaningful predictive representation for downstream ML pipelines in clinical decision-making.
Collapse
|
38
|
Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how biomedical entities are related. Gigascience 2022; 12:giad047. [PMID: 37503959 PMCID: PMC10375517 DOI: 10.1093/gigascience/giad047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 04/14/2023] [Accepted: 06/06/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND Hetnets, short for "heterogeneous networks," contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet, connects 11 types of nodes-including genes, diseases, drugs, pathways, and anatomical structures-with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious about not only how metformin is related to breast cancer but also how a given gene might be involved in insomnia. FINDINGS We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any 2 nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. CONCLUSION We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open-source implementation of these methods in our new Python package named hetmatpy.
Collapse
Affiliation(s)
- Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Related Sciences, Denver, CO 80202, USA
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Kyle Kloster
- Carbon, Inc., Redwood City, CA 94063, USA
- Department of Computer Science, North Carolina State University, Raleigh, NC 27606, USA
| | - Benjamin J Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Faisal Alquaddoomi
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Dongbo Hu
- Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia, PA 19104, USA
| | - David N Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yun Hao
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Blair D Sullivan
- School of Computing, University of Utah, Salt Lake City, UT 84112, USA
| | - Michael W Nagle
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, MA 02139, USA
- Human Biology Integration Foundation, Deep Human Biology Learning, Eisai Inc., Cambridge, MA 02140, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
39
|
Santangelo BE, Gillenwater LA, Salem NM, Hunter LE. Molecular cartooning with knowledge graphs. FRONTIERS IN BIOINFORMATICS 2022; 2:1054578. [PMID: 36568701 PMCID: PMC9772836 DOI: 10.3389/fbinf.2022.1054578] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/23/2022] [Indexed: 12/13/2022] Open
Abstract
Molecular "cartoons," such as pathway diagrams, provide a visual summary of biomedical research results and hypotheses. Their ubiquitous appearance within the literature indicates their universal application in mechanistic communication. A recent survey of pathway diagrams identified 64,643 pathway figures published between 1995 and 2019 with 1,112,551 mentions of 13,464 unique human genes participating in a wide variety of biological processes. Researchers generally create these diagrams using generic diagram editing software that does not itself embody any biomedical knowledge. Biomedical knowledge graphs (KGs) integrate and represent knowledge in a semantically consistent way, systematically capturing biomedical knowledge similar to that in molecular cartoons. KGs have the potential to provide context and precise details useful in drawing such figures. However, KGs cannot generally be translated directly into figures. They include substantial material irrelevant to the scientific point of a given figure and are often more detailed than is appropriate. How could KGs be used to facilitate the creation of molecular diagrams? Here we present a new approach towards cartoon image creation that utilizes the semantic structure of knowledge graphs to aid the production of molecular diagrams. We introduce a set of "semantic graphical actions" that select and transform the relational information between heterogeneous entities (e.g., genes, proteins, pathways, diseases) in a KG to produce diagram schematics that meet the scientific communication needs of the user. These semantic actions search, select, filter, transform, group, arrange, connect and extract relevant subgraphs from KGs based on meaning in biological terms, e.g., a protein upstream of a target in a pathway. To demonstrate the utility of this approach, we show how semantic graphical actions on KGs could have been used to produce three existing pathway diagrams in diverse biomedical domains: Down Syndrome, COVID-19, and neuroinflammation. Our focus is on recapitulating the semantic content of the figures, not the layout, glyphs, or other aesthetic aspects. Our results suggest that the use of KGs and semantic graphical actions to produce biomedical diagrams will reduce the effort required and improve the quality of this visual form of scientific communication.
Collapse
|
40
|
An automatic hypothesis generation for plausible linkage between xanthium and diabetes. Sci Rep 2022; 12:17547. [PMID: 36266295 PMCID: PMC9585073 DOI: 10.1038/s41598-022-20752-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 09/19/2022] [Indexed: 01/13/2023] Open
Abstract
There has been a significant increase in text mining implementation for biomedical literature in recent years. Previous studies introduced the implementation of text mining and literature-based discovery to generate hypotheses of potential candidates for drug development. By conducting a hypothesis-generation step and using evidence from published journal articles or proceedings, previous studies have managed to reduce experimental time and costs. First, we applied the closed discovery approach from Swanson's ABC model to collect publications related to 36 Xanthium compounds or diabetes. Second, we extracted biomedical entities and relations using a knowledge extraction engine, the Public Knowledge Discovery Engine for Java or PKDE4J. Third, we built a knowledge graph using the obtained bio entities and relations and then generated paths with Xanthium compounds as source nodes and diabetes as the target node. Lastly, we employed graph embeddings to rank each path and evaluated the results based on domain experts' opinions and literature. Among 36 Xanthium compounds, 35 had direct paths to five diabetes-related nodes. We ranked 2,740,314 paths in total between 35 Xanthium compounds and three diabetes-related phrases: type 1 diabetes, type 2 diabetes, and diabetes mellitus. Based on the top five percentile paths, we concluded that adenosine, choline, beta-sitosterol, rhamnose, and scopoletin were potential candidates for diabetes drug development using natural products. Our framework for hypothesis generation employs a closed discovery from Swanson's ABC model that has proven very helpful in discovering biological linkages between bio entities. The PKDE4J tools we used to capture bio entities from our document collection could label entities into five categories: genes, compounds, phenotypes, biological processes, and molecular functions. Using the BioPREP model, we managed to interpret the semantic relatedness between two nodes and provided paths containing valuable hypotheses. Lastly, using a graph-embedding algorithm in our path-ranking analysis, we exploited the semantic relatedness while preserving the graph structure properties.
Collapse
|
41
|
Fernández-Torras A, Duran-Frigola M, Bertoni M, Locatelli M, Aloy P. Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque. Nat Commun 2022; 13:5304. [PMID: 36085310 PMCID: PMC9463154 DOI: 10.1038/s41467-022-33026-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 08/30/2022] [Indexed: 12/25/2022] Open
Abstract
Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge, so that multiple views of a given biological event can be considered simultaneously. Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated biomedical descriptors derived from a gigantic knowledge graph, displaying more than 450 thousand biological entities and 30 million relationships between them. The Bioteque integrates, harmonizes, and formats data collected from over 150 data sources, including 12 biological entities (e.g., genes, diseases, drugs) linked by 67 types of associations (e.g., 'drug treats disease', 'gene interacts with gene'). We show how Bioteque descriptors facilitate the assessment of high-throughput protein-protein interactome data, the prediction of drug response and new repurposing opportunities, and demonstrate that they can be used off-the-shelf in downstream machine learning tasks without loss of performance with respect to using original data. The Bioteque thus offers a thoroughly processed, tractable, and highly optimized assembly of the biomedical knowledge available in the public domain.
Collapse
Affiliation(s)
- Adrià Fernández-Torras
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Miquel Duran-Frigola
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Ersilia Open Source Initiative, Cambridge, UK
| | - Martino Bertoni
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Martina Locatelli
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain.
| |
Collapse
|
42
|
Asaad C, Ghogho M. AsthmaKGxE: An asthma-environment interaction knowledge graph leveraging public databases and scientific literature. Comput Biol Med 2022; 148:105933. [DOI: 10.1016/j.compbiomed.2022.105933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 06/11/2022] [Accepted: 07/30/2022] [Indexed: 11/03/2022]
|
43
|
Poleksic A. Overcoming Sparseness of Biomedical Networks to Identify Drug Repositioning Candidates. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2377-2384. [PMID: 33591920 DOI: 10.1109/tcbb.2021.3059807] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Modeling complex biological systems is necessary to understand biochemical interactions behind pharmacological effects of drugs. Successful in silico drug repurposing relies on exploration of diverse biochemical concepts and their relationships, including drug's adverse reactions, drug targets, disease symptoms, as well as disease associated genes and their pathways, to name a few. We present a computational method for inferring drug-disease associations from complex but incomplete and biased biological networks. Our method employs matrix completion to overcome the sparseness of biomedical data and to enrich the set of relationships between different biomedical entities. We present a strategy for identifying network paths supportive of drug efficacy as well as a computational procedure capable of combining different network patterns to better distinguish treatments from non-treatments. The algorithms is available at http://bioinfo.cs.uni.edu/AEONET.html.
Collapse
|
44
|
Gupta C, Chandrashekar P, Jin T, He C, Khullar S, Chang Q, Wang D. Bringing machine learning to research on intellectual and developmental disabilities: taking inspiration from neurological diseases. J Neurodev Disord 2022; 14:28. [PMID: 35501679 PMCID: PMC9059371 DOI: 10.1186/s11689-022-09438-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 04/07/2022] [Indexed: 12/31/2022] Open
Abstract
Intellectual and Developmental Disabilities (IDDs), such as Down syndrome, Fragile X syndrome, Rett syndrome, and autism spectrum disorder, usually manifest at birth or early childhood. IDDs are characterized by significant impairment in intellectual and adaptive functioning, and both genetic and environmental factors underpin IDD biology. Molecular and genetic stratification of IDDs remain challenging mainly due to overlapping factors and comorbidity. Advances in high throughput sequencing, imaging, and tools to record behavioral data at scale have greatly enhanced our understanding of the molecular, cellular, structural, and environmental basis of some IDDs. Fueled by the "big data" revolution, artificial intelligence (AI) and machine learning (ML) technologies have brought a whole new paradigm shift in computational biology. Evidently, the ML-driven approach to clinical diagnoses has the potential to augment classical methods that use symptoms and external observations, hoping to push the personalized treatment plan forward. Therefore, integrative analyses and applications of ML technology have a direct bearing on discoveries in IDDs. The application of ML to IDDs can potentially improve screening and early diagnosis, advance our understanding of the complexity of comorbidity, and accelerate the identification of biomarkers for clinical research and drug development. For more than five decades, the IDDRC network has supported a nexus of investigators at centers across the USA, all striving to understand the interplay between various factors underlying IDDs. In this review, we introduced fast-increasing multi-modal data types, highlighted example studies that employed ML technologies to illuminate factors and biological mechanisms underlying IDDs, as well as recent advances in ML technologies and their applications to IDDs and other neurological diseases. We discussed various molecular, clinical, and environmental data collection modes, including genetic, imaging, phenotypical, and behavioral data types, along with multiple repositories that store and share such data. Furthermore, we outlined some fundamental concepts of machine learning algorithms and presented our opinion on specific gaps that will need to be filled to accomplish, for example, reliable implementation of ML-based diagnosis technology in IDD clinics. We anticipate that this review will guide researchers to formulate AI and ML-based approaches to investigate IDDs and related conditions.
Collapse
Affiliation(s)
- Chirag Gupta
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Pramod Chandrashekar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Ting Jin
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Chenfeng He
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Saniya Khullar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Qiang Chang
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Medical Genetics, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Neurology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| |
Collapse
|
45
|
Santos A, Colaço AR, Nielsen AB, Niu L, Strauss M, Geyer PE, Coscia F, Albrechtsen NJW, Mundt F, Jensen LJ, Mann M. A knowledge graph to interpret clinical proteomics data. Nat Biotechnol 2022; 40:692-702. [PMID: 35102292 PMCID: PMC9110295 DOI: 10.1038/s41587-021-01145-6] [Citation(s) in RCA: 107] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 11/01/2021] [Indexed: 12/14/2022]
Abstract
Implementing precision medicine hinges on the integration of omics data, such as proteomics, into the clinical decision-making process, but the quantity and diversity of biomedical data, and the spread of clinically relevant knowledge across multiple biomedical databases and publications, pose a challenge to data integration. Here we present the Clinical Knowledge Graph (CKG), an open-source platform currently comprising close to 20 million nodes and 220 million relationships that represent relevant experimental data, public databases and literature. The graph structure provides a flexible data model that is easily extendable to new nodes and relationships as new databases become available. The CKG incorporates statistical and machine learning algorithms that accelerate the analysis and interpretation of typical proteomics workflows. Using a set of proof-of-concept biomarker studies, we show how the CKG might augment and enrich proteomics data and help inform clinical decision-making.
Collapse
Affiliation(s)
- Alberto Santos
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
- Li-Ka Shing Big Data Institute, University of Oxford, Oxford, UK.
- Center for Health Data Science, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Ana R Colaço
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Annelaura B Nielsen
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Lili Niu
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Maximilian Strauss
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
- OmicEra Diagnostics GmbH, Planegg, Germany
| | - Philipp E Geyer
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
- OmicEra Diagnostics GmbH, Planegg, Germany
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Fabian Coscia
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Nicolai J Wewer Albrechtsen
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Biomedical Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department for Clinical Biochemistry, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Filip Mundt
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Lars Juhl Jensen
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Matthias Mann
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| |
Collapse
|
46
|
Gogleva A, Polychronopoulos D, Pfeifer M, Poroshin V, Ughetto M, Martin MJ, Thorpe H, Bornot A, Smith PD, Sidders B, Dry JR, Ahdesmäki M, McDermott U, Papa E, Bulusu KC. Knowledge graph-based recommendation framework identifies drivers of resistance in EGFR mutant non-small cell lung cancer. Nat Commun 2022; 13:1667. [PMID: 35351890 PMCID: PMC8964738 DOI: 10.1038/s41467-022-29292-7] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 12/25/2022] Open
Abstract
Resistance to EGFR inhibitors (EGFRi) presents a major obstacle in treating non-small cell lung cancer (NSCLC). One of the most exciting new ways to find potential resistance markers involves running functional genetic screens, such as CRISPR, followed by manual triage of significantly enriched genes. This triage process to identify 'high value' hits resulting from the CRISPR screen involves manual curation that requires specialized knowledge and can take even experts several months to comprehensively complete. To find key drivers of resistance faster we build a recommendation system on top of a heterogeneous biomedical knowledge graph integrating pre-clinical, clinical, and literature evidence. The recommender system ranks genes based on trade-offs between diverse types of evidence linking them to potential mechanisms of EGFRi resistance. This unbiased approach identifies 57 resistance markers from >3,000 genes, reducing hit identification time from months to minutes. In addition to reproducing known resistance markers, our method identifies previously unexplored resistance mechanisms that we prospectively validate.
Collapse
Affiliation(s)
- Anna Gogleva
- Biological Insight Knowledge Graph (BIKG), AI Engineering, R&D IT, AstraZeneca, Cambridge, UK
| | - Dimitris Polychronopoulos
- Early Computational Oncology, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Matthias Pfeifer
- Bioscience, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | | | - Michaël Ughetto
- Biological Insight Knowledge Graph (BIKG), AI Engineering, R&D IT, AstraZeneca, Gothenburg, Sweden
| | - Matthew J Martin
- Bioscience, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Hannah Thorpe
- Bioscience, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Aurelie Bornot
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Paul D Smith
- Bioscience, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Ben Sidders
- Early Computational Oncology, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Jonathan R Dry
- Early Computational Oncology, Research and Early Development, Oncology R&D, AstraZeneca, Waltham, MA, USA
| | - Miika Ahdesmäki
- Early Computational Oncology, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Ultan McDermott
- Bioscience, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Eliseo Papa
- Biological Insight Knowledge Graph (BIKG), AI Engineering, R&D IT, AstraZeneca, Cambridge, UK.
| | - Krishna C Bulusu
- Early Computational Oncology, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK.
| |
Collapse
|
47
|
Ratajczak F, Joblin M, Ringsquandl M, Hildebrandt M. Task-driven knowledge graph filtering improves prioritizing drugs for repurposing. BMC Bioinformatics 2022; 23:84. [PMID: 35246025 PMCID: PMC8894843 DOI: 10.1186/s12859-022-04608-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 12/09/2021] [Indexed: 02/07/2023] Open
Abstract
Background Drug repurposing aims at finding new targets for already developed drugs. It becomes more relevant as the cost of discovering new drugs steadily increases. To find new potential targets for a drug, an abundance of methods and existing biomedical knowledge from different domains can be leveraged. Recently, knowledge graphs have emerged in the biomedical domain that integrate information about genes, drugs, diseases and other biological domains. Knowledge graphs can be used to predict new connections between compounds and diseases, leveraging the interconnected biomedical data around them. While real world use cases such as drug repurposing are only interested in one specific relation type, widely used knowledge graph embedding models simultaneously optimize over all relation types in the graph. This can lead the models to underfit the data that is most relevant for the desired relation type. For example, if we want to learn embeddings to predict links between compounds and diseases but almost the entirety of relations in the graph is incident to other pairs of entity types, then the resulting embeddings are likely not optimised to predict links between compounds and diseases. We propose a method that leverages domain knowledge in the form of metapaths and use them to filter two biomedical knowledge graphs (Hetionet and DRKG) for the purpose of improving performance on the prediction task of drug repurposing while simultaneously increasing computational efficiency. Results We find that our method reduces the number of entities by 60% on Hetionet and 26% on DRKG, while leading to an improvement in prediction performance of up to 40.8% on Hetionet and 14.2% on DRKG, with an average improvement of 20.6% on Hetionet and 8.9% on DRKG. Additionally, prioritization of antiviral compounds for SARS CoV-2 improves after task-driven filtering is applied. Conclusion Knowledge graphs contain facts that are counter productive for specific tasks, in our case drug repurposing. We also demonstrate that these facts can be removed, resulting in an improved performance in that task and a more efficient learning process. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04608-y.
Collapse
Affiliation(s)
- Florin Ratajczak
- Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Munich, Germany. .,Digital Technology and Innovation, Siemens Healthineers, Erlangen, Germany.
| | | | | | | |
Collapse
|
48
|
Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0. BIG DATA AND COGNITIVE COMPUTING 2022; 6. [PMID: 35936510 PMCID: PMC9351549 DOI: 10.3390/bdcc6010027] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities.
Collapse
|
49
|
Zahoránszky-Kőhalmi G, Siramshetty VB, Kumar P, Gurumurthy M, Grillo B, Mathew B, Metaxatos D, Backus M, Mierzwa T, Simon R, Grishagin I, Brovold L, Mathé EA, Hall MD, Michael SG, Godfrey AG, Mestres J, Jensen LJ, Oprea TI. A Workflow of Integrated Resources to Catalyze Network Pharmacology Driven COVID-19 Research. J Chem Inf Model 2022; 62:718-729. [PMID: 35057621 PMCID: PMC10790216 DOI: 10.1021/acs.jcim.1c00431] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
In the event of an outbreak due to an emerging pathogen, time is of the essence to contain or to mitigate the spread of the disease. Drug repositioning is one of the strategies that has the potential to deliver therapeutics relatively quickly. The SARS-CoV-2 pandemic has shown that integrating critical data resources to drive drug-repositioning studies, involving host-host, host-pathogen, and drug-target interactions, remains a time-consuming effort that translates to a delay in the development and delivery of a life-saving therapy. Here, we describe a workflow we designed for a semiautomated integration of rapidly emerging data sets that can be generally adopted in a broad network pharmacology research setting. The workflow was used to construct a COVID-19 focused multimodal network that integrates 487 host-pathogen, 63 278 host-host protein, and 1221 drug-target interactions. The resultant Neo4j graph database named "Neo4COVID19" is made publicly accessible via a web interface and via API calls based on the Bolt protocol. Details for accessing the database are provided on a landing page (https://neo4covid19.ncats.io/). We believe that our Neo4COVID19 database will be a valuable asset to the research community and will catalyze the discovery of therapeutics to fight COVID-19.
Collapse
Affiliation(s)
| | - Vishal B. Siramshetty
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Praveen Kumar
- Department of Internal Medicine, University of New Mexico School of Medicine, 1 University of New Mexico, Albuquerque, NM 87131, USA
- Department of Computer Science, University of New Mexico, 1 University of New Mexico Albuquerque, NM 87131, USA
| | - Manideep Gurumurthy
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Busola Grillo
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Biju Mathew
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Dimitrios Metaxatos
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Mark Backus
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Tim Mierzwa
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Reid Simon
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Ivan Grishagin
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
- Rancho BioSciences LLC., 16955 Via Del Campo Suite 200, San Diego, CA 92127, USA
| | - Laura Brovold
- Rancho BioSciences LLC., 16955 Via Del Campo Suite 200, San Diego, CA 92127, USA
| | - Ewy A. Mathé
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Matthew D. Hall
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Samuel G. Michael
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Alexander G. Godfrey
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Jordi Mestres
- Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| | - Lars J. Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences,University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
| | - Tudor I. Oprea
- Department of Internal Medicine, University of New Mexico School of Medicine, 1 University of New Mexico, Albuquerque, NM 87131, USA
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences,University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
- UNM Comprehensive Cancer Center, 1201 Camino de Salud NE, Albuquerque, NM 87102, USA
- Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at University of Gothenburg, Box 480, 40530 Gothenburg, Sweden
| |
Collapse
|
50
|
Machine learning prediction and tau-based screening identifies potential Alzheimer's disease genes relevant to immunity. Commun Biol 2022; 5:125. [PMID: 35149761 PMCID: PMC8837797 DOI: 10.1038/s42003-022-03068-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 01/21/2022] [Indexed: 12/19/2022] Open
Abstract
With increased research funding for Alzheimer's disease (AD) and related disorders across the globe, large amounts of data are being generated. Several studies employed machine learning methods to understand the ever-growing omics data to enhance early diagnosis, map complex disease networks, or uncover potential drug targets. We describe results based on a Target Central Resource Database protein knowledge graph and evidence paths transformed into vectors by metapath matching. We extracted features between specific genes and diseases, then trained and optimized our model using XGBoost, termed MPxgb(AD). To determine our MPxgb(AD) prediction performance, we examined the top twenty predicted genes through an experimental screening pipeline. Our analysis identified potential AD risk genes: FRRS1, CTRAM, SCGB3A1, FAM92B/CIBAR2, and TMEFF2. FRRS1 and FAM92B are considered dark genes, while CTRAM, SCGB3A1, and TMEFF2 are connected to TREM2-TYROBP, IL-1β-TNFα, and MTOR-APP AD-risk nodes, suggesting relevance to the pathogenesis of AD.
Collapse
|