1
|
Raju S, Turner ME, Cao C, Abdul-Samad M, Punwasi N, Blaser MC, Cahalane RME, Botts SR, Prajapati K, Patel S, Wu R, Gustafson D, Galant NJ, Fiddes L, Chemaly M, Hedin U, Matic L, Seidman MA, Subasri V, Singh SA, Aikawa E, Fish JE, Howe KL. Multiomic Landscape of Extracellular Vesicles in Human Carotid Atherosclerotic Plaque Reveals Endothelial Communication Networks. Arterioscler Thromb Vasc Biol 2025; 45:1277-1305. [PMID: 40438929 DOI: 10.1161/atvbaha.124.322324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Accepted: 04/29/2025] [Indexed: 06/28/2025]
Abstract
BACKGROUND Carotid atherosclerosis is orchestrated by cell-cell communication that drives progression along a clinical continuum (asymptomatic to symptomatic). Extracellular vesicles (EVs) are cell-derived nanoparticles representing a new paradigm in cellular communication. Little is known about their biological cargo, cellular origin/destination, and functional roles in human atherosclerotic plaque. METHODS EVs were enriched via size exclusion chromatography from human carotid endarterectomy samples dissected into paired plaque and marginal zones (symptomatic n=16, asymptomatic n=13). EV-cargos were assessed via whole transcriptome microRNA-sequencing and mass spectrometry-based proteomics. EV multiomics was integrated with bulk and single-cell RNA-sequencing datasets to predict EV cellular origin and ligand-receptor interactions, and multimodal biological network integration of EV-cargo was completed. EV functional impact was assessed with endothelial angiogenesis assays. RESULTS Carotid plaques contained more EVs than adjacent marginal zones, with differential enrichment for EV-microRNAs and EV-proteins in key atherogenic pathways. EV cellular origin analysis suggested that tissue EV signatures originated from endothelial cells, smooth muscle cells, and immune cells. Integrated tissue vesiculomics and single-cell RNA-sequencing indicated complex EV-vascular cell communication that changed with disease progression and plaque vulnerability (ie, symptomatic disease). Plaques from symptomatic patients, but not asymptomatic patients, were characterized by increased involvement of endothelial pathways and more complex ligand-receptor interactions, relative to their marginal zones. Plaque EVs were predicted to mediate communication with endothelial cells. Pathway enrichment analysis delineated an endothelial signature with roles in angiogenesis and neovascularization, well-known indices of plaque instability. This was validated functionally, wherein human carotid symptomatic plaque EVs induced sprouting angiogenesis in comparison to their matched marginal zones. CONCLUSIONS Our findings indicate that EVs may drive dynamic changes in plaques through EV-vascular cell communication and effector functions that typify vulnerability to rupture, precipitating symptomatic disease. The discovery of endothelial-directed angiogenic processes mediated by EVs creates new therapeutic avenues for atherosclerosis.
Collapse
Affiliation(s)
- Sneha Raju
- Toronto General Hospital Research Institute, University Health Network, ON, Toronto, Canada (S.R., S.R.B., K.P., S.P., R.W., D.G., J.E.F., K.L.H.)
- Institute of Medical Science, Temerty Faculty of Medicine, University of Toronto, ON, Canada (S.R., S.R.B., J.E.F., K.L.H.)
- Division of Vascular Surgery, University Health Network, Department of Surgery, University of Toronto, ON, Canada (S.R., K.L.H.)
- Temerty Faculty of Medicine, University of Toronto, ON, Canada (S.R., C.C., S.R.B., L.F., K.L.H.)
| | - Mandy E Turner
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (M.E.T., M.C.B., R.M.E.C., S.A.S., E.A.)
| | - Christian Cao
- Temerty Faculty of Medicine, University of Toronto, ON, Canada (S.R., C.C., S.R.B., L.F., K.L.H.)
| | - Majed Abdul-Samad
- Department of Laboratory Medicine and Pathobiology, University of Toronto, ON, Canada (M.A.-S., K.P., R.W., J.E.F., K.L.H.)
| | - Neil Punwasi
- Peter Munk Cardiac Centre, University Health Network, Toronto, Canada (N.P., M.A.S., V.S., J.E.F., K.L.H.)
| | - Mark C Blaser
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (M.E.T., M.C.B., R.M.E.C., S.A.S., E.A.)
| | - Rachel M E Cahalane
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (M.E.T., M.C.B., R.M.E.C., S.A.S., E.A.)
- Mechanobiology and Medical Device Research Group (MMDRG), Biomedical Engineering, College of Science and Engineering, University of Galway, Ireland (R.M.E.C)
| | - Steven R Botts
- Toronto General Hospital Research Institute, University Health Network, ON, Toronto, Canada (S.R., S.R.B., K.P., S.P., R.W., D.G., J.E.F., K.L.H.)
- Institute of Medical Science, Temerty Faculty of Medicine, University of Toronto, ON, Canada (S.R., S.R.B., J.E.F., K.L.H.)
- Temerty Faculty of Medicine, University of Toronto, ON, Canada (S.R., C.C., S.R.B., L.F., K.L.H.)
| | - Kamalben Prajapati
- Toronto General Hospital Research Institute, University Health Network, ON, Toronto, Canada (S.R., S.R.B., K.P., S.P., R.W., D.G., J.E.F., K.L.H.)
- Department of Laboratory Medicine and Pathobiology, University of Toronto, ON, Canada (M.A.-S., K.P., R.W., J.E.F., K.L.H.)
| | - Sarvatit Patel
- Toronto General Hospital Research Institute, University Health Network, ON, Toronto, Canada (S.R., S.R.B., K.P., S.P., R.W., D.G., J.E.F., K.L.H.)
| | - Ruilin Wu
- Toronto General Hospital Research Institute, University Health Network, ON, Toronto, Canada (S.R., S.R.B., K.P., S.P., R.W., D.G., J.E.F., K.L.H.)
- Department of Laboratory Medicine and Pathobiology, University of Toronto, ON, Canada (M.A.-S., K.P., R.W., J.E.F., K.L.H.)
| | - Dakota Gustafson
- Toronto General Hospital Research Institute, University Health Network, ON, Toronto, Canada (S.R., S.R.B., K.P., S.P., R.W., D.G., J.E.F., K.L.H.)
| | | | - Lindsey Fiddes
- Temerty Faculty of Medicine, University of Toronto, ON, Canada (S.R., C.C., S.R.B., L.F., K.L.H.)
| | - Melody Chemaly
- Vascular Surgery Division, Department of Molecular Medicine and Surgery, Karolinska University Hospital and Karolinska Institut, Stockholm, Sweden (M.C., U.H., L.M.)
| | - Ulf Hedin
- Vascular Surgery Division, Department of Molecular Medicine and Surgery, Karolinska University Hospital and Karolinska Institut, Stockholm, Sweden (M.C., U.H., L.M.)
| | - Ljubica Matic
- Vascular Surgery Division, Department of Molecular Medicine and Surgery, Karolinska University Hospital and Karolinska Institut, Stockholm, Sweden (M.C., U.H., L.M.)
| | - Michael A Seidman
- Peter Munk Cardiac Centre, University Health Network, Toronto, Canada (N.P., M.A.S., V.S., J.E.F., K.L.H.)
- Laboratory Medicine Program, University Health Network, Toronto, ON, Canada (M.A.S.)
| | - Vallijah Subasri
- Peter Munk Cardiac Centre, University Health Network, Toronto, Canada (N.P., M.A.S., V.S., J.E.F., K.L.H.)
| | - Sasha A Singh
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (M.E.T., M.C.B., R.M.E.C., S.A.S., E.A.)
| | - Elena Aikawa
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (M.E.T., M.C.B., R.M.E.C., S.A.S., E.A.)
- Center for Excellence in Vascular Biology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (E.A.)
| | - Jason E Fish
- Toronto General Hospital Research Institute, University Health Network, ON, Toronto, Canada (S.R., S.R.B., K.P., S.P., R.W., D.G., J.E.F., K.L.H.)
- Institute of Medical Science, Temerty Faculty of Medicine, University of Toronto, ON, Canada (S.R., S.R.B., J.E.F., K.L.H.)
- Department of Laboratory Medicine and Pathobiology, University of Toronto, ON, Canada (M.A.-S., K.P., R.W., J.E.F., K.L.H.)
- Peter Munk Cardiac Centre, University Health Network, Toronto, Canada (N.P., M.A.S., V.S., J.E.F., K.L.H.)
| | - Kathryn L Howe
- Toronto General Hospital Research Institute, University Health Network, ON, Toronto, Canada (S.R., S.R.B., K.P., S.P., R.W., D.G., J.E.F., K.L.H.)
- Institute of Medical Science, Temerty Faculty of Medicine, University of Toronto, ON, Canada (S.R., S.R.B., J.E.F., K.L.H.)
- Division of Vascular Surgery, University Health Network, Department of Surgery, University of Toronto, ON, Canada (S.R., K.L.H.)
- Temerty Faculty of Medicine, University of Toronto, ON, Canada (S.R., C.C., S.R.B., L.F., K.L.H.)
- Department of Laboratory Medicine and Pathobiology, University of Toronto, ON, Canada (M.A.-S., K.P., R.W., J.E.F., K.L.H.)
- Peter Munk Cardiac Centre, University Health Network, Toronto, Canada (N.P., M.A.S., V.S., J.E.F., K.L.H.)
| |
Collapse
|
2
|
Billmann M. Who controls the tariffs of a human cell? Mol Syst Biol 2025:10.1038/s44320-025-00112-6. [PMID: 40355753 DOI: 10.1038/s44320-025-00112-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2025] [Accepted: 04/25/2025] [Indexed: 05/15/2025] Open
Affiliation(s)
- Maximilian Billmann
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn, 53127, Germany.
| |
Collapse
|
3
|
Chhibbar P, Das J. Machine learning approaches enable the discovery of therapeutics across domains. Mol Ther 2025; 33:2269-2278. [PMID: 40186352 DOI: 10.1016/j.ymthe.2025.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Revised: 03/21/2025] [Accepted: 04/01/2025] [Indexed: 04/07/2025] Open
Abstract
Multi-modal datasets have grown exponentially in the last decade. This has created an enormous demand for machine learning models that can predict complex outcomes by leveraging cellular, molecular, and humoral profiles. Corresponding inference of mechanisms can help to uncover new therapeutic targets. Here, we discuss how biological principles guide the design of predictive models and how interpretable machine learning can lead to novel mechanistic insights. We provide descriptions of multiple learning techniques and how suited they are to domain adaptations. Finally, we talk about broad learning capabilities of foundation models on large datasets and whether they can be used to provide meaningful inference about biological datasets.
Collapse
Affiliation(s)
- Prabal Chhibbar
- Centre for Systems Immunology, Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology PhD Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| | - Jishnu Das
- Centre for Systems Immunology, Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
4
|
Kitani A, Matsui Y. Integrative network analysis reveals novel moderators of Aβ-Tau interaction in Alzheimer's disease. Alzheimers Res Ther 2025; 17:70. [PMID: 40176187 PMCID: PMC11967117 DOI: 10.1186/s13195-025-01705-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Accepted: 02/25/2025] [Indexed: 04/04/2025]
Abstract
BACKGROUND Although interactions between amyloid-beta and tau proteins have been implicated in Alzheimer's disease (AD), the precise mechanisms by which these interactions contribute to disease progression are not yet fully understood. Moreover, despite the growing application of deep learning in various biomedical fields, its application in integrating networks to analyze disease mechanisms in AD research remains limited. In this study, we employed BIONIC, a deep learning-based network integration method, to integrate proteomics and protein-protein interaction data, with an aim to uncover factors that moderate the effects of the Aβ-tau interaction on mild cognitive impairment (MCI) and early-stage AD. METHODS Proteomic data from the ROSMAP cohort were integrated with protein-protein interaction (PPI) data using a Deep Learning-based model. Linear regression analysis was applied to histopathological and gene expression data, and mutual information was used to detect moderating factors. Statistical significance was determined using the Benjamini-Hochberg correction (p < 0.05). RESULTS Our results suggested that astrocytes and GPNMB + microglia moderate the Aβ-tau interaction. Based on linear regression with histopathological and gene expression data, GFAP and IBA1 levels and GPNMB gene expression positively contributed to the interaction of tau with Aβ in non-dementia cases, replicating the results of the network analysis. CONCLUSIONS These findings suggest that GPNMB + microglia moderate the Aβ-tau interaction in early AD and therefore are a novel therapeutic target. To facilitate further research, we have made the integrated network available as a visualization tool for the scientific community (URL: https://igcore.cloud/GerOmics/AlzPPMap ).
Collapse
Affiliation(s)
- Akihiro Kitani
- Department of Integrated Health Science, Biomedical and Health Informatics Unit, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yusuke Matsui
- Department of Integrated Health Science, Biomedical and Health Informatics Unit, Nagoya University Graduate School of Medicine, Nagoya, Japan.
- Institute for Glyco-Core Research (Igcore), Nagoya University, Nagoya, Aichi, 461-8673, Japan.
| |
Collapse
|
5
|
Nayar G, Altman RB. Heterogeneous network approaches to protein pathway prediction. Comput Struct Biotechnol J 2024; 23:2727-2739. [PMID: 39035835 PMCID: PMC11260399 DOI: 10.1016/j.csbj.2024.06.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/23/2024] Open
Abstract
Understanding protein-protein interactions (PPIs) and the pathways they comprise is essential for comprehending cellular functions and their links to specific phenotypes. Despite the prevalence of molecular data generated by high-throughput sequencing technologies, a significant gap remains in translating this data into functional information regarding the series of interactions that underlie phenotypic differences. In this review, we present an in-depth analysis of heterogeneous network methodologies for modeling protein pathways, highlighting the critical role of integrating multifaceted biological data. It outlines the process of constructing these networks, from data representation to machine learning-driven predictions and evaluations. The work underscores the potential of heterogeneous networks in capturing the complexity of proteomic interactions, thereby offering enhanced accuracy in pathway prediction. This approach not only deepens our understanding of cellular processes but also opens up new possibilities in disease treatment and drug discovery by leveraging the predictive power of comprehensive proteomic data analysis.
Collapse
Affiliation(s)
- Gowri Nayar
- Department of Biomedical Data Science, Stanford University, United States
| | - Russ B. Altman
- Department of Biomedical Data Science, Stanford University, United States
- Department of Genetics, Stanford University, United States
- Department of Medicine, Stanford University, United States
- Department of Bioengineering, Stanford University, United States
| |
Collapse
|
6
|
Tasnina N, Murali TM. ICoN: integration using co-attention across biological networks. BIOINFORMATICS ADVANCES 2024; 5:vbae182. [PMID: 39801779 PMCID: PMC11723530 DOI: 10.1093/bioadv/vbae182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 09/24/2024] [Accepted: 11/14/2024] [Indexed: 01/16/2025]
Abstract
Motivation Molecular interaction networks are powerful tools for studying cellular functions. Integrating diverse types of networks enhances performance in downstream tasks such as gene module detection and protein function prediction. The challenge lies in extracting meaningful protein feature representations due to varying levels of sparsity and noise across these heterogeneous networks. Results We propose ICoN, a novel unsupervised graph neural network model that takes multiple protein-protein association networks as inputs and generates a feature representation for each protein that integrates the topological information from all the networks. A key contribution of ICoN is exploiting a mechanism called "co-attention" that enables cross-network communication during training. The model also incorporates a denoising training technique, introducing perturbations to each input network and training the model to reconstruct the original network from its corrupted version. Our experimental results demonstrate that ICoN surpasses individual networks across three downstream tasks: gene module detection, gene coannotation prediction, and protein function prediction. Compared to existing unsupervised network integration models, ICoN exhibits superior performance across the majority of downstream tasks and shows enhanced robustness against noise. This work introduces a promising approach for effectively integrating diverse protein-protein association networks, aiming to achieve a biologically meaningful representation of proteins. Availability and implementation The ICoN software is available under the GNU Public License v3 at https://github.com/Murali-group/ICoN.
Collapse
Affiliation(s)
- Nure Tasnina
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| |
Collapse
|
7
|
Zhang Y, Wang Y, Wu C, Zhan L, Wang A, Cheng C, Zhao J, Zhang W, Chen J, Li P. Drug-target interaction prediction by integrating heterogeneous information with mutual attention network. BMC Bioinformatics 2024; 25:361. [PMID: 39563226 DOI: 10.1186/s12859-024-05976-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 11/05/2024] [Indexed: 11/21/2024] Open
Abstract
BACKGROUND Identification of drug-target interactions is an indispensable part of drug discovery. While conventional shallow machine learning and recent deep learning methods based on chemogenomic properties of drugs and target proteins have pushed this prediction performance improvement to a new level, these methods are still difficult to adapt to novel structures. Alternatively, large-scale biological and pharmacological data provide new ways to accelerate drug-target interaction prediction. METHODS Here, we propose DrugMAN, a deep learning model for predicting drug-target interaction by integrating multiplex heterogeneous functional networks with a mutual attention network (MAN). DrugMAN uses a graph attention network-based integration algorithm to learn network-specific low-dimensional features for drugs and target proteins by integrating four drug networks and seven gene/protein networks collected by a certain screening conditions, respectively. DrugMAN then captures interaction information between drug and target representations by a mutual attention network to improve drug-target prediction. RESULTS DrugMAN achieved the best performance compared with cheminformation-based methods SVM, RF, DeepPurpose and network-based deep learing methods DTINet and NeoDT in four different scenarios, especially in real-world scenarios. Compared with SVM, RF, deepurpose, DTINet, and NeoDT, DrugMAN showed the smallest decrease in AUROC, AUPRC, and F1-Score from warm-start to Both-cold scenarios. This result is attributed to DrugMAN's learning from heterogeneous data and indicates that DrugMAN has a good generalization ability. Taking together, DrugMAN spotlights heterogeneous information to mine drug-target interactions and can be a powerful tool for drug discovery and drug repurposing.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- Shanxi Key Lab for Modernization of TCVM, College of Basic Sciences, Shanxi Agricultural University, Taigu, 030801, China
| | - Yingdong Wang
- Shanxi Key Lab for Modernization of TCVM, College of Basic Sciences, Shanxi Agricultural University, Taigu, 030801, China
| | - Chaoyong Wu
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Lingmin Zhan
- Shanxi Key Lab for Modernization of TCVM, College of Basic Sciences, Shanxi Agricultural University, Taigu, 030801, China
| | - Aoyi Wang
- Shanxi Key Lab for Modernization of TCVM, College of Basic Sciences, Shanxi Agricultural University, Taigu, 030801, China
| | - Caiping Cheng
- Shanxi Key Lab for Modernization of TCVM, College of Basic Sciences, Shanxi Agricultural University, Taigu, 030801, China
| | - Jinzhong Zhao
- Shanxi Key Lab for Modernization of TCVM, College of Basic Sciences, Shanxi Agricultural University, Taigu, 030801, China
| | - Wuxia Zhang
- Shanxi Key Lab for Modernization of TCVM, College of Basic Sciences, Shanxi Agricultural University, Taigu, 030801, China.
| | - Jianxin Chen
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, 100029, China.
| | - Peng Li
- Shanxi Key Lab for Modernization of TCVM, College of Basic Sciences, Shanxi Agricultural University, Taigu, 030801, China.
| |
Collapse
|
8
|
Kitani A, Matsui Y. Integrative Network Analysis Reveals Novel Moderators of Aβ-Tau Interaction in Alzheimer's Disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599092. [PMID: 39554095 PMCID: PMC11565825 DOI: 10.1101/2024.06.14.599092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Background Although interactions between amyloid-beta and tau proteins have been implicated in Alzheimer's disease (AD), the precise mechanisms by which these interactions contribute to disease progression are not yet fully understood. Moreover, despite the growing application of deep learning in various biomedical fields, its application in integrating networks to analyze disease mechanisms in AD research remains limited. In this study, we employed BIONIC, a deep learning-based network integration method, to integrate proteomics and protein-protein interaction data, with an aim to uncover factors that moderate the effects of the Aβ-tau interaction on mild cognitive impairment (MCI) and early-stage AD. Methods Proteomic data from the ROSMAP cohort were integrated with protein-protein interaction (PPI) data using a Deep Learning-based model. Linear regression analysis was applied to histopathological and gene expression data, and mutual information was used to detect moderating factors. Statistical significance was determined using the Benjamini-Hochberg correction (p < 0.05). Results Our results suggested that astrocytes and GPNMB+ microglia moderate the Aβ-tau interaction. Based on linear regression with histopathological and gene expression data, GFAP and IBA1 levels and GPNMB gene expression positively contributed to the interaction of tau with Aβ in non-dementia cases, replicating the results of the network analysis. Conclusions These findings indicate that GPNMB+ microglia moderate the Aβ-tau interaction in early AD and therefore are a novel therapeutic target. To facilitate further research, we have made the integrated network available as a visualization tool for the scientific community (URL: https://igcore.cloud/GerOmics/AlzPPMap).
Collapse
Affiliation(s)
- Akihiro Kitani
- Biomedical and Health Informatics Unit, Department of Integrated Health Science, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yusuke Matsui
- Biomedical and Health Informatics Unit, Department of Integrated Health Science, Nagoya University Graduate School of Medicine, Nagoya, Japan
- Institute for Glyco-core Research (iGCORE), Nagoya University, 461-8673 Nagoya, Aichi, Japan
| |
Collapse
|
9
|
Hu M, Ideker T. Putting proteins in context. Cell Syst 2024; 15:891-892. [PMID: 39418999 PMCID: PMC12119128 DOI: 10.1016/j.cels.2024.09.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 09/20/2024] [Indexed: 10/19/2024]
Abstract
Proteins exhibit cell-type-specific functions and interactions, yet most ways of representing proteins lack any biological or environmental context. To address this gap, recent work by Li et al.1 introduces PINNACLE, a geometric deep learning approach that generates contextualized representations of proteins by combined analysis of protein interactions and multiorgan single-cell transcriptomics.
Collapse
Affiliation(s)
- Mengzhou Hu
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Trey Ideker
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
10
|
Li Q, Hu Z, Wang Y, Li L, Fan Y, King I, Jia G, Wang S, Song L, Li Y. Progress and opportunities of foundation models in bioinformatics. Brief Bioinform 2024; 25:bbae548. [PMID: 39461902 PMCID: PMC11512649 DOI: 10.1093/bib/bbae548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/20/2024] [Accepted: 10/12/2024] [Indexed: 10/29/2024] Open
Abstract
Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly through foundation models (FMs), which address longstanding challenges in bioinformatics such as limited annotated data and data noise. These AI techniques have demonstrated remarkable efficacy across various downstream validation tasks, effectively representing diverse biological entities and heralding a new era in computational biology. The primary goal of this survey is to conduct a general investigation and summary of FMs in bioinformatics, tracing their evolutionary trajectory, current research landscape, and methodological frameworks. Our primary focus is on elucidating the application of FMs to specific biological problems, offering insights to guide the research community in choosing appropriate FMs for tasks like sequence analysis, structure prediction, and function annotation. Each section delves into the intricacies of the targeted challenges, contrasting the architectures and advancements of FMs with conventional methods and showcasing their utility across different biological domains. Further, this review scrutinizes the hurdles and constraints encountered by FMs in biology, including issues of data noise, model interpretability, and potential biases. This analysis provides a theoretical groundwork for understanding the circumstances under which certain FMs may exhibit suboptimal performance. Lastly, we outline prospective pathways and methodologies for the future development of FMs in biological research, facilitating ongoing innovation in the field. This comprehensive examination not only serves as an academic reference but also as a roadmap for forthcoming explorations and applications of FMs in biology.
Collapse
Affiliation(s)
- Qing Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Zhihang Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Yixuan Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Lei Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Yimin Fan
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Irwin King
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Gengjie Jia
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, 518120, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai, 200030, China
- Shenzhen Institute of Advanced Technology, Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen, Guangdong, 518055, China
| | - Le Song
- BioMap, Zhongguancun Life Science Park, Haidian District, Beijing, 100085, China
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| |
Collapse
|
11
|
Nasser R, Schaffer LV, Ideker T, Sharan R. Multi-modal contrastive learning of subcellular organization using DICE. Bioinformatics 2024; 40:ii105-ii110. [PMID: 39230695 PMCID: PMC11520230 DOI: 10.1093/bioinformatics/btae387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
The data deluge in biology calls for computational approaches that can integrate multiple datasets of different types to build a holistic view of biological processes or structures of interest. An emerging paradigm in this domain is the unsupervised learning of data embeddings that can be used for downstream clustering and classification tasks. While such approaches for integrating data of similar types are becoming common, there is scarcer work on consolidating different data modalities such as network and image information. Here, we introduce DICE (Data Integration through Contrastive Embedding), a contrastive learning model for multi-modal data integration. We apply this model to study the subcellular organization of proteins by integrating protein-protein interaction data and protein image data measured in HEK293 cells. We demonstrate the advantage of data integration over any single modality and show that our framework outperforms previous integration approaches. Availability: https://github.com/raminass/protein-contrastive Contact: raminass@gmail.com.
Collapse
Affiliation(s)
- Rami Nasser
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Leah V Schaffer
- Department of Medicine, University of California, San Diego, San Diego, CA 92037, United States
| | - Trey Ideker
- Department of Medicine, University of California, San Diego, San Diego, CA 92037, United States
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA 92037, United States
- Moores Cancer Center, University of California, San Diego, San Diego, CA 92037, United States
- Department of Bioengineering, University of California, San Diego, San Diego, CA 92037, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
12
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
13
|
Li MM, Huang Y, Sumathipala M, Liang MQ, Valdeolivas A, Ananthakrishnan AN, Liao K, Marbach D, Zitnik M. Contextual AI models for single-cell protein biology. Nat Methods 2024; 21:1546-1557. [PMID: 39039335 PMCID: PMC11310085 DOI: 10.1038/s41592-024-02341-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 06/10/2024] [Indexed: 07/24/2024]
Abstract
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here we introduce PINNACLE, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multiorgan single-cell atlas, PINNACLE learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. PINNACLE's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. PINNACLE outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases and pinpoints cell type contexts with higher predictive capability than context-free models. PINNACLE's ability to adjust its outputs on the basis of the context in which it operates paves the way for large-scale context-specific predictions in biology.
Collapse
Affiliation(s)
- Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yepeng Huang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marissa Sumathipala
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Man Qing Liang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alberto Valdeolivas
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Ashwin N Ananthakrishnan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, USA
| | - Katherine Liao
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Inflammation, and Immunity, Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Marbach
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
14
|
Cesnik A, Schaffer LV, Gaur I, Jain M, Ideker T, Lundberg E. Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes. Annu Rev Biomed Data Sci 2024; 7:369-389. [PMID: 38748859 PMCID: PMC11343683 DOI: 10.1146/annurev-biodatasci-102423-113534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research.
Collapse
Affiliation(s)
- Anthony Cesnik
- Department of Bioengineering, Stanford University, Stanford, California, USA;
| | - Leah V Schaffer
- Department of Medicine, University of California San Diego, La Jolla, California, USA;
| | - Ishan Gaur
- Department of Bioengineering, Stanford University, Stanford, California, USA;
| | - Mayank Jain
- Department of Medicine, University of California San Diego, La Jolla, California, USA;
| | - Trey Ideker
- Departments of Computer Science and Engineering and Bioengineering, University of California San Diego, La Jolla, California, USA
- Department of Medicine, University of California San Diego, La Jolla, California, USA;
| | - Emma Lundberg
- Chan Zuckerberg Biohub, San Francisco, California, USA
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Pathology, Stanford University, Palo Alto, California, USA
- Department of Bioengineering, Stanford University, Stanford, California, USA;
| |
Collapse
|
15
|
Zhan L, Wang Y, Wang A, Zhang Y, Cheng C, Zhao J, Zhang W, Chen J, Li P. A genome-scale deep learning model to predict gene expression changes of genetic perturbations from multiplex biological networks. Brief Bioinform 2024; 25:bbae433. [PMID: 39226889 PMCID: PMC11370636 DOI: 10.1093/bib/bbae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 07/17/2024] [Accepted: 08/19/2024] [Indexed: 09/05/2024] Open
Abstract
Systematic characterization of biological effects to genetic perturbation is essential to the application of molecular biology and biomedicine. However, the experimental exhaustion of genetic perturbations on the genome-wide scale is challenging. Here, we show TranscriptionNet, a deep learning model that integrates multiple biological networks to systematically predict transcriptional profiles to three types of genetic perturbations based on transcriptional profiles induced by genetic perturbations in the L1000 project: RNA interference, clustered regularly interspaced short palindromic repeat, and overexpression. TranscriptionNet performs better than existing approaches in predicting inducible gene expression changes for all three types of genetic perturbations. TranscriptionNet can predict transcriptional profiles for all genes in existing biological networks and increases perturbational gene expression changes for each type of genetic perturbation from a few thousand to 26 945 genes. TranscriptionNet demonstrates strong generalization ability when comparing predicted and true gene expression changes on different external tasks. Overall, TranscriptionNet can systemically predict transcriptional consequences induced by perturbing genes on a genome-wide scale and thus holds promise to systemically detect gene function and enhance drug development and target discovery.
Collapse
Affiliation(s)
- Lingmin Zhan
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Yingdong Wang
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Aoyi Wang
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Yuanyuan Zhang
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Caiping Cheng
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Jinzhong Zhao
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Wuxia Zhang
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Jianxin Chen
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, 11 North Third Ring Road East, Chaoyang District, Beijing 100029, China
| | - Peng Li
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| |
Collapse
|
16
|
Raju S, Turner ME, Cao C, Abdul-Samad M, Punwasi N, Blaser MC, Cahalane RM, Botts SR, Prajapati K, Patel S, Wu R, Gustafson D, Galant NJ, Fiddes L, Chemaly M, Hedin U, Matic L, Seidman M, Subasri V, Singh SA, Aikawa E, Fish JE, Howe KL. Multiomics unveils extracellular vesicle-driven mechanisms of endothelial communication in human carotid atherosclerosis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.21.599781. [PMID: 38979218 PMCID: PMC11230219 DOI: 10.1101/2024.06.21.599781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background: Carotid atherosclerosis is orchestrated by cell-cell communication that drives progression along a clinical continuum (asymptomatic to symptomatic). Extracellular vesicles (EVs) are cell-derived nanoparticles representing a new paradigm in cellular communication. Little is known about their biological cargo, cellular origin/destination, and functional roles in human atherosclerotic plaque. Methods: EVs were enriched via size exclusion chromatography from human carotid endarterectomy samples dissected into paired plaque and marginal zones (symptomatic n=16, asymptomatic n=13). EV cargos were assessed via whole transcriptome miRNA sequencing and mass spectrometry-based proteomics. EV multi-omics were integrated with bulk and single cell RNA-sequencing (scRNA-seq) datasets to predict EV cellular origin and ligand-receptor interactions, and multi-modal biological network integration of EV-cargo was completed. EV functional impact was assessed with endothelial angiogenesis assays. Results: Carotid plaques contained more EVs than adjacent marginal zones, with differential enrichment for EV-miRNAs and EV-proteins in key atherogenic pathways. EV cellular origin analysis suggested that tissue EV signatures originated from endothelial cells (EC), smooth muscle cells (SMC), and immune cells. Integrated tissue vesiculomics and scRNA-seq indicated complex EV-vascular cell communication that changed with disease progression and plaque vulnerability (i.e., symptomatic disease). Plaques from symptomatic patients, but not asymptomatic patients, were characterized by increased involvement of endothelial pathways and more complex ligand-receptor interactions, relative to their marginal zones. Plaque-EVs were predicted to mediate communication with ECs. Pathway enrichment analysis delineated an endothelial signature with roles in angiogenesis and neovascularization - well-known indices of plaque instability. This was validated functionally, wherein human carotid symptomatic plaque EVs induced sprouting angiogenesis in comparison to their matched marginal zones. Conclusion: Our findings indicate that EVs may drive dynamic changes in plaques through EV- vascular cell communication and effector functions that typify vulnerability to rupture, precipitating symptomatic disease. The discovery of endothelial-directed angiogenic processes mediated by EVs creates new therapeutic avenues for atherosclerosis.
Collapse
|
17
|
Song W, Xu L, Han C, Tian Z, Zou Q. Drug-target interaction predictions with multi-view similarity network fusion strategy and deep interactive attention mechanism. Bioinformatics 2024; 40:btae346. [PMID: 38837345 PMCID: PMC11164831 DOI: 10.1093/bioinformatics/btae346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 05/06/2024] [Accepted: 05/28/2024] [Indexed: 06/07/2024] Open
Abstract
MOTIVATION Accurately identifying the drug-target interactions (DTIs) is one of the crucial steps in the drug discovery and drug repositioning process. Currently, many computational-based models have already been proposed for DTI prediction and achieved some significant improvement. However, these approaches pay little attention to fuse the multi-view similarity networks related to drugs and targets in an appropriate way. Besides, how to fully incorporate the known interaction relationships to accurately represent drugs and targets is not well investigated. Therefore, there is still a need to improve the accuracy of DTI prediction models. RESULTS In this study, we propose a novel approach that employs Multi-view similarity network fusion strategy and deep Interactive attention mechanism to predict Drug-Target Interactions (MIDTI). First, MIDTI constructs multi-view similarity networks of drugs and targets with their diverse information and integrates these similarity networks effectively in an unsupervised manner. Then, MIDTI obtains the embeddings of drugs and targets from multi-type networks simultaneously. After that, MIDTI adopts the deep interactive attention mechanism to further learn their discriminative embeddings comprehensively with the known DTI relationships. Finally, we feed the learned representations of drugs and targets to the multilayer perceptron model and predict the underlying interactions. Extensive results indicate that MIDTI significantly outperforms other baseline methods on the DTI prediction task. The results of the ablation experiments also confirm the effectiveness of the attention mechanism in the multi-view similarity network fusion strategy and the deep interactive attention mechanism. AVAILABILITY AND IMPLEMENTATION https://github.com/XuLew/MIDTI.
Collapse
Affiliation(s)
- Wei Song
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Lewen Xu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Chenguang Han
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
18
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
19
|
Liu C, Xiao K, Yu C, Lei Y, Lyu K, Tian T, Zhao D, Zhou F, Tang H, Zeng J. A probabilistic knowledge graph for target identification. PLoS Comput Biol 2024; 20:e1011945. [PMID: 38578805 PMCID: PMC11034645 DOI: 10.1371/journal.pcbi.1011945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 04/22/2024] [Accepted: 02/24/2024] [Indexed: 04/07/2024] Open
Abstract
Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.
Collapse
Affiliation(s)
- Chang Liu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kaimin Xiao
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, China
- Joint Graduate Program of Peking-Tsinghua-NIBS, School of Life Sciences, Tsinghua University, Beijing, China
| | - Cuinan Yu
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Yipin Lei
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kangbo Lyu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, Jilin Province, China
| | - Haidong Tang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- School of Engineering, Westlake University, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- Research Center for Industries of the Future and School of Engineering, Westlake University, Hangzhou, Zhejiang Province, China
| |
Collapse
|
20
|
Li P, Jiang Z, Liu T, Liu X, Qiao H, Yao X. Improving drug response prediction via integrating gene relationships with deep learning. Brief Bioinform 2024; 25:bbae153. [PMID: 38600666 PMCID: PMC11006795 DOI: 10.1093/bib/bbae153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/05/2024] [Accepted: 03/18/2024] [Indexed: 04/12/2024] Open
Abstract
Predicting the drug response of cancer cell lines is crucial for advancing personalized cancer treatment, yet remains challenging due to tumor heterogeneity and individual diversity. In this study, we present a deep learning-based framework named Deep neural network Integrating Prior Knowledge (DIPK) (DIPK), which adopts self-supervised techniques to integrate multiple valuable information, including gene interaction relationships, gene expression profiles and molecular topologies, to enhance prediction accuracy and robustness. We demonstrated the superior performance of DIPK compared to existing methods on both known and novel cells and drugs, underscoring the importance of gene interaction relationships in drug response prediction. In addition, DIPK extends its applicability to single-cell RNA sequencing data, showcasing its capability for single-cell-level response prediction and cell identification. Further, we assess the applicability of DIPK on clinical data. DIPK accurately predicted a higher response to paclitaxel in the pathological complete response (pCR) group compared to the residual disease group, affirming the better response of the pCR group to the chemotherapy compound. We believe that the integration of DIPK into clinical decision-making processes has the potential to enhance individualized treatment strategies for cancer patients.
Collapse
Affiliation(s)
- Pengyong Li
- School of Computer Science and Technology,Xidian University, 710126 Xi’an, Shaanxi, China
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, 519020 Macau, China
| | - Zhengxiang Jiang
- School of Electronic Engineering, Xidian University, 710126 Xi’an, Shaanxi, China
| | - Tianxiao Liu
- School of Computer Science and Technology,Xidian University, 710126 Xi’an, Shaanxi, China
| | - Xinyu Liu
- Beijing Laboratory of Biomedical Materials, Department of Geriatric Dentistry, Peking University School and Hospital of Stomatology, 100081 Beijing, China
| | - Hui Qiao
- Department of Oncology, Tai’an Municipal Hospital, 271021 Tai’an, Shandong, China
| | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999078 Macao, China
| |
Collapse
|
21
|
Wang B, Vartak R, Zaltsman Y, Naing ZZC, Hennick KM, Polacco BJ, Bashir A, Eckhardt M, Bouhaddou M, Xu J, Sun N, Lasser MC, Zhou Y, McKetney J, Guiley KZ, Chan U, Kaye JA, Chadha N, Cakir M, Gordon M, Khare P, Drake S, Drury V, Burke DF, Gonzalez S, Alkhairy S, Thomas R, Lam S, Morris M, Bader E, Seyler M, Baum T, Krasnoff R, Wang S, Pham P, Arbalaez J, Pratt D, Chag S, Mahmood N, Rolland T, Bourgeron T, Finkbeiner S, Swaney DL, Bandyopadhay S, Ideker T, Beltrao P, Willsey HR, Obernier K, Nowakowski TJ, Hüttenhain R, State MW, Willsey AJ, Krogan NJ. A foundational atlas of autism protein interactions reveals molecular convergence. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.03.569805. [PMID: 38076945 PMCID: PMC10705567 DOI: 10.1101/2023.12.03.569805] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Translating high-confidence (hc) autism spectrum disorder (ASD) genes into viable treatment targets remains elusive. We constructed a foundational protein-protein interaction (PPI) network in HEK293T cells involving 100 hcASD risk genes, revealing over 1,800 PPIs (87% novel). Interactors, expressed in the human brain and enriched for ASD but not schizophrenia genetic risk, converged on protein complexes involved in neurogenesis, tubulin biology, transcriptional regulation, and chromatin modification. A PPI map of 54 patient-derived missense variants identified differential physical interactions, and we leveraged AlphaFold-Multimer predictions to prioritize direct PPIs and specific variants for interrogation in Xenopus tropicalis and human forebrain organoids. A mutation in the transcription factor FOXP1 led to reconfiguration of DNA binding sites and altered development of deep cortical layer neurons in forebrain organoids. This work offers new insights into molecular mechanisms underlying ASD and describes a powerful platform to develop and test therapeutic strategies for many genetically-defined conditions.
Collapse
|
22
|
Yao D, Zhang B, Li X, Zhan X, Zhan X, Zhang B. Applying negative sample denoising and multi-view feature for lncRNA-disease association prediction. Front Genet 2024; 14:1332273. [PMID: 38264213 PMCID: PMC10803626 DOI: 10.3389/fgene.2023.1332273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 12/22/2023] [Indexed: 01/25/2024] Open
Abstract
Increasing evidence indicates that mutations and dysregulation of long non-coding RNA (lncRNA) play a crucial role in the pathogenesis and prognosis of complex human diseases. Computational methods for predicting the association between lncRNAs and diseases have gained increasing attention. However, these methods face two key challenges: obtaining reliable negative samples and incorporating lncRNA-disease association (LDA) information from multiple perspectives. This paper proposes a method called NDMLDA, which combines multi-view feature extraction, unsupervised negative sample denoising, and stacking ensemble classifier. Firstly, an unsupervised method (K-means) is used to design a negative sample denoising module to alleviate the imbalance of samples and the impact of potential noise in the negative samples on model performance. Secondly, graph attention networks are employed to extract multi-view features of both lncRNAs and diseases, thereby enhancing the learning of association information between them. Finally, lncRNA-disease association prediction is implemented through a stacking ensemble classifier. Existing research datasets are integrated to evaluate performance, and 5-fold cross-validation is conducted on this dataset. Experimental results demonstrate that NDMLDA achieves an AUC of 0.9907and an AUPR of 0.9927, with a 5-fold cross-validation variance of less than 0.1%. These results outperform the baseline methods. Additionally, case studies further illustrate the model's potential in cancer diagnosis and precision medicine implementation.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Bo Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiangkui Li
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiaojuan Zhan
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South University of Science and Technology, Shenzhen, China
| | - Binbin Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| |
Collapse
|
23
|
Mancuso CA, Johnson KA, Liu R, Krishnan A. Joint representation of molecular networks from multiple species improves gene classification. PLoS Comput Biol 2024; 20:e1011773. [PMID: 38198480 PMCID: PMC10805316 DOI: 10.1371/journal.pcbi.1011773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 01/23/2024] [Accepted: 12/20/2023] [Indexed: 01/12/2024] Open
Abstract
Network-based machine learning (ML) has the potential for predicting novel genes associated with nearly any health and disease context. However, this approach often uses network information from only the single species under consideration even though networks for most species are noisy and incomplete. While some recent methods have begun addressing this shortcoming by using networks from more than one species, they lack one or more key desirable properties: handling networks from more than two species simultaneously, incorporating many-to-many orthology information, or generating a network representation that is reusable across different types of and newly-defined prediction tasks. Here, we present GenePlexusZoo, a framework that casts molecular networks from multiple species into a single reusable feature space for network-based ML. We demonstrate that this multi-species network representation improves both gene classification within a single species and knowledge-transfer across species, even in cases where the inter-species correspondence is undetectable based on shared orthologous genes. Thus, GenePlexusZoo enables effectively leveraging the high evolutionary molecular, functional, and phenotypic conservation across species to discover novel genes associated with diverse biological contexts.
Collapse
Affiliation(s)
- Christopher A. Mancuso
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Kayla A. Johnson
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Renming Liu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
24
|
Xiao H, Rosen A, Chhibbar P, Moise L, Das J. From bench to bedside via bytes: Multi-omic immunoprofiling and integration using machine learning and network approaches. Hum Vaccin Immunother 2023; 19:2282803. [PMID: 38100557 PMCID: PMC10730168 DOI: 10.1080/21645515.2023.2282803] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 11/09/2023] [Indexed: 12/17/2023] Open
Abstract
A significant surge in research endeavors leverages the vast potential of high-throughput omic technology platforms for broad profiling of biological responses to vaccines and cutting-edge immunotherapies and stem-cell therapies under development. These profiles capture different aspects of core regulatory and functional processes at different scales of resolution from molecular and cellular to organismal. Systems approaches capture the complex and intricate interplay between these layers and scales. Here, we summarize experimental data modalities, for characterizing the genome, epigenome, transcriptome, proteome, metabolome, and antibody-ome, that enable us to generate large-scale immune profiles. We also discuss machine learning and network approaches that are commonly used to analyze and integrate these modalities, to gain insights into correlates and mechanisms of natural and vaccine-mediated immunity as well as therapy-induced immunomodulation.
Collapse
Affiliation(s)
- Hanxi Xiao
- Center for Systems Immunology, Departments of Immunology and Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Aaron Rosen
- Center for Systems Immunology, Departments of Immunology and Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Prabal Chhibbar
- Center for Systems Immunology, Departments of Immunology and Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Jishnu Das
- Center for Systems Immunology, Departments of Immunology and Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
25
|
Han X, Wang B, Situ C, Qi Y, Zhu H, Li Y, Guo X. scapGNN: A graph neural network-based framework for active pathway and gene module inference from single-cell multi-omics data. PLoS Biol 2023; 21:e3002369. [PMID: 37956172 PMCID: PMC10681325 DOI: 10.1371/journal.pbio.3002369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 11/27/2023] [Accepted: 10/07/2023] [Indexed: 11/15/2023] Open
Abstract
Although advances in single-cell technologies have enabled the characterization of multiple omics profiles in individual cells, extracting functional and mechanistic insights from such information remains a major challenge. Here, we present scapGNN, a graph neural network (GNN)-based framework that creatively transforms sparse single-cell profile data into the stable gene-cell association network for inferring single-cell pathway activity scores and identifying cell phenotype-associated gene modules from single-cell multi-omics data. Systematic benchmarking demonstrated that scapGNN was more accurate, robust, and scalable than state-of-the-art methods in various downstream single-cell analyses such as cell denoising, batch effect removal, cell clustering, cell trajectory inference, and pathway or gene module identification. scapGNN was developed as a systematic R package that can be flexibly extended and enhanced for existing analysis processes. It provides a new analytical platform for studying single cells at the pathway and network levels.
Collapse
Affiliation(s)
- Xudong Han
- State Key Laboratory of Reproductive Medicine and Offspring Health, School of Medicine, Southeast University, Nanjing, China
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Bing Wang
- State Key Laboratory of Reproductive Medicine and Offspring Health, School of Medicine, Southeast University, Nanjing, China
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Chenghao Situ
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Yaling Qi
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Hui Zhu
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Yan Li
- Department of Clinical Laboratory, Sir Run Run Hospital, Nanjing Medical University, Nanjing, China
| | - Xuejiang Guo
- State Key Laboratory of Reproductive Medicine and Offspring Health, School of Medicine, Southeast University, Nanjing, China
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| |
Collapse
|
26
|
Hartman E, Scott AM, Karlsson C, Mohanty T, Vaara ST, Linder A, Malmström L, Malmström J. Interpreting biologically informed neural networks for enhanced proteomic biomarker discovery and pathway analysis. Nat Commun 2023; 14:5359. [PMID: 37660105 PMCID: PMC10475049 DOI: 10.1038/s41467-023-41146-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 08/22/2023] [Indexed: 09/04/2023] Open
Abstract
The incorporation of machine learning methods into proteomics workflows improves the identification of disease-relevant biomarkers and biological pathways. However, machine learning models, such as deep neural networks, typically suffer from lack of interpretability. Here, we present a deep learning approach to combine biological pathway analysis and biomarker identification to increase the interpretability of proteomics experiments. Our approach integrates a priori knowledge of the relationships between proteins and biological pathways and biological processes into sparse neural networks to create biologically informed neural networks. We employ these networks to differentiate between clinical subphenotypes of septic acute kidney injury and COVID-19, as well as acute respiratory distress syndrome of different aetiologies. To gain biological insight into the complex syndromes, we utilize feature attribution-methods to introspect the networks for the identification of proteins and pathways important for distinguishing between subtypes. The algorithms are implemented in a freely available open source Python-package ( https://github.com/InfectionMedicineProteomics/BINN ).
Collapse
Affiliation(s)
- Erik Hartman
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden.
| | - Aaron M Scott
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Christofer Karlsson
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Tirthankar Mohanty
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Suvi T Vaara
- Department of Perioperative and Intensive Care, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Adam Linder
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Lars Malmström
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Johan Malmström
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden.
| |
Collapse
|
27
|
Wu X, Jia W. Multimodal deep learning as a next challenge in nutrition research: tailoring fermented dairy products based on cytidine diphosphate-diacylglycerol synthase-mediated lipid metabolism. Crit Rev Food Sci Nutr 2023; 64:12272-12283. [PMID: 37615630 DOI: 10.1080/10408398.2023.2248633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
Deep learning is evolving in nutritional epidemiology to address challenges including precise nutrition and data-driven disease modeling. Fermented dairy products consumption as the implementation of specific dietary priority contributes to a lower risk of all-cause mortality, cardiovascular disease, and obesity. Various lipid types play different roles in cardiometabolic health and fermentation process changes the lipid profile in dairy products. Leveraging the power of multiple biological datasets can provide mechanistic insights into how proteins impact lipid pathways, and establish connections among fermentation-lipid biomarkers-protein. The recent leap of deep learning has been performed in food category recognition, agro-food freshness detection, and food flavor prediction and regulation. The proposed multimodal deep learning method includes four steps: (i) Forming data matrices based on data generated from different omics layers. (ii) Decomposing high-dimensional omics data according to self-attention mechanism. (iii) Constructing View Correlation Discovery Network to learn the cross-omics correlations and integrate different omics datasets. (iv) Depicting a biological network for lipid metabolism-centered quantitative multi-omics data analysis. Relying on the cytidine diphosphate-diacylglycerol synthase-mediated lipid metabolism regulates the glycerophospholipid composition of fermented dairy effectively. Innovative processing strategies including ohmic heating and pulsed electric field improve the sensory qualities and nutritional characteristics of the products.
Collapse
Affiliation(s)
- Xixuan Wu
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
| | - Wei Jia
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
- Shaanxi Research Institute of Agricultural Products Processing Technology, Xi'an, China
| |
Collapse
|
28
|
Yang L, Chen R, Melendy T, Goodison S, Sun Y. Identifying Significantly Perturbed Subnetworks in Cancer Using Multiple Protein-Protein Interaction Networks. Cancers (Basel) 2023; 15:4090. [PMID: 37627118 PMCID: PMC10452419 DOI: 10.3390/cancers15164090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 08/03/2023] [Accepted: 08/12/2023] [Indexed: 08/27/2023] Open
Abstract
BACKGROUND The identification of cancer driver genes and key molecular pathways has been the focus of large-scale cancer genome studies. Network-based methods detect significantly perturbed subnetworks as putative cancer pathways by incorporating genomics data with the topological information of PPI networks. However, commonly used PPI networks have distinct topological structures, making the results of the same method vary widely when applied to different networks. Furthermore, emerging context-specific PPI networks often have incomplete topological structures, which pose serious challenges for existing subnetwork detection algorithms. METHODS In this paper, we propose a novel method, referred to as MultiFDRnet, to address the above issues. The basic idea is to model a set of PPI networks as a multiplex network to preserve the topological structure of individual networks, while introducing dependencies among them, and, then, to detect significantly perturbed subnetworks on the modeled multiplex network using all the structural information simultaneously. RESULTS To illustrate the effectiveness of the proposed approach, an extensive benchmark analysis was conducted on both simulated and real cancer data. The experimental results showed that the proposed method is able to detect significantly perturbed subnetworks jointly supported by multiple PPI networks and to identify novel modular structures in context-specific PPI networks.
Collapse
Affiliation(s)
- Le Yang
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY 14203, USA; (L.Y.); (R.C.); (T.M.)
| | - Runpu Chen
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY 14203, USA; (L.Y.); (R.C.); (T.M.)
| | - Thomas Melendy
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY 14203, USA; (L.Y.); (R.C.); (T.M.)
| | - Steve Goodison
- Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, FL 32224, USA;
| | - Yijun Sun
- Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY 14203, USA; (L.Y.); (R.C.); (T.M.)
- Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY 14203, USA
| |
Collapse
|
29
|
Nasser R, Sharan R. BERTwalk for integrating gene networks to predict gene- to pathway-level properties. BIOINFORMATICS ADVANCES 2023; 3:vbad086. [PMID: 37448813 PMCID: PMC10336298 DOI: 10.1093/bioadv/vbad086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 06/14/2023] [Accepted: 07/02/2023] [Indexed: 07/15/2023]
Abstract
Motivation Graph representation learning is a fundamental problem in the field of data science with applications to integrative analysis of biological networks. Previous work in this domain was mostly limited to shallow representation techniques. A recent deep representation technique, BIONIC, has achieved state-of-the-art results in a variety of tasks but used arbitrarily defined components. Results Here, we present BERTwalk, an unsupervised learning scheme that combines the BERT masked language model with a network propagation regularization for graph representation learning. The transformation from networks to texts allows our method to naturally integrate different networks and provide features that inform not only nodes or edges but also pathway-level properties. We show that our BERTwalk model outperforms BIONIC, as well as four other recent methods, on two comprehensive benchmarks in yeast and human. We further show that our model can be utilized to infer functional pathways and their effects. Availability and implementation Code and data are available at https://github.com/raminass/BERTwalk. Contact roded@tauex.tau.ac.il.
Collapse
Affiliation(s)
- Rami Nasser
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | | |
Collapse
|
30
|
Woicik A, Zhang M, Xu H, Mostafavi S, Wang S. Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling. Bioinformatics 2023; 39:i504-i512. [PMID: 37387142 DOI: 10.1093/bioinformatics/btad247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks. RESULTS To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven network distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in F1 score, 15% improvement in micro-AUPRC, and 63% improvement in macro-AUPRC for human protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini's performance significantly improves when more networks are added to the input network collection, while Mashup and BIONIC embeddings' performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks and can be used to massively integrate and analyze networks in other domains. AVAILABILITY AND IMPLEMENTATION Gemini can be accessed at: https://github.com/MinxZ/Gemini.
Collapse
Affiliation(s)
- Addie Woicik
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Mingxin Zhang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Hanwen Xu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
31
|
Koyyada P, Mishra S. A systematic computational analysis of Mycobacterium tuberculosis H37Rv and human CD34+ genomic expression reveals crucial molecular entities involved in infection progression. J Biomol Struct Dyn 2023; 41:13332-13347. [PMID: 36744528 DOI: 10.1080/07391102.2023.2175257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 01/19/2023] [Indexed: 02/07/2023]
Abstract
The co-evolution of Mycobacterium tuberculosis H37Rv along with its host systems enables the pathogenic bacterium to emerge as a multi-drug resistant form. This creates challenges for a more efficacious treatment strategy that can mitigate the infection. Working towards the same, our study followed a mathematical and statistical approach proposing that mycobacterial transcription factors regulating virulence and adaptation, host cell cytoplasmic component metabolism, oxidoreductase activity and respiratory ETC would be targets for antibiotics against Mycobacterium tuberculosis. Simultaneously, extending the statistical study on Mycobacterium-infected human cord blood CD34+ cells revealed that the human CD34+ genes, S100A8 and FGR (tyrosine-protein kinase, Src2), might be affected in the infection pathogenesis by Mycobacterium. Further, the deduced Mycobacterium-human gene interaction network proposed that mycobacterial coregulators Rv0452 (MarR family regulator) and Rv3862c (WhiB6) triggered genes controlling bacterial metabolism, which influences human immunological pathways involving TLR2 and CXCL8/MAPK8.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Praveena Koyyada
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Seema Mishra
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| |
Collapse
|