1
|
Yan H, Weng D, Li D, Gu Y, Ma W, Liu Q. Prior knowledge-guided multilevel graph neural network for tumor risk prediction and interpretation via multi-omics data integration. Brief Bioinform 2024; 25:bbae184. [PMID: 38670157 PMCID: PMC11052635 DOI: 10.1093/bib/bbae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 04/06/2024] [Indexed: 04/28/2024] Open
Abstract
The interrelation and complementary nature of multi-omics data can provide valuable insights into the intricate molecular mechanisms underlying diseases. However, challenges such as limited sample size, high data dimensionality and differences in omics modalities pose significant obstacles to fully harnessing the potential of these data. The prior knowledge such as gene regulatory network and pathway information harbors useful gene-gene interaction and gene functional module information. To effectively integrate multi-omics data and make full use of the prior knowledge, here, we propose a Multilevel-graph neural network (GNN): a hierarchically designed deep learning algorithm that sequentially leverages multi-omics data, gene regulatory networks and pathway information to extract features and enhance accuracy in predicting survival risk. Our method achieved better accuracy compared with existing methods. Furthermore, key factors nonlinearly associated with the tumor pathogenesis are prioritized by employing two interpretation algorithms (i.e. GNN-Explainer and IGscore) for neural networks, at gene and pathway level, respectively. The top genes and pathways exhibit strong associations with disease in survival analyses, many of which such as SEC61G and CYP27B1 are previously reported in the literature.
Collapse
Affiliation(s)
- Hongxi Yan
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| | - Dawei Weng
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Dongguo Li
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Yu Gu
- School of Biomedical Engineering, Capital Medical University, 10 You An Men WaiXi Tou Tiao, 100069, Beijing, China
| | - Wenji Ma
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, 200025, Shanghai, China
| | - Qingjie Liu
- Department of Computer Science, Beihang University, XueYuan Road, 100191, BeiJing, China
| |
Collapse
|
2
|
Latapiat V, Saez M, Pedroso I, Martin AJM. Unraveling patient heterogeneity in complex diseases through individualized co-expression networks: a perspective. Front Genet 2023; 14:1209416. [PMID: 37636264 PMCID: PMC10449456 DOI: 10.3389/fgene.2023.1209416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/24/2023] [Indexed: 08/29/2023] Open
Abstract
This perspective highlights the potential of individualized networks as a novel strategy for studying complex diseases through patient stratification, enabling advancements in precision medicine. We emphasize the impact of interpatient heterogeneity resulting from genetic and environmental factors and discuss how individualized networks improve our ability to develop treatments and enhance diagnostics. Integrating system biology, combining multimodal information such as genomic and clinical data has reached a tipping point, allowing the inference of biological networks at a single-individual resolution. This approach generates a specific biological network per sample, representing the individual from which the sample originated. The availability of individualized networks enables applications in personalized medicine, such as identifying malfunctions and selecting tailored treatments. In essence, reliable, individualized networks can expedite research progress in understanding drug response variability by modeling heterogeneity among individuals and enabling the personalized selection of pharmacological targets for treatment. Therefore, developing diverse and cost-effective approaches for generating these networks is crucial for widespread application in clinical services.
Collapse
Affiliation(s)
- Verónica Latapiat
- Programa de Doctorado en Genómica Integrativa, Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
- Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile
| | - Mauricio Saez
- Centro de Oncología de Precisión, Facultad de Medicina y Ciencias de la Salud, Universidad Mayor, Santiago, Chile
- Laboratorio de Investigación en Salud de Precisión, Departamento de Procesos Diagnósticos y Evaluación, Facultad de Ciencias de la Salud, Universidad Católica de Temuco, Temuco, Chile
| | - Inti Pedroso
- Vicerrectoría de Investigación, Universidad Mayor, Santiago, Chile
| | - Alberto J. M. Martin
- Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile
- Escuela de Ingeniería, Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Santiago, Chile
| |
Collapse
|
3
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 PMCID: PMC10186658 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| |
Collapse
|
4
|
Timakum T, Song M, Kim G. Integrated entitymetrics analysis for health information on bipolar disorder using social media data and scientific literature. ASLIB J INFORM MANAG 2022. [DOI: 10.1108/ajim-02-2022-0090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThis study aimed to examine the mental health information entities and associations between the biomedical, psychological and social domains of bipolar disorder (BD) by analyzing social media data and scientific literature.Design/methodology/approachReddit posts and full-text papers from PubMed Central (PMC) were collected. The text analysis was used to create a psychological dictionary. The text mining tools were applied to extract BD entities and their relationships in the datasets using a dictionary- and rule-based approach. Lastly, social network analysis and visualization were employed to view the associations.FindingsMental health information on the drug side effects entity was detected frequently in both datasets. In the affective category, the most frequent entities were “depressed” and “severe” in the social media and PMC data, respectively. The social and personal concerns entities that related to friends, family, self-attitude and economy were found repeatedly in the Reddit data. The relationships between the biomedical and psychological processes, “afraid” and “Lithium” and “schizophrenia” and “suicidal,” were identified often in the social media and PMC data, respectively.Originality/valueMental health information has been increasingly sought-after, and BD is a mental illness with complicated factors in the clinical picture. This paper has made an original contribution to comprehending the biological, psychological and social factors of BD. Importantly, these results have highlighted the benefit of mental health informatics that can be analyzed in the laboratory and social media domains.
Collapse
|
5
|
McInnis MG, Andreassen OA, Andreazza AC, Alon U, Berk M, Brister T, Burdick KE, Cui D, Frye M, Leboyer M, Mitchell PB, Merikangas K, Nierenberg AA, Nurnberger JI, Pham D, Vieta E, Yatham LN, Young AH. Strategies and foundations for scientific discovery in longitudinal studies of bipolar disorder. Bipolar Disord 2022; 24:499-508. [PMID: 35244317 PMCID: PMC9440950 DOI: 10.1111/bdi.13198] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Bipolar disorder (BD) is a complex and dynamic condition with a typical onset in late adolescence or early adulthood followed by an episodic course with intervening periods of subthreshold symptoms or euthymia. It is complicated by the accumulation of comorbid medical and psychiatric disorders. The etiology of BD remains unknown and no reliable biological markers have yet been identified. This is likely due to lack of comprehensive ontological framework and, most importantly, the fact that most studies have been based on small nonrepresentative clinical samples with cross-sectional designs. We propose to establish large, global longitudinal cohorts of BD studied consistently in a multidimensional and multidisciplinary manner to determine etiology and help improve treatment. Herein we propose collection of a broad range of data that reflect the heterogenic phenotypic manifestations of BD that include dimensional and categorical measures of mood, neurocognitive, personality, behavior, sleep and circadian, life-story, and outcomes domains. In combination with genetic and biological information such an approach promotes the integrating and harmonizing of data within and across current ontology systems while supporting a paradigm shift that will facilitate discovery and become the basis for novel hypotheses.
Collapse
Affiliation(s)
| | - Ole A. Andreassen
- NORMENT CentreUniversity of Oslo and Oslo University HospitalOsloNorway
| | - Ana C. Andreazza
- Department of Pharmacology & ToxicologyTemerty Faculty of MedicineUniversity of TorontoTorontoOntarioCanada
| | | | - Michael Berk
- Deakin UniversityIMPACT – the Institute for Mental and Physical Health and Clinical TranslationSchool of MedicineBarwon HealthGeelongAustralia
- OrygenThe National Centre of Excellence in Youth Mental HealthCentre for Youth Mental HealthFlorey Institute for Neuroscience and Mental Health and the Department of PsychiatryThe University of MelbourneMelbourneAustralia
| | - Teri Brister
- National Alliance on Mental IllnessArlingtonVirginiaUSA
| | | | - Donghong Cui
- Shanghai Mental Health CenterShanghai Jiao Tong University School of MedicineShanghai Mental Health CenterShangaiChina
| | | | - Marion Leboyer
- Département de psychiatrieUniversité Paris Est Creteil (UPEC)AP‐HPHôpitaux Universitaires H. MondorDMU IMPACTINSERM, translational NeuropsychiatryFondation FondaMentalCreteilFrance
| | | | - Kathleen Merikangas
- Intramural Research ProgramNational Institute of Mental HealthBethesdaMarylandUSA
| | | | | | - Daniel Pham
- Milken InstituteCenter for Strategic PhilanthopyWashingtonDistrict of ColumbiaUSA
| | - Eduard Vieta
- Bipolar and Depressive disorders UnitHospital ClinicInstitute of NeuroscienceUniversity of BarcelonaIDIBAPSCIBERSAMBarcelonaCataloniaSpain
| | | | - Allan H. Young
- Department of Psychological MedicineInstitute of Psychiatry, Psychology and NeuroscienceKing’s College London & South London and Maudsley NHS Foundation TrustBethlem Royal HospitalBeckenhamKentUK
| |
Collapse
|
6
|
Adipokines in Sleep Disturbance and Metabolic Dysfunction: Insights from Network Analysis. Clocks Sleep 2022; 4:321-331. [PMID: 35892989 PMCID: PMC9326621 DOI: 10.3390/clockssleep4030027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 06/18/2022] [Accepted: 06/20/2022] [Indexed: 11/17/2022] Open
Abstract
Adipokines are a growing group of secreted proteins that play important roles in obesity, sleep disturbance, and metabolic derangements. Due to the complex interplay between adipokines, sleep, and metabolic regulation, an integrated approach is required to better understand the significance of adipokines in these processes. In the present study, we created and analyzed a network of six adipokines and their molecular partners involved in sleep disturbance and metabolic dysregulation. This network represents information flow from regulatory factors, adipokines, and physiologic pathways to disease processes in metabolic dysfunction. Analyses using network metrics revealed that obesity and obstructive sleep apnea were major drivers for the sleep associated metabolic dysregulation. Two adipokines, leptin and adiponectin, were found to have higher degrees than other adipokines, indicating their central roles in the network. These adipokines signal through major metabolic pathways such as insulin signaling, inflammation, food intake, and energy expenditure, and exert their functions in cardiovascular, reproductive, and autoimmune diseases. Leptin, AMP activated protein kinase (AMPK), and fatty acid oxidation were found to have global influence in the network and represent potentially important interventional targets for metabolic and sleep disorders. These findings underscore the great potential of using network based approaches to identify new insights and pharmaceutical targets in metabolic and sleep disorders.
Collapse
|
7
|
Network-Based Methods for Approaching Human Pathologies from a Phenotypic Point of View. Genes (Basel) 2022; 13:genes13061081. [PMID: 35741843 PMCID: PMC9222217 DOI: 10.3390/genes13061081] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 01/27/2023] Open
Abstract
Network and systemic approaches to studying human pathologies are helping us to gain insight into the molecular mechanisms of and potential therapeutic interventions for human diseases, especially for complex diseases where large numbers of genes are involved. The complex human pathological landscape is traditionally partitioned into discrete “diseases”; however, that partition is sometimes problematic, as diseases are highly heterogeneous and can differ greatly from one patient to another. Moreover, for many pathological states, the set of symptoms (phenotypes) manifested by the patient is not enough to diagnose a particular disease. On the contrary, phenotypes, by definition, are directly observable and can be closer to the molecular basis of the pathology. These clinical phenotypes are also important for personalised medicine, as they can help stratify patients and design personalised interventions. For these reasons, network and systemic approaches to pathologies are gradually incorporating phenotypic information. This review covers the current landscape of phenotype-centred network approaches to study different aspects of human diseases.
Collapse
|
8
|
|
9
|
Zeng H, Zhang J, Preising GA, Rubel T, Singh P, Ritz A. Graphery: interactive tutorials for biological network algorithms. Nucleic Acids Res 2021; 49:W257-W262. [PMID: 34037782 PMCID: PMC8262715 DOI: 10.1093/nar/gkab420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/19/2021] [Accepted: 05/03/2021] [Indexed: 11/14/2022] Open
Abstract
Networks have been an excellent framework for modeling complex biological information, but the methodological details of network-based tools are often described for a technical audience. We have developed Graphery, an interactive tutorial webserver that illustrates foundational graph concepts frequently used in network-based methods. Each tutorial describes a graph concept along with executable Python code that can be interactively run on a graph. Users navigate each tutorial using their choice of real-world biological networks that highlight the diverse applications of network algorithms. Graphery also allows users to modify the code within each tutorial or write new programs, which all can be executed without requiring an account. Graphery accepts ideas for new tutorials and datasets that will be shaped by both computational and biological researchers, growing into a community-contributed learning platform. Graphery is available at https://graphery.reedcompbio.org/.
Collapse
Affiliation(s)
- Heyuan Zeng
- Computer Science Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA.,Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Jinbiao Zhang
- Information and Communication Technology Department, Xiamen University Malaysia, Jalan Sunsuria, Bandar Sunsuria, 43900 Sepang, Selangor Darul Ehsan, Malaysia
| | - Gabriel A Preising
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Tobias Rubel
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Pramesh Singh
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Anna Ritz
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| |
Collapse
|
10
|
Pirch S, Müller F, Iofinova E, Pazmandi J, Hütter CVR, Chiettini M, Sin C, Boztug K, Podkosova I, Kaufmann H, Menche J. The VRNetzer platform enables interactive network analysis in Virtual Reality. Nat Commun 2021; 12:2432. [PMID: 33893283 PMCID: PMC8065164 DOI: 10.1038/s41467-021-22570-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 03/09/2021] [Indexed: 12/17/2022] Open
Abstract
Networks provide a powerful representation of interacting components within complex systems, making them ideal for visually and analytically exploring big data. However, the size and complexity of many networks render static visualizations on typically-sized paper or screens impractical, resulting in proverbial ‘hairballs’. Here, we introduce a Virtual Reality (VR) platform that overcomes these limitations by facilitating the thorough visual, and interactive, exploration of large networks. Our platform allows maximal customization and extendibility, through the import of custom code for data analysis, integration of external databases, and design of arbitrary user interface elements, among other features. As a proof of concept, we show how our platform can be used to interactively explore genome-scale molecular networks to identify genes associated with rare diseases and understand how they might contribute to disease development. Our platform represents a general purpose, VR-based data exploration platform for large and diverse data types by providing an interface that facilitates the interaction between human intuition and state-of-the-art analysis methods. Data-rich networks can be difficult to interpret beyond a certain size. Here, the authors introduce a platform that uses virtual reality to allow the visual exploration of large networks, while interfacing with data repositories and other analytical methods to improve the interpretation of big data.
Collapse
Affiliation(s)
- Sebastian Pirch
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Felix Müller
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Eugenia Iofinova
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Julia Pazmandi
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria.,Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria
| | - Christiane V R Hütter
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Martin Chiettini
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Celine Sin
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Kaan Boztug
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.,Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria.,St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria.,St. Anna Children's Hospital, Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, Vienna, Austria.,Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, Vienna, Austria
| | - Iana Podkosova
- Institute of Visual Computing and Human-Centered Technology, TU Wien, Vienna, Austria
| | - Hannes Kaufmann
- Institute of Visual Computing and Human-Centered Technology, TU Wien, Vienna, Austria
| | - Jörg Menche
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria. .,Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria. .,Faculty of Mathematics, University of Vienna, Vienna, Austria.
| |
Collapse
|
11
|
Liu R, Krishnan A. PecanPy: a fast, efficient, and parallelized Python implementation of node2vec. Bioinformatics 2021; 37:3377-3379. [PMID: 33760066 PMCID: PMC8504639 DOI: 10.1093/bioinformatics/btab202] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 03/07/2021] [Accepted: 03/23/2021] [Indexed: 11/12/2022] Open
Abstract
Summary Learning low-dimensional representations (embeddings) of nodes in large graphs is key to applying machine learning on massive biological networks. Node2vec is the most widely used method for node embedding. However, its original Python and C++ implementations scale poorly with network density, failing for dense biological networks with hundreds of millions of edges. We have developed PecanPy, a new Python implementation of node2vec that uses cache-optimized compact graph data structures and precomputing/parallelization to result in fast, high-quality node embeddings for biological networks of all sizes and densities. Availabilityand implementation PecanPy software is freely available at https://github.com/krishnanlab/PecanPy. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renming Liu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI
| |
Collapse
|
12
|
Levi H, Elkon R, Shamir R. DOMINO: a network-based active module identification algorithm with reduced rate of false calls. Mol Syst Biol 2021; 17:e9593. [PMID: 33471440 PMCID: PMC7816759 DOI: 10.15252/msb.20209593] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 11/09/2020] [Accepted: 11/11/2020] [Indexed: 01/18/2023] Open
Abstract
Algorithms for active module identification (AMI) are central to analysis of omics data. Such algorithms receive a gene network and nodes' activity scores as input and report subnetworks that show significant over-representation of accrued activity signal ("active modules"), thus representing biological processes that presumably play key roles in the analyzed conditions. Here, we systematically evaluated six popular AMI methods on gene expression and GWAS data. We observed that GO terms enriched in modules detected on the real data were often also enriched on modules found on randomly permuted data. This indicated that AMI methods frequently report modules that are not specific to the biological context measured by the analyzed omics dataset. To tackle this bias, we designed a permutation-based method that empirically evaluates GO terms reported by AMI methods. We used the method to fashion five novel AMI performance criteria. Last, we developed DOMINO, a novel AMI algorithm, that outperformed the other six algorithms in extensive testing on GE and GWAS data. Software is available at https://github.com/Shamir-Lab.
Collapse
Affiliation(s)
- Hagai Levi
- The Blavatnik School of Computer ScienceTel Aviv UniversityTel AvivIsrael
| | - Ran Elkon
- Department of Human Molecular Genetics and BiochemistrySackler School of MedicineTel Aviv UniversityTel AvivIsrael
- Sagol School of NeuroscienceTel Aviv UniversityTel AvivIsrael
| | - Ron Shamir
- The Blavatnik School of Computer ScienceTel Aviv UniversityTel AvivIsrael
| |
Collapse
|
13
|
Vulliard L, Menche J. Complex Networks in Health and Disease. SYSTEMS MEDICINE 2021. [PMCID: PMC7263184 DOI: 10.1016/b978-0-12-801238-3.11640-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
From protein interactions to signal transduction, from metabolism to the nervous system: Virtually all processes in health and disease rely on the careful orchestration of a large number of diverse individual components ranging from molecules to cells and entire organs. Networks provide a powerful framework for describing and understanding these complex systems in a wholistic fashion. They offer a unique combination of a highly intuitive, qualitative description, and a plethora of analytical, quantitative tools. Here we provide a brief introduction to the emerging field of network medicine. After an overview of the core concepts for connecting network characteristics to biological functions, we review commonly used networks, ranging from the molecular interaction networks that form the basis of all biological processes in the cell to the global transportation networks that govern the spread of global epidemics. Lastly, we highlight current conceptual and practical challenges.
Collapse
|
14
|
Hudson IL. Data Integration Using Advances in Machine Learning in Drug Discovery and Molecular Biology. Methods Mol Biol 2021; 2190:167-184. [PMID: 32804365 DOI: 10.1007/978-1-0716-0826-5_7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
While the term artificial intelligence and the concept of deep learning are not new, recent advances in high-performance computing, the availability of large annotated data sets required for training, and novel frameworks for implementing deep neural networks have led to an unprecedented acceleration of the field of molecular (network) biology and pharmacogenomics. The need to align biological data to innovative machine learning has stimulated developments in both data integration (fusion) and knowledge representation, in the form of heterogeneous, multiplex, and biological networks or graphs. In this chapter we briefly introduce several popular neural network architectures used in deep learning, namely, the fully connected deep neural network, recurrent neural network, convolutional neural network, and the autoencoder. Deep learning predictors, classifiers, and generators utilized in modern feature extraction may well assist interpretability and thus imbue AI tools with increased explication, potentially adding insights and advancements in novel chemistry and biology discovery.The capability of learning representations from structures directly without using any predefined structure descriptor is an important feature distinguishing deep learning from other machine learning methods and makes the traditional feature selection and reduction procedures unnecessary. In this chapter we briefly show how these technologies are applied for data integration (fusion) and analysis in drug discovery research covering these areas: (1) application of convolutional neural networks to predict ligand-protein interactions; (2) application of deep learning in compound property and activity prediction; (3) de novo design through deep learning. We also: (1) discuss some aspects of future development of deep learning in drug discovery/chemistry; (2) provide references to published information; (3) provide recently advocated recommendations on using artificial intelligence and deep learning in -omics research and drug discovery.
Collapse
Affiliation(s)
- Irene Lena Hudson
- Mathematical Sciences, School of Science, RMIT University, Melbourne, VIC, Australia.
| |
Collapse
|
15
|
Network integration and modelling of dynamic drug responses at multi-omics levels. Commun Biol 2020; 3:573. [PMID: 33060801 PMCID: PMC7567116 DOI: 10.1038/s42003-020-01302-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 09/14/2020] [Indexed: 12/25/2022] Open
Abstract
Uncovering cellular responses from heterogeneous genomic data is crucial for molecular medicine in particular for drug safety. This can be realized by integrating the molecular activities in networks of interacting proteins. As proof-of-concept we challenge network modeling with time-resolved proteome, transcriptome and methylome measurements in iPSC-derived human 3D cardiac microtissues to elucidate adverse mechanisms of anthracycline cardiotoxicity measured with four different drugs (doxorubicin, epirubicin, idarubicin and daunorubicin). Dynamic molecular analysis at in vivo drug exposure levels reveal a network of 175 disease-associated proteins and identify common modules of anthracycline cardiotoxicity in vitro, related to mitochondrial and sarcomere function as well as remodeling of extracellular matrix. These in vitro-identified modules are transferable and are evaluated with biopsies of cardiomyopathy patients. This to our knowledge most comprehensive study on anthracycline cardiotoxicity demonstrates a reproducible workflow for molecular medicine and serves as a template for detecting adverse drug responses from complex omics data. Using a network propagation approach with integrated multi-omic data, Selevsek et al. develop a reproducible workflow for identifying drug toxicity effects in cellular systems. This is demonstrated with the analysis of anthracycline cardiotoxicity in cardiac microtissues under the effect of multiple drugs.
Collapse
|
16
|
Li W, Zhang S, Yang G. Dynamic organization of intracellular organelle networks. WIREs Mech Dis 2020; 13:e1505. [PMID: 32865347 DOI: 10.1002/wsbm.1505] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 06/06/2020] [Accepted: 07/09/2020] [Indexed: 01/07/2023]
Abstract
Intracellular organelles are membrane-bound and biochemically distinct compartments constructed to serve specialized functions in eukaryotic cells. Through extensive interactions, they form networks to coordinate and integrate their specialized functions for cell physiology. A fundamental property of these organelle networks is that they constantly undergo dynamic organization via membrane fusion and fission to remodel their internal connections and to mediate direct material exchange between compartments. The dynamic organization not only enables them to serve critical physiological functions adaptively but also differentiates them from many other biological networks such as gene regulatory networks and cell signaling networks. This review examines this fundamental property of the organelle networks from a systems point of view. The focus is exclusively on homotypic networks formed by mitochondria, lysosomes, endosomes, and the endoplasmic reticulum, respectively. First, key mechanisms that drive the dynamic organization of these networks are summarized. Then, several distinct organizational properties of these networks are highlighted. Next, spatial properties of the dynamic organization of these networks are emphasized, and their functional implications are examined. Finally, some representative molecular machineries that mediate the dynamic organization of these networks are surveyed. Overall, the dynamic organization of intracellular organelle networks is emerging as a fundamental and unifying paradigm in the internal organization of eukaryotic cells. This article is categorized under: Metabolic Diseases > Molecular and Cellular Physiology.
Collapse
Affiliation(s)
- Wenjing Li
- Laboratory of Computational Biology and Machine Intelligence, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.,National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Shuhao Zhang
- Laboratory of Computational Biology and Machine Intelligence, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.,National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China.,College of Life Sciences, Nankai University, Tianjin, China
| | - Ge Yang
- Laboratory of Computational Biology and Machine Intelligence, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.,National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China.,Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.,Department of Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
17
|
Ye C, Paccanaro A, Gerstein M, Yan KK. The corrected gene proximity map for analyzing the 3D genome organization using Hi-C data. BMC Bioinformatics 2020; 21:222. [PMID: 32471347 PMCID: PMC7260828 DOI: 10.1186/s12859-020-03545-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 05/11/2020] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Genome-wide ligation-based assays such as Hi-C provide us with an unprecedented opportunity to investigate the spatial organization of the genome. Results of a typical Hi-C experiment are often summarized in a chromosomal contact map, a matrix whose elements reflect the co-location frequencies of genomic loci. To elucidate the complex structural and functional interactions between those genomic loci, networks offer a natural and powerful framework. RESULTS We propose a novel graph-theoretical framework, the Corrected Gene Proximity (CGP) map to study the effect of the 3D spatial organization of genes in transcriptional regulation. The starting point of the CGP map is a weighted network, the gene proximity map, whose weights are based on the contact frequencies between genes extracted from genome-wide Hi-C data. We derive a null model for the network based on the signal contributed by the 1D genomic distance and use it to "correct" the gene proximity for cell type 3D specific arrangements. The CGP map, therefore, provides a network framework for the 3D structure of the genome on a global scale. On human cell lines, we show that the CGP map can detect and quantify gene co-regulation and co-localization more effectively than the map obtained by raw contact frequencies. Analyzing the expression pattern of metabolic pathways of two hematopoietic cell lines, we find that the relative positioning of the genes, as captured and quantified by the CGP, is highly correlated with their expression change. We further show that the CGP map can be used to form an inter-chromosomal proximity map that allows large-scale abnormalities, such as chromosomal translocations, to be identified. CONCLUSIONS The Corrected Gene Proximity map is a map of the 3D structure of the genome on a global scale. It allows the simultaneous analysis of intra- and inter- chromosomal interactions and of gene co-regulation and co-localization more effectively than the map obtained by raw contact frequencies, thus revealing hidden associations between global spatial positioning and gene expression. The flexible graph-based formalism of the CGP map can be easily generalized to study any existing Hi-C datasets.
Collapse
Affiliation(s)
- Cheng Ye
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham, TW20 0EX, UK
| | - Alberto Paccanaro
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham, TW20 0EX, UK.
- School of Applied Mathematics, Fundação Getulio Vargas, Rio de Janeiro, Brazil.
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Department of Molecular Biophysics and Biochemistry, Department of Computer Science, Department of Statistics and Data Science, Yale University, New Haven, CT, 06520, USA
| | - Koon-Kiu Yan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105-3678, USA.
| |
Collapse
|
18
|
Koutrouli M, Karatzas E, Paez-Espino D, Pavlopoulos GA. A Guide to Conquer the Biological Network Era Using Graph Theory. Front Bioeng Biotechnol 2020; 8:34. [PMID: 32083072 PMCID: PMC7004966 DOI: 10.3389/fbioe.2020.00034] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/15/2020] [Indexed: 12/24/2022] Open
Abstract
Networks are one of the most common ways to represent biological systems as complex sets of binary interactions or relations between different bioentities. In this article, we discuss the basic graph theory concepts and the various graph types, as well as the available data structures for storing and reading graphs. In addition, we describe several network properties and we highlight some of the widely used network topological features. We briefly mention the network patterns, motifs and models, and we further comment on the types of biological and biomedical networks along with their corresponding computer- and human-readable file formats. Finally, we discuss a variety of algorithms and metrics for network analyses regarding graph drawing, clustering, visualization, link prediction, perturbation, and network alignment as well as the current state-of-the-art tools. We expect this review to reach a very broad spectrum of readers varying from experts to beginners while encouraging them to enhance the field further.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Department of Informatics and Telecommunications, University of Athens, Athens, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, Department of Energy, Joint Genome Institute, Walnut Creek, CA, United States
| | | |
Collapse
|
19
|
Zhou Y, Lauschke VM. Pharmacogenomic network analysis of the gene-drug interaction landscape underlying drug disposition. Comput Struct Biotechnol J 2020; 18:52-58. [PMID: 31890144 PMCID: PMC6921140 DOI: 10.1016/j.csbj.2019.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Revised: 11/22/2019] [Accepted: 11/22/2019] [Indexed: 11/30/2022] Open
Abstract
In recent decades the identification of pharmacogenomic gene-drug associations has evolved tremendously. Despite this progress, a major fraction of the heritable inter-individual variability remains elusive. Higher-dimensional phenomena, such as gene-gene-drug interactions, in which variability in multiple genes synergizes to precipitate an observable phenotype have been suggested to account at least for part of this missing heritability. However, the identification of such intricate relationships remains difficult partly because of analytical challenges associated with the complexity explosion of the problem. To facilitate the identification of such combinatorial pharmacogenetic associations, we here propose a network analysis strategy. Specifically, we analyzed the landscape of drug metabolizing enzymes and transporters for 100 top selling drugs as well as all compounds with pharmacogenetic germline labels or dosing guidelines. Based on this data, we calculated the posterior probabilities that gene i is involved in metabolism, transport or toxicity of a given drug under the condition that another gene j is involved for all pharmacogene pairs (i, j). Interestingly, these analyses revealed significant patterns between individual genes and across pharmacogene families that provide insights into metabolic interactions. To visualize the gene-drug interaction landscape, we use multidimensional scaling to collapse this similarity matrix into a two-dimensional network. We suggest that Euclidian distance between nodes can inform about the likelihood of epistatic interactions and thus might provide a useful tool to reduce the search space and facilitate the identification of combinatorial pharmacogenomic associations.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm 171 77, Sweden
| | - Volker M. Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm 171 77, Sweden
- Corresponding author at: Department of Physiology and Pharmacology, Karolinska Institutet, SE-171 77 Stockholm, Sweden.
| |
Collapse
|
20
|
Zewde NT. Multiscale Solutions to Quantitative Systems Biology Models. Front Mol Biosci 2019; 6:119. [PMID: 31737643 PMCID: PMC6831518 DOI: 10.3389/fmolb.2019.00119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 10/14/2019] [Indexed: 11/13/2022] Open
Affiliation(s)
- Nehemiah T Zewde
- Department of Bioengineering, University of California, Riverside, Riverside, CA, United States
| |
Collapse
|
21
|
Chagoyen M, Ranea JAG, Pazos F. Applications of molecular networks in biomedicine. Biol Methods Protoc 2019; 4:bpz012. [PMID: 32395629 PMCID: PMC7200821 DOI: 10.1093/biomethods/bpz012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/20/2019] [Accepted: 08/28/2019] [Indexed: 12/12/2022] Open
Abstract
Due to the large interdependence between the molecular components of living systems, many phenomena, including those related to pathologies, cannot be explained in terms of a single gene or a small number of genes. Molecular networks, representing different types of relationships between molecular entities, embody these large sets of interdependences in a framework that allow their mining from a systemic point of view to obtain information. These networks, often generated from high-throughput omics datasets, are used to study the complex phenomena of human pathologies from a systemic point of view. Complementing the reductionist approach of molecular biology, based on the detailed study of a small number of genes, systemic approaches to human diseases consider that these are better reflected in large and intricate networks of relationships between genes. These networks, and not the single genes, provide both better markers for diagnosing diseases and targets for treating them. Network approaches are being used to gain insight into the molecular basis of complex diseases and interpret the large datasets associated with them, such as genomic variants. Network formalism is also suitable for integrating large, heterogeneous and multilevel datasets associated with diseases from the molecular level to organismal and epidemiological scales. Many of these approaches are available to nonexpert users through standard software packages.
Collapse
Affiliation(s)
- Monica Chagoyen
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| |
Collapse
|
22
|
Navarro FCP, Mohsen H, Yan C, Li S, Gu M, Meyerson W, Gerstein M. Genomics and data science: an application within an umbrella. Genome Biol 2019; 20:109. [PMID: 31142351 PMCID: PMC6540394 DOI: 10.1186/s13059-019-1724-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural “exports” and “imports” between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.
Collapse
Affiliation(s)
- Fábio C P Navarro
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA
| | - Hussein Mohsen
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA
| | - Chengfei Yan
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA
| | - Shantao Li
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA.,Department of Biomedical Data Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Mengting Gu
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA
| | - William Meyerson
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA. .,Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA. .,Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA. .,Department of Statistics and Data Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT, 06520, USA.
| |
Collapse
|
23
|
Mura C, Draizen EJ, Bourne PE. Structural biology meets data science: does anything change? Curr Opin Struct Biol 2018; 52:95-102. [PMID: 30267935 DOI: 10.1016/j.sbi.2018.09.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 08/31/2018] [Accepted: 09/07/2018] [Indexed: 01/22/2023]
Abstract
Data science has emerged from the proliferation of digital data, coupled with advances in algorithms, software and hardware (e.g., GPU computing). Innovations in structural biology have been driven by similar factors, spurring us to ask: can these two fields impact one another in deep and hitherto unforeseen ways? We posit that the answer is yes. New biological knowledge lies in the relationships between sequence, structure, function and disease, all of which play out on the stage of evolution, and data science enables us to elucidate these relationships at scale. Here, we consider the above question from the five key pillars of data science: acquisition, engineering, analytics, visualization and policy, with an emphasis on machine learning as the premier analytics approach.
Collapse
Affiliation(s)
- Cameron Mura
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Eli J Draizen
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Philip E Bourne
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA; Data Science Institute, University of Virginia, Charlottesville, VA 22904, USA.
| |
Collapse
|