1
|
Mongardi S, Cascianelli S, Masseroli M. Biologically weighted LASSO: enhancing functional interpretability in gene expression data analysis. Bioinformatics 2024; 40:btae605. [PMID: 39412436 PMCID: PMC11639179 DOI: 10.1093/bioinformatics/btae605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 09/12/2024] [Accepted: 10/14/2024] [Indexed: 11/01/2024] Open
Abstract
MOTIVATION Feature selection approaches are widely used in gene expression data analysis to identify the most relevant features and boost performance in regression and classification tasks. However, such algorithms solely consider each feature's quantitative contribution to the task, possibly limiting the biological interpretability of the results. Feature-related prior knowledge, such as functional annotations and pathways information, can be incorporated into feature selection algorithms to potentially improve model performance and interpretability. RESULTS We propose an embedded integrative approach to feature selection that combines weighted LASSO feature selection and prior biological knowledge in a single step, by means of a novel score of biological relevance that summarizes information extracted from popular biological knowledge bases. Findings from the performed experiments indicate that our proposed approach is able to identify the most predictive genes while simultaneously enhancing the biological interpretability of the results compared to the standard LASSO regularized model. AVAILABILITY AND IMPLEMENTATION Code is available at https://github.com/DEIB-GECO/GIS-weigthed_LASSO.
Collapse
Affiliation(s)
- Sofia Mongardi
- Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Milan 20133, Italy
| | - Silvia Cascianelli
- Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Milan 20133, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Milan 20133, Italy
| |
Collapse
|
2
|
Quinn TP, Hess JL, Marshe VS, Barnett MM, Hauschild AC, Maciukiewicz M, Elsheikh SSM, Men X, Schwarz E, Trakadis YJ, Breen MS, Barnett EJ, Zhang-James Y, Ahsen ME, Cao H, Chen J, Hou J, Salekin A, Lin PI, Nicodemus KK, Meyer-Lindenberg A, Bichindaritz I, Faraone SV, Cairns MJ, Pandey G, Müller DJ, Glatt SJ. A primer on the use of machine learning to distil knowledge from data in biological psychiatry. Mol Psychiatry 2024; 29:387-401. [PMID: 38177352 PMCID: PMC11228968 DOI: 10.1038/s41380-023-02334-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/21/2023] [Accepted: 11/17/2023] [Indexed: 01/06/2024]
Abstract
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.
Collapse
Affiliation(s)
- Thomas P Quinn
- Applied Artificial Intelligence Institute (A2I2), Burwood, VIC, 3125, Australia
| | - Jonathan L Hess
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Victoria S Marshe
- Institute of Medical Science, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Michelle M Barnett
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Anne-Christin Hauschild
- Department of Medical Informatics, Medical University Center Göttingen, Göttingen, Lower Saxony, 37075, Germany
| | - Malgorzata Maciukiewicz
- Hospital Zurich, University of Zurich, Zurich, 8091, Switzerland
- Department of Rheumatology and Immunology, University Hospital Bern, Bern, 3010, Switzerland
- Department for Biomedical Research (DBMR), University of Bern, Bern, 3010, Switzerland
| | - Samar S M Elsheikh
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Xiaoyu Men
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, M5S 1A1, Canada
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Yannis J Trakadis
- Department Human Genetics, McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
| | - Michael S Breen
- Psychiatry, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Eric J Barnett
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Yanli Zhang-James
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Mehmet Eren Ahsen
- Department of Business Administration, Gies College of Business, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Department of Biomedical and Translational Sciences, Carle-Illinois School of Medicine, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
| | - Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Junfang Chen
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Jiahui Hou
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Asif Salekin
- Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, 13244, USA
| | - Ping-I Lin
- Discipline of Psychiatry and Mental Health, University of New South Wales, Sydney, NSW, 2052, Australia
- Mental Health Research Unit, South Western Sydney Local Health District, Liverpool, NSW, 2170, Australia
| | | | - Andreas Meyer-Lindenberg
- Clinical Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Isabelle Bichindaritz
- Biomedical and Health Informatics/Computer Science Department, State University of New York at Oswego, Oswego, NY, 13126, USA
- Intelligent Bio Systems Lab, State University of New York at Oswego, Oswego, NY, 13126, USA
| | - Stephen V Faraone
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Daniel J Müller
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, Psychosomatics and Psychotherapy, Center of Mental Health, University Hospital of Würzburg, Würzburg, 97080, Germany
| | - Stephen J Glatt
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Public Health and Preventive Medicine, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
| |
Collapse
|
3
|
Shi J, Bendig D, Vollmar HC, Rasche P. Mapping the Bibliometrics Landscape of AI in Medicine: Methodological Study. J Med Internet Res 2023; 25:e45815. [PMID: 38064255 PMCID: PMC10746970 DOI: 10.2196/45815] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 08/16/2023] [Accepted: 09/30/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI), conceived in the 1950s, has permeated numerous industries, intensifying in tandem with advancements in computing power. Despite the widespread adoption of AI, its integration into medicine trails other sectors. However, medical AI research has experienced substantial growth, attracting considerable attention from researchers and practitioners. OBJECTIVE In the absence of an existing framework, this study aims to outline the current landscape of medical AI research and provide insights into its future developments by examining all AI-related studies within PubMed over the past 2 decades. We also propose potential data acquisition and analysis methods, developed using Python (version 3.11) and to be executed in Spyder IDE (version 5.4.3), for future analogous research. METHODS Our dual-pronged approach involved (1) retrieving publication metadata related to AI from PubMed (spanning 2000-2022) via Python, including titles, abstracts, authors, journals, country, and publishing years, followed by keyword frequency analysis and (2) classifying relevant topics using latent Dirichlet allocation, an unsupervised machine learning approach, and defining the research scope of AI in medicine. In the absence of a universal medical AI taxonomy, we used an AI dictionary based on the European Commission Joint Research Centre AI Watch report, which emphasizes 8 domains: reasoning, planning, learning, perception, communication, integration and interaction, service, and AI ethics and philosophy. RESULTS From 2000 to 2022, a comprehensive analysis of 307,701 AI-related publications from PubMed highlighted a 36-fold increase. The United States emerged as a clear frontrunner, producing 68,502 of these articles. Despite its substantial contribution in terms of volume, China lagged in terms of citation impact. Diving into specific AI domains, as the Joint Research Centre AI Watch report categorized, the learning domain emerged dominant. Our classification analysis meticulously traced the nuanced research trajectories across each domain, revealing the multifaceted and evolving nature of AI's application in the realm of medicine. CONCLUSIONS The research topics have evolved as the volume of AI studies increases annually. Machine learning remains central to medical AI research, with deep learning expected to maintain its fundamental role. Empowered by predictive algorithms, pattern recognition, and imaging analysis capabilities, the future of AI research in medicine is anticipated to concentrate on medical diagnosis, robotic intervention, and disease management. Our topic modeling outcomes provide a clear insight into the focus of AI research in medicine over the past decades and lay the groundwork for predicting future directions. The domains that have attracted considerable research attention, primarily the learning domain, will continue to shape the trajectory of AI in medicine. Given the observed growing interest, the domain of AI ethics and philosophy also stands out as a prospective area of increased focus.
Collapse
Affiliation(s)
- Jin Shi
- Institute for Entrepreneurship, University of Münster, Münster, Germany
| | - David Bendig
- Institute for Entrepreneurship, University of Münster, Münster, Germany
| | | | - Peter Rasche
- Department of Healthcare, University of Applied Science - Hochschule Niederrhein, Krefeld, Germany
| |
Collapse
|
4
|
Qumsiyeh E, Salah Z, Yousef M. miRGediNET: A comprehensive examination of common genes in miRNA-Target interactions and disease associations: Insights from a grouping-scoring-modeling approach. Heliyon 2023; 9:e22666. [PMID: 38090011 PMCID: PMC10711121 DOI: 10.1016/j.heliyon.2023.e22666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/15/2023] [Accepted: 11/16/2023] [Indexed: 06/15/2024] Open
Abstract
In the broad and complex field of biological data analysis, researchers frequently gather information from a single source or database. Despite being a widespread practice, this has disadvantages. Relying exclusively on a single source can limit our comprehension as it may omit various perspectives that could be obtained by combining multiple knowledge bases. Acknowledging this shortcoming, we report on miRGediNET, a novel approach combining information from three biological databases. Our investigation focuses on microRNAs (miRNAs), small non-coding RNA molecules that regulate gene expression post-transcriptionally. We delve deeply into the knowledge of these miRNA's interactions with genes and the possible effects these interactions may have on different diseases. The scientific community has long recognized a direct correlation between the progression of specific diseases and miRNAs, as well as the genes they target. By using miRGediNET, we go beyond simply acknowledging this relationship. Rather, we actively look for the critical genes that could act as links between the actions of miRNAs and the mechanisms underlying disease. Our methodology, which carefully identifies and investigates these important genes, is supported by a strategic framework that may open up new possibilities for comprehending diseases and creating treatments. We have developed a tool on the Knime platform as a concrete application of our research. This tool serves as both a validation of our study and an invitation to the larger community to interact with, investigate, and build upon our findings. miRGediNET is publicly accessible on GitHub at https://github.com/malikyousef/miRGediNET, providing a collaborative environment for additional research and innovation for enthusiasts and fellow researchers.
Collapse
Affiliation(s)
- Emma Qumsiyeh
- Department of Computer Science and Information Technology, Al-Quds University, Palestine
| | - Zaidoun Salah
- Molecular Genetics and Genetic Toxicology, Arab American University, Ramallah, Palestine
| | - Malik Yousef
- Information Technology Engineering, Al-Quds University, Abu Dis, Palestine
| |
Collapse
|
5
|
Ersoz NS, Bakir-Gungor B, Yousef M. GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning. Front Genet 2023; 14:1139082. [PMID: 37671046 PMCID: PMC10476493 DOI: 10.3389/fgene.2023.1139082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 07/05/2023] [Indexed: 09/07/2023] Open
Abstract
Introduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product. Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype. Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model. Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.
Collapse
Affiliation(s)
- Nur Sebnem Ersoz
- Department of Bioengineering, Graduate School of Engineering and Science, Abdullah Gul University, Kayseri, Türkiye
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Türkiye
- Department of Bioengineering, Faculty of Life and Natural Sciences, Abdullah Gul University, Kayseri, Türkiye
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| |
Collapse
|
6
|
Kuzudisli C, Bakir-Gungor B, Bulut N, Qaqish B, Yousef M. Review of feature selection approaches based on grouping of features. PeerJ 2023; 11:e15666. [PMID: 37483989 PMCID: PMC10358338 DOI: 10.7717/peerj.15666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/08/2023] [Indexed: 07/25/2023] Open
Abstract
With the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly-ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
Collapse
Affiliation(s)
- Cihan Kuzudisli
- Department of Computer Engineering, Hasan Kalyoncu University, Gaziantep, Turkey
- Department of Electrical and Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Nurten Bulut
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Bahjat Qaqish
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, Chapel Hill, United States of America
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
| |
Collapse
|
7
|
Ahn SH, Kim JH. Factor-specific generative pattern from large-scale drug-induced gene expression profile. Sci Rep 2023; 13:6339. [PMID: 37072452 PMCID: PMC10113368 DOI: 10.1038/s41598-023-33061-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/06/2023] [Indexed: 05/03/2023] Open
Abstract
Drug discovery is a complex and interdisciplinary field that requires the identification of potential drug targets for specific diseases. In this study, we present FacPat, a novel approach that identifies the optimal factor-specific pattern explaining the drug-induced gene expression profile. FacPat uses a genetic algorithm based on pattern distance to mine the optimal factor-specific pattern for each gene in the LINCS L1000 dataset. We applied Benjamini-Hochberg correction to control the false discovery rate and identified significant and interpretable factor-specific patterns consisting of 480 genes, 7 chemical compounds, and 38 human cell lines. Using our approach, we identified genes that show context-specific effects related to chemical compounds and/or human cell lines. Furthermore, we performed functional enrichment analysis to characterize biological features. We demonstrate that FacPat can be used to reveal novel relationships among drugs, diseases, and genes.
Collapse
Affiliation(s)
- Se Hwan Ahn
- Department of Biomedical Sciences, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Ju Han Kim
- Department of Biomedical Sciences, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Republic of Korea.
- Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
8
|
Unlu Yazici M, Marron JS, Bakir-Gungor B, Zou F, Yousef M. Invention of 3Mint for feature grouping and scoring in multi-omics. Front Genet 2023; 14:1093326. [PMID: 37007972 PMCID: PMC10050723 DOI: 10.3389/fgene.2023.1093326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 02/27/2023] [Indexed: 03/17/2023] Open
Abstract
Advanced genomic and molecular profiling technologies accelerated the enlightenment of the regulatory mechanisms behind cancer development and progression, and the targeted therapies in patients. Along this line, intense studies with immense amounts of biological information have boosted the discovery of molecular biomarkers. Cancer is one of the leading causes of death around the world in recent years. Elucidation of genomic and epigenetic factors in Breast Cancer (BRCA) can provide a roadmap to uncover the disease mechanisms. Accordingly, unraveling the possible systematic connections between-omics data types and their contribution to BRCA tumor progression is crucial. In this study, we have developed a novel machine learning (ML) based integrative approach for multi-omics data analysis. This integrative approach combines information from gene expression (mRNA), microRNA (miRNA) and methylation data. Due to the complexity of cancer, this integrated data is expected to improve the prediction, diagnosis and treatment of disease through patterns only available from the 3-way interactions between these 3-omics datasets. In addition, the proposed method bridges the interpretation gap between the disease mechanisms that drive onset and progression. Our fundamental contribution is the 3 Multi-omics integrative tool (3Mint). This tool aims to perform grouping and scoring of groups using biological knowledge. Another major goal is improved gene selection via detection of novel groups of cross-omics biomarkers. Performance of 3Mint is assessed using different metrics. Our computational performance evaluations showed that the 3Mint classifies the BRCA molecular subtypes with lower number of genes when compared to the miRcorrNet tool which uses miRNA and mRNA gene expression profiles in terms of similar performance metrics (95% Accuracy). The incorporation of methylation data in 3Mint yields a much more focused analysis. The 3Mint tool and all other supplementary files are available at https://github.com/malikyousef/3Mint/.
Collapse
Affiliation(s)
- Miray Unlu Yazici
- Department of Bioengineering, Abdullah Gül University, Kayseri, Türkiye
| | - J. S. Marron
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, United States
| | - Burcu Bakir-Gungor
- Department of Bioengineering, Abdullah Gül University, Kayseri, Türkiye
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Türkiye
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
- *Correspondence: Malik Yousef,
| |
Collapse
|
9
|
Perscheid C. Comprior: facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets. BMC Bioinformatics 2021; 22:401. [PMID: 34384353 PMCID: PMC8361636 DOI: 10.1186/s12859-021-04308-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/27/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reproducible benchmarking is important for assessing the effectiveness of novel feature selection approaches applied on gene expression data, especially for prior knowledge approaches that incorporate biological information from online knowledge bases. However, no full-fledged benchmarking system exists that is extensible, provides built-in feature selection approaches, and a comprehensive result assessment encompassing classification performance, robustness, and biological relevance. Moreover, the particular needs of prior knowledge feature selection approaches, i.e. uniform access to knowledge bases, are not addressed. As a consequence, prior knowledge approaches are not evaluated amongst each other, leaving open questions regarding their effectiveness. RESULTS We present the Comprior benchmark tool, which facilitates the rapid development and effortless benchmarking of feature selection approaches, with a special focus on prior knowledge approaches. Comprior is extensible by custom approaches, offers built-in standard feature selection approaches, enables uniform access to multiple knowledge bases, and provides a customizable evaluation infrastructure to compare multiple feature selection approaches regarding their classification performance, robustness, runtime, and biological relevance. CONCLUSION Comprior allows reproducible benchmarking especially of prior knowledge approaches, which facilitates their applicability and for the first time enables a comprehensive assessment of their effectiveness.
Collapse
Affiliation(s)
- Cindy Perscheid
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.
| |
Collapse
|
10
|
Yousef M, Ülgen E, Uğur Sezerman O. CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis. PeerJ Comput Sci 2021; 7:e336. [PMID: 33816987 PMCID: PMC7959595 DOI: 10.7717/peerj-cs.336] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 11/23/2020] [Indexed: 05/04/2023]
Abstract
Most of the traditional gene selection approaches are borrowed from other fields such as statistics and computer science, However, they do not prioritize biologically relevant genes since the ultimate goal is to determine features that optimize model performance metrics not to build a biologically meaningful model. Therefore, there is an imminent need for new computational tools that integrate the biological knowledge about the data in the process of gene selection and machine learning. Integrative gene selection enables incorporation of biological domain knowledge from external biological resources. In this study, we propose a new computational approach named CogNet that is an integrative gene selection tool that exploits biological knowledge for grouping the genes for the computational modeling tasks of ranking and classification. In CogNet, the pathfindR serves as the biological grouping tool to allow the main algorithm to rank active-subnetwork-oriented KEGG pathway enrichment analysis results to build a biologically relevant model. CogNet provides a list of significant KEGG pathways that can classify the data with a very high accuracy. The list also provides the genes belonging to these pathways that are differentially expressed that are used as features in the classification problem. The list facilitates deep analysis and better interpretability of the role of KEGG pathways in classification of the data thus better establishing the biological relevance of these differentially expressed genes. Even though the main aim of our study is not to improve the accuracy of any existing tool, the performance of the CogNet outperforms a similar approach called maTE while obtaining similar performance compared to other similar tools including SVM-RCE. CogNet was tested on 13 gene expression datasets concerning a variety of diseases.
Collapse
Affiliation(s)
- Malik Yousef
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
- Department of Information Systems, Zefat Academic College, Zefat, Israel
| | - Ege Ülgen
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| | - Osman Uğur Sezerman
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| |
Collapse
|
11
|
Yousef M, Kumar A, Bakir-Gungor B. Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data. ENTROPY (BASEL, SWITZERLAND) 2020; 23:E2. [PMID: 33374969 PMCID: PMC7821996 DOI: 10.3390/e23010002] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 12/14/2020] [Accepted: 12/16/2020] [Indexed: 12/19/2022]
Abstract
In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. One of the main goals of this review is to explore the existing methods that integrate different types of information in order to improve the identification of the biomolecular signatures of diseases and the discovery of new potential targets for treatment. These integrative approaches are expected to aid the prediction, diagnosis, and treatment of diseases, as well as to enlighten us on disease state dynamics, mechanisms of their onset and progression. The integration of various types of biological information will necessitate the development of novel techniques for integration and data analysis. Another aim of this review is to boost the bioinformatics community to develop new approaches for searching and determining significant groups/clusters of features based on one or more biological grouping functions.
Collapse
Affiliation(s)
- Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat 13206, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat 13206, Israel
| | - Abhishek Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India;
- Manipal Academy of Higher Education (MAHE), Manipal 576104, India
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri 38080, Turkey;
| |
Collapse
|
12
|
Perscheid C. Integrative biomarker detection on high-dimensional gene expression data sets: a survey on prior knowledge approaches. Brief Bioinform 2020; 22:5881664. [PMID: 32761115 DOI: 10.1093/bib/bbaa151] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 06/15/2020] [Accepted: 06/16/2020] [Indexed: 02/06/2023] Open
Abstract
Gene expression data provide the expression levels of tens of thousands of genes from several hundred samples. These data are analyzed to detect biomarkers that can be of prognostic or diagnostic use. Traditionally, biomarker detection for gene expression data is the task of gene selection. The vast number of genes is reduced to a few relevant ones that achieve the best performance for the respective use case. Traditional approaches select genes based on their statistical significance in the data set. This results in issues of robustness, redundancy and true biological relevance of the selected genes. Integrative analyses typically address these shortcomings by integrating multiple data artifacts from the same objects, e.g. gene expression and methylation data. When only gene expression data are available, integrative analyses instead use curated information on biological processes from public knowledge bases. With knowledge bases providing an ever-increasing amount of curated biological knowledge, such prior knowledge approaches become more powerful. This paper provides a thorough overview on the status quo of biomarker detection on gene expression data with prior biological knowledge. We discuss current shortcomings of traditional approaches, review recent external knowledge bases, provide a classification and qualitative comparison of existing prior knowledge approaches and discuss open challenges for this kind of gene selection.
Collapse
Affiliation(s)
- Cindy Perscheid
- Hasso Plattner Institute, University of Potsdam, Potsdam, 14482, Germany
| |
Collapse
|
13
|
Perscheid C, Grasnick B, Uflacker M. Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches. J Integr Bioinform 2018; 16:/j/jib.ahead-of-print/jib-2018-0064/jib-2018-0064.xml. [PMID: 30785707 PMCID: PMC6798862 DOI: 10.1515/jib-2018-0064] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/12/2018] [Indexed: 12/30/2022] Open
Abstract
The advance of high-throughput RNA-Sequencing techniques enables researchers to analyze the complete gene activity in particular cells. From the insights of such analyses, researchers can identify disease-specific expression profiles, thus understand complex diseases like cancer, and eventually develop effective measures for diagnosis and treatment. The high dimensionality of gene expression data poses challenges to its computational analysis, which is addressed with measures of gene selection. Traditional gene selection approaches base their findings on statistical analyses of the actual expression levels, which implies several drawbacks when it comes to accurately identifying the underlying biological processes. In turn, integrative approaches include curated information on biological processes from external knowledge bases during gene selection, which promises to lead to better interpretability and improved predictive performance. Our work compares the performance of traditional and integrative gene selection approaches. Moreover, we propose a straightforward approach to integrate external knowledge with traditional gene selection approaches. We introduce a framework enabling the automatic external knowledge integration, gene selection, and evaluation. Evaluation results prove our framework to be a useful tool for evaluation and show that integration of external knowledge improves overall analysis results.
Collapse
Affiliation(s)
- Cindy Perscheid
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Bastien Grasnick
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Matthias Uflacker
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| |
Collapse
|
14
|
Lopez C, Tucker S, Salameh T, Tucker C. An unsupervised machine learning method for discovering patient clusters based on genetic signatures. J Biomed Inform 2018; 85:30-39. [PMID: 30016722 PMCID: PMC6621561 DOI: 10.1016/j.jbi.2018.07.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 06/22/2018] [Accepted: 07/07/2018] [Indexed: 01/04/2023]
Abstract
INTRODUCTION Many chronic disorders have genomic etiology, disease progression, clinical presentation, and response to treatment that vary on a patient-to-patient basis. Such variability creates a need to identify characteristics within patient populations that have clinically relevant predictive value in order to advance personalized medicine. Unsupervised machine learning methods are suitable to address this type of problem, in which no a priori class label information is available to guide this search. However, it is challenging for existing methods to identify cluster memberships that are not just a result of natural sampling variation. Moreover, most of the current methods require researchers to provide specific input parameters a priori. METHOD This work presents an unsupervised machine learning method to cluster patients based on their genomic makeup without providing input parameters a priori. The method implements internal validity metrics to algorithmically identify the number of clusters, as well as statistical analyses to test for the significance of the results. Furthermore, the method takes advantage of the high degree of linkage disequilibrium between single nucleotide polymorphisms. Finally, a gene pathway analysis is performed to identify potential relationships between the clusters in the context of known biological knowledge. DATASETS AND RESULTS The method is tested with a cluster validation and a genomic dataset previously used in the literature. Benchmark results indicate that the proposed method provides the greatest performance out of the methods tested. Furthermore, the method is implemented on a sample genome-wide study dataset of 191 multiple sclerosis patients. The results indicate that the method was able to identify genetically distinct patient clusters without the need to select parameters a priori. Additionally, variants identified as significantly different between clusters are shown to be enriched for protein-protein interactions, especially in immune processes and cell adhesion pathways, via Gene Ontology term analysis. CONCLUSION Once links are drawn between clusters and clinically relevant outcomes, Immunochip data can be used to classify high-risk and newly diagnosed chronic disease patients into known clusters for predictive value. Further investigation can extend beyond pathway analysis to evaluate these clusters for clinical significance of genetically related characteristics such as age of onset, disease course, heritability, and response to treatment.
Collapse
Affiliation(s)
- Christian Lopez
- Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Scott Tucker
- Hershey College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USA; Engineering Science and Mechanics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Tarik Salameh
- Hershey College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USA
| | - Conrad Tucker
- Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA; Engineering Design Technology and Professional Programs, The Pennsylvania State University, University Park, PA 16802, USA; Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
15
|
|
16
|
López B, Torrent-Fontbona F, Viñas R, Fernández-Real JM. Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction. Artif Intell Med 2017; 85:43-49. [PMID: 28943335 DOI: 10.1016/j.artmed.2017.09.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 09/04/2017] [Indexed: 10/18/2022]
Abstract
OBJECTIVE The use of artificial intelligence techniques to find out which Single Nucleotide Polymorphisms (SNPs) promote the development of a disease is one of the features of medical research, as such techniques may potentially aid early diagnosis and help in the prescription of preventive measures. In particular, the aim is to help physicians to identify the relevant SNPs related to Type 2 diabetes, and to build a decision-support tool for risk prediction. METHODS We use the Random Forest (RF) technique in order to search for the most important attributes (SNPs) related to diabetes, giving a weight (degree of importance), ranging between 0 and 1, to each attribute. Support Vector Machines and Logistic Regression have also been used since they are two other machine learning techniques that are well-established in the health community. Their performance has been compared to that achieved by RF. Furthermore, the relevance of the attributes obtained through the use of RF has then been used to perform predictions with k-Nearest Neighbour method weighting attributes in the similarity measure according to the relevance of the attributes with RF. RESULTS Testing is performed on a set of 677 subjects. RF is able to handle the complexity of features' interactions, overfitting, and unknown attribute values, providing the SNPs' relevance with an up to 0.89 area under the ROC curve in terms of risk prediction. RF outperforms all the other tested machine learning techniques in terms of prediction accuracy, and in terms of the stability of the estimated relevance of the attributes. CONCLUSIONS The Random Forest is a useful method for learning predictive models and the relevance of SNPs without any underlying assumption.
Collapse
Affiliation(s)
- Beatriz López
- University of Girona, Campus Montilivi, building EPS4, 17071 Girona, Spain.
| | | | - Ramón Viñas
- University of Girona, Campus Montilivi, building EPS4, 17071 Girona, Spain.
| | - José Manuel Fernández-Real
- Biomedical Research Institute of Girona, Avda. de França, s/n, 17007 Girona, Spain; CIBERobn Pathophysiology of Obesity and Nutrition, Instituto de Salud Carlos III, Madrid, Spain.
| |
Collapse
|
17
|
Andel M, Masri F, Klema J, Krejcik Z, Belickova M. Sparse omics-network regularization to increase interpretability and performance of linear classification models. 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) 2015:615-620. [DOI: 10.1109/bibm.2015.7359754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
18
|
Kessler RC, Warner CH, Ivany C, Petukhova MV, Rose S, Bromet EJ, Brown M, Cai T, Colpe LJ, Cox KL, Fullerton CS, Gilman SE, Gruber MJ, Heeringa SG, Lewandowski-Romps L, Li J, Millikan-Bell AM, Naifeh JA, Nock MK, Rosellini AJ, Sampson NA, Schoenbaum M, Stein MB, Wessely S, Zaslavsky AM, Ursano RJ. Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army Study To Assess Risk and rEsilience in Servicemembers (Army STARRS). JAMA Psychiatry 2015; 72:49-57. [PMID: 25390793 PMCID: PMC4286426 DOI: 10.1001/jamapsychiatry.2014.1754] [Citation(s) in RCA: 275] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
IMPORTANCE The US Army experienced a sharp increase in soldier suicides beginning in 2004. Administrative data reveal that among those at highest risk are soldiers in the 12 months after inpatient treatment of a psychiatric disorder. OBJECTIVE To develop an actuarial risk algorithm predicting suicide in the 12 months after US Army soldier inpatient treatment of a psychiatric disorder to target expanded posthospitalization care. DESIGN, SETTING, AND PARTICIPANTS There were 53,769 hospitalizations of active duty soldiers from January 1, 2004, through December 31, 2009, with International Classification of Diseases, Ninth Revision, Clinical Modification psychiatric admission diagnoses. Administrative data available before hospital discharge abstracted from a wide range of data systems (sociodemographic, US Army career, criminal justice, and medical or pharmacy) were used to predict suicides in the subsequent 12 months using machine learning methods (regression trees and penalized regressions) designed to evaluate cross-validated linear, nonlinear, and interactive predictive associations. MAIN OUTCOMES AND MEASURES Suicides of soldiers hospitalized with psychiatric disorders in the 12 months after hospital discharge. RESULTS Sixty-eight soldiers died by suicide within 12 months of hospital discharge (12.0% of all US Army suicides), equivalent to 263.9 suicides per 100,000 person-years compared with 18.5 suicides per 100,000 person-years in the total US Army. The strongest predictors included sociodemographics (male sex [odds ratio (OR), 7.9; 95% CI, 1.9-32.6] and late age of enlistment [OR, 1.9; 95% CI, 1.0-3.5]), criminal offenses (verbal violence [OR, 2.2; 95% CI, 1.2-4.0] and weapons possession [OR, 5.6; 95% CI, 1.7-18.3]), prior suicidality [OR, 2.9; 95% CI, 1.7-4.9], aspects of prior psychiatric inpatient and outpatient treatment (eg, number of antidepressant prescriptions filled in the past 12 months [OR, 1.3; 95% CI, 1.1-1.7]), and disorders diagnosed during the focal hospitalizations (eg, nonaffective psychosis [OR, 2.9; 95% CI, 1.2-7.0]). A total of 52.9% of posthospitalization suicides occurred after the 5% of hospitalizations with highest predicted suicide risk (3824.1 suicides per 100,000 person-years). These highest-risk hospitalizations also accounted for significantly elevated proportions of several other adverse posthospitalization outcomes (unintentional injury deaths, suicide attempts, and subsequent hospitalizations). CONCLUSIONS AND RELEVANCE The high concentration of risk of suicide and other adverse outcomes might justify targeting expanded posthospitalization interventions to soldiers classified as having highest posthospitalization suicide risk, although final determination requires careful consideration of intervention costs, comparative effectiveness, and possible adverse effects.
Collapse
Affiliation(s)
- Ronald C. Kessler
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
| | - Christopher H. Warner
- Department of Behavioral Medicine, Blanchfield Army Community Hospital, Fort Campbell, Kentucky
| | | | - Maria V. Petukhova
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
| | - Sherri Rose
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
| | - Evelyn J. Bromet
- Department of Psychiatry, Stony Brook University School of Medicine, Stony Brook, New York
| | - Millard Brown
- US Army Office of the Surgeon General, Falls Church, Virginia
| | - Tianxi Cai
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts
| | - Lisa J. Colpe
- National Institute of Mental Health, Bethesda, Maryland
| | - Kenneth L. Cox
- US Army Public Health Command, Aberdeen Proving Ground, Maryland
| | - Carol S. Fullerton
- Center for the Study of Traumatic Stress, Department of Psychiatry, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | - Stephen E. Gilman
- Department of Social and Behavioral Sciences, Harvard School of Public Health, Boston, Massachusetts10Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts
| | - Michael J. Gruber
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
| | | | | | - Junlong Li
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts
| | | | - James A. Naifeh
- Center for the Study of Traumatic Stress, Department of Psychiatry, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | - Matthew K. Nock
- Department of Psychology, Harvard University, Cambridge, Massachusetts
| | | | - Nancy A. Sampson
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
| | | | - Murray B. Stein
- Department of Psychiatry, University of California, San Diego, La Jolla14Deapartment of Family and Preventive Medicine, University of California, San Diego, La Jolla15Veterans Affairs San Diego Healthcare System, San Diego, California
| | - Simon Wessely
- King’s Centre for Military Health Research, King’s College London, London, United Kingdom
| | - Alan M. Zaslavsky
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
| | - Robert J. Ursano
- Center for the Study of Traumatic Stress, Department of Psychiatry, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | | |
Collapse
|
19
|
Reboiro-Jato M, Díaz F, Glez-Peña D, Fdez-Riverola F. A novel ensemble of classifiers that use biological relevant gene sets for microarray classification. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2014.01.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
20
|
Reboiro-Jato M, Arrais JP, Oliveira JL, Fdez-Riverola F. geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification. BMC Bioinformatics 2014; 15:31. [PMID: 24475928 PMCID: PMC3909759 DOI: 10.1186/1471-2105-15-31] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2012] [Accepted: 01/27/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. RESULTS geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. CONCLUSIONS geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.
Collapse
Affiliation(s)
| | | | | | - Florentino Fdez-Riverola
- Escuela Superior de Ingeniería Informática, Universidade de Vigo, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain.
| |
Collapse
|
21
|
Cheng X, Hu C, Li Y. A Semantic-Driven Knowledge Representation Model for the Materials Engineering Application. DATA SCIENCE JOURNAL 2014. [DOI: 10.2481/dsj.13-061] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
22
|
Masegosa AR, Moral S. An interactive approach for Bayesian network learning using domain/expert knowledge. Int J Approx Reason 2013. [DOI: 10.1016/j.ijar.2013.03.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
23
|
Baty F, Rüdiger J, Miglino N, Kern L, Borger P, Brutsche M. Exploring the transcription factor activity in high-throughput gene expression data using RLQ analysis. BMC Bioinformatics 2013; 14:178. [PMID: 23742070 PMCID: PMC3686578 DOI: 10.1186/1471-2105-14-178] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 05/30/2013] [Indexed: 12/14/2022] Open
Abstract
Background Interpretation of gene expression microarray data in the light of external information on both columns and rows (experimental variables and gene annotations) facilitates the extraction of pertinent information hidden in these complex data. Biologists classically interpret genes of interest after retrieving functional information from a subset of genes of interest. Transcription factors play an important role in orchestrating the regulation of gene expression. Their activity can be deduced by examining the presence of putative transcription factors binding sites in the gene promoter regions. Results In this paper we present the multivariate statistical method RLQ which aims to analyze microarray data where additional information is available on both genes and samples. As an illustrative example, we applied RLQ methodology to analyze transcription factor activity associated with the time-course effect of steroids on the growth of primary human lung fibroblasts. RLQ could successfully predict transcription factor activity, and could integrate various other sources of external information in the main frame of the analysis. The approach was validated by means of alternative statistical methods and biological validation. Conclusions RLQ provides an efficient way of extracting and visualizing structures present in a gene expression dataset by directly modeling the link between experimental variables and gene annotations.
Collapse
Affiliation(s)
- Florent Baty
- Division of Pulmonary Medicine, Cantonal Hospital St, Gallen, Rorschacherstrasse 95, CH-9007 St, Gallen, Switzerland.
| | | | | | | | | | | |
Collapse
|
24
|
Gade S, Porzelius C, Fälth M, Brase JC, Wuttig D, Kuner R, Binder H, Sültmann H, Beißbarth T. Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer. BMC Bioinformatics 2011; 12:488. [PMID: 22188670 PMCID: PMC3471479 DOI: 10.1186/1471-2105-12-488] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 12/21/2011] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND One of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and prognosis. Furthermore, miRNA abundance can directly affect target transcripts and translation in tumor cells. Prediction models are trained to identify either mRNA or miRNA signatures for patient stratification. With the increasing number of microarray studies collecting mRNA and miRNA from the same patient cohort there is a need for statistical methods to integrate or fuse both kinds of data into one prediction model in order to find a combined signature that improves the prediction. RESULTS Here, we propose a new method to fuse miRNA and mRNA data into one prediction model. Since miRNAs are known regulators of mRNAs we used the correlations between them as well as the target prediction information to build a bipartite graph representing the relations between miRNAs and mRNAs. This graph was used to guide the feature selection in order to improve the prediction. The method is illustrated on a prostate cancer data set comprising 98 patient samples with miRNA and mRNA expression data. The biochemical relapse was used as clinical endpoint. It could be shown that the bipartite graph in combination with both data sets could improve prediction performance as well as the stability of the feature selection. CONCLUSIONS Fusion of mRNA and miRNA expression data into one prediction model improves clinical outcome prediction in terms of prediction error and stable feature selection. The R source code of the proposed method is available in the supplement.
Collapse
Affiliation(s)
- Stephan Gade
- German Cancer Research Center, Cancer Genome Research, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Christine Porzelius
- Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, 79104 Freiburg, Germany
| | - Maria Fälth
- German Cancer Research Center, Cancer Genome Research, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Jan C Brase
- German Cancer Research Center, Cancer Genome Research, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Daniela Wuttig
- German Cancer Research Center, Cancer Genome Research, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Ruprecht Kuner
- German Cancer Research Center, Cancer Genome Research, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, 79104 Freiburg, Germany
- Institute of Medical Biometry, Epidemiology and Informatics (IMBEI), Working Group Medical Biometry, University Medical Center Johannes Gutenberg University Mainz, 55101 Mainz, Germany
| | - Holger Sültmann
- German Cancer Research Center, Cancer Genome Research, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Tim Beißbarth
- University Medical Center Göttingen, Medical Statistics, 37099 Göttingen, Germany
| |
Collapse
|
25
|
Cano A, Masegosa AR, Moral S. A Method for Integrating Expert Knowledge When Learning Bayesian Networks From Data. ACTA ACUST UNITED AC 2011; 41:1382-94. [PMID: 21659034 DOI: 10.1109/tsmcb.2011.2148197] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Andrés Cano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.
| | | | | |
Collapse
|
26
|
Human papilloma virus strain detection utilising custom-designed oligonucleotide microarrays. Methods Mol Biol 2011; 688:75-95. [PMID: 20938834 DOI: 10.1007/978-1-60761-947-5_7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Within the past 15 years, the utilisation of microarray technology for the detection of specific pathogen strains has increased rapidly. Presently, it is possible to simply purchase a pre-manufactured "off the shelf " oligonucleotide microarray bearing a wide variety of known signature DNA sequences previously identified in the organism being studied. Consequently, a hybridisation analysis may be used to pinpoint which strain/s is present in any given clinical sample. However, there exists a problem if the study necessitates the identification of novel sequences which are not represented in commercially available microarray chips. Ideally, such investigations require an in situ oligonucleotide microarray platform with the capacity to synthesise microarrays bearing probe sequences designed solely by the researcher. This chapter will focus on the employment of the Combimatrix® B3 CustomArray™ for the synthesis of reusable, bespoke microarrays for the purpose of discerning multiple Human Papilloma Virus strains.
Collapse
|
27
|
Bellazzi R, Diomidous M, Sarkar IN, Takabayashi K, Ziegler A, McCray AT. Data analysis and data mining: current issues in biomedical informatics. Methods Inf Med 2011; 50:536-44. [PMID: 22146916 PMCID: PMC3233983 DOI: 10.3414/me11-06-0002] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
BACKGROUND Medicine and biomedical sciences have become data-intensive fields, which, at the same time, enable the application of data-driven approaches and require sophisticated data analysis and data mining methods. Biomedical informatics provides a proper interdisciplinary context to integrate data and knowledge when processing available information, with the aim of giving effective decision-making support in clinics and translational research. OBJECTIVES To reflect on different perspectives related to the role of data analysis and data mining in biomedical informatics. METHODS On the occasion of the 50th year of Methods of Information in Medicine a symposium was organized, which reflected on opportunities, challenges and priorities of organizing, representing and analysing data, information and knowledge in biomedicine and health care. The contributions of experts with a variety of backgrounds in the area of biomedical data analysis have been collected as one outcome of this symposium, in order to provide a broad, though coherent, overview of some of the most interesting aspects of the field. RESULTS The paper presents sections on data accumulation and data-driven approaches in medical informatics, data and knowledge integration, statistical issues for the evaluation of data mining models, translational bioinformatics and bioinformatics aspects of genetic epidemiology. CONCLUSIONS Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context. In the future, it will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers.
Collapse
Affiliation(s)
- R Bellazzi
- University of Pavia, Dipartimento di Informatica e Sistemistica, Via Ferrata 1, 27100 Pavia (PV), Italy.
| | | | | | | | | | | |
Collapse
|
28
|
Johannes M, Brase JC, Fröhlich H, Gade S, Gehrmann M, Fälth M, Sültmann H, Beissbarth T. Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients. ACTA ACUST UNITED AC 2010; 26:2136-44. [PMID: 20591905 DOI: 10.1093/bioinformatics/btq345] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
MOTIVATION One of the main goals of high-throughput gene-expression studies in cancer research is to identify prognostic gene signatures, which have the potential to predict the clinical outcome. It is common practice to investigate these questions using classification methods. However, standard methods merely rely on gene-expression data and assume the genes to be independent. Including pathway knowledge a priori into the classification process has recently been indicated as a promising way to increase classification accuracy as well as the interpretability and reproducibility of prognostic gene signatures. RESULTS We propose a new method called Reweighted Recursive Feature Elimination. It is based on the hypothesis that a gene with a low fold-change should have an increased influence on the classifier if it is connected to differentially expressed genes. We used a modified version of Google's PageRank algorithm to alter the ranking criterion of the SVM-RFE algorithm. Evaluations of our method on an integrated breast cancer dataset comprising 788 samples showed an improvement of the area under the receiver operator characteristic curve as well as in the reproducibility and interpretability of selected genes. AVAILABILITY The R code of the proposed algorithm is given in Supplementary Material.
Collapse
Affiliation(s)
- Marc Johannes
- German Cancer Research Center, Cancer Genome Research, Im Neuenheimer Feld 280, 69120 Heidelberg.
| | | | | | | | | | | | | | | |
Collapse
|
29
|
SoFoCles: Feature filtering for microarray classification based on Gene Ontology. J Biomed Inform 2010; 43:1-14. [DOI: 10.1016/j.jbi.2009.06.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2008] [Revised: 11/08/2008] [Accepted: 06/28/2009] [Indexed: 11/16/2022]
|
30
|
Lee WP, Tzou WS. Computational methods for discovering gene networks from expression data. Brief Bioinform 2009; 10:408-23. [PMID: 19505889 DOI: 10.1093/bib/bbp028] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Designing and conducting experiments are routine practices for modern biologists. The real challenge, especially in the post-genome era, usually comes not from acquiring data, but from subsequent activities such as data processing, analysis, knowledge generation and gaining insight into the research question of interest. The approach of inferring gene regulatory networks (GRNs) has been flourishing for many years, and new methods from mathematics, information science, engineering and social sciences have been applied. We review different kinds of computational methods biologists use to infer networks of varying levels of accuracy and complexity. The primary concern of biologists is how to translate the inferred network into hypotheses that can be tested with real-life experiments. Taking the biologists' viewpoint, we scrutinized several methods for predicting GRNs in mammalian cells, and more importantly show how the power of different knowledge databases of different types can be used to identify modules and subnetworks, thereby reducing complexity and facilitating the generation of testable hypotheses.
Collapse
Affiliation(s)
- Wei-Po Lee
- Department of Information Management, National Sun Yat-sen University, Kaohsiung, Taiwan.
| | | |
Collapse
|
31
|
Glez-Peña D, Díaz F, Hernández JM, Corchado JM, Fdez-Riverola F. geneCBR: a translational tool for multiple-microarray analysis and integrative information retrieval for aiding diagnosis in cancer research. BMC Bioinformatics 2009; 10:187. [PMID: 19538727 PMCID: PMC2703634 DOI: 10.1186/1471-2105-10-187] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Accepted: 06/18/2009] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Bioinformatics and medical informatics are two research fields that serve the needs of different but related communities. Both domains share the common goal of providing new algorithms, methods and technological solutions to biomedical research, and contributing to the treatment and cure of diseases. Although different microarray techniques have been successfully used to investigate useful information for cancer diagnosis at the gene expression level, the true integration of existing methods into day-to-day clinical practice is still a long way off. Within this context, case-based reasoning emerges as a suitable paradigm specially intended for the development of biomedical informatics applications and decision support systems, given the support and collaboration involved in such a translational development. With the goals of removing barriers against multi-disciplinary collaboration and facilitating the dissemination and transfer of knowledge to real practice, case-based reasoning systems have the potential to be applied to translational research mainly because their computational reasoning paradigm is similar to the way clinicians gather, analyze and process information in their own practice of clinical medicine. RESULTS In addressing the issue of bridging the existing gap between biomedical researchers and clinicians who work in the domain of cancer diagnosis, prognosis and treatment, we have developed and made accessible a common interactive framework. Our geneCBR system implements a freely available software tool that allows the use of combined techniques that can be applied to gene selection, clustering, knowledge extraction and prediction for aiding diagnosis in cancer research. For biomedical researches, geneCBR expert mode offers a core workbench for designing and testing new techniques and experiments. For pathologists or oncologists, geneCBR diagnostic mode implements an effective and reliable system that can diagnose cancer subtypes based on the analysis of microarray data using a CBR architecture. For programmers, geneCBR programming mode includes an advanced edition module for run-time modification of previous coded techniques. CONCLUSION geneCBR is a new translational tool that can effectively support the integrative work of programmers, biomedical researches and clinicians working together in a common framework. The code is freely available under the GPL license and can be obtained at http://www.genecbr.org.
Collapse
Affiliation(s)
- Daniel Glez-Peña
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n 32004, Ourense, Spain
| | - Fernando Díaz
- Computer Science Department, University of Valladolid, Escuela Universitaria de Informática, Plaza Santa Eulalia, 9–11, 40005, Segovia, Spain
| | - Jesús M Hernández
- Hematological Service, Hospital Universitario de Salamanca and Centro de Investigación del Cáncer (CIC), Universisty of Salamanca-CSIC, Campus Miguel de Unamuno, 37007, Salamanca, Spain
| | - Juan M Corchado
- Computer Science Department, University of Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain
| | - Florentino Fdez-Riverola
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n 32004, Ourense, Spain
| |
Collapse
|
32
|
Dotan-Cohen D, Kasif S, Melkman AA. Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering. ACTA ACUST UNITED AC 2009; 25:1789-95. [PMID: 19497934 PMCID: PMC2705235 DOI: 10.1093/bioinformatics/btp327] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The structure of the GO is another source of background knowledge that can be exploited through the use of semantic similarity. Results: We propose here a novel algorithm that integrates semantic similarities (derived from the ontology structure) into the procedure of deriving clusters from the dendrogram constructed during expression-based hierarchical clustering. Our approach can handle the multiple annotations, from different levels of the GO hierarchy, which most genes have. Moreover, it treats annotated and unannotated genes in a uniform manner. Consequently, the clusters obtained by our algorithm are characterized by significantly enriched annotations. In both cross-validation tests and when using an external index such as protein–protein interactions, our algorithm performs better than previous approaches. When applied to human cancer expression data, our algorithm identifies, among others, clusters of genes related to immune response and glucose metabolism. These clusters are also supported by protein–protein interaction data. Contact:dotna@cs.bgu.ac.il Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dikla Dotan-Cohen
- Department of Computer Science, Ben-Gurion University, Beer Sheva, Israel 84105.
| | | | | |
Collapse
|
33
|
Abstract
Data mining is the process of selecting, exploring, and modeling large amounts of data to discover unknown patterns or relationships useful to the data analyst. This article describes applications of data mining for the analysis of blood glucose and diabetes mellitus data. The diabetes management context is particularly well suited to a data mining approach. The availability of electronic health records and monitoring facilities, including telemedicine programs, is leading to accumulating huge data sets that are accessible to physicians, practitioners, and health care decision makers. Moreover, because diabetes is a lifelong disease, even data available for an individual patient may be massive and difficult to interpret. Finally, the capability of interpreting blood glucose readings is important not only in diabetes monitoring but also when monitoring patients in intensive care units. This article describes and illustrates work that has been carried out in our institutions in two areas in which data mining has a significant potential utility to researchers and clinical practitioners: analysis of (i) blood glucose home monitoring data of diabetes mellitus patients and (ii) blood glucose monitoring data from hospitalized intensive care unit patients.
Collapse
Affiliation(s)
- Riccardo Bellazzi
- Dipartimento di Informatica e Sistemistica, Università di Pavia, Pavia, Italy.
| | | |
Collapse
|
34
|
Leach SM, Tipney H, Feng W, Baumgartner WA, Kasliwal P, Schuyler RP, Williams T, Spritz RA, Hunter L. Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput Biol 2009; 5:e1000215. [PMID: 19325874 PMCID: PMC2653649 DOI: 10.1371/journal.pcbi.1000215] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2008] [Accepted: 02/12/2009] [Indexed: 01/17/2023] Open
Abstract
The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.
Collapse
Affiliation(s)
- Sonia M. Leach
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Hannah Tipney
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Weiguo Feng
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - William A. Baumgartner
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Priyanka Kasliwal
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Ronald P. Schuyler
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Trevor Williams
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Richard A. Spritz
- Human Medical Genetics Program, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Lawrence Hunter
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
- * E-mail:
| |
Collapse
|
35
|
|
36
|
Intelligent data analysis in biomedicine. J Biomed Inform 2007; 40:605-8. [PMID: 17959422 DOI: 10.1016/j.jbi.2007.10.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Accepted: 10/05/2007] [Indexed: 11/21/2022]
|