Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lustgarten JL, Balasubramanian JB, Visweswaran S, Gopalakrishnan V. Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure. Data. 2017;2:5. [PMID: 28331847 PMCID: PMC5358670 DOI: 10.3390/data2010005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

For:	Lustgarten JL, Balasubramanian JB, Visweswaran S, Gopalakrishnan V. Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure. Data. 2017;2:5. [PMID: 28331847 PMCID: PMC5358670 DOI: 10.3390/data2010005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Number

Cited by Other Article(s)

Balasubramanian JB, Boes RD, Gopalakrishnan V. A novel approach to modeling multifactorial diseases using Ensemble Bayesian Rule classifiers. J Biomed Inform 2020;107:103455. [PMID: 32497685 DOI: 10.1016/j.jbi.2020.103455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 03/26/2020] [Accepted: 05/10/2020] [Indexed: 10/24/2022]

Abstract

Modeling factors influencing disease phenotypes, from biomarker profiling study datasets, is a critical task in biomedicine. Such datasets are typically generated from high-throughput 'omic' technologies, which help examine disease mechanisms at an unprecedented resolution. These datasets are challenging because they are high-dimensional. The disease mechanisms they study are also complex because many diseases are multifactorial, resulting from the collective activity of several factors, each with a small effect. Bayesian rule learning (BRL) is a rule model inferred from learning Bayesian networks from data, and has been shown to be effective in modeling high-dimensional datasets. However, BRL is not efficient at modeling multifactorial diseases since it suffers from data fragmentation during learning. In this paper, we overcome this limitation by implementing and evaluating three types of ensemble model combination strategies with BRL- uniform combination (UC; same as Bagging), Bayesian model averaging (BMA), and Bayesian model combination (BMC)- collectively called Ensemble Bayesian Rule Learning (EBRL). We also introduce a novel method to visualize EBRL models, called the Bayesian Rule Ensemble Visualizing tool (BREVity), which helps extract interpret the most important rule patterns guiding the predictions made by the ensemble model. Our results using twenty-five public, high-dimensional, gene expression datasets of multifactorial diseases, suggest that, both EBRL models using UC and BMC achieve better predictive performance than BMA and other classic machine learning methods. Furthermore, BMC is found to be more reliable than UC, when the ensemble includes sub-optimal models resulting from the stochasticity of the model search process. Together, EBRL and BREVity provides researchers a promising and novel tool for modeling multifactorial diseases from high-dimensional datasets that leverages strengths of ensemble methods for predictive performance, while also providing interpretable explanations for its predictions.

Collapse

Lustgarten JL, Zehnder A, Shipman W, Gancher E, Webb TL. Veterinary informatics: forging the future between veterinary medicine, human medicine, and One Health initiatives-a joint paper by the Association for Veterinary Informatics (AVI) and the CTSA One Health Alliance (COHA). JAMIA Open 2020;3:306-317. [PMID: 32734172 PMCID: PMC7382640 DOI: 10.1093/jamiaopen/ooaa005] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 12/26/2019] [Accepted: 02/26/2020] [Indexed: 12/25/2022] Open

Abstract

Objectives

This manuscript reviews the current state of veterinary medical electronic health records and the ability to aggregate and analyze large datasets from multiple organizations and clinics. We also review analytical techniques as well as research efforts into veterinary informatics with a focus on applications relevant to human and animal medicine. Our goal is to provide references and context for these resources so that researchers can identify resources of interest and translational opportunities to advance the field.

Methods and Results

This review covers various methods of veterinary informatics including natural language processing and machine learning techniques in brief and various ongoing and future projects. After detailing techniques and sources of data, we describe some of the challenges and opportunities within veterinary informatics as well as providing reviews of common One Health techniques and specific applications that affect both humans and animals.

Discussion

Current limitations in the field of veterinary informatics include limited sources of training data for developing machine learning and artificial intelligence algorithms, siloed data between academic institutions, corporate institutions, and many small private practices, and inconsistent data formats that make many integration problems difficult. Despite those limitations, there have been significant advancements in the field in the last few years and continued development of a few, key, large data resources that are available for interested clinicians and researchers. These real-world use cases and applications show current and significant future potential as veterinary informatics grows in importance. Veterinary informatics can forge new possibilities within veterinary medicine and between veterinary medicine, human medicine, and One Health initiatives.

Collapse

Cai C, Cooper GF, Lu KN, Ma X, Xu S, Zhao Z, Chen X, Xue Y, Lee AV, Clark N, Chen V, Lu S, Chen L, Yu L, Hochheiser HS, Jiang X, Wang QJ, Lu X. Systematic discovery of the functional impact of somatic genome alterations in individual tumors through tumor-specific causal inference. PLoS Comput Biol 2019;15:e1007088. [PMID: 31276486 PMCID: PMC6650088 DOI: 10.1371/journal.pcbi.1007088] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 07/23/2019] [Accepted: 05/09/2019] [Indexed: 02/07/2023] Open

Abstract

Cancer is mainly caused by somatic genome alterations (SGAs). Precision oncology involves identifying and targeting tumor-specific aberrations resulting from causative SGAs. We developed a novel tumor-specific computational framework that finds the likely causative SGAs in an individual tumor and estimates their impact on oncogenic processes, which suggests the disease mechanisms that are acting in that tumor. This information can be used to guide precision oncology. We report a tumor-specific causal inference (TCI) framework, which estimates causative SGAs by modeling causal relationships between SGAs and molecular phenotypes (e.g., transcriptomic, proteomic, or metabolomic changes) within an individual tumor. We applied the TCI algorithm to tumors from The Cancer Genome Atlas (TCGA) and estimated for each tumor the SGAs that causally regulate the differentially expressed genes (DEGs) in that tumor. Overall, TCI identified 634 SGAs that are predicted to cause cancer-related DEGs in a significant number of tumors, including most of the previously known drivers and many novel candidate cancer drivers. The inferred causal relationships are statistically robust and biologically sensible, and multiple lines of experimental evidence support the predicted functional impact of both the well-known and the novel candidate drivers that are predicted by TCI. TCI provides a unified framework that integrates multiple types of SGAs and molecular phenotypes to estimate which genome perturbations are causally influencing one or more molecular/cellular phenotypes in an individual tumor. By identifying major candidate drivers and revealing their functional impact in an individual tumor, TCI sheds light on the disease mechanisms of that tumor, which can serve to advance our basic knowledge of cancer biology and to support precision oncology that provides tailored treatment of individual tumors.

Collapse

Affiliation(s)

Chunhui Cai Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Gregory F. Cooper Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Kevin N. Lu Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Xiaojun Ma Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
Shuping Xu Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, United States of America
Zhenlong Zhao Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, United States of America
Xueer Chen Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Yifan Xue Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Adrian V. Lee Center for Causal Discovery, Pittsburgh, PA, United States of America Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, United States of America Magee Women’s Cancer Research Center, Pittsburgh, PA, United States of America UPMC Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA, United States of America
Nathan Clark Center for Causal Discovery, Pittsburgh, PA, United States of America Department of Computational Biology and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
Vicky Chen Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Songjian Lu Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Lujia Chen Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Liyue Yu Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Harry S. Hochheiser Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Xia Jiang Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America
Q. Jane Wang Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, United States of America * E-mail: (QJW); (XL)
Xinghua Lu Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America Center for Causal Discovery, Pittsburgh, PA, United States of America UPMC Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA, United States of America * E-mail: (QJW); (XL)

Collapse

Balasubramanian JB, Gopalakrishnan V. Tunable structure priors for Bayesian rule learning for knowledge integrated biomarker discovery. World J Clin Oncol 2018;9:98-109. [PMID: 30254965 PMCID: PMC6153126 DOI: 10.5306/wjco.v9.i5.98] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 07/24/2018] [Accepted: 08/05/2018] [Indexed: 02/06/2023] Open

Abstract

AIM

To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.

METHODS

Bayesian rule learning (BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks (BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other state-of-the-art classifiers commonly used in biomedicine.

RESULTS

We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true data-generating model and the model learned by BRL_p. We specified the true model using informative structure priors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve (AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor (EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.

CONCLUSION

BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.

Collapse