1
|
Zeng H, Yin C, Chai C, Wang Y, Dai Q, Sun H. Cancer gene identification through integrating causal prompting large language model with omics data-driven causal inference. Brief Bioinform 2025; 26:bbaf113. [PMID: 40072848 PMCID: PMC11899576 DOI: 10.1093/bib/bbaf113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 01/13/2025] [Accepted: 02/26/2025] [Indexed: 03/14/2025] Open
Abstract
Identifying genes causally linked to cancer from a multi-omics perspective is essential for understanding the mechanisms of cancer and improving therapeutic strategies. Traditional statistical and machine-learning methods that rely on generalized correlation approaches to identify cancer genes often produce redundant, biased predictions with limited interpretability, largely due to overlooking confounding factors, selection biases, and the nonlinear activation function in neural networks. In this study, we introduce a novel framework for identifying cancer genes across multiple omics domains, named ICGI (Integrative Causal Gene Identification), which leverages a large language model (LLM) prompted with causality contextual cues and prompts, in conjunction with data-driven causal feature selection. This approach demonstrates the effectiveness and potential of LLMs in uncovering cancer genes and comprehending disease mechanisms, particularly at the genomic level. However, our findings also highlight that current LLMs may not capture comprehensive information across all omics levels. By applying the proposed causal feature selection module to transcriptomic datasets from six cancer types in The Cancer Genome Atlas and comparing its performance with state-of-the-art methods, it demonstrates superior capability in identifying cancer genes that distinguish between cancerous and normal samples. Additionally, we have developed an online service platform that allows users to input a gene of interest and a specific cancer type. The platform provides automated results indicating whether the gene plays a significant role in cancer, along with clear and accessible explanations. Moreover, the platform summarizes the inference outcomes obtained from data-driven causal learning methods.
Collapse
Affiliation(s)
- Haolong Zeng
- School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, Changchun 130012, Jilin Province, China
| | - Chaoyi Yin
- School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, Changchun 130012, Jilin Province, China
| | - Chunyang Chai
- School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, Changchun 130012, Jilin Province, China
| | - Yuezhu Wang
- School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, Changchun 130012, Jilin Province, China
| | - Qi Dai
- College of Life Science and Medicine, Zhejiang Sci-Tech University, Second Street 928, Qiantang District, Hangzhou 310018, Zhejiang Province, China
| | - Huiyan Sun
- School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, Changchun 130012, Jilin Province, China
- International Center of Future Science, Jilin University, 3003 Qianjin Street, Changchun 130012, Jilin Province, China
- Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Jilin University, 3003 Qianjin Street, Changchun 130012, Jilin Province, China
| |
Collapse
|
2
|
Wu S, Yin C, Wang Y, Sun H. Identifying cancer prognosis genes through causal learning. Brief Bioinform 2024; 26:bbae721. [PMID: 39808115 PMCID: PMC11729728 DOI: 10.1093/bib/bbae721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 11/22/2024] [Accepted: 12/30/2024] [Indexed: 01/16/2025] Open
Abstract
Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models. Subsequently, an iterative conditional independence test combined with graph pruning is utilized to infer the causal skeleton, thereby pinpointing prognosis-related genes. Experiments on transcriptomic data from 18 cancer types sourced from The Cancer Genome Atlas Project demonstrate CPCG's effectiveness in predicting prognosis under four evaluation metrics. Validations on 24 additional datasets covering 12 cancer types from the Gene Expression Omnibus and the Chinese Glioma Genome Atlas Project further demonstrate CPCG's robustness and generalizability. CPCG identifies a concise but reliable set of genes, obviating the need for gene combination enumeration for survival time estimation. These genes are also proved closely linked to crucial biological processes in cancer. Moreover, CPCG constructs a stable causal skeleton and exhibits insensitivity to the order of data shuffling. Overall, CPCG is a powerful tool for extracting cancer prognostic biomarkers, offering interpretability, generalizability, and robustness. CPCG holds promise for facilitating targeted interventions in clinical treatment strategies.
Collapse
Affiliation(s)
- Siwei Wu
- School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, 130012 Changchun, China
| | - Chaoyi Yin
- School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, 130012 Changchun, China
| | - Yuezhu Wang
- School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, 130012 Changchun, China
| | - Huiyan Sun
- School of Artificial Intelligence, Jilin University, 3003 Qianjin Street, 130012 Changchun, China
- International Center of Future Science, Jilin University, 3003 Qianjin Street, 130012 Changchun, China
- Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Jilin University, 3003 Qianjin Street, 130012 Changchun, China
| |
Collapse
|
3
|
Zhang H, Yan C, Xia Y, Guan J, Zhou S. Causal Gene Identification Using Non-Linear Regression-Based Independence Tests. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:185-195. [PMID: 35139025 DOI: 10.1109/tcbb.2022.3149864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
With the development of biomedical techniques in the past decades, causal gene identification has become one of the most promising applications in human genome-based business, which can help doctors to evaluate the risk of certain genetic diseases and provide further treatment recommendations for potential patients. When no controlled experiments can be applied, machine learning techniques like causal inference-based methods are generally used to identify causal genes. Unfortunately, most of the existing methods detect disease-related genes by ranking-based strategies or feature selection techniques, which generally return a superset of the corresponding real causal genes. There are also some causal inference-based methods that can identify a part of real causal genes from those supersets, but they are just able to return a few causal genes. This is contrary to our knowledge, as many results from controlled experiments have demonstrated that a certain disease, especially cancer, is usually related to dozens or hundreds of genes. In this work, we present an effective approach for identifying causal genes from gene expression data by using a new search strategy based on non-linear regression-based independence tests, which is able to greatly reduce the search space, and simultaneously establish the causal relationships from the candidate genes to the disease variable. Extensive experiments on real-world cancer datasets show that our method is superior to the existing causal inference-based methods in three aspects: 1) our method can identify dozens of causal genes, and 1/3 ∼ 1/2 of the discovered causal genes can be verified by existing works that they are really directly related to the corresponding disease; 2) The discovered causal genes are able to distinguish the status or disease subtype of the target patient; 3) Most of the discovered causal genes are closely relevant to the disease variable.
Collapse
|
4
|
Multi-label causal feature selection based on neighbourhood mutual information. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01609-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
5
|
Zhang H, Zhou S, Yan C, Guan J, Wang X, Zhang J, Huan J. Learning Causal Structures Based on Divide and Conquer. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:3232-3243. [PMID: 32780709 DOI: 10.1109/tcyb.2020.3010004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article addresses two important issues of causal inference in the high-dimensional situation. One is how to reduce redundant conditional independence (CI) tests, which heavily impact the efficiency and accuracy of existing constraint-based methods. Another is how to construct the true causal graph from a set of Markov equivalence classes returned by these methods. For the first issue, we design a recursive decomposition approach where the original data (a set of variables) are first decomposed into two small subsets, each of which is then recursively decomposed into two smaller subsets until none of these subsets can be decomposed further. Redundant CI tests can be reduced by inferring causalities from these subsets. The advantage of this decomposition scheme lies in two aspects: 1) it requires only low-order CI tests and 2) it does not violate d -separation. The complete causality can be reconstructed by merging all the partial results of the subsets. For the second issue, we employ regression-based CI tests to check CIs in linear non-Gaussian additive noise cases, which can identify more causal directions by [Formula: see text] (or [Formula: see text]). Consequently, causal direction learning is no longer limited by the number of returned V -structures and consistent propagation. Extensive experiments show that the proposed method can not only substantially reduce redundant CI tests but also effectively distinguish the equivalence classes.
Collapse
|
6
|
Learning causal structures using hidden compact representation. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Zhang H, Yan C, Zhou S, Guan J, Zhang J. Combined cause inference: Definition, model and performance. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.06.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
8
|
Understanding the causal structure among the tags in marketing systems. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05552-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
9
|
Yan C, Zhou S. Effective and scalable causal partitioning based on low-order conditional independent tests. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
10
|
Cai R, Zhang Z, Hao Z, Winslett M. Sophisticated Merging Over Random Partitions: A Scalable and Robust Causal Discovery Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3623-3635. [PMID: 28858816 DOI: 10.1109/tnnls.2017.2734804] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Scalable causal discovery is an essential technology to a wide spectrum of applications, including biomedical studies and social network evolution analysis. To tackle the difficulty of high dimensionality, a number of solutions are proposed in the literature, generally dividing the original variable domain into smaller subdomains by computation intensive partitioning strategies. These approaches usually suffer significant structural errors when the partitioning strategies fail to recognize true causal edges across the output subdomains. Such a structural error accumulates quickly with the growing depth of recursive partitioning, due to the lack of correction mechanism over causally connected variables when they are wrongly divided into two subdomains, finally jeopardizing the robustness of the integrated results. This paper proposes a completely different strategy to solve the problem, powered by a lightweight random partitioning scheme together with a carefully designed merging algorithm over results from the random partitions. Based on the randomness properties of the partitioning scheme, we design a suite of tricks for the merging algorithm, in order to support propagation-based significance enhancement, maximal acyclic subgraph causal ordering, and order-sensitive redundancy elimination. Theoretical studies as well as empirical evaluations verify the genericity, effectiveness, and scalability of our proposal on both simulated and real-world causal structures when the scheme is used in combination with a variety of causal solvers known effective on smaller domains.
Collapse
|
11
|
Hong Y, Hao Z, Mai G, Huang H, Kumar Sangaiah A. Causal Discovery Combining K2 with Brain Storm Optimization Algorithm. Molecules 2018; 23:molecules23071729. [PMID: 30012940 PMCID: PMC6100085 DOI: 10.3390/molecules23071729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Revised: 07/07/2018] [Accepted: 07/09/2018] [Indexed: 11/16/2022] Open
Abstract
Exploring and detecting the causal relations among variables have shown huge practical values in recent years, with numerous opportunities for scientific discovery, and have been commonly seen as the core of data science. Among all possible causal discovery methods, causal discovery based on a constraint approach could recover the causal structures from passive observational data in general cases, and had shown extensive prospects in numerous real world applications. However, when the graph was sufficiently large, it did not work well. To alleviate this problem, an improved causal structure learning algorithm named brain storm optimization (BSO), is presented in this paper, combining K2 with brain storm optimization (K2-BSO). Here BSO is used to search optimal topological order of nodes instead of graph space. This paper assumes that dataset is generated by conforming to a causal diagram in which each variable is generated from its parent based on a causal mechanism. We designed an elaborate distance function for clustering step in BSO according to the mechanism of K2. The graph space therefore was reduced to a smaller topological order space and the order space can be further reduced by an efficient clustering method. The experimental results on various real-world datasets showed our methods outperformed the traditional search and score methods and the state-of-the-art genetic algorithm-based methods.
Collapse
Affiliation(s)
- Yinghan Hong
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China.
- School of Physics and Electronic Engineering, Hanshan Normal University, Chaozhou 521041, China.
| | - Zhifeng Hao
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China.
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China.
| | - Guizhen Mai
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China.
| | - Han Huang
- School of Software Engineering, South China University of Technology, Guangzhou 510006, China.
| | - Arun Kumar Sangaiah
- School of Computing Science and Engineering, Vellore Institute of Technology, Vellore-632014, Tamil Nadu, India.
| |
Collapse
|
12
|
Cai R, Liu M, Hu Y, Melton BL, Matheny ME, Xu H, Duan L, Waitman LR. Identification of adverse drug-drug interactions through causal association rule discovery from spontaneous adverse event reports. Artif Intell Med 2017; 76:7-15. [PMID: 28363289 DOI: 10.1016/j.artmed.2017.01.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2016] [Revised: 01/29/2017] [Accepted: 01/31/2017] [Indexed: 12/29/2022]
Abstract
OBJECTIVE Drug-drug interaction (DDI) is of serious concern, causing over 30% of all adverse drug reactions and resulting in significant morbidity and mortality. Early discovery of adverse DDI is critical to prevent patient harm. Spontaneous reporting systems have been a major resource for drug safety surveillance that routinely collects adverse event reports from patients and healthcare professionals. In this study, we present a novel approach to discover DDIs from the Food and Drug Administration's adverse event reporting system. METHODS Data-driven discovery of DDI is an extremely challenging task because higher-order associations require analysis of all combinations of drugs and adverse events and accurate estimate of the relationships between drug combinations and adverse event require cause-and-effect inference. To efficiently identify causal relationships, we introduce the causal concept into association rule mining by developing a method called Causal Association Rule Discovery (CARD). The properties of V-structures in Bayesian Networks are utilized in the search for causal associations. To demonstrate feasibility, CARD is compared to the traditional association rule mining (AR) method in DDI identification. RESULTS Based on physician evaluation of 100 randomly selected higher-order associations generated by CARD and AR, CARD is demonstrated to be more accurate in identifying known drug interactions compared to AR, 20% vs. 10% respectively. Moreover, CARD yielded a lower number of drug combinations that are unknown to interact, i.e., 50% for CARD and 79% for AR. CONCLUSION Evaluation analysis demonstrated that CARD is more likely to identify true causal drug variables and associations to adverse event.
Collapse
Affiliation(s)
- Ruichu Cai
- Faculty of Computer Science, Guangdong University of Technology, Guangzhou, People's Republic of China.
| | - Mei Liu
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, 66160, USA.
| | - Yong Hu
- Big Data Decision Institute, Jinan University, Guangzhou, People's Republic of China
| | | | - Michael E Matheny
- Geriatric Research Education & Clinical Care, Tennessee Valley Healthcare System, Veteran's Health Administration, Nashville, USA; Department of Biomedical Informatics, Department of Medicine, Division of General Internal Medicine, & Department of Biostatistics, Vanderbilt University, Nashville, USA
| | - Hua Xu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, USA
| | - Lian Duan
- Departent of Information Systems and Business Analytics, Hofstra University, Hempstead, USA
| | - Lemuel R Waitman
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, 66160, USA
| |
Collapse
|
13
|
Wei L, Bowen Z, Zhiyong C, Gao X, Liao M. Exploring local discriminative information from evolutionary profiles for cytokine–receptor interaction prediction. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.02.078] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
14
|
Multiple-cause discovery combined with structure learning for high-dimensional discrete data and application to stock prediction. Soft comput 2016. [DOI: 10.1007/s00500-015-1764-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
15
|
Chen J, Wang X, Liu B. iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions. Sci Rep 2016; 6:19062. [PMID: 26753561 PMCID: PMC4709562 DOI: 10.1038/srep19062] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 12/02/2015] [Indexed: 11/09/2022] Open
Abstract
The identification of microRNA precursors (pre-miRNAs) helps in understanding regulator in biological processes. The performance of computational predictors depends on their training sets, in which the negative sets play an important role. In this regard, we investigated the influence of benchmark datasets on the predictive performance of computational predictors in the field of miRNA identification, and found that the negative samples have significant impact on the predictive results of various methods. We constructed a new benchmark set with different data distributions of negative samples. Trained with this high quality benchmark dataset, a new computational predictor called iMiRNA-SSF was proposed, which employed various features extracted from RNA sequences. Experimental results showed that iMiRNA-SSF outperforms three state-of-the-art computational methods. For practical applications, a web-server of iMiRNA-SSF was established at the website http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/.
Collapse
Affiliation(s)
- Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| |
Collapse
|
16
|
Lotfi E, Keshavarz A. Gene expression microarray classification using PCA–BEL. Comput Biol Med 2014; 54:180-7. [DOI: 10.1016/j.compbiomed.2014.09.008] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 09/13/2014] [Accepted: 09/16/2014] [Indexed: 01/15/2023]
|
17
|
Wang Y, Chen K, Zhang J, Yao L, Li K, Jin Z, Ye Q, Guo X. Aging Influence on Gray Matter Structural Associations within the Default Mode Network Utilizing Bayesian Network Modeling. Front Aging Neurosci 2014; 6:105. [PMID: 24910613 PMCID: PMC4038778 DOI: 10.3389/fnagi.2014.00105] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Accepted: 05/14/2014] [Indexed: 01/21/2023] Open
Abstract
Recent neuroimaging studies have revealed normal aging-related alterations in functional and structural brain networks such as the default mode network (DMN). However, less is understood about specific brain structural dependencies or interactions between brain regions within the DMN in the normal aging process. In this study, using Bayesian network (BN) modeling, we analyzed gray matter volume data from 109 young and 82 old subjects to characterize the influence of aging on associations between core brain regions within the DMN. Furthermore, we investigated the discriminability of the aging-associated BN models for the young and old groups. Compared to their young counterparts, the old subjects showed significant reductions in connections from right inferior temporal cortex (ITC) to medial prefrontal cortex (mPFC), right hippocampus (HP) to right ITC, and mPFC to posterior cingulate cortex and increases in connections from left HP to mPFC and right inferior parietal cortex to right ITC. Moreover, the classification results showed that the aging-related BN models could predict group membership with 88.48% accuracy, 88.07% sensitivity, and 89.02% specificity. Our findings suggest that structural associations within the DMN may be affected by normal aging and provide crucial information about aging effects on brain structural networks.
Collapse
Affiliation(s)
- Yan Wang
- College of Information Science and Technology, Beijing Normal University, Beijing, China
| | - Kewei Chen
- Banner Alzheimer’s Institute and Banner Good Samaritan PET Center, Phoenix, AZ, USA
| | - Jiacai Zhang
- College of Information Science and Technology, Beijing Normal University, Beijing, China
| | - Li Yao
- College of Information Science and Technology, Beijing Normal University, Beijing, China
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
| | - Ke Li
- Laboratory of Magnetic Resonance Imaging, The 306th Hospital of People’s Liberation Army, Beijing, China
| | - Zhen Jin
- Laboratory of Magnetic Resonance Imaging, The 306th Hospital of People’s Liberation Army, Beijing, China
| | - Qing Ye
- College of Information Science and Technology, Beijing Normal University, Beijing, China
| | - Xiaojuan Guo
- College of Information Science and Technology, Beijing Normal University, Beijing, China
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
| |
Collapse
|
18
|
Liu M, Cai R, Hu Y, Matheny ME, Sun J, Hu J, Xu H. Determining molecular predictors of adverse drug reactions with causality analysis based on structure learning. J Am Med Inform Assoc 2013; 21:245-51. [PMID: 24334612 DOI: 10.1136/amiajnl-2013-002051] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE Adverse drug reaction (ADR) can have dire consequences. However, our current understanding of the causes of drug-induced toxicity is still limited. Hence it is of paramount importance to determine molecular factors of adverse drug responses so that safer therapies can be designed. METHODS We propose a causality analysis model based on structure learning (CASTLE) for identifying factors that contribute significantly to ADRs from an integration of chemical and biological properties of drugs. This study aims to address two major limitations of the existing ADR prediction studies. First, ADR prediction is mostly performed by assessing the correlations between the input features and ADRs, and the identified associations may not indicate causal relations. Second, most predictive models lack biological interpretability. RESULTS CASTLE was evaluated in terms of prediction accuracy on 12 organ-specific ADRs using 830 approved drugs. The prediction was carried out by first extracting causal features with structure learning and then applying them to a support vector machine (SVM) for classification. Through rigorous experimental analyses, we observed significant increases in both macro and micro F1 scores compared with the traditional SVM classifier, from 0.88 to 0.89 and 0.74 to 0.81, respectively. Most importantly, identified links between the biological factors and organ-specific drug toxicities were partially supported by evidence in Online Mendelian Inheritance in Man. CONCLUSIONS The proposed CASTLE model not only performed better in prediction than the baseline SVM but also produced more interpretable results (ie, biological factors responsible for ADRs), which is critical to discovering molecular activators of ADRs.
Collapse
Affiliation(s)
- Mei Liu
- Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, USA
| | | | | | | | | | | | | |
Collapse
|