1
|
Yonekawa KE, Zhou C, Haaland WL, Wright DR. Nephrotoxin-Related Acute Kidney Injury and Predicting High-Risk Medication Combinations in the Hospitalized Child. J Hosp Med 2019; 14:462-467. [PMID: 30986180 DOI: 10.12788/jhm.3196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
BACKGROUND In the hospitalized patient, nephrotoxin exposure is one potentially modifiable risk factor for acute kidney injury (AKI). Clinical decision support based on nephrotoxin ordering was developed at our hospital to assist inpatient providers with the prevention or mitigation of nephrotoxin-related AKI. The initial decision support algorithm (Algorithm 1) was modified in order to align with a national AKI collaborative (Algorithm 2). OBJECTIVE Our first aim was to determine the impact of this alignment on the sensitivity and specificity of our nephrotoxin-related AKI detection system. Second, if the system efficacy was found to be suboptimal, we then sought to develop an improved model. DESIGN A retrospective cohort study in hospitalized patients between December 1, 2013 and November 30, 2015 (N = 14,779) was conducted. INTERVENTIONS With the goal of increasing nephrotoxin-related AKI detection sensitivity, a novel model based on the identification of combinations of high-risk medications was developed. RESULTS Application of the algorithms to our nephrotoxin use and AKI data resulted in sensitivities of 46.9% (Algorithm 1) and 43.3% (Algorithm 2, P = .22) and specificities of 73.6% and 89.3%, respectively (P < .001). Our novel AKI detection model was able to deliver a sensitivity of 74% and a specificity of 70%. CONCLUSIONS Modifications to our AKI detection system by adopting Algorithm 2, which included an expanded list of nephrotoxins and equally weighting each medication, did not improve our nephrotoxin-related AKI detection. It did improve our system's specificity. Sensitivity increased by >50% when we applied a novel algorithm based on observed data with identification of key medication combinations.
Collapse
Affiliation(s)
- Karyn E Yonekawa
- Department of Pediatrics, University of Washington School of Medicine, Seattle, Washington
| | - Chuan Zhou
- Department of Pediatrics, University of Washington School of Medicine, Seattle, Washington
- Seattle Children's Research Institute Center for Child Health, Behavior, and Development, Seattle, Washington
| | - Wren L Haaland
- Seattle Children's Research Institute Center for Child Health, Behavior, and Development, Seattle, Washington
| | - Davene R Wright
- Department of Pediatrics, University of Washington School of Medicine, Seattle, Washington
- Seattle Children's Research Institute Center for Child Health, Behavior, and Development, Seattle, Washington
| |
Collapse
|
2
|
Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy. Sci Rep 2016; 6:36812. [PMID: 27876821 PMCID: PMC5120272 DOI: 10.1038/srep36812] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 09/27/2016] [Indexed: 02/02/2023] Open
Abstract
Mining large datasets using machine learning approaches often leads to models that are hard to interpret and not amenable to the generation of hypotheses that can be experimentally tested. We present ‘Logic Optimization for Binary Input to Continuous Output’ (LOBICO), a computational approach that infers small and easily interpretable logic models of binary input features that explain a continuous output variable. Applying LOBICO to a large cancer cell line panel, we find that logic combinations of multiple mutations are more predictive of drug response than single gene predictors. Importantly, we show that the use of the continuous information leads to robust and more accurate logic models. LOBICO implements the ability to uncover logic models around predefined operating points in terms of sensitivity and specificity. As such, it represents an important step towards practical application of interpretable logic models.
Collapse
|
3
|
Tran D, Verma K, Ward K, Diaz D, Kataria E, Torabi A, Almeida A, Malfoy B, Stratford EW, Mitchell DC, Bryan BA. Functional genomics analysis reveals a MYC signature associated with a poor clinical prognosis in liposarcomas. THE AMERICAN JOURNAL OF PATHOLOGY 2015; 185:717-28. [PMID: 25622542 DOI: 10.1016/j.ajpath.2014.11.024] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2014] [Revised: 11/17/2014] [Accepted: 11/20/2014] [Indexed: 11/19/2022]
Abstract
Liposarcomas, which are malignant fatty tumors, are the second most common soft-tissue sarcomas. Several histologically defined liposarcoma subtypes exist, yet little is known about the molecular pathology that drives the diversity in these tumors. We used functional genomics to classify a panel of diverse liposarcoma cell lines based on hierarchical clustering of their gene expression profiles, indicating that liposarcoma gene expression profiles and histologic classification are not directly correlated. Boolean probability approaches based on cancer-associated properties identified differential expression in multiple genes, including MYC, as potentially affecting liposarcoma signaling networks and cancer outcome. We confirmed our method with a large panel of lipomatous tumors, revealing that MYC protein expression is correlated with patient survival. These data encourage increased reliance on genomic features in conjunction with histologic features for liposarcoma clinical characterization and lay the groundwork for using Boolean-based probabilities to identify prognostic biomarkers for clinical outcome in tumor patients.
Collapse
Affiliation(s)
- Dat Tran
- Department of Biomedical Sciences, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas
| | - Kundan Verma
- Department of Biomedical Sciences, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas
| | - Kristin Ward
- Department of Biomedical Sciences, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas
| | - Dolores Diaz
- Department of Biomedical Sciences, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas
| | - Esha Kataria
- Department of Biomedical Sciences, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas
| | - Alireza Torabi
- Department of Pathology, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas
| | | | | | - Eva W Stratford
- Cancer Stem Cell Innovation Centre and the Department of Tumor Biology, Institute of Cancer Research, Oslo University Hospital, Norwegian Radium Hospital, Oslo, Norway
| | - Dianne C Mitchell
- Department of Biomedical Sciences, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas
| | - Brad A Bryan
- Department of Biomedical Sciences, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso, Texas.
| |
Collapse
|
4
|
Qiu M, Khisamutdinov E, Zhao Z, Pan C, Choi JW, Leontis NB, Guo P. RNA nanotechnology for computer design and in vivo computation. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2013; 371:20120310. [PMID: 24000362 PMCID: PMC3758167 DOI: 10.1098/rsta.2012.0310] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Molecular-scale computing has been explored since 1989 owing to the foreseeable limitation of Moore's law for silicon-based computation devices. With the potential of massive parallelism, low energy consumption and capability of working in vivo, molecular-scale computing promises a new computational paradigm. Inspired by the concepts from the electronic computer, DNA computing has realized basic Boolean functions and has progressed into multi-layered circuits. Recently, RNA nanotechnology has emerged as an alternative approach. Owing to the newly discovered thermodynamic stability of a special RNA motif (Shu et al. 2011 Nat. Nanotechnol. 6, 658-667 (doi:10.1038/nnano.2011.105)), RNA nanoparticles are emerging as another promising medium for nanodevice and nanomedicine as well as molecular-scale computing. Like DNA, RNA sequences can be designed to form desired secondary structures in a straightforward manner, but RNA is structurally more versatile and more thermodynamically stable owing to its non-canonical base-pairing, tertiary interactions and base-stacking property. A 90-nucleotide RNA can exhibit 4⁹⁰ nanostructures, and its loops and tertiary architecture can serve as a mounting dovetail that eliminates the need for external linking dowels. Its enzymatic and fluorogenic activity creates diversity in computational design. Varieties of small RNA can work cooperatively, synergistically or antagonistically to carry out computational logic circuits. The riboswitch and enzymatic ribozyme activities and its special in vivo attributes offer a great potential for in vivo computation. Unique features in transcription, termination, self-assembly, self-processing and acid resistance enable in vivo production of RNA nanoparticles that harbour various regulators for intracellular manipulation. With all these advantages, RNA computation is promising, but it is still in its infancy. Many challenges still exist. Collaborations between RNA nanotechnologists and computer scientists are necessary to advance this nascent technology.
Collapse
Affiliation(s)
- Meikang Qiu
- Department of Computer Engineering, San Jose State University, San Jose, CA 95192, USA
| | - Emil Khisamutdinov
- Department of Pharmaceutical Science, University of Kentucky, Lexington, KY 40506, USA
| | - Zhengyi Zhao
- Department of Pharmaceutical Science, University of Kentucky, Lexington, KY 40506, USA
| | - Cheryl Pan
- Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY 40506, USA
| | - Jeong-Woo Choi
- Department of Chemical and Biomolecular Engineering, Sogang University, Seoul 121-742, Korea
| | - Neocles B. Leontis
- Department of Chemistry, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Peixuan Guo
- Department of Pharmaceutical Science, University of Kentucky, Lexington, KY 40506, USA
| |
Collapse
|
5
|
Kumar G, Breen EJ, Ranganathan S. Identification of ovarian cancer associated genes using an integrated approach in a Boolean framework. BMC SYSTEMS BIOLOGY 2013; 7:12. [PMID: 23383610 PMCID: PMC3605242 DOI: 10.1186/1752-0509-7-12] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 01/23/2013] [Indexed: 12/31/2022]
Abstract
Background Cancer is a complex disease where molecular mechanism remains elusive. A systems approach is needed to integrate diverse biological information for the prognosis and therapy risk assessment using mechanistic approach to understand gene interactions in pathways and networks and functional attributes to unravel the biological behaviour of tumors. Results We weighted the functional attributes based on various functional properties observed between cancerous and non-cancerous genes reported from literature. This weighing schema was then encoded in a Boolean logic framework to rank differentially expressed genes. We have identified 17 genes to be differentially expressed from a total of 11,173 genes, where ten genes are reported to be down-regulated via epigenetic inactivation and seven genes are up-regulated. Here, we report that the overexpressed genes IRAK1, CHEK1 and BUB1 may play an important role in ovarian cancer. We also show that these 17 genes can be used to form an ovarian cancer signature, to distinguish normal from ovarian cancer subjects and that the set of three genes, CHEK1, AR, and LYN, can be used to classify good and poor prognostic tumors. Conclusion We provided a workflow using a Boolean logic schema for the identification of differentially expressed genes by integrating diverse biological information. This integrated approach resulted in the identification of genes as potential biomarkers in ovarian cancer.
Collapse
Affiliation(s)
- Gaurav Kumar
- ARC Centre of Excellence in Bioinformatics and Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | | | | |
Collapse
|
6
|
McDermott JE, Wang J, Mitchell H, Webb-Robertson BJ, Hafen R, Ramey J, Rodland KD. Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data. ACTA ACUST UNITED AC 2012; 7:37-51. [PMID: 23335946 DOI: 10.1517/17530059.2012.718329] [Citation(s) in RCA: 121] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
INTRODUCTION: The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful molecular signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities for more sophisticated approaches to integrating purely statistical and expert knowledge-based approaches. AREAS COVERED: In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges that have been encountered in deriving valid and useful signatures of disease. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. EXPERT OPINION: Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to identify predictive signatures of disease are key to future success in the biomarker field. We will describe our recommendations for possible approaches to this problem including metrics for the evaluation of biomarkers.
Collapse
|
7
|
Wang G, Rong Y, Chen H, Pearson C, Du C, Simha R, Zeng C. Process-driven inference of biological network structure: feasibility, minimality, and multiplicity. PLoS One 2012; 7:e40330. [PMID: 22815739 PMCID: PMC3399897 DOI: 10.1371/journal.pone.0040330] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2012] [Accepted: 06/07/2012] [Indexed: 12/30/2022] Open
Abstract
A common problem in molecular biology is to use experimental data, such as microarray data, to infer knowledge about the structure of interactions between important molecules in subsystems of the cell. By approximating the state of each molecule as “on” or “off”, it becomes possible to simplify the problem, and exploit the tools of Boolean analysis for such inference. Amongst Boolean techniques, the process-driven approach has shown promise in being able to identify putative network structures, as well as stability and modularity properties. This paper examines the process-driven approach more formally, and makes four contributions about the computational complexity of the inference problem, under the “dominant inhibition” assumption of molecular interactions. The first is a proof that the feasibility problem (does there exist a network that explains the data?) can be solved in polynomial-time. Second, the minimality problem (what is the smallest network that explains the data?) is shown to be NP-hard, and therefore unlikely to result in a polynomial-time algorithm. Third, a simple polynomial-time heuristic is shown to produce near-minimal solutions, as demonstrated by simulation. Fourth, the theoretical framework explains how multiplicity (the number of network solutions to realize a given biological process), which can take exponential-time to compute, can instead be accurately estimated by a fast, polynomial-time heuristic.
Collapse
Affiliation(s)
- Guanyu Wang
- Department of Physics, George Washington University, Washington, D.C., United States of America
| | - Yongwu Rong
- Department of Mathematics, George Washington University, Washington, D.C., United States of America
| | - Hao Chen
- Department of Physics, George Washington University, Washington, D.C., United States of America
| | - Carl Pearson
- Department of Physics, George Washington University, Washington, D.C., United States of America
| | - Chenghang Du
- Department of Physics, George Washington University, Washington, D.C., United States of America
| | - Rahul Simha
- Department of Computer Science, George Washington University, Washington, D.C., United States of America
| | - Chen Zeng
- Department of Physics, George Washington University, Washington, D.C., United States of America
- Department of Physics, Huazhong University of Science and Technology, Wuhan, China
- * E-mail:
| |
Collapse
|
8
|
Hill SM, Neve RM, Bayani N, Kuo WL, Ziyad S, Spellman PT, Gray JW, Mukherjee S. Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology. BMC Bioinformatics 2012; 13:94. [PMID: 22578440 PMCID: PMC3503557 DOI: 10.1186/1471-2105-13-94] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Accepted: 04/19/2012] [Indexed: 01/21/2023] Open
Abstract
Background An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. Results We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. Conclusions The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge.
Collapse
Affiliation(s)
- Steven M Hill
- The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands.
| | | | | | | | | | | | | | | |
Collapse
|
9
|
A Boolean-based systems biology approach to predict novel genes associated with cancer: Application to colorectal cancer. BMC SYSTEMS BIOLOGY 2011; 5:35. [PMID: 21352556 PMCID: PMC3051904 DOI: 10.1186/1752-0509-5-35] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2010] [Accepted: 02/26/2011] [Indexed: 12/21/2022]
Abstract
Background Cancer has remarkable complexity at the molecular level, with multiple genes, proteins, pathways and regulatory interconnections being affected. We introduce a systems biology approach to study cancer that formally integrates the available genetic, transcriptomic, epigenetic and molecular knowledge on cancer biology and, as a proof of concept, we apply it to colorectal cancer. Results We first classified all the genes in the human genome into cancer-associated and non-cancer-associated genes based on extensive literature mining. We then selected a set of functional attributes proven to be highly relevant to cancer biology that includes protein kinases, secreted proteins, transcription factors, post-translational modifications of proteins, DNA methylation and tissue specificity. These cancer-associated genes were used to extract 'common cancer fingerprints' through these molecular attributes, and a Boolean logic was implemented in such a way that both the expression data and functional attributes could be rationally integrated, allowing for the generation of a guilt-by-association algorithm to identify novel cancer-associated genes. Finally, these candidate genes are interlaced with the known cancer-related genes in a network analysis aimed at identifying highly conserved gene interactions that impact cancer outcome. We demonstrate the effectiveness of this approach using colorectal cancer as a test case and identify several novel candidate genes that are classified according to their functional attributes. These genes include the following: 1) secreted proteins as potential biomarkers for the early detection of colorectal cancer (FXYD1, GUCA2B, REG3A); 2) kinases as potential drug candidates to prevent tumor growth (CDC42BPB, EPHB3, TRPM6); and 3) potential oncogenic transcription factors (CDK8, MEF2C, ZIC2). Conclusion We argue that this is a holistic approach that faithfully mimics cancer characteristics, efficiently predicts novel cancer-associated genes and has universal applicability to the study and advancement of cancer research.
Collapse
|
10
|
de Ridder J, Gerrits A, Bot J, de Haan G, Reinders M, Wessels L. Inferring combinatorial association logic networks in multimodal genome-wide screens. Bioinformatics 2010; 26:i149-57. [PMID: 20529900 PMCID: PMC2881395 DOI: 10.1093/bioinformatics/btq211] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION We propose an efficient method to infer combinatorial association logic networks from multiple genome-wide measurements from the same sample. We demonstrate our method on a genetical genomics dataset, in which we search for Boolean combinations of multiple genetic loci that associate with transcript levels. RESULTS Our method provably finds the global solution and is very efficient with runtimes of up to four orders of magnitude faster than the exhaustive search. This enables permutation procedures for determining accurate false positive rates and allows selection of the most parsimonious model. When applied to transcript levels measured in myeloid cells from 24 genotyped recombinant inbred mouse strains, we discovered that nine gene clusters are putatively modulated by a logical combination of trait loci rather than a single locus. A literature survey supports and further elucidates one of these findings. Due to our approach, optimal solutions for multi-locus logic models and accurate estimates of the associated false discovery rates become feasible. Our algorithm, therefore, offers a valuable alternative to approaches employing complex, albeit suboptimal optimization strategies to identify complex models. AVAILABILITY The MATLAB code of the prototype implementation is available on: http://bioinformatics.tudelft.nl/ or http://bioinformatics.nki.nl/.
Collapse
Affiliation(s)
- Jeroen de Ridder
- Delft Bioinformatics Lab, Delft University of Technology, 2628 CD Delft, The Netherlands
| | | | | | | | | | | |
Collapse
|
11
|
Bailly-Bechet M, Braunstein A, Pagnani A, Weigt M, Zecchina R. Inference of sparse combinatorial-control networks from gene-expression data: a message passing approach. BMC Bioinformatics 2010; 11:355. [PMID: 20587029 PMCID: PMC2909222 DOI: 10.1186/1471-2105-11-355] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 06/29/2010] [Indexed: 11/18/2022] Open
Abstract
Background Transcriptional gene regulation is one of the most important mechanisms in controlling many essential cellular processes, including cell development, cell-cycle control, and the cellular response to variations in environmental conditions. Genes are regulated by transcription factors and other genes/proteins via a complex interconnection network. Such regulatory links may be predicted using microarray expression data, but most regulation models suppose transcription factor independence, which leads to spurious links when many genes have highly correlated expression levels. Results We propose a new algorithm to infer combinatorial control networks from gene-expression data. Based on a simple model of combinatorial gene regulation, it includes a message-passing approach which avoids explicit sampling over putative gene-regulatory networks. This algorithm is shown to recover the structure of a simple artificial cell-cycle network model for baker's yeast. It is then applied to a large-scale yeast gene expression dataset in order to identify combinatorial regulations, and to a data set of direct medical interest, namely the Pleiotropic Drug Resistance (PDR) network. Conclusions The algorithm we designed is able to recover biologically meaningful interactions, as shown by recent experimental results [1]. Moreover, new cases of combinatorial control are predicted, showing how simple models taking this phenomenon into account can lead to informative predictions and allow to extract more putative regulatory interactions from microarray databases.
Collapse
Affiliation(s)
- Marc Bailly-Bechet
- ISI Foundation Viale Settimio Severo 65, Villa Gualino, I-10133 Torino, Italy
| | | | | | | | | |
Collapse
|
12
|
Park I, Lee KH, Lee D. Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets. ACTA ACUST UNITED AC 2010; 26:1506-12. [PMID: 20410052 DOI: 10.1093/bioinformatics/btq207] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION Gene set analysis has become an important tool for the functional interpretation of high-throughput gene expression datasets. Moreover, pattern analyses based on inferred gene set activities of individual samples have shown the ability to identify more robust disease signatures than individual gene-based pattern analyses. Although a number of approaches have been proposed for gene set-based pattern analysis, the combinatorial influence of deregulated gene sets on disease phenotype classification has not been studied sufficiently. RESULTS We propose a new approach for inferring combinatorial Boolean rules of gene sets for a better understanding of cancer transcriptome and cancer classification. To reduce the search space of the possible Boolean rules, we identify small groups of gene sets that synergistically contribute to the classification of samples into their corresponding phenotypic groups (such as normal and cancer). We then measure the significance of the candidate Boolean rules derived from each group of gene sets; the level of significance is based on the class entropy of the samples selected in accordance with the rules. By applying the present approach to publicly available prostate cancer datasets, we identified 72 significant Boolean rules. Finally, we discuss several identified Boolean rules, such as the rule of glutathione metabolism (down) and prostaglandin synthesis regulation (down), which are consistent with known prostate cancer biology. AVAILABILITY Scripts written in Python and R are available at http://biosoft.kaist.ac.kr/~ihpark/. The refined gene sets and the full list of the identified Boolean rules are provided in the Supplementary Material. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Inho Park
- Department of Bio and Brain Engineering, KAIST, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea
| | | | | |
Collapse
|