1
|
Artificial intelligence in cancer target identification and drug discovery. Signal Transduct Target Ther 2022; 7:156. [PMID: 35538061 PMCID: PMC9090746 DOI: 10.1038/s41392-022-00994-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 03/14/2022] [Accepted: 04/05/2022] [Indexed: 02/08/2023] Open
Abstract
Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates.
Collapse
|
2
|
Zannotti A, Greco S, Pellegrino P, Giantomassi F, Delli Carpini G, Goteri G, Ciavattini A, Ciarmela P. Macrophages and Immune Responses in Uterine Fibroids. Cells 2021; 10:cells10050982. [PMID: 33922329 PMCID: PMC8146588 DOI: 10.3390/cells10050982] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 04/19/2021] [Accepted: 04/20/2021] [Indexed: 12/12/2022] Open
Abstract
Uterine fibroids represent the most common benign tumors of the uterus. They are considered a typical fibrotic disorder. In fact, the extracellular matrix (ECM) proteins—above all, collagen 1A1, fibronectin and versican—are upregulated in this pathology. The uterine fibroids etiology has not yet been clarified, and this represents an important matter about their resolution. A model has been proposed according to which the formation of an altered ECM could be the result of an excessive wound healing, in turn driven by a dysregulated inflammation process. A lot of molecules act in the complex inflammatory response. Macrophages have a great flexibility since they can assume different phenotypes leading to the tissue repair process. The dysregulation of macrophage proliferation, accumulation and infiltration could lead to an uncontrolled tissue repair and to the consequent pathological fibrosis. In addition, molecules such as monocyte chemoattractant protein-1 (MCP-1), granulocyte macrophage-colony-stimulating factor (GM-CSF), transforming growth factor-beta (TGF-β), activin A and tumor necrosis factor-alfa (TNF-α) were demonstrated to play an important role in the macrophage action within the uncontrolled tissue repair that contributes to the pathological fibrosis that represents a typical feature of the uterine fibroids.
Collapse
Affiliation(s)
- Alessandro Zannotti
- Department of Specialist and Odontostomatological Clinical Sciences, Università Politecnica delle Marche, 60126 Ancona, Italy; (A.Z.); (G.D.C.); (A.C.)
- Department of Experimental and Clinical Medicine, Università Politecnica delle Marche, 60126 Ancona, Italy; (S.G.); (P.P.)
| | - Stefania Greco
- Department of Experimental and Clinical Medicine, Università Politecnica delle Marche, 60126 Ancona, Italy; (S.G.); (P.P.)
| | - Pamela Pellegrino
- Department of Experimental and Clinical Medicine, Università Politecnica delle Marche, 60126 Ancona, Italy; (S.G.); (P.P.)
| | - Federica Giantomassi
- Department of Biomedical Sciences and Public Health, Università Politecnica delle Marche, 60126 Ancona, Italy; (F.G.); (G.G.)
| | - Giovanni Delli Carpini
- Department of Specialist and Odontostomatological Clinical Sciences, Università Politecnica delle Marche, 60126 Ancona, Italy; (A.Z.); (G.D.C.); (A.C.)
| | - Gaia Goteri
- Department of Biomedical Sciences and Public Health, Università Politecnica delle Marche, 60126 Ancona, Italy; (F.G.); (G.G.)
| | - Andrea Ciavattini
- Department of Specialist and Odontostomatological Clinical Sciences, Università Politecnica delle Marche, 60126 Ancona, Italy; (A.Z.); (G.D.C.); (A.C.)
| | - Pasquapina Ciarmela
- Department of Experimental and Clinical Medicine, Università Politecnica delle Marche, 60126 Ancona, Italy; (S.G.); (P.P.)
- Correspondence: ; Tel.:+39-071-220-6270
| |
Collapse
|
3
|
Nersisyan S, Galatenko A, Galatenko V, Shkurnikov M, Tonevitsky A. miRGTF-net: Integrative miRNA-gene-TF network analysis reveals key drivers of breast cancer recurrence. PLoS One 2021; 16:e0249424. [PMID: 33852600 PMCID: PMC8046230 DOI: 10.1371/journal.pone.0249424] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 03/17/2021] [Indexed: 12/14/2022] Open
Abstract
Analysis of regulatory networks is a powerful framework for identification and quantification of intracellular interactions. We introduce miRGTF-net, a novel tool for construction of miRNA-gene-TF networks. We consider multiple transcriptional and post-transcriptional interaction types, including regulation of gene and miRNA expression by transcription factors, gene silencing by miRNAs, and co-expression of host genes with their intronic miRNAs. The underlying algorithm uses information on experimentally validated interactions as well as integrative miRNA/mRNA expression profiles in a given set of samples. The latter ensures simultaneous tissue-specificity and biological validity of interactions. We applied miRGTF-net to paired miRNA/mRNA-sequencing data of breast cancer samples from The Cancer Genome Atlas (TCGA). Together with topological analysis of the constructed network we showed that considered players can form reliable prognostic gene signatures for ER-positive breast cancer. A number of signatures demonstrated remarkably high accuracy on transcriptomic data obtained by both microarrays and RNA sequencing from several independent patient cohorts. Furthermore, an essential part of prognostic genes were identified as direct targets of transcription factor E2F1. The putative interplay between estrogen receptor alpha and E2F1 was suggested as a potential recurrence factor in patients treated with tamoxifen. Source codes of miRGTF-net are available at GitHub (https://github.com/s-a-nersisyan/miRGTF-net).
Collapse
Affiliation(s)
- Stepan Nersisyan
- Faculty of Biology and Biotechnology, HSE University, Moscow, Russia
- * E-mail:
| | - Alexei Galatenko
- Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Moscow, Russia
- Moscow Center for Fundamental and Applied Mathematics, Moscow, Russia
| | - Vladimir Galatenko
- Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Moscow, Russia
| | - Maxim Shkurnikov
- P.A. Hertsen Moscow Oncology Research Center, Branch of National Medical Research Radiological Center, Ministry of Health of the Russian Federation, Moscow, Russia
| | | |
Collapse
|
4
|
Whole Genome 5'-Methylcytosine Level Quantification in Cirrhotic HCV-Infected Egyptian Patients with and without Hepatocellular Carcinoma. Int J Genomics 2020; 2020:1769735. [PMID: 33083446 PMCID: PMC7556053 DOI: 10.1155/2020/1769735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 08/26/2020] [Accepted: 09/20/2020] [Indexed: 11/18/2022] Open
Abstract
DNA methylation is an epigenetic mechanism used by cells to control gene expression. DNA methylation is a commonly used epigenetic signaling tool that can hold genes in the “off” position. Chronic infection with hepatitis C virus (HCV) is considered a major risk for chronic liver impairment. It is the most common leading cause of HCC. The present work is aimed at studying whole genome 5′-methylcytosine levels in cirrhotic HCV-infected Egyptian patients. In the present study, 120 Egyptian adults were included. They were divided into two groups: group І (40 apparently healthy control subjects) and group ІІ (80 HCV-infected patients). Furthermore, group II was subdivided into 2 subgroups according to the presence of HCC in HCV-infected subjects. To all studied subjects, the level of 5-mC% was measured in peripheral blood. In the present study, the median of 5′-methylcytosine% in the control group (group I) was 2.5, in the HCV group (group IIa) was 2.45, and in the HCC group (group II b) was 2.25. A stepwise decrease in 5′-methylcytosine% from the control (group I) toward HCC (group IIb) was observed, taking into consideration that the stepwise global hypomethylation was not statistically significant (p = 0.811). There was a negative correlation between ALT and 5′-methylcytosine% (p = −0.029). From this study, we can conclude that global DNA 5′-methylcytosine% does not differ in HCV-infected cirrhotic patients and HCC patients when compared to normal controls. Consecutively, we had concluded that there is no impact of 5′-methylcytosine% on the development of liver cirrhosis or HCC. Moreover, the negative correlation between 5′-methylcytosine% and serum ALT level denotes a trend of decrease in 5′-methylcytosine% with more liver damage.
Collapse
|
5
|
Lu K, Yang K, Niyongabo E, Shu Z, Wang J, Chang K, Zou Q, Jiang J, Jia C, Liu B, Zhou X. Integrated network analysis of symptom clusters across disease conditions. J Biomed Inform 2020; 107:103482. [PMID: 32535270 DOI: 10.1016/j.jbi.2020.103482] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 05/18/2020] [Accepted: 06/08/2020] [Indexed: 10/24/2022]
Abstract
Identifying the symptom clusters (two or more related symptoms) with shared underlying molecular mechanisms has been a vital analysis task to promote the symptom science and precision health. Related studies have applied the clustering algorithms (e.g. k-means, latent class model) to detect the symptom clusters mostly from various kinds of clinical data. In addition, they focused on identifying the symptom clusters (SCs) for a specific disease, which also mainly concerned with the clinical regularities for symptom management. Here, we utilized a network-based clustering algorithm (i.e., BigCLAM) to obtain 208 typical SCs across disease conditions on a large-scale symptom network derived from integrated high-quality disease-symptom associations. Furthermore, we evaluated the underlying shared molecular mechanisms for SCs, i.e., shared genes, protein-protein interaction (PPI) and gene functional annotations using integrated networks and similarity measures. We found that the symptoms in the same SCs tend to share a higher degree of genes, PPIs and have higher functional homogeneities. In addition, we found that most SCs have related symptoms with shared underlying molecular mechanisms (e.g. enriched pathways) across different disease conditions. Our work demonstrated that the integrated network analysis method could be used for identifying robust SCs and investigate the molecular mechanisms of these SCs, which would be valuable for symptom science and precision health.
Collapse
Affiliation(s)
- Kezhi Lu
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Kuo Yang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Edouard Niyongabo
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Zixin Shu
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Jingjing Wang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Kai Chang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Qunsheng Zou
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Jiyue Jiang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Caiyan Jia
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Baoyan Liu
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China.
| | - Xuezhong Zhou
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China; Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China.
| |
Collapse
|
6
|
Guo L, Zhang A, Xiong J. Identification of specific microRNA-messenger RNA regulation pairs in four subtypes of breast cancer. IET Syst Biol 2020; 14:120-126. [PMID: 32406376 PMCID: PMC8687302 DOI: 10.1049/iet-syb.2019.0086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 10/04/2019] [Accepted: 12/13/2019] [Indexed: 01/01/2023] Open
Abstract
Four subtypes of breast cancer, luminal A, luminal B, basal-like, human epidermal growth factor receptor-enriched, have been identified based on gene expression profiles of human tumours. The goal of this study is to find whether the same groups' genes would exhibit different networks among the four subtypes. Differential expressed genes between each of the four subtypes and the normal samples were identified. The overlaps between the four groups of differentially expressed genes were used to construct regulations networks for each of the four subtypes. Univariate and multivariate Cox regressions were employed to test the genes in the four regulation networks. This study demonstrated that the common genes in four subtypes showed different regulation. Also, the hsa-miR-182 and decorin pair performs different functions among the four subtypes of breast cancer. The result indicated that heterogeneity of breast cancer is not only reflected in the different expression patterns among different genes, but also in the different regulatory networks of the same group of genes.
Collapse
Affiliation(s)
- Ling Guo
- College of Electrical Engineering, Northwest University for Nationalities, Lanzhou, 730030, People's Republic of China
| | - Aihua Zhang
- College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou, 730050, People's Republic of China.
| | - Jie Xiong
- Department of applied mathematics, Changsha University, Changsha, 410022, People's Republic of China
| |
Collapse
|
7
|
Mallik S, Bandyopadhyay S. WeCoMXP: Weighted Connectivity Measure Integrating Co-Methylation, Co-Expression and Protein-Protein Interactions for Gene-Module Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:690-703. [PMID: 30183644 DOI: 10.1109/tcbb.2018.2868348] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The identification of modules (groups of several tightly interconnected genes) in gene interaction network is an essential task for better understanding of the architecture of the whole network. In this article, we develop a novel weighted connectivity measure integrating co-methylation, co-expression, and protein-protein interactions (called WeCoMXP) to detect gene-modules for multi-omics dataset. The proposed measure goes beyond the fundamental degree centrality measure through considering some formulation of higher-order connections. Thereafter, we apply the average linkage clustering method using the corresponding dissimilarity (distance) values of WeCoMXP scores, and utilize a dynamic tree cut method for identifying some gene-modules. We validate the modules through literature search, KEGG pathway, and gene-ontology analyses on the genes representing the modules. Furthermore, the top 10 TFs/miRNAs that are connected with the maximum number of gene-modules and that regulate/target the maximum number of genes from these connected gene-modules, are identified. Moreover, our proposed method provides a better performance than the existing methods in terms of several cluster-validity indices in maximum times.
Collapse
|
8
|
Wu HY, Wei Y, Pan SL. Down-regulation and clinical significance of miR-7-2-3p in papillary thyroid carcinoma with multiple detecting methods. IET Syst Biol 2019; 13:225-233. [PMID: 31538956 PMCID: PMC8687168 DOI: 10.1049/iet-syb.2019.0025] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 05/30/2019] [Accepted: 06/10/2019] [Indexed: 04/05/2024] Open
Abstract
Altered miRNA expression participates in the biological progress of thyroid carcinoma and functions as a diagnostic marker or therapeutic agent. However, the role of miR-7-2-3p is currently unclear. The authors' study was the first investigation of miR-7-2-3p expression level and diagnostic ability in several public databases. Potential target genes were obtained from DIANA Tools, and function enrichment analysis was then performed. Furthermore, the authors examined expression levels of potential targets in the Human Protein Atlas (HPA) and the Cancer Genome Atlas (TCGA). Finally, the potential transcription factors (TFs) were predicted by JASPAR. TCGA, GSE62054, GSE73182, GSE40807, and GSE55780 revealed that miR-7-2-3p expression in papillary thyroid carcinoma (PTC) tissues was notably lower compared with non-tumour tissues, while its expression in E-MATB-736 showed no remarkable difference. Function enrichment analysis showed that 698 genes were enriched in pathways, including pathways in cancer, and glioma. CCND1, GSK3B, and ITGAV of pathways in cancer were inverse correlations with miR-7-2-3p in both post-transcription and protein levels. According to the TF prediction, the prospective upstream TFs of miR-7-2-3p were ISX, SPI1, PRRX1, and BARX1. MiR-7-2-3p was significantly down-regulated and may act on PTC progression by crucial pathways. However, the mechanisms of miR-7-2-3p need further investigation.
Collapse
Affiliation(s)
- Hua-Yu Wu
- Department of Cell Biology and Genetics, School of Pre-clinical Medicine, Guangxi Medical University, Nanning, 530021, Guangxi Zhuang Autonomous Region, People's Republic of China
| | - Yi Wei
- Department of Pathophysiology, School of Pre-clinical Medicine, Guangxi Medical University, Nanning, 530021, Guangxi Zhuang Autonomous Region, People's Republic of China
| | - Shang-Ling Pan
- Department of Pathophysiology, School of Pre-clinical Medicine, Guangxi Medical University, Nanning, 530021, Guangxi Zhuang Autonomous Region, People's Republic of China.
| |
Collapse
|
9
|
Shamsizadeh S, Goliaei S, Razaghi Moghadam Z. CAMIRADA: Cancer microRNA association discovery algorithm, a case study on breast cancer. J Biomed Inform 2019; 94:103180. [PMID: 31039404 DOI: 10.1016/j.jbi.2019.103180] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2019] [Revised: 04/04/2019] [Accepted: 04/17/2019] [Indexed: 12/18/2022]
Abstract
In recent studies, non-coding protein RNAs have been identified as microRNA that can be used as biomarkers for early diagnosis and treatment of cancer, that decrease mortality in cancer. A microRNA may target hundreds or thousands of genes and a gene may regulate several microRNAs, so determining which microRNA is associated with which cancer is a big challenge. Many computational methods have been performed to detect micoRNAs association with cancer, but more effort is needed with higher accuracy. Increasing research has shown that relationship between microRNAs and TFs play a significant role in the diagnosis of cancer. Therefore, we developed a new computational framework (CAMIRADA) to identify cancer-related microRNAs based on the relationship between microRNAs and disease genes (DG) in the protein network, the functional relationships between microRNAs and Transcription Factors (TF) on the co-expression network, and the relationship between microRNAs and the Differential Expression Gene (DEG) on co-expression network. The CAMIRADA was applied to assess breast cancer data from two HMDD and miR2Disease databases. In this study, the AUC for the 65 microRNAs of the top of the list was 0.95, which was more accurate than the similar methods used to detect microRNAs associated with the cancer artery.
Collapse
Affiliation(s)
- Sepideh Shamsizadeh
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran.
| | - Sama Goliaei
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran.
| | - Zahra Razaghi Moghadam
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran; Max Planck Institute of Molecular Plant Physiology, Posdam, Germany.
| |
Collapse
|
10
|
Diseases and their clinical heterogeneity – Are we ignoring the SNiPers and micRomaNAgers? An illustration using Beta-thalassemia clinical spectrum and fetal hemoglobin levels. Genomics 2019; 111:67-75. [DOI: 10.1016/j.ygeno.2018.01.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/18/2017] [Accepted: 01/03/2018] [Indexed: 12/18/2022]
|
11
|
Lee NK, Li X, Wang D. A comprehensive survey on genetic algorithms for DNA motif prediction. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.07.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
12
|
Li C, Dinu V. miR2Pathway: A novel analytical method to discover MicroRNA-mediated dysregulated pathways involved in hepatocellular carcinoma. J Biomed Inform 2018; 81:31-40. [PMID: 29578099 DOI: 10.1016/j.jbi.2018.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2017] [Revised: 02/10/2018] [Accepted: 03/21/2018] [Indexed: 10/17/2022]
Abstract
MicroRNAs (miRNAs) are small, non-coding RNAs involved in the regulation of gene expression at a post-transcriptional level. Recent studies have shown miRNAs as key regulators of a variety of biological processes, such as proliferation, differentiation, apoptosis, metabolism, etc. Aberrantly expressed miRNAs influence individual gene expression level, but rewired miRNA-mRNA connections can influence the activity of biological pathways. Here, we define rewired miRNA-mRNA connections as the differential (rewiring) effects on the activity of biological pathways between hepatocellular carcinoma (HCC) and normal phenotypes. Our work presented here uses a PageRank-based approach to measure the degree of miRNA-mediated dysregulation of biological pathways between HCC and normal samples based on rewired miRNA-mRNA connections. In our study, we regard the degree of miRNA-mediated dysregulation of biological pathways as disease risk of biological pathways. Therefore, we propose a new method, miR2Pathway, to measure and rank the degree of miRNA-mediated dysregulation of biological pathways by measuring the total differential influence of miRNAs on the activity of pathways between HCC and normal states. miR2Pathway proposed here systematically shows the first evidence for a mechanism of biological pathways being dysregulated by rewired miRNA-mRNA connections, and provides new insight into exploring mechanisms behind HCC. Thus, miR2Pathway is a novel method to identify and rank miRNA-dysregulated pathways in HCC.
Collapse
Affiliation(s)
- Chaoxing Li
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA.
| | - Valentin Dinu
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85255, USA.
| |
Collapse
|
13
|
Bandyopadhyay S, Mallik S. Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:673-687. [PMID: 28114033 DOI: 10.1109/tcbb.2016.2636207] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Identification of combinatorial markers from multiple data sources is a challenging task in bioinformatics. Here, we propose a novel computational framework for identifying significant combinatorial markers ( s) using both gene expression and methylation data. The gene expression and methylation data are integrated into a single continuous data as well as a (post-discretized) boolean data based on their intrinsic (i.e., inverse) relationship. A novel combined score of methylation and expression data (viz., ) is introduced which is computed on the integrated continuous data for identifying initial non-redundant set of genes. Thereafter, (maximal) frequent closed homogeneous genesets are identified using a well-known biclustering algorithm applied on the integrated boolean data of the determined non-redundant set of genes. A novel sample-based weighted support ( ) is then proposed that is consecutively calculated on the integrated boolean data of the determined non-redundant set of genes in order to identify the non-redundant significant genesets. The top few resulting genesets are identified as potential s. Since our proposed method generates a smaller number of significant non-redundant genesets than those by other popular methods, the method is much faster than the others. Application of the proposed technique on an expression and a methylation data for Uterine tumor or Prostate Carcinoma produces a set of significant combination of markers. We expect that such a combination of markers will produce lower false positives than individual markers.
Collapse
|
14
|
Mallik S, Zhao Z. Towards integrated oncogenic marker recognition through mutual information-based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles. QUANTITATIVE BIOLOGY 2017; 5:302-327. [PMID: 30221015 DOI: 10.1007/s40484-017-0119-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Background Marker detection is an important task in complex disease studies. Here we provide an association rule mining (ARM) based approach for identifying integrated markers through mutual information (MI) based statistically significant feature extraction, and apply it to acute myeloid leukemia (AML) and prostate carcinoma (PC) gene expression and methylation profiles. Methods We first collect the genes having both expression and methylation values in AML as well as PC. Next, we run Jarque-Bera normality test on the expression/methylation data to divide the whole dataset into two parts: one that ollows normal distribution and the other that does not follow normal distribution. Thus, we have now four parts of the dataset: normally distributed expression data, normally distributed methylation data, non-normally distributed expression data, and non-normally distributed methylated data. A feature-extraction technique, "mRMR" is then utilized on each part. This results in a list of top-ranked genes. Next, we apply Welch t-test (parametric test) and Shrink t-test (non-parametric test) on the expression/methylation data for the top selected normally distributed genes and non-normally distributed genes, respectively. We then use a recent weighted ARM method, "RANWAR" to combine all/specific resultant genes to generate top oncogenic rules along with respective integrated markers. Finally, we perform literature search as well as KEGG pathway and Gene-Ontology (GO) analyses using Enrichr database for in silico validation of the prioritized oncogenes as the markers and labeling the markers as existing or novel. Results The novel markers of AML are {ABCB11↑∪KRT17↓} (i.e., ABCB11 as up-regulated, & KRT17 as down-regulated), and {AP1S1-∪KRT17↓∪NEIL2-∪DYDC1↓}) (i.e., AP1S1 and NEIL2 both as hypo-methylated, & KRT17 and DYDC1 both as down-regulated). The novel marker of PC is {UBIAD1¶∪APBA2‡∪C4orf31‡} (i.e., UBIAD1 as up-regulated and hypo-methylated, & APBA2 and C4orf31 both as down-regulated and hyper-methylated). Conclusion The identified novel markers might have critical roles in AML as well as PC. The approach can be applied to other complex disease.
Collapse
Affiliation(s)
- Saurav Mallik
- Computer Science & Engineering, Aliah University, Newtown, Newtown 700156, India
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
15
|
Lu S, Ma S, Wang Y, Huang T, Zhu Z, Zhao G. Mus musculus-microRNA-449a ameliorates neuropathic pain by decreasing the level of KCNMA1 and TRPA1, and increasing the level of TPTE. Mol Med Rep 2017; 16:353-360. [DOI: 10.3892/mmr.2017.6559] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 03/13/2017] [Indexed: 11/06/2022] Open
|
16
|
Mallik S, Bhadra T, Maulik U. Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data. IEEE Trans Nanobioscience 2017; 16:3-10. [PMID: 28092570 DOI: 10.1109/tnb.2017.2650217] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
Collapse
|
17
|
Mallik S, Sen S, Maulik U. IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data. Gene 2016; 586:87-96. [DOI: 10.1016/j.gene.2016.03.056] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 02/22/2016] [Accepted: 03/30/2016] [Indexed: 12/13/2022]
|