1
|
Wang R, Fang L, Wang Y, Jin J. Identifying Effect Modification of Latent Population Characteristics on Risk Factors with a Sparse Varying Coefficient Regression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.30.626101. [PMID: 39677704 PMCID: PMC11642784 DOI: 10.1101/2024.11.30.626101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Leveraging observational data to understand the associations between risk factors and disease outcomes and conduct disease risk prediction is a common task in epidemiology. While traditional linear regression and other machine learning models have been extensively implemented for this task, the associations between risk factors and disease outcomes are typically deemed fixed. In many cases, however, such associations may vary by some underlying features of the individuals, which may involve certain subpopulation characteristics and environmental factors. While data for these latent features may not be available, the observed data on risk factors may have captured some proportion of the variation in these features. Thus extracting latent factors from risk factors and incorporating this effect modification into the model may better capture the underlying data structure and improve inference. We develop a novel regression model with some coefficients varying as functions of latent features extracted from the risk factors. We have demonstrated the superiority of our approach in various data settings via simulation studies. An application on a dataset for lung cancer patients from The Cancer Genome Atlas (TCGA) Program showed that our approach led to a 6% - 118% increase in (AUC-0.5) for distinguishing between different lung cancer stages compared to the classic lasso and elastic net regressions and identified interesting latent effect modifications associated with certain gene pathways.
Collapse
|
2
|
Jin J, Wang Y. T2-DAG: a powerful test for differentially expressed gene pathways via graph-informed structural equation modeling. Bioinformatics 2022; 38:1005-1014. [PMID: 34755844 PMCID: PMC8796375 DOI: 10.1093/bioinformatics/btab770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/01/2021] [Accepted: 11/04/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION A major task in genetic studies is to identify genes related to human diseases and traits to understand functional characteristics of genetic mutations and enhance patient diagnosis. Compared with marginal analyses of individual genes, identification of gene pathways, i.e. a set of genes with known interactions that collectively contribute to specific biological functions, can provide more biologically meaningful results. Such gene pathway analysis can be formulated into a high-dimensional two-sample testing problem. Given the typically limited sample size of gene expression datasets, most existing two-sample tests tend to have compromised powers because they ignore or only inefficiently incorporate the auxiliary pathway information on gene interactions. RESULTS We propose T2-DAG, a Hotelling's T2-type test for detecting differentially expressed gene pathways, which efficiently leverages the auxiliary pathway information on gene interactions from existing pathway databases through a linear structural equation model. We further establish its asymptotic distribution under pertinent assumptions. Simulation studies under various scenarios show that T2-DAG outperforms several representative existing methods with well-controlled type-I error rates and substantially improved powers, even with incomplete or inaccurate pathway information or unadjusted confounding effects. We also illustrate the performance of T2-DAG in an application to detect differentially expressed KEGG pathways between different stages of lung cancer. AVAILABILITY AND IMPLEMENTATION The R (R Development Core Team, 2021) package T2DAG which implements the proposed T2-DAG test is available on Github at https://github.com/Jin93/T2DAG. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jin Jin
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Yue Wang
- School of Mathematical and Natural Sciences, Arizona State University, Glendale, AZ 85306, USA
| |
Collapse
|
3
|
Moncho-Amor V, Pintado-Berninches L, Ibañez de Cáceres I, Martín-Villar E, Quintanilla M, Chakravarty P, Cortes-Sempere M, Fernández-Varas B, Rodriguez-Antolín C, de Castro J, Sastre L, Perona R. Role of Dusp6 Phosphatase as a Tumor Suppressor in Non-Small Cell Lung Cancer. Int J Mol Sci 2019; 20:ijms20082036. [PMID: 31027181 PMCID: PMC6514584 DOI: 10.3390/ijms20082036] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 04/09/2019] [Accepted: 04/22/2019] [Indexed: 02/06/2023] Open
Abstract
DUSP6/MKP3 is a dual-specific phosphatase that regulates extracellular regulated kinase ERK1/2 and ERK5 activity, with an increasingly recognized role as tumor suppressor. In silico studies from Gene expression Omnibus (GEO) and Cancer Genome atlas (TCGA) databases reveal poor prognosis in those Non-small cell lung cancer (NSCLC) patients with low expression levels of DUSP6. In agreement with these data, here we show that DUSP6 plays a major role in the regulation of cell migration, motility and tumor growth. We have found upregulation in the expression of several genes involved in epithelial to mesenchymal transition (EMT) in NSCLC-DUSP6 depleted cells. Data obtained in RNA-seq studies carried out in DUSP6 depleted cells identified EGFR, TGF-β and WNT signaling pathways and several genes such as VAV3, RUNXR2, LEF1, FGFR2 whose expression is upregulated in these cells and therefore affecting cellular functions such as integrin mediated cell adhesion, focal adhesion and motility. Furthermore, EGF signaling pathway is activated via ERK5 and not ERK1/2 and TGF-β via SMAD2/3 in DUSP6 depleted cells. In summary DUSP6 is a tumor suppressor in NSCLC and re-establishment of its expression may be a potential strategy to revert poor outcome in NSCLC patients.
Collapse
Affiliation(s)
- Verónica Moncho-Amor
- Department of Experimental Models of Human Diseases, Instituto de Investigaciones Biomédicas C.S.I.C./U.A.M, 28029 Madrid, Spain.
- The Francis Crick Institute, London NW1 1ST, UK.
| | - Laura Pintado-Berninches
- Department of Experimental Models of Human Diseases, Instituto de Investigaciones Biomédicas C.S.I.C./U.A.M, 28029 Madrid, Spain.
| | - Inmaculada Ibañez de Cáceres
- Cancer Epigenetics Laboratory, INGEMM, Hospital Universitario La Paz, 28046 Madrid, Spain.
- Biomarkers and Experimental Therapeutics in Cancer, IdiPAZ, 28046 Madrid, Spain.
| | - Ester Martín-Villar
- Departamento de Biotecnología-Instituto de Investigaciones Biosanitarias, Facultad de Ciencias Experimentales, Universidad Francisco de Vitoria, 28223 Madrid, Spain.
| | - Miguel Quintanilla
- Department of Experimental Models of Human Diseases, Instituto de Investigaciones Biomédicas C.S.I.C./U.A.M, 28029 Madrid, Spain.
| | - Probir Chakravarty
- Bioinformatics, Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK.
| | - María Cortes-Sempere
- Department of Experimental Models of Human Diseases, Instituto de Investigaciones Biomédicas C.S.I.C./U.A.M, 28029 Madrid, Spain.
| | - Beatriz Fernández-Varas
- Department of Experimental Models of Human Diseases, Instituto de Investigaciones Biomédicas C.S.I.C./U.A.M, 28029 Madrid, Spain.
| | - Carlos Rodriguez-Antolín
- Cancer Epigenetics Laboratory, INGEMM, Hospital Universitario La Paz, 28046 Madrid, Spain.
- Biomarkers and Experimental Therapeutics in Cancer, IdiPAZ, 28046 Madrid, Spain.
| | - Javier de Castro
- Biomarkers and Experimental Therapeutics in Cancer, IdiPAZ, 28046 Madrid, Spain.
- Department of Oncology, Hospital Universitario La Paz, 28046 Madrid, Spain.
| | - Leandro Sastre
- Department of Experimental Models of Human Diseases, Instituto de Investigaciones Biomédicas C.S.I.C./U.A.M, 28029 Madrid, Spain.
- Biomarkers and Experimental Therapeutics in Cancer, IdiPAZ, 28046 Madrid, Spain.
- CIBER de Enfermedades Raras (CIBERER), 28029 Madrid, Spain.
| | - Rosario Perona
- Department of Experimental Models of Human Diseases, Instituto de Investigaciones Biomédicas C.S.I.C./U.A.M, 28029 Madrid, Spain.
- Biomarkers and Experimental Therapeutics in Cancer, IdiPAZ, 28046 Madrid, Spain.
- CIBER de Enfermedades Raras (CIBERER), 28029 Madrid, Spain.
| |
Collapse
|
4
|
Mallik S, Zhao Z. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis. Genes (Basel) 2017; 9:E7. [PMID: 29283433 PMCID: PMC5793160 DOI: 10.3390/genes9010007] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 12/12/2017] [Accepted: 12/12/2017] [Indexed: 01/18/2023] Open
Abstract
For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.
Collapse
Affiliation(s)
- Saurav Mallik
- Department of Computer Science & Engineering, Aliah University, Newtown, WB-700156, India.
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
| |
Collapse
|
5
|
Wu ZY, Li JR, Huang MH, Cheng JJ, Li H, Chen JH, Lv XQ, Peng ZG, Jiang JD. Internal driving factors leading to extrahepatic manifestation of the hepatitis C virus infection. Int J Mol Med 2017; 40:1792-1802. [PMID: 29039494 PMCID: PMC5716440 DOI: 10.3892/ijmm.2017.3175] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 09/26/2017] [Indexed: 02/07/2023] Open
Abstract
The hepatitis C virus (HCV) infection is associated with various extrahepatic manifestations, which are correlated with poor outcomes, and thus increase the morbidity and mortality of chronic hepatitis C (CHC). Therefore, understanding the internal linkages between systemic manifestations and HCV infection is helpful for treatment of CHC. Yet, the mechanism by which the virus evokes the systemic diseases remains to be elucidated. In the present study, using gene set enrichment analysis (GSEA) and signaling pathway impact analysis (SPIA), a comprehensive analysis of microarray data of mRNAs was conducted in HCV-infected and -uninfected Huh7.5 cells, and signaling pathways (which are significantly activated or inhibited) and certain molecules (which are commonly important in those signaling pathways) were selected. Forty signaling pathways were selected using GSEA, and eight signaling pathways were selected with SPIA. These pathways are associated with cancer, metabolism, environmental information processing and organismal systems, which provide important information for further clarifying the intrinsic associations between syndromes of HCV infection, of which seven pathways were not previously reported, including basal transcription factors, pathogenic Escherichia coli infection, shigellosis, gastric acid secretion, dorso-ventral axis formation, amoebiasis and cholinergic synapse. Ten genes, SOS1, RAF1, IFNA2, IFNG, MTHFR, IGF1, CALM3, UBE2B, TP53 and BMP7 whose expression may be the key internal driving molecules, were selected using the online tool Anni 2.1. Furthermore, the present study demonstrated the internal linkages between systemic manifestations and HCV infection, and presented the potential molecules that are key to those linkages.
Collapse
Affiliation(s)
- Zhou-Yi Wu
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Jian-Rui Li
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Meng-Hao Huang
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Jun-Jun Cheng
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Hu Li
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Jin-Hua Chen
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Xiao-Qin Lv
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Zong-Gen Peng
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| | - Jian-Dong Jiang
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, P.R. China
| |
Collapse
|
6
|
Porrello A, Piergentili RB. Contextualizing the Genes Altered in Bladder Neoplasms in Pediatric andTeen Patients Allows Identifying Two Main Classes of Biological ProcessesInvolved and New Potential Therapeutic Targets. Curr Genomics 2016; 17:33-61. [PMID: 27013923 PMCID: PMC4780474 DOI: 10.2174/1389202916666151014222603] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Revised: 06/29/2015] [Accepted: 07/08/2015] [Indexed: 12/19/2022] Open
Abstract
Research on bladder neoplasms in pediatric and teen patients (BNPTP) has described 21 genes, which are variously involved in this disease and are mostly responsible for deregulated cell proliferation. However, due to the limited number of publications on this subject, it is still unclear what type of relationships there are among these genes and which are the chances that, while having different molecular functions, they i) act as downstream effector genes of well-known pro- or anti- proliferative stimuli and/or interplay with biochemical pathways having oncological relevance or ii) are specific and, possibly, early biomarkers of these pathologies. A Gene Ontology (GO)-based analysis showed that these 21 genes are involved in biological processes, which can be split into two main classes: cell regulation-based and differentiation/development-based. In order to understand the involvement/overlapping with main cancer-related pathways, we performed a meta-analysis dependent on the 189 oncogenic signatures of the Molecular Signatures Database (OSMSD) curated by the Broad Institute. We generated a binary matrix with 53 gene signatures having at least one hit; this analysis i) suggests that some genes of the original list show inconsistencies and might need to be experimentally re- assessed or evaluated as biomarkers (in particular, ACTA2) and ii) allows hypothesizing that important (proto)oncogenes (E2F3, ERBB2/HER2, CCND1, WNT1, and YAP1) and (putative) tumor suppressors (BRCA1, RBBP8/CTIP, and RB1-RBL2/p130) may participate in the onset of this disease or worsen the observed phenotype, thus expanding the list of possible molecular targets for the treatment of BNPTP.
Collapse
Affiliation(s)
- A. Porrello
- Comprehensive Cancer Center (LCCC), University of North Carolina (UNC)-Chapel Hill, Chapel Hill, 27599 NC, USA
| | - R. b Piergentili
- Institute of Molecular Biology and Pathology at CNR (CNR-IBPM); Department of Biology and Biotechnologies, Sapienza – Università di Roma, Italy
| |
Collapse
|
7
|
Chang JTH, Lee YM, Huang RS. The impact of the Cancer Genome Atlas on lung cancer. Transl Res 2015; 166:568-85. [PMID: 26318634 PMCID: PMC4656061 DOI: 10.1016/j.trsl.2015.08.001] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 08/03/2015] [Indexed: 12/11/2022]
Abstract
The Cancer Genome Atlas (TCGA) has profiled more than 10,000 samples derived from 33 types of cancer to date, with the goal of improving our understanding of the molecular basis of cancer and advancing our ability to diagnose, treat, and prevent cancer. This review focuses on lung cancer as it is the leading cause of cancer-related mortality worldwide in both men and women. Particularly, non-small cell lung cancers (including lung adenocarcinoma and lung squamous cell carcinoma) were evaluated. Our goal was to demonstrate the impact of TCGA on lung cancer research under 4 themes: diagnostic markers, disease progression markers, novel therapeutic targets, and novel tools. Examples are given related to DNA mutation, copy number variation, messenger RNA, and microRNA expression along with methylation profiling.
Collapse
Affiliation(s)
- Jeremy T-H Chang
- Biological Sciences Collegiate Division, The University of Chicago, Chicago, Ill
| | - Yee Ming Lee
- Center for Personalized Therapeutics, The University of Chicago, Chicago, Ill
| | | |
Collapse
|