1
|
Lovis C, Zhang K, Li C, Jiang X, Kim Y. Scalable Causal Structure Learning: Scoping Review of Traditional and Deep Learning Algorithms and New Opportunities in Biomedicine. JMIR Med Inform 2023; 11:e38266. [PMID: 36649070 PMCID: PMC9890349 DOI: 10.2196/38266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 08/30/2022] [Accepted: 09/18/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Causal structure learning refers to a process of identifying causal structures from observational data, and it can have multiple applications in biomedicine and health care. OBJECTIVE This paper provides a practical review and tutorial on scalable causal structure learning models with examples of real-world data to help health care audiences understand and apply them. METHODS We reviewed traditional (combinatorial and score-based) methods for causal structure discovery and machine learning-based schemes. Various traditional approaches have been studied to tackle this problem, the most important among these being the Peter Spirtes and Clark Glymour algorithms. This was followed by analyzing the literature on score-based methods, which are computationally faster. Owing to the continuous constraint on acyclicity, there are new deep learning approaches to the problem in addition to traditional and score-based methods. Such methods can also offer scalability, particularly when there is a large amount of data involving multiple variables. Using our own evaluation metrics and experiments on linear, nonlinear, and benchmark Sachs data, we aimed to highlight the various advantages and disadvantages associated with these methods for the health care community. We also highlighted recent developments in biomedicine where causal structure learning can be applied to discover structures such as gene networks, brain connectivity networks, and those in cancer epidemiology. RESULTS We also compared the performance of traditional and machine learning-based algorithms for causal discovery over some benchmark data sets. Directed Acyclic Graph-Graph Neural Network has the lowest structural hamming distance (19) and false positive rate (0.13) based on the Sachs data set, whereas Greedy Equivalence Search and Max-Min Hill Climbing have the best false discovery rate (0.68) and true positive rate (0.56), respectively. CONCLUSIONS Machine learning-based approaches, including deep learning, have many advantages over traditional approaches, such as scalability, including a greater number of variables, and potentially being applied in a wide range of biomedical applications, such as genetics, if sufficient data are available. Furthermore, these models are more flexible than traditional models and are poised to positively affect many applications in the future.
Collapse
Affiliation(s)
| | - Kai Zhang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| | - Can Li
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| | - Yejin Kim
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, HOUSTON, TX, United States
| |
Collapse
|
2
|
A Novel Machine Learning 13-Gene Signature: Improving Risk Analysis and Survival Prediction for Clear Cell Renal Cell Carcinoma Patients. Cancers (Basel) 2022; 14:cancers14092111. [PMID: 35565241 PMCID: PMC9103317 DOI: 10.3390/cancers14092111] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 04/11/2022] [Accepted: 04/12/2022] [Indexed: 02/05/2023] Open
Abstract
Simple Summary Clear cell renal cell carcinoma is a type of kidney cancer which comprises the majority of all renal cell carcinomas. Many efforts have been made to identify biomarkers which could help healthcare professionals better treat this kind of cancer. With extensive public data available, we conducted a machine learning study to determine a gene signature that could indicate patient survival with high accuracy. Through the min-Redundancy and Max-Relevance algorithm we generated a signature of 13 genes highly correlated with patient outcomes. These findings reveal potential strategies for personalized medicine in the clinical practice. Abstract Patients with clear cell renal cell carcinoma (ccRCC) have poor survival outcomes, especially if it has metastasized. It is of paramount importance to identify biomarkers in genomic data that could help predict the aggressiveness of ccRCC and its resistance to drugs. Thus, we conducted a study with the aims of evaluating gene signatures and proposing a novel one with higher predictive power and generalization in comparison to the former signatures. Using ccRCC cohorts of the Cancer Genome Atlas (TCGA-KIRC) and International Cancer Genome Consortium (ICGC-RECA), we evaluated linear survival models of Cox regression with 14 signatures and six methods of feature selection, and performed functional analysis and differential gene expression approaches. In this study, we established a 13-gene signature (AR, AL353637.1, DPP6, FOXJ1, GNB3, HHLA2, IL4, LIMCH1, LINC01732, OTX1, SAA1, SEMA3G, ZIC2) whose expression levels are able to predict distinct outcomes of patients with ccRCC. Moreover, we performed a comparison between our signature and others from the literature. The best-performing gene signature was achieved using the ensemble method Min-Redundancy and Max-Relevance (mRMR). This signature comprises unique features in comparison to the others, such as generalization through different cohorts and being functionally enriched in significant pathways: Urothelial Carcinoma, Chronic Kidney disease, and Transitional cell carcinoma, Nephrolithiasis. From the 13 genes in our signature, eight are known to be correlated with ccRCC patient survival and four are immune-related. Our model showed a performance of 0.82 using the Receiver Operator Characteristic (ROC) Area Under Curve (AUC) metric and it generalized well between the cohorts. Our findings revealed two clusters of genes with high expression (SAA1, OTX1, ZIC2, LINC01732, GNB3 and IL4) and low expression (AL353637.1, AR, HHLA2, LIMCH1, SEMA3G, DPP6, and FOXJ1) which are both correlated with poor prognosis. This signature can potentially be used in clinical practice to support patient treatment care and follow-up.
Collapse
|
3
|
Wang HC, Chiang CJ, Liu TC, Wu CC, Chen YT, Chang JG, Shieh GS. Immunohistochemical Expression of Five Protein Combinations Revealed as Prognostic Markers in Asian Oral Cancer. Front Genet 2021; 12:643461. [PMID: 33936170 PMCID: PMC8083901 DOI: 10.3389/fgene.2021.643461] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/01/2021] [Indexed: 12/24/2022] Open
Abstract
Oral squamous cell carcinoma (OSCC) has a high mortality rate (∼50%), and the 5-year overall survival rate is not optimal. Cyto- and histopathological examination of cancer tissues is the main strategy for diagnosis and treatment. In the present study, we aimed to uncover immunohistochemical (IHC) markers for prognosis in Asian OSCC. From the collected 742 synthetic lethal gene pairs (of various cancer types), we first filtered genes relevant to OSCC, performed 29 IHC stains at different cellular portions and combined these IHC stains into 398 distinct pairs. Next, we identified novel IHC prognostic markers in OSCC among Taiwanese population, from the single and paired IHC staining by univariate Cox regression analysis. Increased nuclear expression of RB1 [RB1(N)↑], CDH3(C)↑-STK17A(N)↑ and FLNA(C)↑-KRAS(C)↑were associated with survival, but not independent of tumor stage, where C and N denote cytoplasm and nucleus, respectively. Furthermore, multivariate Cox regression analyses revealed that CSNK1E(C)↓-SHC1(N)↓ (P = 5.9 × 10–5; recommended for clinical use), BRCA1(N)↓-SHC1(N)↓ (P = 0.030), CSNK1E(C)↓-RB1(N)↑ (P = 0.045), [CSNK1E(C)-SHC1(N), FLNA(C)-KRAS(C)] (P = 0.000, rounded to three decimal places) and [BRCA1(N)-SHC1(N), FLNA(C)-KRAS(C)] (P = 0.020) were significant factors of poor prognosis, independent of lymph node metastasis, stage and alcohol consumption. An external dataset from The Cancer Genome Atlas HNSCC cohort confirmed that CDH3↑-STK17A↑ was a significant predictor of poor survival. Our approach identified prognostic markers with components involved in different pathways and revealed IHC marker pairs while neither single IHC was a marker, thus it improved the current state-of-the-art for identification of IHC markers.
Collapse
Affiliation(s)
- Hui-Ching Wang
- Graduate Institute of Clinical Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Division of Hematology and Oncology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
| | | | - Ta-Chih Liu
- Department of Hematology-Oncology, Chang Bing Show Chwan Memorial Hospital, Changhua, Taiwan
| | - Chun-Chieh Wu
- Department of Pathology, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Yi-Ting Chen
- Department of Pathology, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Jan-Gowth Chang
- Epigenome Research Center, China Medical University Hospital, Taichung, Taiwan.,Department of Laboratory Medicine, China Medical University Hospital, Taichung, Taiwan.,Center for Precision Medicine, China Medical University Hospital, Taichung, Taiwan.,School of Medicine, China Medical University, Taichung, Taiwan.,Department of Bioinformatics and Medical Engineering, Asia University, Taichung, Taiwan
| | - Grace S Shieh
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan.,Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan.,Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan.,Data Science Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
| |
Collapse
|
4
|
Chang JG, Chen CC, Wu YY, Che TF, Huang YS, Yeh KT, Shieh GS, Yang PC. Uncovering synthetic lethal interactions for therapeutic targets and predictive markers in lung adenocarcinoma. Oncotarget 2016; 7:73664-73680. [PMID: 27655641 PMCID: PMC5342006 DOI: 10.18632/oncotarget.12046] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 08/24/2016] [Indexed: 12/28/2022] Open
Abstract
Two genes are called synthetic lethal (SL) if their simultaneous mutation leads to cell death, but mutation of either individual does not. Targeting SL partners of mutated cancer genes can selectively kill cancer cells, but leave normal cells intact. We present an integrated approach to uncover SL gene pairs as novel therapeutic targets of lung adenocarcinoma (LADC). Of 24 predicted SL pairs, PARP1-TP53 was validated by RNAi knockdown to have synergistic toxicity in H1975 and invasive CL1-5 LADC cells; additionally FEN1-RAD54B, BRCA1-TP53, BRCA2-TP53 and RB1-TP53 were consistent with the literature. While metastasis remains a bottleneck in cancer treatment and inhibitors of PARP1 have been developed, this result may have therapeutic potential for LADC, in which TP53 is commonly mutated. We also demonstrated that silencing PARP1 enhanced the cell death induced by the platinum-based chemotherapy drug carboplatin in lung cancer cells (CL1-5 and H1975). IHC of RAD54B↑, BRCA1↓-RAD54B↑, FEN1(N)↑-RAD54B↑ and PARP1↑-RAD54B↑ were shown to be prognostic markers for 131 Asian LADC patients, and all markers except BRCA1↓-RAD54B↑ were further confirmed by three independent gene expression data sets (a total of 426 patients) including The Cancer Genome Atlas (TCGA) cohort of LADC. Importantly, we identified POLB-TP53 and POLB as predictive markers for the TCGA cohort (230 subjects), independent of age and stage. Thus, POLB and POLB-TP53 may be used to stratify future non-Asian LADC patients for therapeutic strategies.
Collapse
Affiliation(s)
- Jan-Gowth Chang
- Department of Laboratory Medicine and Epigenome Research Center, China Medical University Hospital, China Medical University, Taichung, Taiwan
| | - Chia-Cheng Chen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Yi-Ying Wu
- Graduate Institute of Clinical Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Ting-Fang Che
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Yi-Syuan Huang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Kun-Tu Yeh
- Department of Pathology, Changhua Christian Hospital, Changhua, Taiwan
- Department of Pathology, School of Medicine, Chung Shan Medical University, Taichung, Taiwan
| | - Grace S. Shieh
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
| | - Pan-Chyr Yang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
- Center of Genomic Medicine, National Taiwan University, Taipei, Taiwan
- Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| |
Collapse
|