1
|
Yaldız B, Erdoğan O, Rafatov S, Iyigün C, Aydın Son Y. Revealing third-order interactions through the integration of machine learning and entropy methods in genomic studies. BioData Min 2024; 17:3. [PMID: 38291454 PMCID: PMC10826120 DOI: 10.1186/s13040-024-00355-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 01/16/2024] [Indexed: 02/01/2024] Open
Abstract
BACKGROUND Non-linear relationships at the genotype level are essential in understanding the genetic interactions of complex disease traits. Genome-wide association Studies (GWAS) have revealed statistical association of the SNPs in many complex diseases. As GWAS results could not thoroughly reveal the genetic background of these disorders, Genome-Wide Interaction Studies have started to gain importance. In recent years, various statistical approaches, such as entropy-based methods, have been suggested for revealing these non-additive interactions between variants. This study presents a novel prioritization workflow integrating two-step Random Forest (RF) modeling and entropy analysis after PLINK filtering. PLINK-RF-RF workflow is followed by an entropy-based 3-way interaction information (3WII) method to capture the hidden patterns resulting from non-linear relationships between genotypes in Late-Onset Alzheimer Disease to discover early and differential diagnosis markers. RESULTS Three models from different datasets are developed by integrating PLINK-RF-RF analysis and entropy-based three-way interaction information (3WII) calculation method, which enables the detection of the third-order interactions, which are not primarily considered in epistatic interaction studies. A reduced SNP set is selected for all three datasets by 3WII analysis by PLINK filtering and prioritization of SNP with RF-RF modeling, promising as a model minimization approach. Among SNPs revealed by 3WII, 4 SNPs out of 19 from GenADA, 1 SNP out of 27 from ADNI, and 4 SNPs out of 106 from NCRAD are mapped to genes directly associated with Alzheimer Disease. Additionally, several SNPs are associated with other neurological disorders. Also, the genes the variants mapped to in all datasets are significantly enriched in calcium ion binding, extracellular matrix, external encapsulating structure, and RUNX1 regulates estrogen receptor-mediated transcription pathways. Therefore, these functional pathways are proposed for further examination for a possible LOAD association. Besides, all 3WII variants are proposed as candidate biomarkers for the genotyping-based LOAD diagnosis. CONCLUSION The entropy approach performed in this study reveals the complex genetic interactions that significantly contribute to LOAD risk. We benefited from the entropy-based 3WII as a model minimization step and determined the significant 3-way interactions between the prioritized SNPs by PLINK-RF-RF. This framework is a promising approach for disease association studies, which can also be modified by integrating other machine learning and entropy-based interaction methods.
Collapse
Affiliation(s)
- Burcu Yaldız
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Onur Erdoğan
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Sevda Rafatov
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Cem Iyigün
- Department of Industrial Engineering, METU, Ankara, Turkey
| | - Yeşim Aydın Son
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey.
- Graduate School of Informatics, ODTU-NOROM, METU, Ankara, Turkey.
| |
Collapse
|
2
|
Nazari E, Naderi H, Tabadkani M, ArefNezhad R, Farzin AH, Dashtiahangar M, Khazaei M, Ferns GA, Mehrabian A, Tabesh H, Avan A. Breast cancer prediction using different machine learning methods applying multi factors. J Cancer Res Clin Oncol 2023; 149:17133-17146. [PMID: 37773467 DOI: 10.1007/s00432-023-05388-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/01/2023] [Indexed: 10/01/2023]
Abstract
OBJECTIVE Breast cancer (BC) is a multifactorial disease and is one of the most common cancers globally. This study aimed to compare different machine learning (ML) techniques to develop a comprehensive breast cancer risk prediction model based on features of various factors. METHODS The population sample contained 810 records (115 cancer patients and 695 healthy individuals). 45 attributes out of 85 were selected based on the opinion of experts. These selected attributes are in genetic, biochemical, biomarker, gender, demographic and pathological factors. 13 Machine learning models were trained with proposed attributes and coefficient of attributes and internal relationships were calculated. RESULT Compared to other methods random forest (RF) has higher performance (accuracy 99.26%, precision 99%, and area under the curve (AUC) 99%). The results of assessing the impact and correlation of variables using the RF method based on PCA indicated that pathology, biomarker, biochemistry, gene, and demographic factors with a coefficient of 0.35, 0.23, 0.15, 0.14, and 0.13 respectively, affected the risk of BC (r2 = 0.54). CONCLUSION Breast cancer has several risk factors. Medical experts use these risk factors for early diagnosis. Therefore, identifying related risk factors and their effect can increase the accuracy of diagnosis. Considering the broad features for predicting breast cancer leads to the development of a comprehensive prediction model. In this study, using RF technique a breast cancer prediction model with 99.3% accuracy was developed based on multifactorial features.
Collapse
Affiliation(s)
- Elham Nazari
- Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hamid Naderi
- Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahla Tabadkani
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Reza ArefNezhad
- Halal Research Center of IRI, FDA, Tehran, Iran
- Department of Anatomy, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | | | | | - Majid Khazaei
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Division of Medical Education, Brighton & Sussex Medical School, Falmer, Brighton, BN1 9PH, Sussex, UK
| | - Amin Mehrabian
- Warwick Medical School, University of Warwick, Coventry, UK
| | - Hamed Tabesh
- Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Amir Avan
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
- Faculty of Health, School of Biomedical Sciences, Queensland University of Technology, Brisbane, QLD, Australia.
- College of Medicine, University of Warith Al-Anbiyaa, Karbala, Iraq.
| |
Collapse
|
3
|
Susmitha P, Kumar P, Yadav P, Sahoo S, Kaur G, Pandey MK, Singh V, Tseng TM, Gangurde SS. Genome-wide association study as a powerful tool for dissecting competitive traits in legumes. FRONTIERS IN PLANT SCIENCE 2023; 14:1123631. [PMID: 37645459 PMCID: PMC10461012 DOI: 10.3389/fpls.2023.1123631] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 06/08/2023] [Indexed: 08/31/2023]
Abstract
Legumes are extremely valuable because of their high protein content and several other nutritional components. The major challenge lies in maintaining the quantity and quality of protein and other nutritional compounds in view of climate change conditions. The global need for plant-based proteins has increased the demand for seeds with a high protein content that includes essential amino acids. Genome-wide association studies (GWAS) have evolved as a standard approach in agricultural genetics for examining such intricate characters. Recent development in machine learning methods shows promising applications for dimensionality reduction, which is a major challenge in GWAS. With the advancement in biotechnology, sequencing, and bioinformatics tools, estimation of linkage disequilibrium (LD) based associations between a genome-wide collection of single-nucleotide polymorphisms (SNPs) and desired phenotypic traits has become accessible. The markers from GWAS could be utilized for genomic selection (GS) to predict superior lines by calculating genomic estimated breeding values (GEBVs). For prediction accuracy, an assortment of statistical models could be utilized, such as ridge regression best linear unbiased prediction (rrBLUP), genomic best linear unbiased predictor (gBLUP), Bayesian, and random forest (RF). Both naturally diverse germplasm panels and family-based breeding populations can be used for association mapping based on the nature of the breeding system (inbred or outbred) in the plant species. MAGIC, MCILs, RIAILs, NAM, and ROAM are being used for association mapping in several crops. Several modifications of NAM, such as doubled haploid NAM (DH-NAM), backcross NAM (BC-NAM), and advanced backcross NAM (AB-NAM), have also been used in crops like rice, wheat, maize, barley mustard, etc. for reliable marker-trait associations (MTAs), phenotyping accuracy is equally important as genotyping. Highthroughput genotyping, phenomics, and computational techniques have advanced during the past few years, making it possible to explore such enormous datasets. Each population has unique virtues and flaws at the genomics and phenomics levels, which will be covered in more detail in this review study. The current investigation includes utilizing elite breeding lines as association mapping population, optimizing the choice of GWAS selection, population size, and hurdles in phenotyping, and statistical methods which will analyze competitive traits in legume breeding.
Collapse
Affiliation(s)
- Pusarla Susmitha
- Regional Agricultural Research Station, Acharya N.G. Ranga Agricultural University, Andhra Pradesh, India
| | - Pawan Kumar
- Department of Genetics and Plant Breeding, College of Agriculture, Chaudhary Charan Singh (CCS) Haryana Agricultural University, Hisar, India
| | - Pankaj Yadav
- Department of Bioscience and Bioengineering, Indian Institute of Technology, Rajasthan, India
| | - Smrutishree Sahoo
- Department of Genetics and Plant Breeding, School of Agriculture, Gandhi Institute of Engineering and Technology (GIET) University, Odisha, India
| | - Gurleen Kaur
- Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Manish K. Pandey
- Department of Genomics, Prebreeding and Bioinformatics, International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India
| | - Varsha Singh
- Department of Plant and Soil Sciences, Mississippi State University, Starkville, MS, United States
| | - Te Ming Tseng
- Department of Plant and Soil Sciences, Mississippi State University, Starkville, MS, United States
| | - Sunil S. Gangurde
- Department of Plant Pathology, University of Georgia, Tifton, GA, United States
| |
Collapse
|
4
|
Sha Z, Chen Y, Hu T. NSPA: characterizing the disease association of multiple genetic interactions at single-subject resolution. BIOINFORMATICS ADVANCES 2023; 3:vbad010. [PMID: 36818729 PMCID: PMC9927570 DOI: 10.1093/bioadv/vbad010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/02/2023] [Accepted: 02/02/2023] [Indexed: 02/10/2023]
Abstract
Motivation The interaction between genetic variables is one of the major barriers to characterizing the genetic architecture of complex traits. To consider epistasis, network science approaches are increasingly being used in research to elucidate the genetic architecture of complex diseases. Network science approaches associate genetic variables' disease susceptibility to their topological importance in the network. However, this network only represents genetic interactions and does not describe how these interactions attribute to disease association at the subject-scale. We propose the Network-based Subject Portrait Approach (NSPA) and an accompanying feature transformation method to determine the collective risk impact of multiple genetic interactions for each subject. Results The feature transformation method converts genetic variants of subjects into new values that capture how genetic variables interact with others to attribute to a subject's disease association. We apply this approach to synthetic and genetic datasets and learn that (1) the disease association can be captured using multiple disjoint sets of genetic interactions and (2) the feature transformation method based on NSPA improves predictive performance comparing with using the original genetic variables. Our findings confirm the role of genetic interaction in complex disease and provide a novel approach for gene-disease association studies to identify genetic architecture in the context of epistasis. Availability and implementation The codes of NSPA are now available in: https://github.com/MIB-Lab/Network-based-Subject-Portrait-Approach. Contact ting.hu@queensu.ca. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Zhendong Sha
- School of Computing, Queen’s University, Kingston, Ontario, Canada K7L 2N8
| | - Yuanzhu Chen
- School of Computing, Queen’s University, Kingston, Ontario, Canada K7L 2N8
| | - Ting Hu
- To whom correspondence should be addressed.
| |
Collapse
|
5
|
Abd El Hamid MM, Omar YM, Shaheen M, Mabrouk MS. Discovering epistasis interactions in Alzheimer's disease using deep learning model. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
6
|
Abd El Hamid MM, Shaheen M, Mabrouk MS, Omar YMK. MACHINE LEARNING FOR DETECTING EPISTASIS INTERACTIONS AND ITS RELEVANCE TO PERSONALIZED MEDICINE IN ALZHEIMER’S DISEASE: SYSTEMATIC REVIEW. BIOMEDICAL ENGINEERING: APPLICATIONS, BASIS AND COMMUNICATIONS 2021; 33. [DOI: 10.4015/s1016237221500472] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Alzheimer’s disease (AD) is a progressive disease that attacks the brain’s neurons and causes problems in memory, thinking, and reasoning skills. Personalized Medicine (PM) needs a better and more accurate understanding of the relationship between human genetic data and complex diseases like AD. The goal of PM is to tailor the treatment of a case person to his individual properties. PM requires the prediction of a person’s disease from genetic data, and its success depends on the accurate detection of genetic biomarkers. Single Nucleotide polymorphisms (SNPs) are considered the most prevalent type of variation in the human genome. Epistasis has a biological relevance to complex diseases and has an important impact on PM. Detection of the most significant epistasis interactions associated with complex diseases is a big challenge. This paper reviews several machine learning techniques and algorithms to detect the most significant epistasis interactions in Alzheimer’s disease. We discuss many machine learning techniques that can be used for detecting SNPs’ combinations like Random Forests, Support Vector Machines, Multifactor Dimensionality Reduction, Neural Network, and Deep Learning. This review paper highlights the pros and cons of these techniques and explains how they can be applied in an efficient framework to apply knowledge discovery and data mining in AD disease.
Collapse
Affiliation(s)
- Marwa M. Abd El Hamid
- The Higher Institute of Computer Science & Information Technology, El-Shorouk Academy, El Shorouk City, Cairo, Egypt
- College of Computing and Information Technology AASTMT, Egypt
| | - Mohamed Shaheen
- College of Computing and Information Technology AASTMT, Egypt
| | - Mai S. Mabrouk
- Biomedical Engineering Department Misr University for Science and Technology 6th of October City, Egypt
| | | |
Collapse
|
7
|
Deng Y, Song Z, Huang L, Guo Z, Tong B, Sun M, Zhao J, Zhang H, Zhang Z, Li G. Tumor purity as a prognosis and immunotherapy relevant feature in cervical cancer. Aging (Albany NY) 2021; 13:24768-24785. [PMID: 34844217 PMCID: PMC8660621 DOI: 10.18632/aging.203714] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 06/23/2021] [Indexed: 01/05/2023]
Abstract
Background: Tumor purity plays a vital role in the biological process of solid tumors, but its function in gynecologic cancers remains unclear. This study explored the correlation between tumor purity and immune function of gynecological cancers and its reliability as a prognostic indicator of immunotherapy. Methods: Gynecological cancer-related datasets were downloaded from The Cancer Genome Atlas (TCGA). Tumor purity was calculated by the ESTIMATE algorithm. A LASSO Cox regression analysis was performed to construct the risk score model. A Kaplan–Meier Plotter was used to explore the relationships between tumor purity and cancer prognosis. We performed the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Set Enrichment Analysis (GSEA) to explore the pathways in the subgroups. A nomogram was used to quantitatively assess the cancer prognosis. Results: Tumor purity was negatively correlated with B cell infiltration in cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC). Approximately 420 genes were positively associated with B cell infiltration and CESC prognosis and were enriched in immune-related signaling pathways. There were 11 key genes used to construct a risk score model. The low-risk group had a higher immune score and better prognosis than the high-risk group. A nomogram based on risk score, T stage, and clinical-stage had good predictive value in quantitatively evaluating CESC prognosis. Conclusions: This study is the first to reveal the correlation between tumor purity and immunity in CESC and suggests that low-risk patients may be more sensitive to immunotherapy. This provides a theoretical basis for the clinical treatment of CESC.
Collapse
Affiliation(s)
- Yali Deng
- The Second Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Zewen Song
- Department of Oncology, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Li Huang
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan, China
| | - Zhenni Guo
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan, China
| | - Binghua Tong
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan, China
| | - Meiqing Sun
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan, China
| | - Jin Zhao
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan, China
| | - Huina Zhang
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan, China
| | - Zhen Zhang
- Department of Oncology, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Guoyin Li
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan, China.,Academy of Medical Science, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
8
|
Manavalan R, Priya S. Genetic interactions effects for cancer disease identification using computational models: a review. Med Biol Eng Comput 2021; 59:733-758. [PMID: 33839998 DOI: 10.1007/s11517-021-02343-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 03/10/2021] [Indexed: 11/29/2022]
Abstract
Genome-wide association studies (GWAS) provide clear insight into understanding genetic variations and environmental influences responsible for various human diseases. Cancer identification through genetic interactions (epistasis) is one of the significant ongoing researches in GWAS. The growth of the cancer cell emerges from multi-locus as well as complex genetic interaction. It is impractical for the physician to detect cancer via manual examination of SNPs interaction. Due to its importance, several computational approaches have been modeled to infer epistasis effects. This article includes a comprehensive and multifaceted review of all relevant genetic studies published between 2001 and 2020. In this contemporary review, various computational methods are as follows: multifactor dimensionality reduction-based approaches, statistical strategies, machine learning, and optimization-based techniques are carefully reviewed and presented with their evaluation results. Moreover, these computational approaches' strengths and limitations are described. The issues behind the computational methods for identifying the cancer disease through genetic interactions and the various evaluation parameters used by researchers have been analyzed. This review is highly beneficial for researchers and medical professionals to learn techniques adapted to discover the epistasis and aids to design novel automatic epistasis detection systems with strong robustness and maximum efficiency to address the different research problems in finding practical solutions effectively.
Collapse
Affiliation(s)
- R Manavalan
- Department of Computer Science, Arignar Anna Government Arts College, Villupuram, Tamil Nadu, 605602, India.
| | - S Priya
- Computer Science, Arignar Anna Government Arts College, Villupuram, Tamil Nadu, India
| |
Collapse
|
9
|
Kafaie S, Xu L, Hu T. Statistical methods with exhaustive search in the identification of gene-gene interactions for colorectal cancer. Genet Epidemiol 2020; 45:222-234. [PMID: 33231893 DOI: 10.1002/gepi.22372] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 10/10/2020] [Accepted: 11/09/2020] [Indexed: 12/16/2022]
Abstract
Though additive forms of heritability are primarily studied in genetics, nonlinear, non-additive gene-gene interactions, that is, epistasis, could explain a portion of the missing heritability in complex human diseases including cancer. In recent years, powerful computational methods have been introduced to understand multivariable genetic factors of these complex human diseases in extremely high-dimensional genome-wide data. In this study, we investigated the performance of three powerful methods, BOolean Operation-based Screening and Testing (BOOST), FastEpistasis, and Tree-based Epistasis Association Mapping (TEAM) to identify interacting genetic risk factors of colorectal cancer (CRC) for genome-wide association studies (GWAS). After quality-control based data preprocessing, we applied these three algorithms to a CRC GWAS data set, and selected the top-ranked 100 single-nucleotide polymorphism (SNP) pairs identified by each method (251 SNPs in total), among which 74 pairs were common between FastEpistasis and BOOST. The identified SNPs by BOOST, FastEpistasis, and TEAM mapped to 58, 57, and 62 genes, respectively. Some genes highlighted by our study, including MACF1, USP49, SMAD2, SMAD3, TGFBR1, and RHOA, have been detected in previous CRC-related research. We also identified some new genes with potential biological relevance to CRC such as CCDC32. Furthermore, we constructed the network of these top SNP pairs for three methods, and the patterns identified in the networks show that some SNPs including rs2412531, rs349699, and rs17142011 play a crucial role in the classification of disease status in our study.
Collapse
Affiliation(s)
- Somayeh Kafaie
- Department of Computer Science, Memorial University, St. John's, Newfoundland, Canada
| | - Ling Xu
- Department of Computer Science, Memorial University, St. John's, Newfoundland, Canada
| | - Ting Hu
- Department of Computer Science, Memorial University, St. John's, Newfoundland, Canada.,School of Computing, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
10
|
Investigation of gene-gene interactions in cardiac traits and serum fatty acid levels in the LURIC Health Study. PLoS One 2020; 15:e0238304. [PMID: 32915819 PMCID: PMC7485803 DOI: 10.1371/journal.pone.0238304] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 08/13/2020] [Indexed: 01/25/2023] Open
Abstract
Epistasis analysis elucidates the effects of gene-gene interactions (G×G) between multiple loci for complex traits. However, the large computational demands and the high multiple testing burden impede their discoveries. Here, we illustrate the utilization of two methods, main effect filtering based on individual GWAS results and biological knowledge-based modeling through Biofilter software, to reduce the number of interactions tested among single nucleotide polymorphisms (SNPs) for 15 cardiac-related traits and 14 fatty acids. We performed interaction analyses using the two filtering methods, adjusting for age, sex, body mass index (BMI), waist-hip ratio, and the first three principal components from genetic data, among 2,824 samples from the Ludwigshafen Risk and Cardiovascular (LURIC) Health Study. Using Biofilter, one interaction nearly met Bonferroni significance: an interaction between rs7735781 in XRCC4 and rs10804247 in XRCC5 was identified for venous thrombosis with a Bonferroni-adjusted likelihood ratio test (LRT) p: 0.0627. A total of 57 interactions were identified from main effect filtering for the cardiac traits G×G (10) and fatty acids G×G (47) at Bonferroni-adjusted LRT p < 0.05. For cardiac traits, the top interaction involved SNPs rs1383819 in SNTG1 and rs1493939 (138kb from 5’ of SAMD12) with Bonferroni-adjusted LRT p: 0.0228 which was significantly associated with history of arterial hypertension. For fatty acids, the top interaction between rs4839193 in KCND3 and rs10829717 in LOC107984002 with Bonferroni-adjusted LRT p: 2.28×10−5 was associated with 9-trans 12-trans octadecanoic acid, an omega-6 trans fatty acid. The model inflation factor for the interactions under different filtering methods was evaluated from the standard median and the linear regression approach. Here, we applied filtering approaches to identify numerous genetic interactions related to cardiac-related outcomes as potential targets for therapy. The approaches described offer ways to detect epistasis in the complex traits and to improve precision medicine capability.
Collapse
|
11
|
Shayesteh SP, Alikhassi A, Fard Esfahani A, Miraie M, Geramifar P, Bitarafan-Rajabi A, Haddad P. Neo-adjuvant chemoradiotherapy response prediction using MRI based ensemble learning method in rectal cancer patients. Phys Med 2019; 62:111-119. [PMID: 31153390 DOI: 10.1016/j.ejmp.2019.03.013] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 02/23/2019] [Accepted: 03/17/2019] [Indexed: 02/08/2023] Open
Abstract
OBJECTIVES The aim of this study was to investigate and validate the performance of individual and ensemble machine learning models (EMLMs) based on magnetic resonance imaging (MRI) to predict neo-adjuvant chemoradiation therapy (nCRT) response in rectal cancer patients. We also aimed to study the effect of Laplacian of Gaussian (LOG) filter on EMLMs predictive performance. METHODS 98 rectal cancer patients were divided into a training (n = 53) and a validation set (n = 45). All patients underwent MRI a week before nCRT. Several features from intensity, shape and texture feature sets were extracted from MR images. SVM, Bayesian network, neural network and KNN classifiers were used individually and together for response prediction. Predictive performance was evaluated using the area under the receiver operator characteristic (ROC) curve (AUC). RESULTS Patients' nCRT responses included 17 patients with Grade 0, 28 with Grade 1, 34 with Grade 2, and 19 with Grade 3 according to AJCC/CAP pathologic grading. In without preprocessing MR Image the best result was for Bayesian network classifier with AUC and accuracy of 75.2% and 80.9% respectively, which was confirmed in the validation set with an AUC and accuracy of 74% and 79% respectively. In EMLMs the best result was for 4 (SVM.NN.BN.KNN) classifier EMLM with AUC and accuracy of 97.8% and 92.8% in testing and 95% and 90% in validation set respectively. CONCLUSIONS In conclusion, we observed that machine learning methods can used to predict nCRT response in patients with rectal cancer. Preprocessing LOG filters and EL models can improve the prediction process.
Collapse
Affiliation(s)
- Sajad P Shayesteh
- Department of Physiology, Pharmacology and Medical Physics, Faculty of Medicine, Alborz University of Medical Sciences, Karaj, Iran
| | - Afsaneh Alikhassi
- Department of Radiology, Cancer Institute of Iran, Tehran University of Medical Sciences, Tehran, Iran
| | - Armaghan Fard Esfahani
- Research Center for Nuclear Medicine, Shariati Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - M Miraie
- Cancer Research Centre & Radiation Oncology Department, Cancer Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Parham Geramifar
- Research Center for Nuclear Medicine, Shariati Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Ahmad Bitarafan-Rajabi
- Cardiovascular Intervention Research Center, Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences, Tehran, Iran; Echocardiography Research Center, Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences, Tehran, Iran
| | - Peiman Haddad
- Radiation Oncology Research Center, Cancer Institute, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
12
|
Kafaie S, Chen Y, Hu T. A network approach to prioritizing susceptibility genes for genome-wide association studies. Genet Epidemiol 2019; 43:477-491. [PMID: 30859622 DOI: 10.1002/gepi.22198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 01/31/2019] [Accepted: 02/25/2019] [Indexed: 12/22/2022]
Abstract
The heritability of complex diseases including cancer is often attributed to multiple interacting genetic alterations. Such a non-linear, non-additive gene-gene interaction effect, that is, epistasis, renders univariable analysis methods ineffective for genome-wide association studies. In recent years, network science has seen increasing applications in modeling epistasis to characterize the complex relationships between a large number of genetic variations and the phenotypic outcome. In this study, by constructing a statistical epistasis network of colorectal cancer (CRC), we proposed to use multiple network measures to prioritize genes that influence the disease risk of CRC through synergistic interaction effects. We computed and analyzed several global and local properties of the large CRC epistasis network. We utilized topological properties of network vertices such as the edge strength, vertex centrality, and occurrence at different graphlets to identify genes that may be of potential biological relevance to CRC. We found 512 top-ranked single-nucleotide polymorphisms, among which COL22A1, RGS7, WWOX, and CELF2 were the four susceptibility genes prioritized by all described metrics as the most influential on CRC.
Collapse
Affiliation(s)
- Somayeh Kafaie
- Department of Computer Science, Memorial University, St. John's, NL, Canada
| | - Yuanzhu Chen
- Department of Computer Science, Memorial University, St. John's, NL, Canada
| | - Ting Hu
- Department of Computer Science, Memorial University, St. John's, NL, Canada
| |
Collapse
|