1
|
Hernández-Lemus E, Ochoa S. Methods for multi-omic data integration in cancer research. Front Genet 2024; 15:1425456. [PMID: 39364009 PMCID: PMC11446849 DOI: 10.3389/fgene.2024.1425456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 08/28/2024] [Indexed: 10/05/2024] Open
Abstract
Multi-omics data integration is a term that refers to the process of combining and analyzing data from different omic experimental sources, such as genomics, transcriptomics, methylation assays, and microRNA sequencing, among others. Such data integration approaches have the potential to provide a more comprehensive functional understanding of biological systems and has numerous applications in areas such as disease diagnosis, prognosis and therapy. However, quantitative integration of multi-omic data is a complex task that requires the use of highly specialized methods and approaches. Here, we discuss a number of data integration methods that have been developed with multi-omics data in view, including statistical methods, machine learning approaches, and network-based approaches. We also discuss the challenges and limitations of such methods and provide examples of their applications in the literature. Overall, this review aims to provide an overview of the current state of the field and highlight potential directions for future research.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| |
Collapse
|
2
|
Jung AM, Furlong MA, Goodrich JM, Cardenas A, Beitel SC, Littau SR, Caban-Martinez AJ, Gulotta JJ, Wallentine DD, Urwin D, Gabriel J, Hughes J, Graber JM, Grant C, Burgess JL. Associations Between Epigenetic Age Acceleration and microRNA Expression Among U.S. Firefighters. Epigenet Insights 2023; 16:25168657231206301. [PMID: 37953967 PMCID: PMC10634256 DOI: 10.1177/25168657231206301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 09/20/2023] [Indexed: 11/14/2023] Open
Abstract
Epigenetic changes may be biomarkers of health. Epigenetic age acceleration (EAA), the discrepancy between epigenetic age measured via epigenetic clocks and chronological age, is associated with morbidity and mortality. However, the intersection of epigenetic clocks with microRNAs (miRNAs) and corresponding miRNA-based health implications have not been evaluated. We analyzed DNA methylation and miRNA profiles from blood sampled among 332 individuals enrolled across 2 U.S.-based firefighter occupational studies (2015-2018 and 2018-2020). We considered 7 measures of EAA in leukocytes (PhenoAge, GrimAge, Horvath, skin-blood, and Hannum epigenetic clocks, and extrinsic and intrinsic epigenetic age acceleration). We identified miRNAs associated with EAA using individual linear regression models, adjusted for sex, race/ethnicity, chronological age, and cell type estimates, and investigated downstream effects of associated miRNAs with miRNA enrichment analyses and genomic annotations. On average, participants were 38 years old, 88% male, and 75% non-Hispanic white. We identified 183 of 798 miRNAs associated with EAA (FDR q < 0.05); 126 with PhenoAge, 59 with GrimAge, 1 with Horvath, and 1 with the skin-blood clock. Among miRNAs associated with Horvath and GrimAge, there were 61 significantly enriched disease annotations including age-related metabolic and cardiovascular conditions and several cancers. Enriched pathways included those related to proteins and protein modification. We identified miRNAs associated with EAA of multiple epigenetic clocks. PhenoAge had more associations with individual miRNAs, but GrimAge and Horvath had greater implications for miRNA-associated pathways. Understanding the relationship between these epigenetic markers could contribute to our understanding of the molecular underpinnings of aging and aging-related diseases.
Collapse
Affiliation(s)
- Alesia M Jung
- Department of Community, Environment & Policy, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, USA
- Department of Pharmacology & Toxicology, R. Ken Coit College of Pharmacy, College of Public Health, Tucson, AZ, USA
| | - Melissa A Furlong
- Department of Community, Environment & Policy, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, USA
| | - Jaclyn M Goodrich
- Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Andres Cardenas
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA, USA
| | - Shawn C Beitel
- Department of Community, Environment & Policy, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, USA
| | - Sally R Littau
- Department of Community, Environment & Policy, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, USA
| | - Alberto J Caban-Martinez
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, USA
| | | | | | - Derek Urwin
- Los Angeles County Fire Department, Los Angeles, CA, USA
- Department of Chemistry & Biochemistry, University of California Los Angeles, Los Angeles, CA, USA
- Division of Health Safety and Medicine, International Association of Fire Fighters, Washington, DC, USA
| | - Jamie Gabriel
- Los Angeles County Fire Department, Los Angeles, CA, USA
| | | | - Judith M Graber
- Department of Biostatistics & Epidemiology, School of Public Health, Rutgers University, Piscataway, NJ, USA
| | - Casey Grant
- Fire Protection Research Foundation, Quincy, MA, USA
| | - Jefferey L Burgess
- Department of Community, Environment & Policy, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
3
|
Woodward AA, Urbanowicz RJ, Naj AC, Moore JH. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 2022; 46:555-571. [PMID: 35924480 PMCID: PMC9669229 DOI: 10.1002/gepi.22497] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 01/07/2023]
Abstract
Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences. Thus, it is critical to review the impact of genetic heterogeneity on the design and analysis of population level genetic studies, aspects that are often overlooked in the literature. In this review, we first contextualize our approach to genetic heterogeneity by proposing a high-level categorization of heterogeneity into "feature," "outcome," and "associative" heterogeneity, drawing on perspectives from epidemiology and machine learning to illustrate distinctions between them. We highlight the unique nature of genetic heterogeneity as a heterogeneous pattern of association that warrants specific methodological considerations. We then focus on the challenges that preclude effective detection and characterization of genetic heterogeneity across a variety of epidemiological contexts. Finally, we discuss systems heterogeneity as an integrated approach to using genetic and other high-dimensional multi-omic data in complex disease research.
Collapse
Affiliation(s)
- Alexa A. Woodward
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Ryan J. Urbanowicz
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Jason H. Moore
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| |
Collapse
|
4
|
Shivakumar M, Han S, Lee Y, Kim D. Epigenetic interplay between methylation and miRNA in bladder cancer: focus on isoform expression. BMC Genomics 2021; 22:754. [PMID: 34674656 PMCID: PMC8529714 DOI: 10.1186/s12864-021-08052-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 09/24/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Various epigenetic factors are responsible for the non-genetic regulation on gene expression. The epigenetically dysregulated oncogenes or tumor suppressors by miRNA and/or DNA methylation are often observed in cancer cells. Each of these epigenetic regulators has been studied well in cancer progressions; however, their mutual regulatory relationship in cancer still remains unclear. In this study, we propose an integrative framework to systematically investigate epigenetic interactions between miRNA and methylation at the alternatively spliced mRNA level in bladder cancer. Each of these epigenetic regulators has been studied well in cancer progressions; however, their mutual regulatory relationship in cancer still remains unclear. RESULTS The integrative analyses yielded 136 significant combinations (methylation, miRNA and isoform). Further, overall survival analysis on the 136 combinations based on methylation and miRNA, high and low expression groups resulted in 13 combinations associated with survival. Additionally, different interaction patterns were examined. CONCLUSIONS Our study provides a higher resolution of molecular insight into the crosstalk between two epigenetic factors, DNA methylation and miRNA. Given the importance of epigenetic interactions and alternative splicing in cancer, it is timely to identify and understand the underlying mechanisms based on epigenetic markers and their interactions in cancer, leading to alternative splicing with primary functional impact.
Collapse
Affiliation(s)
- Manu Shivakumar
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Seonggyun Han
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, USA
| | - Younghee Lee
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, USA.,Huntsman Cancer Institute, Salt Lake City, USA
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. .,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
5
|
Kim SY, Choe EK, Shivakumar M, Kim D, Sohn KA. Multi-layered network-based pathway activity inference using directed random walks: application to predicting clinical outcomes in urologic cancer. Bioinformatics 2021; 37:2405-2413. [PMID: 33543748 PMCID: PMC8388033 DOI: 10.1093/bioinformatics/btab086] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 12/11/2020] [Accepted: 02/02/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION To better understand the molecular features of cancers, a comprehensive analysis using multiomics data has been conducted. Additionally, a pathway activity inference method has been developed to facilitate the integrative effects of multiple genes. In this respect, we have recently proposed a novel integrative pathway activity inference approach, iDRW, and demonstrated the effectiveness of the method with respect to dichotomizing two survival groups. However, there were several limitations, such as a lack of generality. In this study, we designed a directed gene-gene graph using pathway information by assigning interactions between genes in multiple layers of networks. RESULTS : As a proof-of-concept study, it was evaluated using three genomic profiles of urologic cancer patients. The proposed integrative approach achieved improved outcome prediction performances compared with a single genomic profile alone and other existing pathway activity inference methods. The integrative approach also identified common/cancer-specific candidate driver pathways as predictive prognostic features in urologic cancers. Furthermore, it provides better biological insights into the prioritized pathways and genes in an integrated view using a multi-layered gene-gene network. Our framework is not specifically designed for urologic cancers and can be generally applicable for various datasets. AVAILABILITY iDRW is implemented as the R software package. The source codes are available at https://github.com/sykim122/iDRW. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- So Yeon Kim
- Department of Software and Computer Engineering, Ajou University, Suwon 16499, South Korea
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Eun Kyung Choe
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Surgery, Seoul National University Hospital Healthcare System Gangnam Center, Seoul 06236, South Korea
| | - Manu Shivakumar
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
- To whom correspondence should be addressed. or
| | - Kyung-Ah Sohn
- Department of Software and Computer Engineering, Ajou University, Suwon 16499, South Korea
- Department of Artificial Intelligence, Ajou University, Suwon 16499, South Korea
- To whom correspondence should be addressed. or
| |
Collapse
|
6
|
Tong D, Tian Y, Zhou T, Ye Q, Li J, Ding K, Li J. Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data. BMC Med Inform Decis Mak 2020; 20:22. [PMID: 32033604 PMCID: PMC7006213 DOI: 10.1186/s12911-020-1043-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Accepted: 01/31/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Colon cancer is common worldwide and is the leading cause of cancer-related death. Multiple levels of omics data are available due to the development of sequencing technologies. In this study, we proposed an integrative prognostic model for colon cancer based on the integration of clinical and multi-omics data. METHODS In total, 344 patients were included in this study. Clinical, gene expression, DNA methylation and miRNA expression data were retrieved from The Cancer Genome Atlas (TCGA). To accommodate the high dimensionality of omics data, unsupervised clustering was used as dimension reduction method. The bias-corrected Harrell's concordance index was used to verify which clustering result provided the best prognostic performance. Finally, we proposed a prognostic prediction model based on the integration of clinical data and multi-omics data. Uno's concordance index with cross-validation was used to compare the discriminative performance of the prognostic model constructed with different covariates. RESULTS Combinations of clinical and multi-omics data can improve prognostic performance, as shown by the increase of the bias-corrected Harrell's concordance of the prognostic model from 0.7424 (clinical features only) to 0.7604 (clinical features and three types of omics features). Additionally, 2-year, 3-year and 5-year Uno's concordance statistics increased from 0.7329, 0.7043, and 0.7002 (clinical features only) to 0.7639, 0.7474 and 0.7597 (clinical features and three types of omics features), respectively. CONCLUSION In conclusion, this study successfully combined clinical and multi-omics data for better prediction of colon cancer prognosis.
Collapse
Affiliation(s)
- Danyang Tong
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Yu Tian
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Tianshu Zhou
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Qiancheng Ye
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China
| | - Jun Li
- Department of Surgical Oncology, Second Affiliated Hospital, Zhejiang University School of Medicine, No. 88 Jiefang Road, Hangzhou, 31009, Zhejiang Province, China
| | - Kefeng Ding
- Department of Surgical Oncology, Second Affiliated Hospital, Zhejiang University School of Medicine, No. 88 Jiefang Road, Hangzhou, 31009, Zhejiang Province, China
| | - Jingsong Li
- Engineering Research Center of EMR and Intelligent Expert System, Ministry of Education, College of Biomedical Engineering and Instrument Science, Zhejiang University, No. 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, China.
- Research Center for Healthcare Data Science, Zhejiang Lab, Hangzhou, China.
| |
Collapse
|
7
|
Hernández-Lemus E, Reyes-Gopar H, Espinal-Enríquez J, Ochoa S. The Many Faces of Gene Regulation in Cancer: A Computational Oncogenomics Outlook. Genes (Basel) 2019; 10:E865. [PMID: 31671657 PMCID: PMC6896122 DOI: 10.3390/genes10110865] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/24/2019] [Indexed: 12/16/2022] Open
Abstract
Cancer is a complex disease at many different levels. The molecular phenomenology of cancer is also quite rich. The mutational and genomic origins of cancer and their downstream effects on processes such as the reprogramming of the gene regulatory control and the molecular pathways depending on such control have been recognized as central to the characterization of the disease. More important though is the understanding of their causes, prognosis, and therapeutics. There is a multitude of factors associated with anomalous control of gene expression in cancer. Many of these factors are now amenable to be studied comprehensively by means of experiments based on diverse omic technologies. However, characterizing each dimension of the phenomenon individually has proven to fall short in presenting a clear picture of expression regulation as a whole. In this review article, we discuss some of the more relevant factors affecting gene expression control both, under normal conditions and in tumor settings. We describe the different omic approaches that we can use as well as the computational genomic analysis needed to track down these factors. Then we present theoretical and computational frameworks developed to integrate the amount of diverse information provided by such single-omic analyses. We contextualize this within a systems biology-based multi-omic regulation setting, aimed at better understanding the complex interplay of gene expression deregulation in cancer.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Helena Reyes-Gopar
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| |
Collapse
|
8
|
Lin X, Pavani KC, Smits K, Deforce D, Heindryckx B, Van Soom A, Peelman L. Bta-miR-10b Secreted by Bovine Embryos Negatively Impacts Preimplantation Embryo Quality. Front Genet 2019; 10:757. [PMID: 31507632 PMCID: PMC6713719 DOI: 10.3389/fgene.2019.00757] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 07/17/2019] [Indexed: 01/02/2023] Open
Abstract
In a previous study, we found miR-10b to be more abundant in a conditioned culture medium of degenerate embryos compared to that of blastocysts. Here, we show that miR-10b mimics added to the culture medium can be taken up by embryos. This uptake results in an increase in embryonic cell apoptosis and aberrant expression of DNA methyltransferases (DNMTs). Using several algorithms, Homeobox A1 (HOXA1) was identified as one of the potential miR-10b target genes and dual-luciferase assay confirmed HOXA1 as a direct target of miR-10b. Microinjection of si-HOXA1 into embryos also resulted in an increase in embryonic cell apoptosis and downregulation of DNMTs. Cell progression analysis using Madin–Darby bovine kidney cells (MDBKs) showed that miR-10b overexpression and HOXA1 knockdown results in suppressed cell cycle progression and decreased cell viability. Overall, this work demonstrates that miR-10b negatively influences embryo quality and might do this through targeting HOXA1 and/or influencing DNA methylation.
Collapse
Affiliation(s)
- Xiaoyuan Lin
- Department of Nutrition, Genetics and Ethology, Faculty of Veterinary Medicine, Ghent University, Ghent, Belgium
| | | | - Katrien Smits
- Reproduction, Obstetrics and Herd Health, Ghent University, Ghent, Belgium
| | - Dieter Deforce
- Laboratory for Pharmaceutical Biotechnology, Faculty of Pharmaceutical Sciences, Ghent University, Ghent, Belgium
| | - Björn Heindryckx
- Department for Reproductive Medicine, Ghent University Hospital, Ghent, Belgium
| | - Ann Van Soom
- Reproduction, Obstetrics and Herd Health, Ghent University, Ghent, Belgium
| | - Luc Peelman
- Department of Nutrition, Genetics and Ethology, Faculty of Veterinary Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
9
|
Kim TR, Jeong HH, Sohn KA. Topological integration of RPPA proteomic data with multi-omics data for survival prediction in breast cancer via pathway activity inference. BMC Med Genomics 2019; 12:94. [PMID: 31296204 PMCID: PMC6624183 DOI: 10.1186/s12920-019-0511-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The analysis of integrated multi-omics data enables the identification of disease-related biomarkers that cannot be identified from a single omics profile. Although protein-level data reflects the cellular status of cancer tissue more directly than gene-level data, past studies have mainly focused on multi-omics integration using gene-level data as opposed to protein-level data. However, the use of protein-level data (such as mass spectrometry) in multi-omics integration has some limitations. For example, the correlation between the characteristics of gene-level data (such as mRNA) and protein-level data is weak, and it is difficult to detect low-abundance signaling proteins that are used to target cancer. The reverse phase protein array (RPPA) is a highly sensitive antibody-based quantification method for signaling proteins. However, the number of protein features in RPPA data is extremely low compared to the number of gene features in gene-level data. In this study, we present a new method for integrating RPPA profiles with RNA-Seq and DNA methylation profiles for survival prediction based on the integrative directed random walk (iDRW) framework proposed in our previous study. In the iDRW framework, each omics profile is merged into a single pathway profile that reflects the topological information of the pathway. In order to address the sparsity of RPPA profiles, we employ the random walk with restart (RWR) approach on the pathway network. RESULTS Our model was validated using survival prediction analysis for a breast cancer dataset from The Cancer Genome Atlas. Our proposed model exhibited improved performance compared with other methods that utilize pathway information and also out-performed models that did not include the RPPA data utilized in our study. The risk pathways identified for breast cancer in this study were closely related to well-known breast cancer risk pathways. CONCLUSIONS Our results indicated that RPPA data is useful for survival prediction for breast cancer patients under our framework. We also observed that iDRW effectively integrates RNA-Seq, DNA methylation, and RPPA profiles, while variation in the composition of the omics data can affect both prediction performance and risk pathway identification. These results suggest that omics data composition is a critical parameter for iDRW.
Collapse
Affiliation(s)
- Tae Rim Kim
- Department of Computer Engineering, Ajou University, Suwon, 16499 South Korea
| | - Hyun-Hwan Jeong
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030 USA
| | - Kyung-Ah Sohn
- Department of Computer Engineering, Ajou University, Suwon, 16499 South Korea
| |
Collapse
|
10
|
Kim SY, Jeong HH, Kim J, Moon JH, Sohn KA. Robust pathway-based multi-omics data integration using directed random walks for survival prediction in multiple cancer studies. Biol Direct 2019; 14:8. [PMID: 31036036 PMCID: PMC6489180 DOI: 10.1186/s13062-019-0239-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 04/10/2019] [Indexed: 01/15/2023] Open
Abstract
Background Integrating the rich information from multi-omics data has been a popular approach to survival prediction and bio-marker identification for several cancer studies. To facilitate the integrative analysis of multiple genomic profiles, several studies have suggested utilizing pathway information rather than using individual genomic profiles. Methods We have recently proposed an integrative directed random walk-based method utilizing pathway information (iDRW) for more robust and effective genomic feature extraction. In this study, we applied iDRW to multiple genomic profiles for two different cancers, and designed a directed gene-gene graph which reflects the interaction between gene expression and copy number data. In the experiments, the performances of the iDRW method and four state-of-the-art pathway-based methods were compared using a survival prediction model which classifies samples into two survival groups. Results The results show that the integrative analysis guided by pathway information not only improves prediction performance, but also provides better biological insights into the top pathways and genes prioritized by the model in both the neuroblastoma and the breast cancer datasets. The pathways and genes selected by the iDRW method were shown to be related to the corresponding cancers. Conclusions In this study, we demonstrated the effectiveness of a directed random walk-based multi-omics data integration method applied to gene expression and copy number data for both breast cancer and neuroblastoma datasets. We revamped a directed gene-gene graph considering the impact of copy number variation on gene expression and redefined the weight initialization and gene-scoring method. The benchmark result for iDRW with four pathway-based methods demonstrated that the iDRW method improved survival prediction performance and jointly identified cancer-related pathways and genes for two different cancer datasets. Reviewers This article was reviewed by Helena Molina-Abril and Marta Hidalgo.
Collapse
Affiliation(s)
- So Yeon Kim
- Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Hyun-Hwan Jeong
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.,Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Jaesik Kim
- Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Jeong-Hyeon Moon
- Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Kyung-Ah Sohn
- Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea.
| |
Collapse
|
11
|
El-Manzalawy Y, Hsieh TY, Shivakumar M, Kim D, Honavar V. Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data. BMC Med Genomics 2018; 11:71. [PMID: 30255801 PMCID: PMC6157248 DOI: 10.1186/s12920-018-0388-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Large-scale collaborative precision medicine initiatives (e.g., The Cancer Genome Atlas (TCGA)) are yielding rich multi-omics data. Integrative analyses of the resulting multi-omics data, such as somatic mutation, copy number alteration (CNA), DNA methylation, miRNA, gene expression, and protein expression, offer tantalizing possibilities for realizing the promise and potential of precision medicine in cancer prevention, diagnosis, and treatment by substantially improving our understanding of underlying mechanisms as well as the discovery of novel biomarkers for different types of cancers. However, such analyses present a number of challenges, including heterogeneity, and high-dimensionality of omics data. METHODS We propose a novel framework for multi-omics data integration using multi-view feature selection. We introduce a novel multi-view feature selection algorithm, MRMR-mv, an adaptation of the well-known Min-Redundancy and Maximum-Relevance (MRMR) single-view feature selection algorithm to the multi-view setting. RESULTS We report results of experiments using an ovarian cancer multi-omics dataset derived from the TCGA database on the task of predicting ovarian cancer survival. Our results suggest that multi-view models outperform both view-specific models (i.e., models trained and tested using a single type of omics data) and models based on two baseline data fusion methods. CONCLUSIONS Our results demonstrate the potential of multi-view feature selection in integrative analyses and predictive modeling from multi-omics data.
Collapse
Affiliation(s)
- Yasser El-Manzalawy
- Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA.,The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA.,The Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, PA, 16802, USA
| | - Tsung-Yu Hsieh
- Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA.,School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, PA, 16802, USA.,The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA
| | - Manu Shivakumar
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Dokyoon Kim
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA. .,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Vasant Honavar
- Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA. .,School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, PA, 16802, USA. .,The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA. .,The Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
12
|
Kim SY, Kim TR, Jeong HH, Sohn KA. Integrative pathway-based survival prediction utilizing the interaction between gene expression and DNA methylation in breast cancer. BMC Med Genomics 2018; 11:68. [PMID: 30255812 PMCID: PMC6157196 DOI: 10.1186/s12920-018-0389-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background Integrative analysis on multi-omics data has gained much attention recently. To investigate the interactive effect of gene expression and DNA methylation on cancer, we propose a directed random walk-based approach on an integrated gene-gene graph that is guided by pathway information. Methods Our approach first extracts a single pathway profile matrix out of the gene expression and DNA methylation data by performing the random walk over the integrated graph. We then apply a denoising autoencoder to the pathway profile to further identify important pathway features and genes. The extracted features are validated in the survival prediction task for breast cancer patients. Results The results show that the proposed method substantially improves the survival prediction performance compared to that of other pathway-based prediction methods, revealing that the combined effect of gene expression and methylation data is well reflected in the integrated gene-gene graph combined with pathway information. Furthermore, we show that our joint analysis on the methylation features and gene expression profile identifies cancer-specific pathways with genes related to breast cancer. Conclusions In this study, we proposed a DRW-based method on an integrated gene-gene graph with expression and methylation profiles in order to utilize the interactions between them. The results showed that the constructed integrated gene-gene graph can successfully reflect the combined effect of methylation features on gene expression profiles. We also found that the selected features by DA can effectively extract topologically important pathways and genes specifically related to breast cancer.
Collapse
Affiliation(s)
- So Yeon Kim
- Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Tae Rim Kim
- Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Hyun-Hwan Jeong
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.,Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Kyung-Ah Sohn
- Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea.
| |
Collapse
|
13
|
Doostparast Torshizi A, Petzold LR. Graph-based semi-supervised learning with genomic data integration using condition-responsive genes applied to phenotype classification. J Am Med Inform Assoc 2018; 25:99-108. [PMID: 28505320 PMCID: PMC7647127 DOI: 10.1093/jamia/ocx032] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Revised: 02/08/2017] [Accepted: 03/14/2017] [Indexed: 11/14/2022] Open
Abstract
Objective Data integration methods that combine data from different molecular levels such as genome, epigenome, transcriptome, etc., have received a great deal of interest in the past few years. It has been demonstrated that the synergistic effects of different biological data types can boost learning capabilities and lead to a better understanding of the underlying interactions among molecular levels. Methods In this paper we present a graph-based semi-supervised classification algorithm that incorporates latent biological knowledge in the form of biological pathways with gene expression and DNA methylation data. The process of graph construction from biological pathways is based on detecting condition-responsive genes, where 3 sets of genes are finally extracted: all condition responsive genes, high-frequency condition-responsive genes, and P-value-filtered genes. Results The proposed approach is applied to ovarian cancer data downloaded from the Human Genome Atlas. Extensive numerical experiments demonstrate superior performance of the proposed approach compared to other state-of-the-art algorithms, including the latest graph-based classification techniques. Conclusions Simulation results demonstrate that integrating various data types enhances classification performance and leads to a better understanding of interrelations between diverse omics data types. The proposed approach outperforms many of the state-of-the-art data integration algorithms.
Collapse
Affiliation(s)
| | - Linda R Petzold
- Department of Computer Science, University of California, Santa Barbara, CA, USA
| |
Collapse
|
14
|
Kim D, Li R, Lucas A, Verma SS, Dudek SM, Ritchie MD. Using knowledge-driven genomic interactions for multi-omics data analysis: metadimensional models for predicting clinical outcomes in ovarian carcinoma. J Am Med Inform Assoc 2017; 24:577-587. [PMID: 28040685 DOI: 10.1093/jamia/ocw165] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 12/02/2016] [Indexed: 02/07/2023] Open
Abstract
It is common that cancer patients have different molecular signatures even though they have similar clinical features, such as histology, due to the heterogeneity of tumors. To overcome this variability, we previously developed a new approach incorporating prior biological knowledge that identifies knowledge-driven genomic interactions associated with outcomes of interest. However, no systematic approach has been proposed to identify interaction models between pathways based on multi-omics data. Here we have proposed such a novel methodological framework, called metadimensional knowledge-driven genomic interactions (MKGIs). To test the utility of the proposed framework, we applied it to an ovarian cancer dataset including multi-omics profiles from The Cancer Genome Atlas to predict grade, stage, and survival outcome. We found that each knowledge-driven genomic interaction model, based on different genomic datasets, contains different sets of pathway features, which suggests that each genomic data type may contribute to outcomes in ovarian cancer via a different pathway. In addition, MKGI models significantly outperformed the single knowledge-driven genomic interaction model. From the MKGI models, many interactions between pathways associated with outcomes were found, including the mitogen-activated protein kinase (MAPK) signaling pathway and the gonadotropin-releasing hormone (GnRH) signaling pathway, which are known to play important roles in cancer pathogenesis. The beauty of incorporating biological knowledge into the model based on multi-omics data is the ability to improve diagnosis and prognosis and provide better interpretability. Thus, determining variability in molecular signatures based on these interactions between pathways may lead to better diagnostic/treatment strategies for better precision medicine.
Collapse
Affiliation(s)
- Dokyoon Kim
- Biomedical and Translational Informatics, Geisinger Health System, Danville, Pennsylvania, USA.,Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Ruowang Li
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Anastasia Lucas
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Shefali S Verma
- Biomedical and Translational Informatics, Geisinger Health System, Danville, Pennsylvania, USA
| | - Scott M Dudek
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics, Geisinger Health System, Danville, Pennsylvania, USA.,Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| |
Collapse
|
15
|
Lee G, Bang L, Kim SY, Kim D, Sohn KA. Identifying subtype-specific associations between gene expression and DNA methylation profiles in breast cancer. BMC Med Genomics 2017; 10:28. [PMID: 28589855 PMCID: PMC5461552 DOI: 10.1186/s12920-017-0268-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Breast cancer is a complex disease in which different genomic patterns exists depending on different subtypes. Recent researches present that multiple subtypes of breast cancer occur at different rates, and play a crucial role in planning treatment. To better understand underlying biological mechanisms on breast cancer subtypes, investigating the specific gene regulatory system via different subtypes is desirable. METHODS Gene expression, as an intermediate phenotype, is estimated based on methylation profiles to identify the impact of epigenomic features on transcriptomic changes in breast cancer. We propose a kernel weighted l1-regularized regression model to incorporate tumor subtype information and further reveal gene regulations affected by different breast cancer subtypes. For the proper control of subtype-specific estimation, samples from different breast cancer subtype are learned at different rate based on target estimates. Kolmogorov Smirnov test is conducted to determine learning rate of each sample from different subtype. RESULTS It is observed that genes that might be sensitive to breast cancer subtype show prediction improvement when estimated using our proposed method. Comparing to a standard method, overall performance is also enhanced by incorporating tumor subtypes. In addition, we identified subtype-specific network structures based on the associations between gene expression and DNA methylation. CONCLUSIONS In this study, kernel weighted lasso model is proposed for identifying subtype-specific associations between gene expressions and DNA methylation profiles. Identification of subtype-specific gene expression associated with epigenomic changes might be helpful for better planning treatment and developing new therapies.
Collapse
Affiliation(s)
- Garam Lee
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Lisa Bang
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - So Yeon Kim
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Dokyoon Kim
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA. .,The Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA.
| | - Kyung-Ah Sohn
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, South Korea.
| |
Collapse
|
16
|
Shivakumar M, Lee Y, Bang L, Garg T, Sohn KA, Kim D. Identification of epigenetic interactions between miRNA and DNA methylation associated with gene expression as potential prognostic markers in bladder cancer. BMC Med Genomics 2017; 10:30. [PMID: 28589857 PMCID: PMC5461531 DOI: 10.1186/s12920-017-0269-y] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background One of the fundamental challenges in cancer is to detect the regulators of gene expression changes during cancer progression. Through transcriptional silencing of critical cancer-related genes, epigenetic change such as DNA methylation plays a crucial role in cancer. In addition, miRNA, another major component of epigenome, is also a regulator at the post-transcriptional levels that modulate transcriptome changes. However, a mechanistic role of synergistic interactions between DNA methylation and miRNA as epigenetic regulators on transcriptomic changes and its association with clinical outcomes such as survival have remained largely unexplored in cancer. Methods In this study, we propose an integrative framework to identify epigenetic interactions between methylation and miRNA associated with transcriptomic changes. To test the utility of the proposed framework, the bladder cancer data set, including DNA methylation, miRNA expression, and gene expression data, from The Cancer Genome Atlas (TCGA) was analyzed for this study. Results First, we found 120 genes associated with interactions between the two epigenomic components. Then, 11 significant epigenetic interactions between miRNA and methylation, which target E2F3, CCND1, UTP6, CDADC1, SLC35E3, METRNL, TPCN2, NACC2, VGLL4, and PTEN, were found to be associated with survival. To this end, exploration of TCGA bladder cancer data identified epigenetic interactions that are associated with survival as potential prognostic markers in bladder cancer. Conclusions Given the importance and prevalence of these interactions of epigenetic events in bladder cancer it is timely to understand further how different epigenetic components interact and influence each other. Electronic supplementary material The online version of this article (doi:10.1186/s12920-017-0269-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Manu Shivakumar
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Younghee Lee
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Lisa Bang
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Tullika Garg
- Mowad Urology Department, Geisinger Health System, Danville, PA, USA
| | - Kyung-Ah Sohn
- Department of Software and Computer Engineering, Ajou University, Suwon, South Korea.
| | - Dokyoon Kim
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA. .,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
17
|
Abstract
Background Biological system is a multi-layered structure of omics with genome, epigenome, transcriptome, metabolome, proteome, etc., and can be further stretched to clinical/medical layers such as diseasome, drugs, and symptoms. One advantage of omics is that we can figure out an unknown component or its trait by inferring from known omics components. The component can be inferred by the ones in the same level of omics or the ones in different levels. Methods To implement the inference process, an algorithm that can be applied to the multi-layered complex system is required. In this study, we develop a semi-supervised learning algorithm that can be applied to the multi-layered complex system. In order to verify the validity of the inference, it was applied to the prediction problem of disease co-occurrence with a two-layered network composed of symptom-layer and disease-layer. Results The symptom-disease layered network obtained a fairly high value of AUC, 0.74, which is regarded as noticeable improvement when comparing 0.59 AUC of single-layered disease network. If further stretched to whole layered structure of omics, the proposed method is expected to produce more promising results. Conclusion This research has novelty in that it is a new integrative algorithm that incorporates the vertical structure of omics data, on contrary to other existing methods that integrate the data in parallel fashion. The results can provide enhanced guideline for disease co-occurrence prediction, thereby serve as a valuable tool for inference process of multi-layered biological system.
Collapse
Affiliation(s)
- Myungjun Kim
- Department of Industrial Engineering, Ajou University, 206 Worldcup-ro, Yeongtong-gu, Suwon, 16499, South Korea
| | - Yonghyun Nam
- Department of Industrial Engineering, Ajou University, 206 Worldcup-ro, Yeongtong-gu, Suwon, 16499, South Korea
| | - Hyunjung Shin
- Department of Industrial Engineering, Ajou University, 206 Worldcup-ro, Yeongtong-gu, Suwon, 16499, South Korea.
| |
Collapse
|
18
|
Hassanzadeh HR, Phan JH, Wang MD. A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2016; 2016:184-189. [PMID: 32655981 DOI: 10.1109/bibm.2016.7822516] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient's quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients. Despite the wealth of information available in expression profiles of cancer tumors, fulfilling the aforementioned objective remains a big challenge, for the most part, due to the paucity of data samples compared to the high dimension of the expression profiles. As such, analysis of transcriptomic data modalities calls for state-of-the-art big-data analytics techniques that can maximally use all the available data to discover the relevant information hidden within a significant amount of noise. In this paper, we propose a pipeline that predicts cancer patients' survival by exploiting the structure of the input (manifold learning) and by leveraging the unlabeled samples using Laplacian support vector machines, a graph-based semi supervised learning (GSSL) paradigm. We show that under certain circumstances, no single modality per se will result in the best accuracy and by fusing different models together via a stacked generalization strategy, we may boost the accuracy synergistically. We apply our approach to two cancer datasets and present promising results. We maintain that a similar pipeline can be used for predictive tasks where labeled samples are expensive to acquire.
Collapse
Affiliation(s)
- Hamid Reza Hassanzadeh
- Department of Computational Science and Engineering, Georgia Institute of Technology Atlanta, Georgia 30332
| | - John H Phan
- Department of Biomedical Engineering Georgia Institute of Technology and Emory University, Atlanta, Georgia 30332
| | - May D Wang
- Department of Biomedical Engineering Georgia Institute of Technology and Emory University, Atlanta, Georgia 30332
| |
Collapse
|
19
|
Nam Y, Kim M, Lee K, Shin H. CLASH: Complementary Linkage with Anchoring and Scoring for Heterogeneous biomolecular and clinical data. BMC Med Inform Decis Mak 2016; 16 Suppl 3:72. [PMID: 27454118 PMCID: PMC4959382 DOI: 10.1186/s12911-016-0315-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Background The study on disease-disease association has been increasingly viewed and analyzed as a network, in which the connections between diseases are configured using the source information on interactome maps of biomolecules such as genes, proteins, metabolites, etc. Although abundance in source information leads to tighter connections between diseases in the network, for a certain group of diseases, such as metabolic diseases, the connections do not occur much due to insufficient source information; a large proportion of their associated genes are still unknown. One way to circumvent the difficulties in the lack of source information is to integrate available external information by using one of up-to-date integration or fusion methods. However, if one wants a disease network placing huge emphasis on the original source of data but still utilizing external sources only to complement it, integration may not be pertinent. Interpretation on the integrated network would be ambiguous: meanings conferred on edges would be vague due to fused information. Methods In this study, we propose a network based algorithm that complements the original network by utilizing external information while preserving the network’s originality. The proposed algorithm links the disconnected node to the disease network by using complementary information from external data source through four steps: anchoring, connecting, scoring, and stopping. Results When applied to the network of metabolic diseases that is sourced from protein-protein interaction data, the proposed algorithm recovered connections by 97%, and improved the AUC performance up to 0.71 (lifted from 0.55) by using the external information outsourced from text mining results on PubMed comorbidity literatures. Experimental results also show that the proposed algorithm is robust to noisy external information. Conclusion This research has novelty in which the proposed algorithm preserves the network’s originality, but at the same time, complements it by utilizing external information. Furthermore it can be utilized for original association recovery and novel association discovery for disease network.
Collapse
Affiliation(s)
- Yonghyun Nam
- Department of Industrial Engineering, Ajou University, Wonchun-dong, Yeongtong-gu, Suwon, 443-749, South Korea
| | - Myungjun Kim
- Department of Industrial Engineering, Ajou University, Wonchun-dong, Yeongtong-gu, Suwon, 443-749, South Korea
| | - Kyungwon Lee
- Department of Digital Media, Ajou University, Wonchun-dong, Yeongtong-gu, 443-749, Suwon, South Korea
| | - Hyunjung Shin
- Department of Industrial Engineering, Ajou University, Wonchun-dong, Yeongtong-gu, Suwon, 443-749, South Korea.
| |
Collapse
|
20
|
Świtnicki MP, Juul M, Madsen T, Sørensen KD, Pedersen JS. PINCAGE: probabilistic integration of cancer genomics data for perturbed gene identification and sample classification. Bioinformatics 2016; 32:1353-65. [PMID: 26740525 DOI: 10.1093/bioinformatics/btv758] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 12/17/2015] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Cancer development and progression is driven by a complex pattern of genomic and epigenomic perturbations. Both types of perturbations can affect gene expression levels and disease outcome. Integrative analysis of cancer genomics data may therefore improve detection of perturbed genes and prediction of disease state. As different data types are usually dependent, analysis based on independence assumptions will make inefficient use of the data and potentially lead to false conclusions. MODEL Here, we present PINCAGE (Probabilistic INtegration of CAncer GEnomics data), a method that uses probabilistic integration of cancer genomics data for combined evaluation of RNA-seq gene expression and 450k array DNA methylation measurements of promoters as well as gene bodies. It models the dependence between expression and methylation using modular graphical models, which also allows future inclusion of additional data types. RESULTS We apply our approach to a Breast Invasive Carcinoma dataset from The Cancer Genome Atlas consortium, which includes 82 adjacent normal and 730 cancer samples. We identify new biomarker candidates of breast cancer development (PTF1A, RABIF, RAG1AP1, TIMM17A, LOC148145) and progression (SERPINE3, ZNF706). PINCAGE discriminates better between normal and tumour tissue and between progressing and non-progressing tumours in comparison with established methods that assume independence between tested data types, especially when using evidence from multiple genes. Our method can be applied to any type of cancer or, more generally, to any genomic disease for which sufficient amount of molecular data is available. AVAILABILITY AND IMPLEMENTATION R scripts available at http://moma.ki.au.dk/prj/pincage/ CONTACT : michal.switnicki@clin.au.dk or jakob.skou@clin.au.dk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | | | - Jakob S Pedersen
- Department of Molecular Medicine (MOMA) Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, 8000, Denmark
| |
Collapse
|
21
|
Abstract
Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.
Collapse
Affiliation(s)
- Ulf Schmitz
- Dept of Systems Biology & Bioinformatics, University of Rostock, Rostock, Germany
| | - Olaf Wolkenhauer
- Dept of Systems Biology & Bioinformatics, University of Rostock, Rostock, Germany
| |
Collapse
|
22
|
Jeong HH, Leem S, Wee K, Sohn KA. Integrative network analysis for survival-associated gene-gene interactions across multiple genomic profiles in ovarian cancer. J Ovarian Res 2015; 8:42. [PMID: 26138921 PMCID: PMC4491426 DOI: 10.1186/s13048-015-0171-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2014] [Accepted: 06/24/2015] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Recent advances in high-throughput technology and the emergence of large-scale genomic datasets have enabled detection of genomic features that affect clinical outcomes. Although many previous computational studies have analysed the effect of each single gene or the additive effects of multiple genes on the clinical outcome, less attention has been devoted to the identification of gene-gene interactions of general type that are associated with the clinical outcome. Moreover, the integration of information from multiple molecular profiles adds another challenge to this problem. Recently, network-based approaches have gained huge popularity. However, previous network construction methods have been more concerned with the relationship between features only, rather than the effect of feature interactions on clinical outcome. METHODS We propose a mutual information-based integrative network analysis framework (MINA) that identifies gene pairs associated with clinical outcome and systematically analyses the resulting networks over multiple genomic profiles. We implement an efficient non-parametric testing scheme that ensures the significance of detected gene interactions. We develop a tool named MINA that automates the proposed analysis scheme of identifying outcome-associated gene interactions and generating various networks from those interacting pairs for downstream analysis. RESULTS We demonstrate the proposed framework using real data from ovarian cancer patients in The Cancer Genome Atlas (TCGA). Statistically significant gene pairs associated with survival were identified from multiple genomic profiles, which include many individual genes that have weak or no effect on survival. Moreover, we also show that integrated networks, constructed by merging networks from multiple genomic profiles, demonstrate better topological properties and biological significance than individual networks. CONCLUSIONS We have developed a simple but powerful analysis tool that is able to detect gene-gene interactions associated with clinical outcome on multiple genomic profiles. By being network-based, our approach provides a better insight into the underlying gene-gene interaction mechanisms that affect the clinical outcome of cancer patients.
Collapse
Affiliation(s)
- Hyun-Hwan Jeong
- Department of Information and Computer Engineering, Ajou University, Suwon, 443-749, Republic of Korea.
| | - Sangseob Leem
- Department of Information and Computer Engineering, Ajou University, Suwon, 443-749, Republic of Korea.
| | - Kyubum Wee
- Department of Information and Computer Engineering, Ajou University, Suwon, 443-749, Republic of Korea.
| | - Kyung-Ah Sohn
- Department of Information and Computer Engineering, Ajou University, Suwon, 443-749, Republic of Korea.
| |
Collapse
|
23
|
Kim D, Li R, Dudek SM, Ritchie MD. Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer. J Biomed Inform 2015; 56:220-8. [PMID: 26048077 DOI: 10.1016/j.jbi.2015.05.019] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Revised: 05/15/2015] [Accepted: 05/27/2015] [Indexed: 12/27/2022]
Abstract
Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Dokyoon Kim
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Ruowang Li
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Scott M Dudek
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Marylyn D Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA; Geisinger Health System, Danville, PA, USA.
| |
Collapse
|
24
|
Affiliation(s)
- Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| |
Collapse
|