1
|
Al-Harazi O, El Allali A, Kaya N, Colak D. Identification of Diagnostic and Prognostic Subnetwork Biomarkers for Women with Breast Cancer Using Integrative Genomic and Network-Based Analysis. Int J Mol Sci 2024; 25:12779. [PMID: 39684488 DOI: 10.3390/ijms252312779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 11/11/2024] [Accepted: 11/14/2024] [Indexed: 12/18/2024] Open
Abstract
Breast cancer remains a major global health concern and a leading cause of cancer-related deaths among women. Early detection and effective treatment are essential in improving patient survival. Advances in omics technologies have provided deeper insights into the molecular mechanisms underlying breast cancer. This study aimed to identify subnetwork markers with diagnostic and prognostic potential by integrating genome-wide gene expression data with protein-protein interaction networks. We identified four significant subnetworks revealing potentially important hub genes, including VEGFA, KIF4A, ZWINT, PTPRU, IKBKE, STYK1, CENPO, and UBE2C. The diagnostic and prognostic potentials of these subnetworks were validated using independent datasets. Unsupervised principal component analysis demonstrated a clear separation of breast cancer patients from healthy controls across multiple datasets. A KNN classification model, based on these subnetworks, achieved an accuracy of 97%, sensitivity of 98%, specificity of 94%, and area under the curve (AUC) of 96%. Moreover, the prognostic significance of these subnetwork markers was validated using independent transcriptomic datasets comprising over 4000 patients. These findings suggest that subnetwork markers derived from integrated genomic network analyses can enhance our understanding of the molecular landscape of breast cancer, potentially leading to improved diagnostic, prognostic, and therapeutic strategies.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- Molecular Oncology Department, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Achraf El Allali
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Benguerir 43150, Morocco
| | - Namik Kaya
- Translational Genomics Department, Center for Genomic Medicine, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Dilek Colak
- Molecular Oncology Department, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| |
Collapse
|
2
|
Li C, Gao Z, Su B, Xu G, Lin X. Data analysis methods for defining biomarkers from omics data. Anal Bioanal Chem 2021; 414:235-250. [PMID: 34951658 DOI: 10.1007/s00216-021-03813-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/26/2021] [Accepted: 11/29/2021] [Indexed: 02/01/2023]
Abstract
Omics mainly includes genomics, epigenomics, transcriptomics, proteomics and metabolomics. The rapid development of omics technology has opened up new ways to study disease diagnosis and prognosis and to define prospective information of complex diseases. Since omics data are usually large and complex, the method used to analyze the data and to define important information is crucial in omics study. In this review, we focus on advances in biomarker discovery methods based on omics data in the last decade, and categorize them as individual feature analysis, combinatorial feature analysis and network analysis. We also discuss the challenges and perspectives in this field.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Zhenbo Gao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| |
Collapse
|
3
|
Mandal K, Sarmah R, Bhattacharyya DK. POPBic: Pathway-Based Order Preserving Biclustering Algorithm Towards the Analysis of Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2659-2670. [PMID: 32175872 DOI: 10.1109/tcbb.2020.2980816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
To understand the underlying biological mechanisms of gene expression data, it is important to discover the groups of genes that have similar expression patterns under certain subsets of conditions. Biclustering algorithms have been effective in analyzing large-scale gene expression data. Recently, traditional biclustering has been improved by introducing biological knowledge along with the expression data during the biclustering process. In this paper, we propose the Pathway-based Order Preserving Biclustering (POPBic) algorithm by incorporating Kyoto Encyclopedia of Genes and Genomes (KEGG) based on the hypothesis that two genes sharing similar pathways are likely to be similar. The basic principle of the POPBic approach is to apply the concept of Longest Common Subsequence between a pair of genes which have a high number of common pathways. The algorithm identifies the expression patterns from data using two major steps: (i) selection of significant seed genes and (ii) extraction of biclusters. We performe exhaustive experimentation with the POPBic algorithm using synthetic dataset to evaluate the bicluster model, finding its robustness in the presence of noise and identifying overlapping biclusters. We demonstrate that POPBic is able to discover biologically significant biclusters for four cancer microarray gene expression datasets. POPBic has been found to perform consistently well in comparison to its closest competitors.
Collapse
|
4
|
Al-Harazi O, El Allali A, Colak D. Biomolecular Databases and Subnetwork Identification Approaches of Interest to Big Data Community: An Expert Review. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 23:138-151. [PMID: 30883301 DOI: 10.1089/omi.2018.0205] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Next-generation sequencing approaches and genome-wide studies have become essential for characterizing the mechanisms of human diseases. Consequently, many researchers have applied these approaches to discover the genetic/genomic causes of common complex and rare human diseases, generating multiomics big data that span the continuum of genomics, proteomics, metabolomics, and many other system science fields. Therefore, there is a significant and unmet need for biological databases and tools that enable and empower the researchers to analyze, integrate, and make sense of big data. There are currently large number of databases that offer different types of biological information. In particular, the integration of gene expression profiles and protein-protein interaction networks provides a deeper understanding of the complex multilayered molecular architecture of human diseases. Therefore, there has been a growing interest in developing methodologies that integrate and contextualize big data from molecular interaction networks to identify biomarkers of human diseases at a subnetwork resolution as well. In this expert review, we provide a comprehensive summary of most popular biomolecular databases for molecular interactions (e.g., Biological General Repository for Interaction Datasets, Kyoto Encyclopedia of Genes and Genomes and Search Tool for The Retrieval of Interacting Genes/Proteins), gene-disease associations (e.g., Online Mendelian Inheritance in Man, Disease-Gene Network, MalaCards), and population-specific databases (e.g., Human Genetic Variation Database), and describe some examples of their usage and potential applications. We also present the most recent subnetwork identification approaches and discuss their main advantages and limitations. As the field of data science continues to emerge, the present analysis offers a deeper and contextualized understanding of the available databases in molecular biomedicine.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.,2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Achraf El Allali
- 2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Dilek Colak
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| |
Collapse
|
5
|
NBIA: a network-based integrative analysis framework - applied to pathway analysis. Sci Rep 2020; 10:4188. [PMID: 32144346 PMCID: PMC7060280 DOI: 10.1038/s41598-020-60981-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 02/19/2020] [Indexed: 02/08/2023] Open
Abstract
With the explosion of high-throughput data, effective integrative analyses are needed to decipher the knowledge accumulated in biological databases. Existing meta-analysis approaches in systems biology often focus on hypothesis testing and neglect real expression changes, i.e. effect sizes, across independent studies. In addition, most integrative tools completely ignore the topological order of gene regulatory networks that hold key characteristics in understanding biological processes. Here we introduce a novel meta-analysis framework, Network-Based Integrative Analysis (NBIA), that transforms the challenging meta-analysis problem into a set of standard pathway analysis problems that have been solved efficiently. NBIA utilizes techniques from classical and modern meta-analysis, as well as a network-based analysis, in order to identify patterns of genes and networks that are consistently impacted across multiple studies. We assess the performance of NBIA by comparing it with nine meta-analysis approaches: Impact Analysis, GSA, and GSEA combined with classical meta-analysis methods (Fisher’s and the additive method), plus the three MetaPath approaches that employ multiple datasets. The 10 approaches have been tested on 1,737 samples from 27 expression datasets related to Alzheimer’s disease, acute myeloid leukemia (AML), and influenza. For all of the three diseases, NBIA consistently identifies biological pathways relevant to the underlying diseases while the other 9 methods fail to capture the key phenomena. The identified AML signature is also validated on a completely independent cohort of 167 AML patients. In this independent cohort, the proposed signature identifies two groups of patients that have significantly different survival profiles (Cox p-value 2 × 10−6). The NBIA framework will be included in the next release of BLMA Bioconductor package (http://bioconductor.org/packages/release/bioc/html/BLMA.html).
Collapse
|
6
|
Ulgen E, Ozisik O, Sezerman OU. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front Genet 2019; 10:858. [PMID: 31608109 PMCID: PMC6773876 DOI: 10.3389/fgene.2019.00858] [Citation(s) in RCA: 278] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 08/16/2019] [Indexed: 12/13/2022] Open
Abstract
Pathway analysis is often the first choice for studying the mechanisms underlying a phenotype. However, conventional methods for pathway analysis do not take into account complex protein-protein interaction information, resulting in incomplete conclusions. Previously, numerous approaches that utilize protein-protein interaction information to enhance pathway analysis yielded superior results compared to conventional methods. Hereby, we present pathfindR, another approach exploiting protein-protein interaction information and the first R package for active-subnetwork-oriented pathway enrichment analyses for class comparison omics experiments. Using the list of genes obtained from an omics experiment comparing two groups of samples, pathfindR identifies active subnetworks in a protein-protein interaction network. It then performs pathway enrichment analyses on these identified subnetworks. To further reduce the complexity, it provides functionality for clustering the resulting pathways. Moreover, through a scoring function, the overall activity of each pathway in each sample can be estimated. We illustrate the capabilities of our pathway analysis method on three gene expression datasets and compare our results with those obtained from three popular pathway analysis tools. The results demonstrate that literature-supported disease-related pathways ranked higher in our approach compared to the others. Moreover, pathfindR identified additional pathways relevant to the conditions that were not identified by other tools, including pathways named after the conditions.
Collapse
Affiliation(s)
- Ege Ulgen
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| | - Ozan Ozisik
- Department of Computer Engineering, Electrical & Electronics Faculty, Yildiz Technical University, Istanbul, Turkey
| | - Osman Ugur Sezerman
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| |
Collapse
|
7
|
Tian S, Wang C, Wang B. Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures. BIOMED RESEARCH INTERNATIONAL 2019; 2019:2497509. [PMID: 31073522 PMCID: PMC6470448 DOI: 10.1155/2019/2497509] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 03/07/2019] [Indexed: 12/29/2022]
Abstract
To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable.
Collapse
Affiliation(s)
- Suyan Tian
- Division of Clinical Research, The First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin 130021, China
| | - Chi Wang
- Department of Biostatistics, Markey Cancer Center, The University of Kentucky, 800 Rose St., Lexington, KY 40536, USA
| | - Bing Wang
- School of Life Science, Jilin University, 2699 Qianjin Street, Changchun, Jilin 130012, China
| |
Collapse
|
8
|
Schönbach C, Verma C, Wee LJK, Bond PJ, Ranganathan S. 2016 update on APBioNet's annual international conference on bioinformatics (InCoB). BMC Genomics 2016; 17:1036. [PMID: 28155656 PMCID: PMC5259860 DOI: 10.1186/s12864-016-3362-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
InCoB became since its inception in 2002 one of the largest annual bioinformatics conferences in the Asia-Pacific region with attendance ranging between 150 and 250 delegates depending on the venue location. InCoB 2016 in Singapore was attended by almost 220 delegates. This year, sessions on structural bioinformatics, sequence and sequencing, and next-generation sequencing fielded the highest number of oral presentation. Forty-four out 96 oral presentations were associated with an accepted manuscript in supplemental issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics or BMC Systems Biology. Articles with a genomics focus are reviewed in this editorial. Next year's InCoB will be held in Shenzen, China from September 20 to 22, 2017.
Collapse
Affiliation(s)
- Christian Schönbach
- International Research Center for Medical Sciences, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, 860-0811 Japan
| | - Chandra Verma
- Bioinformatics Institute, Agency for Science, Technology and Research (A∗STAR), Singapore, 138671 Singapore
| | - Lawrence Jin Kiat Wee
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore, 138632 Singapore
| | - Peter John Bond
- Bioinformatics Institute, Agency for Science, Technology and Research (A∗STAR), Singapore, 138671 Singapore
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109 Australia
| |
Collapse
|