1
|
Razalli II, Abdullah-Zawawi MR, Tamizi AA, Harun S, Zainal-Abidin RA, Jalal MIA, Ullah MA, Zainal Z. Accelerating crop improvement via integration of transcriptome-based network biology and genome editing. PLANTA 2025; 261:92. [PMID: 40095140 DOI: 10.1007/s00425-025-04666-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 03/03/2025] [Indexed: 03/19/2025]
Abstract
MAIN CONCLUSION Big data and network biology infer functional coupling between genes. In combination with machine learning, network biology can dramatically accelerate the pace of gene discovery using modern transcriptomics approaches and be validated via genome editing technology for improving crops to stresses. Unlike other living things, plants are sessile and frequently face various environmental challenges due to climate change. The cumulative effects of combined stresses can significantly influence both plant growth and yields. In navigating the complexities of climate change, ensuring the nourishment of our growing population hinges on implementing precise agricultural systems. Conventional breeding methods have been commonly employed; however, their efficacy has been impeded by limitations in terms of time, cost, and infrastructure. Cutting-edge tools focussing on big data are being championed to usher in a new era in stress biology, aiming to cultivate crops that exhibit enhanced resilience to multifactorial stresses. Transcriptomics, combined with network biology and machine learning, is proving to be a powerful approach for identifying potential genes to target for gene editing, specifically to enhance stress tolerance. The integration of transcriptomic data with genome editing can yield significant benefits, such as gaining insights into gene function by modifying or manipulating of specific genes in the target plant. This review provides valuable insights into the use of transcriptomics platforms and the application of biological network analysis and machine learning in the discovery of novel genes, thereby enhancing the understanding of plant responses to combined or sequential stress. The transcriptomics as a forefront omics platform and how it is employed through biological networks and machine learning that lead to novel gene discoveries for producing multi-stress-tolerant crops, limitations, and future directions have also been discussed.
Collapse
Affiliation(s)
- Izreen Izzati Razalli
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia
| | - Muhammad-Redha Abdullah-Zawawi
- UKM Medical Molecular Biology Institute (UMBI), UKM Medical Centre, Jalan Ya'acob Latiff, Bandar Tun Razak, 56000, Cheras, Kuala Lumpur, Malaysia
| | - Amin-Asyraf Tamizi
- Malaysian Agricultural Research and Development Institute (MARDI), 43400, Serdang, Selangor, Malaysia
| | - Sarahani Harun
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia
| | | | - Muhammad Irfan Abdul Jalal
- UKM Medical Molecular Biology Institute (UMBI), UKM Medical Centre, Jalan Ya'acob Latiff, Bandar Tun Razak, 56000, Cheras, Kuala Lumpur, Malaysia
| | - Mohammad Asad Ullah
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia
- Bangladesh Institute of Nuclear Agriculture (BINA), BAU Campus, Mymensingh, 2202, Bangladesh
| | - Zamri Zainal
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia.
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia.
| |
Collapse
|
2
|
Orfanoudaki G, Psatha K, Aivaliotis M. Insight into Mantle Cell Lymphoma Pathobiology, Diagnosis, and Treatment Using Network-Based and Drug-Repurposing Approaches. Int J Mol Sci 2024; 25:7298. [PMID: 39000404 PMCID: PMC11242097 DOI: 10.3390/ijms25137298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/25/2024] [Accepted: 06/25/2024] [Indexed: 07/16/2024] Open
Abstract
Mantle cell lymphoma (MCL) is a rare, incurable, and aggressive B-cell non-Hodgkin lymphoma (NHL). Early MCL diagnosis and treatment is critical and puzzling due to inter/intra-tumoral heterogeneity and limited understanding of the underlying molecular mechanisms. We developed and applied a multifaceted analysis of selected publicly available transcriptomic data of well-defined MCL stages, integrating network-based methods for pathway enrichment analysis, co-expression module alignment, drug repurposing, and prediction of effective drug combinations. We demonstrate the "butterfly effect" emerging from a small set of initially differentially expressed genes, rapidly expanding into numerous deregulated cellular processes, signaling pathways, and core machineries as MCL becomes aggressive. We explore pathogenicity-related signaling circuits by detecting common co-expression modules in MCL stages, pointing out, among others, the role of VEGFA and SPARC proteins in MCL progression and recommend further study of precise drug combinations. Our findings highlight the benefit that can be leveraged by such an approach for better understanding pathobiology and identifying high-priority novel diagnostic and prognostic biomarkers, drug targets, and efficacious combination therapies against MCL that should be further validated for their clinical impact.
Collapse
Affiliation(s)
- Georgia Orfanoudaki
- Functional Proteomics and Systems Biology (FunPATh), Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, GR-54124 Thessaloniki, Greece
- Institute of Molecular Biology and Biotechnology Foundation for Research and Technology-Hellas, GR-70013 Heraklion, Greece
| | - Konstantina Psatha
- Functional Proteomics and Systems Biology (FunPATh), Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, GR-54124 Thessaloniki, Greece
- Institute of Molecular Biology and Biotechnology Foundation for Research and Technology-Hellas, GR-70013 Heraklion, Greece
- Laboratory of Medical Biology-Genetics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, GR-54124 Thessaloniki, Greece
| | - Michalis Aivaliotis
- Functional Proteomics and Systems Biology (FunPATh), Center for Interdisciplinary Research and Innovation (CIRI-AUTH), Balkan Center, GR-54124 Thessaloniki, Greece
- Institute of Molecular Biology and Biotechnology Foundation for Research and Technology-Hellas, GR-70013 Heraklion, Greece
- Basic and Translational Research Unit, Special Unit for Biomedical Research and Education, School of Medicine, Aristotle University of Thessaloniki, GR-54124 Thessaloniki, Greece
- Laboratory of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, GR-54124 Thessaloniki, Greece
| |
Collapse
|
3
|
Al-Kuhali HA, Shan M, Hael MA, Al-Hada EA, Al-Murisi SA, Al-Kuhali AA, Aldaifl AAQ, Amin ME. Multiview clustering of multi-omics data integration by using a penalty model. BMC Bioinformatics 2022; 23:288. [PMID: 35864439 PMCID: PMC9306064 DOI: 10.1186/s12859-022-04826-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 06/20/2022] [Indexed: 11/10/2022] Open
Abstract
Background Methods for the multiview clustering and integration of multi-omics data have been developed recently to solve problems caused by data noise or limited sample size and to integrate multi-omics data with consistent (common) and differential cluster patterns. However, the integration of such data still suffers from limited performance and low accuracy. Results In this study, a computational framework for the multiview clustering method based on the penalty model is presented to overcome the challenges of low accuracy and limited performance in the case of integrating multi-omics data with consistent (common) and differential cluster patterns. The performance of the proposed method was evaluated on synthetic data and four real multi-omics data and then compared with approaches presented in the literature under different scenarios. Result implies that our method exhibits competitive performance compared with recently developed techniques when the underlying clusters are consistent with synthetic data. In the case of the differential clusters, the proposed method also presents an enhanced performance. In addition, with regards to real omics data, the developed method exhibits better performance, demonstrating its ability to provide more detailed information within each data type and working better to integrate multi-omics data with consistent (common) and differential cluster patterns. This study shows that the proposed method offers more significant differences in survival times across all types of cancer. Conclusions A new multiview clustering method is proposed in this study based on synthetic and real data. This method performs better than other techniques previously presented in the literature in terms of integrating multi-omics data with consistent and differential cluster patterns and determining the significance of difference in survival times.
Collapse
Affiliation(s)
- Hamas A Al-Kuhali
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China
| | - Ma Shan
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China.
| | | | - Eman A Al-Hada
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China
| | | | | | - Ammar A Q Aldaifl
- School of Information Engineering, Wuhan University of Technology, Wuhan, China
| | - Mohammed Elmustafa Amin
- Department of Mathematics, Faculty of Science and Technology, Omdurman Islamic University, Khartoum, Sudan
| |
Collapse
|
4
|
Li S, Wang X, Wang T, Zhang H, Lu X, Liu L, Li L, Bo C, Kong X, Xu S, Ning S, Wang J, Wang L. Identification of the regulatory role of lncRNA HCG18 in myasthenia gravis by integrated bioinformatics and experimental analyses. J Transl Med 2021; 19:468. [PMID: 34794447 PMCID: PMC8600732 DOI: 10.1186/s12967-021-03138-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 11/03/2021] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs), functioning as competing endogenous RNAs (ceRNAs), have been reported to play important roles in the pathogenesis of autoimmune diseases. However, little is known about the regulatory roles of lncRNAs underlying the mechanism of myasthenia gravis (MG). The aim of the present study was to explore the roles of lncRNAs as ceRNAs associated with the progression of MG. METHODS MG risk genes and miRNAs were obtained from public databases. Protein-protein interaction (PPI) network analysis and module analysis were performed. A lncRNA-mediated module-associated ceRNA (LMMAC) network, which integrated risk genes in modules, risk miRNAs and predicted lncRNAs, was constructed to systematically explore the regulatory roles of lncRNAs in MG. Through performing random walk with restart on the network, HCG18/miR-145-5p/CD28 ceRNA axis was found to play important roles in MG, potentially. The expression of HCG18 in MG patients was detected using RT-PCR. The effects of HCG18 knockdown on cell proliferation and apoptosis were determined by CCK-8 assay and flow cytometry. The interactions among HCG18, miR-145-5p and CD28 were explored by luciferase assay, RT-PCR and western blot assay. RESULTS Based on PPI network, we identified 9 modules. Functional enrichment analyses revealed these modules were enriched in immune-related signaling pathways. We then constructed LMMAC network, containing 25 genes, 50 miRNAs, and 64 lncRNAs. Through bioinformatics algorithm, we found lncRNA HCG18 as a ceRNA, might play important roles in MG. Further experiments indicated that HCG18 was overexpressed in MG patients and was a target of miR-145-5p. Functional assays illustrated that HCG18 suppressed Jurkat cell apoptosis and promoted cell proliferation. Mechanistically, knockdown of HCG18 inhibited the CD28 mRNA and protein expression levels in Jurkat cells, while miR-145-5p inhibitor blocked the reduction of CD28 expression induced by HCG18 suppression. CONCLUSION We have reported a novel HCG18/miR-145-5p/CD28 ceRNA axis in MG. Our findings will contribute to a deeper understanding of the molecular mechanism of and provide a novel potential therapeutic target for MG.
Collapse
Affiliation(s)
- Shuang Li
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Xu Wang
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Tianfeng Wang
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Huixue Zhang
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Xiaoyu Lu
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Li Liu
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Lifang Li
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Chunrui Bo
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Xiaotong Kong
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Si Xu
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Shangwei Ning
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
| | - Jianjian Wang
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China.
| | - Lihua Wang
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, Heilongjiang, China.
| |
Collapse
|
5
|
Abood A, Farber CR. Using "-omics" Data to Inform Genome-wide Association Studies (GWASs) in the Osteoporosis Field. Curr Osteoporos Rep 2021; 19:369-380. [PMID: 34125409 PMCID: PMC8767463 DOI: 10.1007/s11914-021-00684-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/22/2021] [Indexed: 01/12/2023]
Abstract
PURPOSE OF REVIEW Osteoporosis constitutes a major societal health problem. Genome-wide association studies (GWASs) have identified over 1100 loci influencing bone mineral density (BMD); however, few of the causal genes have been identified. Here, we review approaches that use "-omics" data and genetic- and systems genetics-based analytical strategies to facilitate causal gene discovery. RECENT FINDINGS The bone field is beginning to adopt approaches that are commonplace in other disease disciplines. The slower progress has been due in part to the lack of large-scale "omics" data on bone and bone cells. This is however changing, and approaches such as eQTL colocalization, transcriptome-wide association studies (TWASs), network, and integrative approaches are beginning to provide significant insight into the genes responsible for BMD GWAS associations. The use of "-omics" data to inform BMD GWASs has increased in recent years, leading to the identification of novel regulators of BMD in humans. The ultimate goal will be to use this information to develop more effective therapies to treat and ultimately prevent osteoporosis.
Collapse
Affiliation(s)
- Abdullah Abood
- Center for Public Health Genomics, University of Virginia, 800717, Charlottesville, VA, 22908, USA
- Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Charles R Farber
- Center for Public Health Genomics, University of Virginia, 800717, Charlottesville, VA, 22908, USA.
- Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, 22908, USA.
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA.
| |
Collapse
|
6
|
Chaari N, Akdağ HC, Rekik I. Estimation of gender-specific connectional brain templates using joint multi-view cortical morphological network integration. Brain Imaging Behav 2021; 15:2081-2100. [PMID: 33089469 PMCID: PMC8413178 DOI: 10.1007/s11682-020-00404-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/21/2020] [Indexed: 12/02/2022]
Abstract
The estimation of a connectional brain template (CBT) integrating a population of brain networks while capturing shared and differential connectional patterns across individuals remains unexplored in gender fingerprinting. This paper presents the first study to estimate gender-specific CBTs using multi-view cortical morphological networks (CMNs) estimated from conventional T1-weighted magnetic resonance imaging (MRI). Specifically, each CMN view is derived from a specific cortical attribute (e.g. thickness), encoded in a network quantifying the dissimilarity in morphology between pairs of cortical brain regions. To this aim, we propose Multi-View Clustering and Fusion Network (MVCF-Net), a novel multi-view network fusion method, which can jointly identify consistent and differential clusters of multi-view datasets in order to capture simultaneously similar and distinct connectional traits of samples. Our MVCF-Net method estimates a representative and well-centered CBTs for male and female populations, independently, to eventually identify their fingerprinting regions of interest (ROIs) in four main steps. First, we perform multi-view network clustering model based on manifold optimization which groups CMNs into shared and differential clusters while preserving their alignment across views. Second, for each view, we linearly fuse CMNs belonging to each cluster, producing local CBTs. Third, for each cluster, we non-linearly integrate the local CBTs across views, producing a cluster-specific CBT. Finally, by linearly fusing the cluster-specific centers we estimate a final CBT of the input population. MVCF-Net produced the most centered and representative CBTs for male and female populations and identified the most discriminative ROIs marking gender differences. The most two gender-discriminative ROIs involved the lateral occipital cortex and pars opercularis in the left hemisphere and the middle temporal gyrus and lingual gyrus in the right hemisphere.
Collapse
Affiliation(s)
- Nada Chaari
- BASIRA Lab, Faculty of Computer and Informatics, Istanbul Technical University, Istanbul, Turkey
| | | | - Islem Rekik
- BASIRA Lab, Faculty of Computer and Informatics, Istanbul Technical University, Istanbul, Turkey.
- Computing, School of Science and Engineering, University of Dundee, Dundee, UK.
| |
Collapse
|
7
|
Lu X, Zhu Z, Peng X, Miao Q, Luo Y, Chen X. InFun: a community detection method to detect overlapping gene communities in biological network. SIGNAL, IMAGE AND VIDEO PROCESSING 2021; 15:681-686. [DOI: 10.1007/s11760-020-01638-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 10/16/2019] [Accepted: 01/08/2020] [Indexed: 01/03/2025]
|
8
|
Tian J, Zhao J, Zheng C. Clustering of cancer data based on Stiefel manifold for multiple views. BMC Bioinformatics 2021; 22:268. [PMID: 34034643 PMCID: PMC8152349 DOI: 10.1186/s12859-021-04195-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/12/2021] [Indexed: 12/23/2022] Open
Abstract
Background In recent years, various sequencing techniques have been used to collect biomedical omics datasets. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. Nevertheless, clustering of omics data consists of many challenges. The primary challenges in omics data analysis come from high dimension of data and small size of sample. Therefore, it is difficult to find a suitable integration method for structural analysis of multiple datasets. Results In this paper, a multi-view clustering based on Stiefel manifold method (MCSM) is proposed. The MCSM method comprises three core steps. Firstly, we established a binary optimization model for the simultaneous clustering problem. Secondly, we solved the optimization problem by linear search algorithm based on Stiefel manifold. Finally, we integrated the clustering results obtained from three omics by using k-nearest neighbor method. We applied this approach to four cancer datasets on TCGA. The result shows that our method is superior to several state-of-art methods, which depends on the hypothesis that the underlying omics cluster class is the same. Conclusion Particularly, our approach has better performance than compared approaches when the underlying clusters are inconsistent. For patients with different subtypes, both consistent and differential clusters can be identified at the same time.
Collapse
Affiliation(s)
- Jing Tian
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jianping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.
| | - Chunhou Zheng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.,School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
9
|
Khan ZH, Agarwal S, Rai A, Memaya MB, Mehrotra S, Mehrotra R. Co-expression network analysis of protein phosphatase 2A (PP2A) genes with stress-responsive genes in Arabidopsis thaliana reveals 13 key regulators. Sci Rep 2020; 10:21480. [PMID: 33293553 PMCID: PMC7722862 DOI: 10.1038/s41598-020-77746-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Accepted: 10/26/2020] [Indexed: 12/17/2022] Open
Abstract
Abiotic and biotic stresses adversely affect plant growth and development and eventually result in less yield and threaten food security worldwide. In plants, several studies have been carried out to understand molecular responses to abiotic and biotic stresses. However, the complete circuitry of stress-responsive genes that plants utilise in response to those environmental stresses are still unknown. The protein phosphatase 2A (PP2A) gene has been known to have a crucial role in abiotic and biotic stresses; but how it regulates the stress response in plants is still not known completely. In this study, we constructed gene co-expression networks of PP2A genes with stress-responsive gene datasets from cold, drought, heat, osmotic, genotoxic, salt, and wounding stresses to unveil their relationships with the PP2A under different conditions of stress. The graph analysis identified 13 hub genes and several influential genes based on closeness centrality score (CCS). Our findings also revealed the count of unique genes present in different settings of stresses and subunits. We also formed clusters of influential genes based on the stress, CCS, and co-expression value. Analysis of cis-regulatory elements (CREs), recurring in promoters of these genes was also performed. Our study has led to the identification of 16 conserved CREs.
Collapse
Affiliation(s)
- Zaiba Hasan Khan
- Department of Biological Sciences, K.K. Birla Goa Campus, BITS-Pilani, Goa, India
| | - Swati Agarwal
- Department of Computer Science and Information Systems, K.K. Birla Goa Campus, BITS-Pilani, Goa, India.
| | - Atul Rai
- Department of Computer Science and Information Systems, K.K. Birla Goa Campus, BITS-Pilani, Goa, India
| | - Mounil Binal Memaya
- Department of Computer Science and Information Systems, K.K. Birla Goa Campus, BITS-Pilani, Goa, India
| | - Sandhya Mehrotra
- Department of Biological Sciences, K.K. Birla Goa Campus, BITS-Pilani, Goa, India
| | - Rajesh Mehrotra
- Department of Biological Sciences, K.K. Birla Goa Campus, BITS-Pilani, Goa, India.
| |
Collapse
|
10
|
Yu Y, Zhang LH, Zhang S. Simultaneous clustering of multiview biomedical data using manifold optimization. Bioinformatics 2020; 35:4029-4037. [PMID: 30918942 DOI: 10.1093/bioinformatics/btz217] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Revised: 12/26/2018] [Accepted: 03/26/2019] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Multiview clustering has attracted much attention in recent years. Several models and algorithms have been proposed for finding the clusters. However, these methods are developed either to find the consistent/common clusters across different views, or to identify the differential clusters among different views. In reality, both consistent and differential clusters may exist in multiview datasets. Thus, development of simultaneous clustering methods such that both the consistent and the differential clusters can be identified is of great importance. RESULTS In this paper, we proposed one method for simultaneous clustering of multiview data based on manifold optimization. The binary optimization model for finding the clusters is relaxed to a real value optimization problem on the Stiefel manifold, which is solved by the line-search algorithm on manifold. We applied the proposed method to both simulation data and four real datasets from TCGA. Both studies show that when the underlying clusters are consistent, our method performs competitive to the state-of-the-art algorithms. When there are differential clusters, our method performs much better. In the real data study, we performed experiments on cancer stratification and differential cluster (module) identification across multiple cancer subtypes. For the patients of different subtypes, both consistent clusters and differential clusters are identified at the same time. The proposed method identifies more clusters that are enriched by gene ontology and KEGG pathways. The differential clusters could be used to explain the different mechanisms for the cancer development in the patients of different subtypes. AVAILABILITY AND IMPLEMENTATION Codes can be downloaded from: http://homepage.fudan.edu.cn/sqzhang/files/2018/12/MVCMOcode.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yun Yu
- School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Lei-Hong Zhang
- School of Mathematical Sciences, Soochow University, Suzhou, China
| | - Shuqin Zhang
- School of Mathematical Sciences, Fudan University, Shanghai, China.,Center for Computational Systems Biology, Fudan University, Shanghai, China.,Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence of Ministry of Education, Fudan University, Shanghai, China
| |
Collapse
|
11
|
Liu Y, Ng MK, Wu S. Multi-Domain Networks Association for Biological Data Using Block Signed Graph Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:435-448. [PMID: 29994480 DOI: 10.1109/tcbb.2018.2848904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Multi-domain biological network association and clustering have attracted a lot of attention in biological data integration and understanding, which can provide a more global and accurate understanding of biological phenomenon. In many problems, different domains may have different cluster structures. Due to rapid growth of data collection from different sources, some domains may be strongly or weakly associated with the other domains. A key challenge is how to determine the degree of association among different domains, and to achieve accurate clustering results by data integration. In this paper, we propose an unsupervised learning approach for multi-domain network association by using block signed graph clustering. In particular, with consistency weights calculation, the proposed algorithm automatically identify domains relevant to each other strongly (or weakly) by assigning them larger (or smaller) weights. This approach not only significantly improve clustering accuracy but also understand multi-domain networks association. In each iteration of the proposed algorithm, we update consistency weights based on cluster structure of each domain, and then make use of different sets of eigenvectors to obtain different cluster structures in each domain. Experimental results on both synthetic data sets and real data sets (including neuron activity data and gene expression data) empirically demonstrate the effectiveness of the proposed algorithm in clustering performance and in domain association capability.
Collapse
|
12
|
Sircar S, Parekh N. Meta-analysis of drought-tolerant genotypes in Oryza sativa: A network-based approach. PLoS One 2019; 14:e0216068. [PMID: 31059518 PMCID: PMC6502313 DOI: 10.1371/journal.pone.0216068] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Accepted: 04/12/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Drought is a severe environmental stress. It is estimated that about 50% of the world rice production is affected mainly by drought. Apart from conventional breeding strategies to develop drought-tolerant crops, innovative computational approaches may provide insights into the underlying molecular mechanisms of stress response and identify drought-responsive markers. Here we propose a network-based computational approach involving a meta-analytic study of seven drought-tolerant rice genotypes under drought stress. RESULTS Co-expression networks enable large-scale analysis of gene-pair associations and tightly coupled clusters that may represent coordinated biological processes. Considering differentially expressed genes in the co-expressed modules and supplementing external information such as resistance/tolerance QTLs, transcription factors, network-based topological measures, we identify and prioritize drought-adaptive co-expressed gene modules and potential candidate genes. Using the candidate genes that are well-represented across the datasets as 'seed' genes, two drought-specific protein-protein interaction networks (PPINs) are constructed with up- and down-regulated genes. Cluster analysis of the up-regulated PPIN revealed ABA signalling pathway as a central process in drought response with a probable crosstalk with energy metabolic processes. Tightly coupled gene clusters representing up-regulation of core cellular respiratory processes and enhanced degradation of branched chain amino acids and cell wall metabolism are identified. Cluster analysis of down-regulated PPIN provides a snapshot of major processes associated with photosynthesis, growth, development and protein synthesis, most of which are shut down during drought. Differential regulation of phytohormones, e.g., jasmonic acid, cell wall metabolism, signalling and posttranslational modifications associated with biotic stress are elucidated. Functional characterization of topologically important, drought-responsive uncharacterized genes that may play a role in important processes such as ABA signalling, calcium signalling, photosynthesis and cell wall metabolism is discussed. Further transgenic studies on these genes may help in elucidating their biological role under stress conditions. CONCLUSION Currently, a large number of resources for rice functional genomics exist which are mostly underutilized by the scientific community. In this study, a computational approach integrating information from various resources such as gene co-expression networks, protein-protein interactions and pathway-level information is proposed to provide a systems-level view of complex drought-responsive processes across the drought-tolerant genotypes.
Collapse
Affiliation(s)
- Sanchari Sircar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
- * E-mail:
| |
Collapse
|
13
|
Gligorijevic V, Panagakis Y, Zafeiriou S. Non-Negative Matrix Factorizations for Multiplex Network Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:928-940. [PMID: 29993651 DOI: 10.1109/tpami.2018.2821146] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Networks have been a general tool for representing, analyzing, and modeling relational data arising in several domains. One of the most important aspect of network analysis is community detection or network clustering. Until recently, the major focus have been on discovering community structure in single (i.e., monoplex) networks. However, with the advent of relational data with multiple modalities, multiplex networks, i.e., networks composed of multiple layers representing different aspects of relations, have emerged. Consequently, community detection in multiplex network, i.e., detecting clusters of nodes shared by all layers, has become a new challenge. In this paper, we propose Network Fusion for Composite Community Extraction (NF-CCE), a new class of algorithms, based on four different non-negative matrix factorization models, capable of extracting composite communities in multiplex networks. Each algorithm works in two steps: first, it finds a non-negative, low-dimensional feature representation of each network layer; then, it fuses the feature representation of layers into a common non-negative, low-dimensional feature representation via collective factorization. The composite clusters are extracted from the common feature representation. We demonstrate the superior performance of our algorithms over the state-of-the-art methods on various types of multiplex networks, including biological, social, economic, citation, phone communication, and brain multiplex networks.
Collapse
|
14
|
Zhang K, Geng W, Zhang S. Network-based logistic regression integration method for biomarker identification. BMC SYSTEMS BIOLOGY 2018; 12:135. [PMID: 30598085 PMCID: PMC6311907 DOI: 10.1186/s12918-018-0657-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Background Many mathematical and statistical models and algorithms have been proposed to do biomarker identification in recent years. However, the biomarkers inferred from different datasets suffer a lack of reproducibilities due to the heterogeneity of the data generated from different platforms or laboratories. This motivates us to develop robust biomarker identification methods by integrating multiple datasets. Methods In this paper, we developed an integrative method for classification based on logistic regression. Different constant terms are set in the logistic regression model to measure the heterogeneity of the samples. By minimizing the differences of the constant terms within the same dataset, both the homogeneity within the same dataset and the heterogeneity in multiple datasets can be kept. The model is formulated as an optimization problem with a network penalty measuring the differences of the constant terms. The L1 penalty, elastic penalty and network related penalties are added to the objective function for the biomarker discovery purpose. Algorithms based on proximal Newton method are proposed to solve the optimization problem. Results We first applied the proposed method to the simulated datasets. Both the AUC of the prediction and the biomarker identification accuracy are improved. We then applied the method to two breast cancer gene expression datasets. By integrating both datasets, the prediction AUC is improved over directly merging the datasets and MetaLasso. And it’s comparable to the best AUC when doing biomarker identification in an individual dataset. The identified biomarkers using network related penalty for variables were further analyzed. Meaningful subnetworks enriched by breast cancer were identified. Conclusion A network-based integrative logistic regression model is proposed in the paper. It improves both the prediction and biomarker identification accuracy. Electronic supplementary material The online version of this article (10.1186/s12918-018-0657-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ke Zhang
- School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China
| | - Wei Geng
- School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China
| | - Shuqin Zhang
- Center for Computational Systems Biology, Shanghai Key Laboratory for Contemporary Applied Mathematics, School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China.
| |
Collapse
|
15
|
Zhang S. Comparisons of gene coexpression network modules in breast cancer and ovarian cancer. BMC SYSTEMS BIOLOGY 2018; 12:8. [PMID: 29671401 PMCID: PMC5907153 DOI: 10.1186/s12918-018-0530-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Background Breast cancer and ovarian cancer are hormone driven and are known to have some predisposition genes in common such as the two well known cancer genes BRCA1 and BRCA2. The objective of this study is to compare the coexpression network modules of both cancers, so as to infer the potential cancer-related modules. Methods We applied the eigen-decomposition to the matrix that integrates the gene coexpression networks of both breast cancer and ovarian cancer. With hierarchical clustering of the related eigenvectors, we obtained the network modules of both cancers simultaneously. Enrichment analysis on Gene Ontology (GO), KEGG pathway, Disease Ontology (DO), and Gene Set Enrichment Analysis (GSEA) in the identified modules was performed. Results We identified 43 modules that are enriched by at least one of the four types of enrichments. 31, 25, and 18 modules are enriched by GO terms, KEGG pathways, and DO terms, respectively. The structure of 29 modules in both cancers is significantly different with p-values less than 0.05, of which 25 modules have larger densities in ovarian cancer. One module was found to be significantly enriched by the terms related to breast cancer from GO, KEGG and DO enrichment. One module was found to be significantly enriched by ovarian cancer related terms. Conclusion Breast cancer and ovarian cancer share some common properties on the module level. Integration of both cancers helps identifying the potential cancer associated modules. Electronic supplementary material The online version of this article (10.1186/s12918-018-0530-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shuqin Zhang
- Center for Computational Systems Biology, Shanghai Key Laboratory for Contemporary Applied Mathematics, School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China.
| |
Collapse
|
16
|
Liu G, Wang H, Chu H, Yu J, Zhou X. Functional diversity of topological modules in human protein-protein interaction networks. Sci Rep 2017; 7:16199. [PMID: 29170401 PMCID: PMC5701033 DOI: 10.1038/s41598-017-16270-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 11/09/2017] [Indexed: 01/18/2023] Open
Abstract
A large-scale molecular interaction network of protein-protein interactions (PPIs) enables the automatic detection of molecular functional modules through a computational approach. However, the functional modules that are typically detected by topological community detection algorithms may be diverse in functional homogeneity and are empirically considered to be default functional modules. Thus, a significant challenge that has been described but not elucidated is investigating the relationship between topological modules and functional modules. We systematically investigated this issue by initially using seven widely used community detection algorithms to partition the PPI network into communities. Four homogeneity measures were subsequently implemented to evaluate the functional homogeneity of protein community. We determined that a significant portion of topological modules with heterogeneous functionality exists and should be further investigated; moreover, these findings indicated that topologically based functional module detection approaches must be reconsidered. Furthermore, we found that the functional homogeneity of topological modules is positively correlated with their edge densities, degree of association with diseases and general Gene Ontology (GO) terms. Thus, topologically based module detection approaches should be used with caution in the identification of functional modules with high homogeneity
Collapse
Affiliation(s)
- Guangming Liu
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China
| | - Huixin Wang
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China
| | - Hongwei Chu
- Dalian University of Technology, Dalian, 116024, China.,Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Jian Yu
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China.
| | - Xuezhong Zhou
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, 100044, China.
| |
Collapse
|
17
|
|
18
|
Peng C, Li A. A Heterogeneous Network Based Method for Identifying GBM-Related Genes by Integrating Multi-Dimensional Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:713-720. [PMID: 28113912 DOI: 10.1109/tcbb.2016.2555314] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The emergence of multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of human diseases and therefore improving diagnosis, treatment, and prevention. In this study, we proposed a heterogeneous network based method by integrating multi-dimensional data (HNMD) to identify GBM-related genes. The novelty of the method lies in that the multi-dimensional data of GBM from TCGA dataset that provide comprehensive information of genes, are combined with protein-protein interactions to construct a weighted heterogeneous network, which reflects both the general and disease-specific relationships between genes. In addition, a propagation algorithm with resistance is introduced to precisely score and rank GBM-related genes. The results of comprehensive performance evaluation show that the proposed method significantly outperforms the network based methods with single-dimensional data and other existing approaches. Subsequent analysis of the top ranked genes suggests they may be functionally implicated in GBM, which further corroborates the superiority of the proposed method. The source code and the results of HNMD can be downloaded from the following URL: http://bioinformatics.ustc.edu.cn/hnmd/ .
Collapse
|
19
|
Lei M, Xu J, Huang LC, Wang L, Li J. Network module-based model in the differential expression analysis for RNA-seq. Bioinformatics 2017; 33:2699-2705. [DOI: 10.1093/bioinformatics/btx214] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 04/11/2017] [Indexed: 12/16/2022] Open
Affiliation(s)
- Mingli Lei
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, People's Republic of China
| | - Jia Xu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, People's Republic of China
| | - Li-Ching Huang
- Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA
| | - Lily Wang
- Department of Public Health Sciences, Division of Biostatistics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Jing Li
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, People's Republic of China
| |
Collapse
|
20
|
OncoBinder facilitates interpretation of proteomic interaction data by capturing coactivation pairs in cancer. Oncotarget 2017; 7:17608-15. [PMID: 26872056 PMCID: PMC4951236 DOI: 10.18632/oncotarget.7305] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Accepted: 01/29/2016] [Indexed: 11/25/2022] Open
Abstract
High-throughput methods such as co-immunoprecipitationmass spectrometry (coIP-MS) and yeast 2 hybridization (Y2H) have suggested a broad range of unannotated protein-protein interactions (PPIs), and interpretation of these PPIs remains a challenging task. The advancements in cancer genomic researches allow for the inference of "coactivation pairs" in cancer, which may facilitate the identification of PPIs involved in cancer. Here we present OncoBinder as a tool for the assessment of proteomic interaction data based on the functional synergy of oncoproteins in cancer. This decision tree-based method combines gene mutation, copy number and mRNA expression information to infer the functional status of protein-coding genes. We applied OncoBinder to evaluate the potential binders of EGFR and ERK2 proteins based on the gastric cancer dataset of The Cancer Genome Atlas (TCGA). As a result, OncoBinder identified high confidence interactions (annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) or validated by low-throughput assays) more efficiently than co-expression based method. Taken together, our results suggest that evaluation of gene functional synergy in cancer may facilitate the interpretation of proteomic interaction data. The OncoBinder toolbox for Matlab is freely accessible online.
Collapse
|