1
|
Chen J, Min W. sTPLS: identifying common and specific correlated patterns under multiple biological conditions. Brief Bioinform 2025; 26:bbaf195. [PMID: 40285361 PMCID: PMC12031727 DOI: 10.1093/bib/bbaf195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 03/19/2025] [Accepted: 04/07/2025] [Indexed: 04/29/2025] Open
Abstract
The rapidly emerging large-scale data in diverse biological research fields present valuable opportunities to explore the underlying mechanisms of tissue development and disease progression. However, few existing methods can simultaneously capture common and condition-specific association between different types of features across different biological conditions, such as cancer types or cell populations. Therefore, we developed the sparse tensor-based partial least squares (sTPLS) method, which integrates multiple pairs of datasets containing two types of features but derived from different biological conditions. We demonstrated the effectiveness and versatility of sTPLS through simulation study and three biological applications. By integrating the pairwise pharmacogenomic data, sTPLS identified 11 gene-drug comodules with high biological functional relevance specific for seven cancer types and two comodules that shared across multi-type cancers, such as breast, ovarian, and colorectal cancers. When applied to single-cell data, it uncovered nine gene-peak comodules representing transcriptional regulatory relationships specific for five cell types and three comodules shared across similar cell types, such as intermediate and naïve B cells. Furthermore, sTPLS can be directly applied to tensor-structured data, successfully revealing shared and distinct cell communication patterns mediated by the MK signaling pathway in coronavirus disease 2019 patients and healthy controls. These results highlight the effectiveness of sTPLS in identifying biologically meaningful relationships across diverse conditions, making it useful for multi-omics integrative analysis.
Collapse
Affiliation(s)
- Jinyu Chen
- School of Mathematics, Statistics and Mechanics, Beijing University of Technology, 100 Pingleyuan, Chaoyang District, Beijing 100124, China
| | - Wenwen Min
- School of Information Science and Engineering, Yunnan University, East Outer Ring Road, Chenggong District, Kunming 650500, China
| |
Collapse
|
2
|
Anwardeen NR, Naja K, Elrayess MA. Advancements in precision medicine: multi-omics approach for tailored metformin treatment in type 2 diabetes. Front Pharmacol 2024; 15:1506767. [PMID: 39669200 PMCID: PMC11634602 DOI: 10.3389/fphar.2024.1506767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Accepted: 11/20/2024] [Indexed: 12/14/2024] Open
Abstract
Metformin has become the frontline treatment in addressing the significant global health challenge of type 2 diabetes due to its proven effectiveness in lowering blood glucose levels. However, the reality is that many patients struggle to achieve their glycemic targets with the medication and the cause behind this variability has not been investigated thoroughly. While genetic factors account for only about a third of this response variability, the potential influence of metabolomics and the gut microbiome on drug efficacy opens new avenues for investigation. This review explores the different molecular signatures to uncover how the complex interplay between genetics, metabolic profiles, and gut microbiota can shape individual responses to metformin. By highlighting the insights from recent studies and identifying knowledge gaps regarding metformin-microbiota interplay, we aim to highlight the path toward more personalized and effective diabetes management strategies and moving beyond the one-size-fits-all approach.
Collapse
Affiliation(s)
| | - Khaled Naja
- Biomedical Research Center, Qatar University, Doha, Qatar
| | - Mohamed A. Elrayess
- Biomedical Research Center, Qatar University, Doha, Qatar
- College of Medicine, QU Health, Qatar University, Doha, Qatar
| |
Collapse
|
3
|
Huang RH, Ge ZL, Xu G, Zeng QM, Jiang B, Xiao GC, Xia W, Wu YT, Liao YF. Prognosis and diagnosis of prostate cancer based on hypergraph regularization sparse least partial squares regression algorithm. Aging (Albany NY) 2024; 16:9599-9624. [PMID: 38829766 PMCID: PMC11210239 DOI: 10.18632/aging.205889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/29/2024] [Indexed: 06/05/2024]
Abstract
BACKGROUND Prostate cancer (PCa) is a malignant tumor of the male reproductive system, and its incidence has increased significantly in recent years. This study aimed to further identify candidate biomarkers with prognostic and diagnostic significance by integrating gene expression and DNA methylation data from PCa patients through association analysis. MATERIAL AND METHODS To this end, this paper proposes a sparse partial least squares regression algorithm based on hypergraph regularization (HR-SPLS) by integrating and clustering two kinds of data. Next, module 2, with the most significant weight, was selected for further analysis according to the weight of each module related to DNA methylation and mRNAs. Based on the DNA methylation sites in module 2, this paper uses multiple machine learning methods to construct a PCa diagnosis-related model of 10-DNA methylation sites. RESULTS The results of Receiver Operating Characteristic (ROC) analysis showed that the DNA methylation-related diagnostic model we constructed could diagnose PCa patients with high accuracy. Subsequently, based on the mRNAs in module 2, we constructed a prognostic model for 7-mRNAs (MYH11, ACTG2, DDR2, CDC42EP3, MARCKSL1, LMOD1, and MYLK) using multivariate Cox regression analysis. The prognostic model could predict the disease free survival of PCa patients with moderate to high accuracy (area under the curve (AUC) =0.761). In addition, Gene Set EnrichmentAnalysis (GSEA) and immune analysis indicated that the prognosis of patients in the risk group might be related to immune cell infiltration. CONCLUSIONS Our findings may provide new methods and insights for identifying disease-related biomarkers by integrating DNA methylation and gene expression data.
Collapse
Affiliation(s)
- Ruo-Hui Huang
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Zi-Lu Ge
- First Clinical Medical College, Gannan Medical University, Ganzhou, Jiangxi, China
| | - Gang Xu
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Qing-Ming Zeng
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Bo Jiang
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Guan-Cheng Xiao
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Wei Xia
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Yu-Ting Wu
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| | - Yun-Feng Liao
- Department of Urology, First Affiliated Hospital of Gannan Medical University, Ganzhou, Jiangxi, China
| |
Collapse
|
4
|
Tan H, Guo M, Chen J, Wang J, Yu G. HetFCM: functional co-module discovery by heterogeneous network co-clustering. Nucleic Acids Res 2024; 52:e16. [PMID: 38088228 PMCID: PMC10853805 DOI: 10.1093/nar/gkad1174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 10/31/2023] [Accepted: 11/23/2023] [Indexed: 02/10/2024] Open
Abstract
Functional molecular module (i.e., gene-miRNA co-modules and gene-miRNA-lncRNA triple-layer modules) analysis can dissect complex regulations underlying etiology or phenotypes. However, current module detection methods lack an appropriate usage and effective model of multi-omics data and cross-layer regulations of heterogeneous molecules, causing the loss of critical genetic information and corrupting the detection performance. In this study, we propose a heterogeneous network co-clustering framework (HetFCM) to detect functional co-modules. HetFCM introduces an attributed heterogeneous network to jointly model interplays and multi-type attributes of different molecules, and applies multiple variational graph autoencoders on the network to generate cross-layer association matrices, then it performs adaptive weighted co-clustering on association matrices and attribute data to identify co-modules of heterogeneous molecules. Empirical study on Human and Maize datasets reveals that HetFCM can find out co-modules characterized with denser topology and more significant functions, which are associated with human breast cancer (subtypes) and maize phenotypes (i.e., lipid storage, drought tolerance and oil content). HetFCM is a useful tool to detect co-modules and can be applied to multi-layer functional modules, yielding novel insights for analyzing molecular mechanisms. We also developed a user-friendly module detection and analysis tool and shared it at http://www.sdu-idea.cn/FMDTool.
Collapse
Affiliation(s)
- Haojiang Tan
- School of Software, Shandong University, Jinan 250101, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, Shandong, China
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing Uni. of Civil Eng. and Arch., Beijing 100044, China
| | - Jian Chen
- College of Agronomy & Biotechnolog, China Agricultural University, Beijing 100193, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, Shandong, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan 250101, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, Shandong, China
| |
Collapse
|
5
|
Tang X, Mo Z, Chang C, Qian X. Group-shrinkage feature selection with a spatial network for mining DNA methylation data. Comput Biol Med 2023; 154:106573. [PMID: 36706568 DOI: 10.1016/j.compbiomed.2023.106573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/05/2023] [Accepted: 01/22/2023] [Indexed: 01/25/2023]
Abstract
Identifying disease-related biomarkers from high-dimensional DNA methylation data helps in reducing early screening costs and inferring pathogenesis mechanisms. Good discovery results have been achieved through spatial correlation methods of methylation sites, group-based regularization, and network constraints. However, these methods still have some key limitations as they cannot exclude isolated differential sites and only consider adjacent site ordering. Therefore, we propose a group-shrinkage feature selection algorithm to encourage the selection of clustered sites and discourage the selection of isolated differential sites. Specifically, a network-guided group-shrinkage strategy is developed to penalize weakly-correlated isolated methylation sites through a network structure constraint. The spatial network is constructed based on spatial correlation information of DNA methylation sites, where this information accounts for the uneven site distribution. The experimental simulations and applications demonstrated that the proposed method outperforms the advanced regularization methods, especially in rejecting isolated methylation sites; hence this study provides an efficient and clinical-valuable method for biomarker candidate discovery in DNA methylation data. Additionally, the proposed method exhibits enhanced reliability due to introducing biological prior knowledge into a regularization-based feature selection framework and could promote more research in the integration between biological prior knowledge and classical feature selection methods, thus facilitating their clinical application. Our source codes will be released at https://github.com/SJTUBME-QianLab/Group-shrinkage-Spatial-Network once this manuscript is accepted for publication.
Collapse
Affiliation(s)
- Xinlu Tang
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Zhanfeng Mo
- School of Computer Science and Engineering, Nanyang Technological University, Singapore.
| | - Cheng Chang
- Department of Nuclear Medicine, Shanghai, Chest Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200030, China.
| | - Xiaohua Qian
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
6
|
Chen J, Han G, Xu A, Akutsu T, Cai H. Identifying miRNA-Gene Common and Specific Regulatory Modules for Cancer Subtyping by a High-Order Graph Matching Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:421-431. [PMID: 35320104 DOI: 10.1109/tcbb.2022.3161635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Identifying regulatory modules between miRNAs and genes is crucial in cancer research. It promotes a comprehensive understanding of the molecular mechanisms of cancer. The genomic data collected from subjects usually relate to different cancer statuses, such as different TNM Classifications of Malignant Tumors (TNM) or histological subtypes. Simple integrated analyses generally identify the core of the tumorigenesis (common modules) but miss the subtype-specific regulatory mechanisms (specific modules). In contrast, separate analyses can only report the differences and ignore important common modules. Therefore, there is an urgent need to develop a novel method to jointly analyze miRNA and gene data of different cancer statuses to identify common and specific modules. To that end, we developed a High-Order Graph Matching model to identify Common and Specific modules (HOGMCS) between miRNA and gene data of different cancer statuses. We first demonstrate the superiority of HOGMCS through a comparison with four state-of-the-art techniques using a set of simulated data. Then, we apply HOGMCS on stomach adenocarcinoma data with four TNM stages and two histological types, and breast invasive carcinoma data with four PAM50 subtypes. The experimental results demonstrate that HOGMCS can accurately extract common and subtype-specific miRNA-gene regulatory modules, where many identified miRNA-gene interactions have been confirmed in several public databases.
Collapse
|
7
|
Wang Y, Guan T, Zhou G, Zhao H, Gao J. SOJNMF: Identifying Multidimensional Molecular Regulatory Modules by Sparse Orthogonality-Regularized Joint Non-Negative Matrix Factorization Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3695-3703. [PMID: 34546925 DOI: 10.1109/tcbb.2021.3114146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cancer is not only a very aggressive but also a very diverse disease. Recent advances in high-throughput omics technologies of cancer have enabled biomedical researchers to have more opportunities for studying its multi-level biological regulatory mechanism. However, there are few methods to explore the underlying mechanism of cancer by identifying its multidimensional molecular regulatory modules from the multidimensional omics data of cancer. In this paper, we propose a sparse orthogonality-regularized joint non-negative matrix factorization (SOJNMF) algorithm which can integratively analyze multidimensional omics data. This method can not only identify multidimensional molecular regulatory modules, but reduce the overlap rate of features among the multidimensional modules while ensuring the sparsity of the coefficient matrix after decomposition. Gene expression data, miRNA expression data and gene methylation data of liver cancer are integratively analyzed based on SOJNMF algorithm. Then, we obtain 238 multidimensional molecular regulatory modules. The results of permutation test indicate that different omics features within these modules are significantly correlated in statistics. Meanwhile, the results of functional enrichment analysis show that these multidimensional modules are significantly related to the underlying mechanism of the occurrence and development of liver cancer.
Collapse
|
8
|
Sun J, Kong Q, Xu Z. Deep alternating non-negative matrix factorisation. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
9
|
Min W, Wan X, Chang TH, Zhang S. A Novel Sparse Graph-Regularized Singular Value Decomposition Model and Its Application to Genomic Data Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3842-3856. [PMID: 33556027 DOI: 10.1109/tnnls.2021.3054635] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Learning the gene coexpression pattern is a central challenge for high-dimensional gene expression analysis. Recently, sparse singular value decomposition (SVD) has been used to achieve this goal. However, this model ignores the structural information between variables (e.g., a gene network). The typical graph-regularized penalty can be used to incorporate such prior graph information to achieve more accurate discovery and better interpretability. However, the existing approach fails to consider the opposite effect of variables with negative correlations. In this article, we propose a novel sparse graph-regularized SVD model with absolute operator (AGSVD) for high-dimensional gene expression pattern discovery. The key of AGSVD is to impose a novel graph-regularized penalty ( | u|T L| u| ). However, such a penalty is a nonconvex and nonsmooth function, so it brings new challenges to model solving. We show that the nonconvex problem can be efficiently handled in a convex fashion by adopting an alternating optimization strategy. The simulation results on synthetic data show that our method is more effective than the existing SVD-based ones. In addition, the results on several real gene expression data sets show that the proposed methods can discover more biologically interpretable expression patterns by incorporating the prior gene network.
Collapse
|
10
|
Wang X, Wen Y. A penalized linear mixed model with generalized method of moments for prediction analysis on high-dimensional multi-omics data. Brief Bioinform 2022; 23:bbac193. [PMID: 35649346 PMCID: PMC9310531 DOI: 10.1093/bib/bbac193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/18/2022] [Accepted: 04/27/2022] [Indexed: 11/13/2022] Open
Abstract
With the advances in high-throughput biotechnologies, high-dimensional multi-layer omics data become increasingly available. They can provide both confirmatory and complementary information to disease risk and thus have offered unprecedented opportunities for risk prediction studies. However, the high-dimensionality and complex inter/intra-relationships among multi-omics data have brought tremendous analytical challenges. Here we present a computationally efficient penalized linear mixed model with generalized method of moments estimator (MpLMMGMM) for the prediction analysis on multi-omics data. Our method extends the widely used linear mixed model proposed for genomic risk predictions to model multi-omics data, where kernel functions are used to capture various types of predictive effects from different layers of omics data and penalty terms are introduced to reduce the impact of noise. Compared with existing penalized linear mixed models, the proposed method adopts the generalized method of moments estimator and it is much more computationally efficient. Through extensive simulation studies and the analysis of positron emission tomography imaging outcomes, we have demonstrated that MpLMMGMM can simultaneously consider a large number of variables and efficiently select those that are predictive from the corresponding omics layers. It can capture both linear and nonlinear predictive effects and achieves better prediction performance than competing methods.
Collapse
Affiliation(s)
- Xiaqiong Wang
- Department of Statistics, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| |
Collapse
|
11
|
Chen J, Huang J, Liao Y, Zhu L, Cai H. Identify Multiple Gene-Drug Common Modules Via Constrained Graph Matching. IEEE J Biomed Health Inform 2022; 26:4794-4805. [PMID: 35788454 DOI: 10.1109/jbhi.2022.3188503] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Identifying gene-drug interactions is vital to understanding biological mechanisms and achieving precise drug repurposing. High-throughput technologies produce a large amount of pharmacological and genomic data, providing an opportunity to explore the associations between oncogenic genes and therapeutic drugs. However, most studies only focus on "one-to-one" or "one-to-many" interactions, ignoring the multivariate patterns between genes and drugs. In this article, a high-order graph matching model with hypergraph constraints is proposed to discover the gene-drug common regulatory modules. Moreover, the prior knowledge is formulated into hypergraph constraints to reveal their multiple correspondences, penalizing the tensor matching process. The experimental results on the synthetic data demonstrate the proposed model is robust to noise contamination and outlier corruption, achieving a better performance than four state-of-the-art methods. We then evaluate the statistical power of our proposed method on the pharmacogenomics data. Our identified gene-drug common modules not only show significantly enriched pathways associated with cancer but also manifest the highly close gene-drug interactions.
Collapse
|
12
|
Shan X, Chen J, Dong K, Zhou W, Zhang S. Deciphering the Spatial Modular Patterns of Tissues by Integrating Spatial and Single-Cell Transcriptomic Data. J Comput Biol 2022; 29:650-663. [PMID: 35727094 DOI: 10.1089/cmb.2021.0617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) provides a powerful tool to analyze the expression level of tissues at a cellular resolution. However, it could not capture the spatial organization of cells in a tissue. The spatially resolved transcriptomics technologies (ST) have been developed to address this issue. However, the emerging STs are still inefficient at single-cell resolution and/or fail to capture the sufficient reads. To this end, we adopted a partial least squares-based method (spatial modular patterns [SpaMOD]) to simultaneously integrate the two data modalities, as well as the networks related to cells and spots, to identify the cell-spot comodules for deciphering the SpaMOD of tissues. We applied SpaMOD to three paired scRNA-seq and ST datasets, derived from the mouse brain, granuloma, and pancreatic ductal adenocarcinoma, respectively. The identified cell-spot comodules provide detailed biological insights into the spatial relationships between cell populations and their spatial locations in the tissue.
Collapse
Affiliation(s)
- Xu Shan
- Department of Software Engineering, Yunnan University, Kunming, China
| | - Jinyu Chen
- College of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing, China
| | - Kangning Dong
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Wei Zhou
- Department of Software Engineering, Yunnan University, Kunming, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.,Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
| |
Collapse
|
13
|
Fang K, Chen Y, Ma S, Zhang Q. Biclustering analysis of functionals via penalized fusion. J MULTIVARIATE ANAL 2022; 189:104874. [PMID: 36817965 PMCID: PMC9937451 DOI: 10.1016/j.jmva.2021.104874] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In biomedical data analysis, clustering is commonly conducted. Biclustering analysis conducts clustering in both the sample and covariate dimensions and can more comprehensively describe data heterogeneity. In most of the existing biclustering analyses, scalar measurements are considered. In this study, motivated by time-course gene expression data and other examples, we take the "natural next step" and consider the biclustering analysis of functionals under which, for each covariate of each sample, a function (to be exact, its values at discrete measurement points) is present. We develop a doubly penalized fusion approach, which includes a smoothness penalty for estimating functionals and, more importantly, a fusion penalty for clustering. Statistical properties are rigorously established, providing the proposed approach a strong ground. We also develop an effective ADMM algorithm and accompanying R code. Numerical analysis, including simulations, comparisons, and the analysis of two time-course gene expression data, demonstrates the practical effectiveness of the proposed approach.
Collapse
Affiliation(s)
- Kuangnan Fang
- Department of Statistics and Data Science, School of Economics, Xiamen University, China
| | - Yuanxing Chen
- Department of Statistics and Data Science, School of Economics, Xiamen University, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, United States of America
| | - Qingzhao Zhang
- MOE Key Laboratory of Econometrics, Department of Statistics and Data Science, School of Economics, Wang Yanan Institute for Studies in Economics, and Fujian Key Lab of Statistics, Xiamen University, China,Corresponding author. (Q. Zhang)
| |
Collapse
|
14
|
Huang H, Wu N, Liang Y, Peng X, Jun S. SLNL: A novel method for gene selection and phenotype classification. INT J INTELL SYST 2022. [DOI: 10.1002/int.22844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- HaiHui Huang
- School of Information Engineering Shaoguan University Shaoguan China
| | - NaiQi Wu
- Macau Institute of Systems Engineering and Collaborative Laboratory of Intelligent Science and Systems Macau University of Science and Technology Macau China
| | - Yong Liang
- The Peng Cheng Laboratory Shenzhen China
| | - XinDong Peng
- School of Information Engineering Shaoguan University Shaoguan China
| | - Shu Jun
- School of Mathematics and Statistics Xi'an Jiaotong University Xi'an China
| |
Collapse
|
15
|
Liu J, Chen H, Yang Y. Prediction models with graph kernel regularization for network data. J Appl Stat 2022; 50:1400-1417. [PMID: 37025276 PMCID: PMC10071950 DOI: 10.1080/02664763.2022.2028745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Traditional regression methods typically consider only covariate information and assume that the observations are mutually independent samples. However, samples usually come from individuals connected by a network in many modern applications. We present a risk minimization formulation for learning from both covariates and network structure in the context of graph kernel regularization. The formulation involves a loss function with a penalty term. This penalty can be used not only to encourage similarity between linked nodes but also lead to improvement over traditional regression models. Furthermore, the penalty can be used with many loss-based predictive methods, such as linear regression with squared loss and logistic regression with log-likelihood loss. Simulations to evaluate the performance of this model in the cases of low dimensions and high dimensions show that our proposed approach outperforms all other benchmarks. We verify this for uniform graph, nonuniform graph, balanced-sample, and unbalanced-sample datasets. The approach was applied to predicting the response values on a 'follow' social network of Tencent Weibo users and on two citation networks (Cora and CiteSeer). Each instance verifies that the proposed method combining covariate information and link structure with the graph kernel regularization can improve predictive performance.
Collapse
Affiliation(s)
- Jie Liu
- International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, People's Republic of China
| | - Haojie Chen
- International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, People's Republic of China
| | - Yang Yang
- School of Statistics and Mathematics, Nanjing Audit University, Nanjing, People's Republic of China
| |
Collapse
|
16
|
Huang HH, Liang Y. Integrating molecular interactions and gene expression to identify biomarkers and network modules of chronic obstructive pulmonary disease. Technol Health Care 2022; 30:135-142. [PMID: 35124591 PMCID: PMC9028746 DOI: 10.3233/thc-228013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND: Chronic obstructive pulmonary disease (COPD) causes chronic obstructive conditions, chronic bronchitis, and emphysema, and is a major cause of death worldwide. Although several efforts for identifying biomarkers and pathways have been made, specific causal COPD mechanism remains unknown. OBJECTIVE: This study combined biological interaction data with gene expression data for a better understanding of the biological process and network module for COPD. METHODS: Using a sparse network-based method, we selected 49 genes from peripheral blood mononuclear cell expression data of 136 subjects, including 42 ex-smoking controls and 94 subjects with COPD. RESULTS: These 49 genes might influence biological processes and molecular functions related to COPD. For example, our result suggests that FoxO signaling may contribute to the atrophy of COPD peripheral muscle tissues via oxidative stress. CONCLUSIONS: Our approach enhances the existing understanding of COPD disease pathogenesis and predicts new genetic markers and pathways that may influence COPD pathogenesis.
Collapse
Affiliation(s)
- Hai-Hui Huang
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
- Macau Institute of Systems Engineering and Collaborative Laboratory of Intelligent Science and Systems, Macau University of Science and Technology, Macau, China
| | - Yong Liang
- Macau Institute of Systems Engineering and Collaborative Laboratory of Intelligent Science and Systems, Macau University of Science and Technology, Macau, China
| |
Collapse
|
17
|
He MF, Liang Y, Huang HH. Integrating molecular interactions and gene expression to identify biomarkers to predict response to tumor necrosis factor inhibitor therapies in rheumatoid arthritis patients. Technol Health Care 2022; 30:451-457. [PMID: 35124619 PMCID: PMC9028654 DOI: 10.3233/thc-thc228041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
BACKGROUND Targeted therapy using anti-TNF (tumor necrosis factor) is the first option for patients with rheumatoid arthritis (RA). Anti-TNF therapy, however, does not lead to meaningful clinical improvement in many RA patients. To predict which patients will not benefit from anti-TNF therapy, clinical tests should be performed prior to treatment beginning. OBJECTIVE Although various efforts have been made to identify biomarkers and pathways that may be helpful to predict the response to anti-TNF treatment, gaps remain in clinical use due to the low predictive power of the selected biomarkers. METHODS In this paper, we used a network-based computational method to identify the select the predictive biomarkers to guide the treatment of RA patients. RESULTS We select 69 genes from peripheral blood expression data from 46 subjects using a sparse network-based method. The result shows that the selected 69 genes might influence biological processes and molecular functions related to the treatment. CONCLUSIONS Our approach advances the predictive power of anti-TNF therapy response and provides new genetic markers and pathways that may influence the treatment.
Collapse
Affiliation(s)
- Min-Fan He
- Macau Institute of Systems Engineering and Collaborative Laboratory of Intelligent Science and Systems, Macau University of Science and Technology, Macau, China
- School of Mathematics and Big Data, Foshan University, Foshan, China
| | - Yong Liang
- Macau Institute of Systems Engineering and Collaborative Laboratory of Intelligent Science and Systems, Macau University of Science and Technology, Macau, China
| | - Hai-Hui Huang
- Provincial Demonstration Software Institute, Shaoguan University, Shaoguan, China
| |
Collapse
|
18
|
Zhu F, Li J, Liu J, Min W. Network-based cancer genomic data integration for pattern discovery. BMC Genom Data 2021; 22:54. [PMID: 34886811 PMCID: PMC8662848 DOI: 10.1186/s12863-021-01004-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Since genes involved in the same biological modules usually present correlated expression profiles, lots of computational methods have been proposed to identify gene functional modules based on the expression profiles data. Recently, Sparse Singular Value Decomposition (SSVD) method has been proposed to bicluster gene expression data to identify gene modules. However, this model can only handle the gene expression data where no gene interaction information is integrated. Ignoring the prior gene interaction information may produce the identified gene modules hard to be biologically interpreted. RESULTS In this paper, we develop a Sparse Network-regularized SVD (SNSVD) method that integrates a prior gene interaction network from a protein protein interaction network and gene expression data to identify underlying gene functional modules. The results on a set of simulated data show that SNSVD is more effective than the traditional SVD-based methods. The further experiment results on real cancer genomic data show that most co-expressed modules are not only significantly enriched on GO/KEGG pathways, but also correspond to dense sub-networks in the prior gene interaction network. Besides, we also use our method to identify ten differentially co-expressed miRNA-gene modules by integrating matched miRNA and mRNA expression data of breast cancer from The Cancer Genome Atlas (TCGA). Several important breast cancer related miRNA-gene modules are discovered. CONCLUSIONS All the results demonstrate that SNSVD can overcome the drawbacks of SSVD and capture more biologically relevant functional modules by incorporating a prior gene interaction network. These identified functional modules may provide a new perspective to understand the diagnostics, occurrence and progression of cancer.
Collapse
Affiliation(s)
- Fangfang Zhu
- State Key Laboratory of Nuclear Resources and Environment and School of Water Resources and Environmental Engineering, East China University of Technology, Nanchang, 330013, China
- State Key Laboratory of Nuclear Resources and Environment and School of Chemistry, Biology and Materials Science, East China University of Technology, Nanchang, 330013, China
| | - Jiang Li
- State Key Laboratory of Nuclear Resources and Environment and School of Chemistry, Biology and Materials Science, East China University of Technology, Nanchang, 330013, China.
| | - Juan Liu
- School of Computer Science, Wuhan University, Wuhan, 430072, China.
| | - Wenwen Min
- School of Mathematics and Computer Science, Jiangxi Science and Technology Normal University, Nanchang, 330038, China.
- Information School, Yunnan University, Kunming, 650091, China.
| |
Collapse
|
19
|
Ouadfel S, Abd Elaziz M. A multi-objective gradient optimizer approach-based weighted multi-view clustering. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2021; 106:104480. [DOI: 10.1016/j.engappai.2021.104480] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
20
|
Alcalá-Corona SA, Sandoval-Motta S, Espinal-Enríquez J, Hernández-Lemus E. Modularity in Biological Networks. Front Genet 2021; 12:701331. [PMID: 34594357 PMCID: PMC8477004 DOI: 10.3389/fgene.2021.701331] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 08/23/2021] [Indexed: 01/13/2023] Open
Abstract
Network modeling, from the ecological to the molecular scale has become an essential tool for studying the structure, dynamics and complex behavior of living systems. Graph representations of the relationships between biological components open up a wide variety of methods for discovering the mechanistic and functional properties of biological systems. Many biological networks are organized into a modular structure, so methods to discover such modules are essential if we are to understand the biological system as a whole. However, most of the methods used in biology to this end, have a limited applicability, as they are very specific to the system they were developed for. Conversely, from the statistical physics and network science perspective, graph modularity has been theoretically studied and several methods of a very general nature have been developed. It is our perspective that in particular for the modularity detection problem, biology and theoretical physics/network science are less connected than they should. The central goal of this review is to provide the necessary background and present the most applicable and pertinent methods for community detection in a way that motivates their further usage in biological research.
Collapse
Affiliation(s)
- Sergio Antonio Alcalá-Corona
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Santiago Sandoval-Motta
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico.,National Council on Science and Technology, Mexico City, Mexico
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
21
|
Ding P, Ouyang W, Luo J, Kwoh CK. Heterogeneous information network and its application to human health and disease. Brief Bioinform 2021; 21:1327-1346. [PMID: 31566212 DOI: 10.1093/bib/bbz091] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/29/2019] [Accepted: 06/30/2019] [Indexed: 12/11/2022] Open
Abstract
The molecular components with the functional interdependencies in human cell form complicated biological network. Diseases are mostly caused by the perturbations of the composite of the interaction multi-biomolecules, rather than an abnormality of a single biomolecule. Furthermore, new biological functions and processes could be revealed by discovering novel biological entity relationships. Hence, more and more biologists focus on studying the complex biological system instead of the individual biological components. The emergence of heterogeneous information network (HIN) offers a promising way to systematically explore complicated and heterogeneous relationships between various molecules for apparently distinct phenotypes. In this review, we first present the basic definition of HIN and the biological system considered as a complex HIN. Then, we discuss the topological properties of HIN and how these can be applied to detect network motif and functional module. Afterwards, methodologies of discovering relationships between disease and biomolecule are presented. Useful insights on how HIN aids in drug development and explores human interactome are provided. Finally, we analyze the challenges and opportunities for uncovering combinatorial patterns among pharmacogenomics and cell-type detection based on single-cell genomic data.
Collapse
Affiliation(s)
- Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chee-Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
22
|
Huang HH, Liang Y. A Novel Cox Proportional Hazards Model for High-Dimensional Genomic Data in Cancer Prognosis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1821-1830. [PMID: 31870990 DOI: 10.1109/tcbb.2019.2961667] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The Cox proportional hazards model is a popular method to study the connection between feature and survival time. Because of the high-dimensionality of genomic data, existing Cox models trained on any specific dataset often generalize poorly to other independent datasets. In this paper, we suggest a novel strategy for the Cox model. This strategy is included a new learning technique, self-paced learning (SPL), and a new gene selection method, SCAD-Net penalty. The SPL method is adopted to aid to build a more accurate prediction with its built-in mechanism of learning from easy samples first and adaptively learning from hard samples. The SCAD-Net penalty has fixed the problem of the SCAD method without an inherent mechanism to fuse the prior graphical information. We combined the SPL with the SCAD-Net penalty to the Cox model (SSNC). The simulation shows that the SSNC outperforms the benchmark in terms of prediction and gene selection. The analysis of a large-scale experiment across several cancer datasets shows that the SSNC method not only results in higher prediction accuracies but also identifies markers that satisfactory stability across another validation dataset. The demo code for the proposed method is provided in supplemental file.
Collapse
|
23
|
Hu C, Jia W. Multi-omics profiling: the way towards precision medicine in metabolic diseases. J Mol Cell Biol 2021; 13:mjab051. [PMID: 34406397 PMCID: PMC8697344 DOI: 10.1093/jmcb/mjab051] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/19/2021] [Accepted: 06/21/2021] [Indexed: 12/12/2022] Open
Abstract
Metabolic diseases including type 2 diabetes mellitus (T2DM), non-alcoholic fatty liver disease (NAFLD), and metabolic syndrome (MetS) are alarming health burdens around the world, while therapies for these diseases are far from satisfying as their etiologies are not completely clear yet. T2DM, NAFLD, and MetS are all complex and multifactorial metabolic disorders based on the interactions between genetics and environment. Omics studies such as genetics, transcriptomics, epigenetics, proteomics, and metabolomics are all promising approaches in accurately characterizing these diseases. And the most effective treatments for individuals can be achieved via omics pathways, which is the theme of precision medicine. In this review, we summarized the multi-omics studies of T2DM, NAFLD, and MetS in recent years, provided a theoretical basis for their pathogenesis and the effective prevention and treatment, and highlighted the biomarkers and future strategies for precision medicine.
Collapse
Affiliation(s)
- Cheng Hu
- Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus,
Shanghai Clinical Center for Diabetes, Shanghai Jiao Tong University Affiliated Sixth
People's Hospital, Shanghai 200233, China
- Institute for Metabolic Disease, Fengxian Central Hospital, The Third School of
Clinical Medicine, Southern Medical University, Shanghai 201499, China
| | - Weiping Jia
- Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus,
Shanghai Clinical Center for Diabetes, Shanghai Jiao Tong University Affiliated Sixth
People's Hospital, Shanghai 200233, China
| |
Collapse
|
24
|
Genome-wide discovery of hidden genes mediating known drug-disease association using KDDANet. NPJ Genom Med 2021; 6:50. [PMID: 34131148 PMCID: PMC8206141 DOI: 10.1038/s41525-021-00216-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 05/25/2021] [Indexed: 11/09/2022] Open
Abstract
Many of genes mediating Known Drug-Disease Association (KDDA) are escaped from experimental detection. Identifying of these genes (hidden genes) is of great significance for understanding disease pathogenesis and guiding drug repurposing. Here, we presented a novel computational tool, called KDDANet, for systematic and accurate uncovering the hidden genes mediating KDDA from the perspective of genome-wide functional gene interaction network. KDDANet demonstrated the competitive performances in both sensitivity and specificity of identifying genes in mediating KDDA in comparison to the existing state-of-the-art methods. Case studies on Alzheimer's disease (AD) and obesity uncovered the mechanistic relevance of KDDANet predictions. Furthermore, when applied with multiple types of cancer-omics datasets, KDDANet not only recapitulated known genes mediating KDDAs related to cancer, but also revealed novel candidates that offer new biological insights. Importantly, KDDANet can be used to discover the shared genes mediating multiple KDDAs. KDDANet can be accessed at http://www.kddanet.cn and the code can be freely downloaded at https://github.com/huayu1111/KDDANet .
Collapse
|
25
|
Huang H, Peng X, Liang Y. SPLSN: An efficient tool for survival analysis and biomarker selection. INT J INTELL SYST 2021. [DOI: 10.1002/int.22532] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Hai‐Hui Huang
- Faculty of Information Technology Macau University of Science and Technology Macau China
- Laboratory of Intelligent Science and Systems, Macau Institute of Systems Engineering and Collaborative Macau University of Science and Technology Macau China
| | - Xin‐Dong Peng
- School of Information Engineering Shaoguan University Shaoguan China
| | - Yong Liang
- Laboratory of Intelligent Science and Systems, Macau Institute of Systems Engineering and Collaborative Macau University of Science and Technology Macau China
- State Key Laboratory of Quality Research in Chinese Medicines Macau University of Science and Technology Macau China
| |
Collapse
|
26
|
TSCCA: A tensor sparse CCA method for detecting microRNA-gene patterns from multiple cancers. PLoS Comput Biol 2021; 17:e1009044. [PMID: 34061840 PMCID: PMC8195367 DOI: 10.1371/journal.pcbi.1009044] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 06/11/2021] [Accepted: 05/05/2021] [Indexed: 12/22/2022] Open
Abstract
Existing studies have demonstrated that dysregulation of microRNAs (miRNAs or miRs) is involved in the initiation and progression of cancer. Many efforts have been devoted to identify microRNAs as potential biomarkers for cancer diagnosis, prognosis and therapeutic targets. With the rapid development of miRNA sequencing technology, a vast amount of miRNA expression data for multiple cancers has been collected. These invaluable data repositories provide new paradigms to explore the relationship between miRNAs and cancer. Thus, there is an urgent need to explore the complex cancer-related miRNA-gene patterns by integrating multi-omics data in a pan-cancer paradigm. In this study, we present a tensor sparse canonical correlation analysis (TSCCA) method for identifying cancer-related miRNA-gene modules across multiple cancers. TSCCA is able to overcome the drawbacks of existing solutions and capture both the cancer-shared and specific miRNA-gene co-expressed modules with better biological interpretations. We comprehensively evaluate the performance of TSCCA using a set of simulated data and matched miRNA/gene expression data across 33 cancer types from the TCGA database. We uncover several dysfunctional miRNA-gene modules with important biological functions and statistical significance. These modules can advance our understanding of miRNA regulatory mechanisms of cancer and provide insights into miRNA-based treatments for cancer. MicroRNAs (miRNAs) are a class of small non-coding RNAs. Previous studies have revealed that miRNA-gene regulatory modules play key roles in the occurrence and development of cancer. However, little has been done to discover miRNA-gene regulatory modules from a pan-cancer view. Thus, it is urgently needed to develop new methods to explore the complex cancer-related miRNA-gene patterns by integrating multi-omics data of multi-cancers. To build the connections between miRNA-gene regulatory modules across different cancer types, we propose a tensor sparse canonical correlation analysis (TSCCA) method. Our specific contributions are two-fold: (1) We propose a sparse statistical learning model TSCCA and an efficient block-coordinate descent algorithm to solve it. (2) We apply TSCCA to a multi-omics data set of 33 cancer types from TCGA and identify some cancer-related miRNA-gene modules with important biological functions and statistical significance.
Collapse
|
27
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
28
|
Lin Y, Ma X. Predicting lincRNA-Disease Association in Heterogeneous Networks Using Co-regularized Non-negative Matrix Factorization. Front Genet 2021; 11:622234. [PMID: 33510774 PMCID: PMC7835800 DOI: 10.3389/fgene.2020.622234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/03/2020] [Indexed: 02/02/2023] Open
Abstract
Long intergenic non-coding ribonucleic acids (lincRNAs) are critical regulators for many complex diseases, and identification of disease-lincRNA association is both costly and time-consuming. Therefore, it is necessary to design computational approaches to predict the disease-lincRNA associations that shed light on the mechanisms of diseases. In this study, we develop a co-regularized non-negative matrix factorization (aka Cr-NMF) to identify potential disease-lincRNA associations by integrating the gene expression of lincRNAs, genetic interaction network for mRNA genes, gene-lincRNA associations, and disease-gene associations. The Cr-NMF algorithm factorizes the disease-lincRNA associations, while the other associations/interactions are integrated using regularization. Furthermore, the regularization does not only preserve the topological structure of the lincRNA co-expression network, but also maintains the links "lincRNA → gene → disease." Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art methods in terms of accuracy on predicting the disease-lincRNA associations. The model and algorithm provide an effective way to explore disease-lncRNA associations.
Collapse
Affiliation(s)
- Yong Lin
- School of Physics and Electronic Information Engineering, Ningxia Normal University, Guyuan, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
29
|
Koukouli E, Wang D, Dondelinger F, Park J. A regularized functional regression model enabling transcriptome-wide dosage-dependent association study of cancer drug response. PLoS Comput Biol 2021; 17:e1008066. [PMID: 33493149 PMCID: PMC7920352 DOI: 10.1371/journal.pcbi.1008066] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 03/01/2021] [Accepted: 12/17/2020] [Indexed: 11/18/2022] Open
Abstract
Cancer treatments can be highly toxic and frequently only a subset of the patient population will benefit from a given treatment. Tumour genetic makeup plays an important role in cancer drug sensitivity. We suspect that gene expression markers could be used as a decision aid for treatment selection or dosage tuning. Using in vitro cancer cell line dose-response and gene expression data from the Genomics of Drug Sensitivity in Cancer (GDSC) project, we build a dose-varying regression model. Unlike existing approaches, this allows us to estimate dosage-dependent associations with gene expression. We include the transcriptomic profiles as dose-invariant covariates into the regression model and assume that their effect varies smoothly over the dosage levels. A two-stage variable selection algorithm (variable screening followed by penalized regression) is used to identify genetic factors that are associated with drug response over the varying dosages. We evaluate the effectiveness of our method using simulation studies focusing on the choice of tuning parameters and cross-validation for predictive accuracy assessment. We further apply the model to data from five BRAF targeted compounds applied to different cancer cell lines under different dosage levels. We highlight the dosage-dependent dynamics of the associations between the selected genes and drug response, and we perform pathway enrichment analysis to show that the selected genes play an important role in pathways related to tumorigenesis and DNA damage response.
Collapse
Affiliation(s)
- Evanthia Koukouli
- Department of Mathematics and Statistics, Fylde College, Lancaster University, Bailrigg, Lancaster, UK
| | - Dennis Wang
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, UK
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | - Frank Dondelinger
- Centre for Health Informatics and Statistics, Lancaster Medical School, Lancaster University, Bailrigg, Lancaster, UK
| | - Juhyun Park
- Department of Mathematics and Statistics, Fylde College, Lancaster University, Bailrigg, Lancaster, UK
| |
Collapse
|
30
|
Zhou Z, Huang H, Liang Y. Cancer classification and biomarker selection via a penalized logsum network-based logistic regression model. Technol Health Care 2021; 29:287-295. [PMID: 33682765 PMCID: PMC8150479 DOI: 10.3233/thc-218026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
BACKGROUND In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE The aim of this paper is to give the model efficient gene selection capability. METHODS In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.
Collapse
Affiliation(s)
- Zhiming Zhou
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Haihui Huang
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
- Shaoguan University, Shaoguan, Guangdong, China
| | - Yong Liang
- Macau Institute of Systems Engineering and Collaborative Laboratory of Intelligent Science and Systems, Macau University of Science and Technology, Macau, China
| |
Collapse
|
31
|
Li J, Lu Q, Wen Y. Multi-kernel linear mixed model with adaptive lasso for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2020; 36:1785-1794. [PMID: 31693075 DOI: 10.1093/bioinformatics/btz822] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 10/08/2019] [Accepted: 11/01/2019] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION The use of human genome discoveries and other established factors to build an accurate risk prediction model is an essential step toward precision medicine. While multi-layer high-dimensional omics data provide unprecedented data resources for prediction studies, their corresponding analytical methods are much less developed. RESULTS We present a multi-kernel penalized linear mixed model with adaptive lasso (MKpLMM), a predictive modeling framework that extends the standard linear mixed models widely used in genomic risk prediction, for multi-omics data analysis. MKpLMM can capture not only the predictive effects from each layer of omics data but also their interactions via using multiple kernel functions. It adopts a data-driven approach to select predictive regions as well as predictive layers of omics data, and achieves robust selection performance. Through extensive simulation studies, the analyses of PET-imaging outcomes from the Alzheimer's Disease Neuroimaging Initiative study, and the analyses of 64 drug responses, we demonstrate that MKpLMM consistently outperforms competing methods in phenotype prediction. AVAILABILITY AND IMPLEMENTATION The R-package is available at https://github.com/YaluWen/OmicPred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Li
- Department of Thoracic Surgery, Dalian Municipal Central Hospital Affiliated of Dalian Medical University, Dalian 116000, China
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
32
|
Wang M, Huang TZ, Fang J, Calhoun VD, Wang YP. Integration of Imaging (epi)Genomics Data for the Study of Schizophrenia Using Group Sparse Joint Nonnegative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1671-1681. [PMID: 30762565 PMCID: PMC7781159 DOI: 10.1109/tcbb.2019.2899568] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Schizophrenia (SZ) is a complex disease. Single nucleotide polymorphism (SNP), brain activity measured by functional magnetic resonance imaging (fMRI) and DNA methylation are all important biomarkers that can be used for the study of SZ. To our knowledge, there has been little effort to combine these three datasets together. In this study, we propose a group sparse joint nonnegative matrix factorization (GSJNMF) model to integrate SNP, fMRI, and DNA methylation for the identification of multi-dimensional modules associated with SZ, which can be used to study regulatory mechanisms underlying SZ at multiple levels. The proposed GSJNMF model projects multiple types of data onto a common feature space, in which heterogeneous variables with large coefficients on the same projected bases are used to identify multi-dimensional modules. We also incorporate group structure information available from each dataset. The genomic factors in such modules have significant correlations or functional associations with several brain activities. At the end, we have applied the method to the analysis of real data collected from the Mind Clinical Imaging Consortium (MCIC) for the study of SZ and identified significant biomarkers. These biomarkers were further used to discover genes and corresponding brain regions, which were confirmed to be significantly associated with SZ.
Collapse
Affiliation(s)
- Min Wang
- School of Mathematical Sciences/Research Center for Image and Vision Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
- School of Information Technology, Jiangxi University of Finance and Economics, Nanchang, Jiangxi, 330013, China
| | - Ting-Zhu Huang
- School of Mathematical Sciences/Research Center for Image and Vision Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
| | - Jian Fang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, USA
| | - Vince D. Calhoun
- The Mind Research Network, University of New Mexico, NM 87131, USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, USA
- Corresponding author.
| |
Collapse
|
33
|
Mitra S, Saha S, Hasanuzzaman M. Multi-view clustering for multi-omics data using unified embedding. Sci Rep 2020; 10:13654. [PMID: 32788601 PMCID: PMC7423957 DOI: 10.1038/s41598-020-70229-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 07/13/2020] [Indexed: 12/14/2022] Open
Abstract
In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other. Embedding learning is an effective strategy for nearest neighbour search and dimensionality reduction in large data sets. This paper attempts to learn a unified probability distribution of the points across different views and generates a unified embedding in a low-dimensional space to optimally preserve neighbourhood identity. Probability distributions generated for each point for each view are combined by conflation method to create a single unified distribution. The goal is to approximate this unified distribution as much as possible when a similar operation is performed on the embedded space. As a cost function, the sum of Kullback-Leibler divergence over the samples is used, which leads to a simple gradient adjusting the position of the samples in the embedded space. The proposed methodology can generate embedding from both complete and incomplete multi-view data sets. Finally, a multi-objective clustering technique (AMOSA) is applied to group the samples in the embedded space. The proposed methodology, Multi-view Neighbourhood Embedding (MvNE), shows an improvement of approximately 2−3% over state-of-the-art models when evaluated on 10 omics data sets.
Collapse
Affiliation(s)
- Sayantan Mitra
- Department of Computer Science, Indian Institute of Technology Patna, Bihta, Bihar, 801103, India.
| | - Sriparna Saha
- Department of Computer Science, Indian Institute of Technology Patna, Bihta, Bihar, 801103, India
| | | |
Collapse
|
34
|
Huang J, Chen J, Zhang B, Zhu L, Cai H. Evaluation of gene-drug common module identification methods using pharmacogenomics data. Brief Bioinform 2020; 22:5860683. [PMID: 32591780 DOI: 10.1093/bib/bbaa087] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 04/06/2020] [Accepted: 04/23/2020] [Indexed: 01/21/2023] Open
Abstract
Accurately identifying the interactions between genomic factors and the response of cancer drugs plays important roles in drug discovery, drug repositioning and cancer treatment. A number of studies revealed that interactions between genes and drugs were 'many-genes-to-many drugs' interactions, i.e. common modules, opposed to 'one-gene-to-one-drug' interactions. Such modules fully explain the interactions between complex biological regulatory mechanisms and cancer drugs. However, strategies for effectively and robustly identifying the underlying common modules among pharmacogenomics data remain to be improved. In this paper, we aim to provide a detailed evaluation of three categories of state-of-the-art common module identification techniques from a machine learning perspective, including non-negative matrix factorization (NMF), partial least squares (PLS) and network analyses. We first evaluate the performance of six methods, namely SNMNMF, NetNMF, SNPLS, O2PLS, NSBM and HOGMMNC, using two series of simulated data sets with different noise levels and outlier ratios. Then, we conduct experiments using a real world data set of 2091 genes and 101 drugs in 392 cancer cell lines and compare the real experimental results from the aspect of biological process term enrichment, gene-drug and drug-drug interactions. Finally, we present interesting findings from our evaluation study and discuss the advantages and drawbacks of each method. Supplementary information: Supplementary file is available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Jie Huang
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Jiazhou Chen
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Bin Zhang
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Lei Zhu
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Hongmin Cai
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| |
Collapse
|
35
|
Chen J, Han G, Xu A, Cai H. Identification of Multidimensional Regulatory Modules Through Multi-Graph Matching With Network Constraints. IEEE Trans Biomed Eng 2020; 67:987-998. [DOI: 10.1109/tbme.2019.2927157] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
36
|
Oh M, Park S, Kim S, Chae H. Machine learning-based analysis of multi-omics data on the cloud for investigating gene regulations. Brief Bioinform 2020; 22:66-76. [PMID: 32227074 DOI: 10.1093/bib/bbaa032] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 02/05/2020] [Accepted: 02/25/2020] [Indexed: 02/06/2023] Open
Abstract
Gene expressions are subtly regulated by quantifiable measures of genetic molecules such as interaction with other genes, methylation, mutations, transcription factor and histone modifications. Integrative analysis of multi-omics data can help scientists understand the condition or patient-specific gene regulation mechanisms. However, analysis of multi-omics data is challenging since it requires not only the analysis of multiple omics data sets but also mining complex relations among different genetic molecules by using state-of-the-art machine learning methods. In addition, analysis of multi-omics data needs quite large computing infrastructure. Moreover, interpretation of the analysis results requires collaboration among many scientists, often requiring reperforming analysis from different perspectives. Many of the aforementioned technical issues can be nicely handled when machine learning tools are deployed on the cloud. In this survey article, we first survey machine learning methods that can be used for gene regulation study, and we categorize them according to five different goals: gene regulatory subnetwork discovery, disease subtype analysis, survival analysis, clinical prediction and visualization. We also summarize the methods in terms of multi-omics input types. Then, we explain why the cloud is potentially a good solution for the analysis of multi-omics data, followed by a survey of two state-of-the-art cloud systems, Galaxy and BioVLAB. Finally, we discuss important issues when the cloud is used for the analysis of multi-omics data for the gene regulation study.
Collapse
Affiliation(s)
- Minsik Oh
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea
| | - Sungjoon Park
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea.,Bioinformatics Institute, Seoul National University, Seoul, 08826, Korea
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women's University, Seoul, 04310,Korea
| |
Collapse
|
37
|
Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020; 21:368-394. [PMID: 30649169 PMCID: PMC7373185 DOI: 10.1093/bib/bby120] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open
Abstract
Cancer is well recognized as a complex disease with dysregulated molecular networks or modules. Graph- and rule-based analytics have been applied extensively for cancer classification as well as prognosis using large genomic and other data over the past decade. This article provides a comprehensive review of various graph- and rule-based machine learning algorithms that have been applied to numerous genomics data to determine the cancer-specific gene modules, identify gene signature-based classifiers and carry out other related objectives of potential therapeutic value. This review focuses mainly on the methodological design and features of these algorithms to facilitate the application of these graph- and rule-based analytical approaches for cancer classification and prognosis. Based on the type of data integration, we divided all the algorithms into three categories: model-based integration, pre-processing integration and post-processing integration. Each category is further divided into four sub-categories (supervised, unsupervised, semi-supervised and survival-driven learning analyses) based on learning style. Therefore, a total of 11 categories of methods are summarized with their inputs, objectives and description, advantages and potential limitations. Next, we briefly demonstrate well-known and most recently developed algorithms for each sub-category along with salient information, such as data profiles, statistical or feature selection methods and outputs. Finally, we summarize the appropriate use and efficiency of all categories of graph- and rule mining-based learning methods when input data and specific objective are given. This review aims to help readers to select and use the appropriate algorithms for cancer classification and prognosis study.
Collapse
Affiliation(s)
- Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| |
Collapse
|
38
|
Xiao Q, Luo J, Liang C, Li G, Cai J, Ding P, Liu Y. Identifying lncRNA and mRNA Co-Expression Modules from Matched Expression Data in Ovarian Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:623-634. [PMID: 30106686 DOI: 10.1109/tcbb.2018.2864129] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Long non-coding RNAs (lncRNAs) have been shown to be involved in multiple biological processes and play critical roles in tumorigenesis. Numerous lncRNAs have been discovered in diverse species, but the functions of most lncRNAs still remain unclear. Meanwhile, their expression patterns and regulation mechanisms are also far from being fully understood. With the advances of high-throughput technologies, the increasing availability of genomic data creates opportunities for deciphering the molecular mechanism and underlying pathogenesis of human diseases. Here, we develop an integrative framework called JONMF to identify lncRNA-mRNA co-expression modules based on the sample-matched lncRNA and mRNA expression profiles. We formulate the module detection task as an optimization problem with joint orthogonal non-negative matrix factorization that could effectively prevent multicollinearity and produce a good modularity interpretation. The constructed lncRNA-mRNA co-expression network and the gene-gene interaction network are used as the network-regularized constraints to improve the module accuracy, while the sparsity constraints are simultaneously utilized to achieve modular sparse solutions. We applied JONMF to human ovarian cancer dataset and the experiment results demonstrate that the proposed method can effectively discover biologically functional co-expression modules, which may provide insights into the function of lncRNAs and molecular mechanism of human diseases.
Collapse
|
39
|
Ma Y, Liu G, Ma Y, Chen Q. Integrative Analysis for Identifying Co-Modules of Microbe-Disease Data by Matrix Tri-Factorization With Phylogenetic Information. Front Genet 2020; 11:83. [PMID: 32153643 PMCID: PMC7048008 DOI: 10.3389/fgene.2020.00083] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 01/24/2020] [Indexed: 12/29/2022] Open
Abstract
Microbe-disease association relationship mining is drawing more and more attention due to its potential in capturing disease-related microbes. Hence, it is essential to develop new tools or algorithms to study the complex pathogenic mechanism of microbe-related diseases. However, previous research studies mainly focused on the paradigm of “one disease, one microbe,” rarely investigated the cooperation and associations between microbes, diseases or microbe-disease co-modules from system level. In this study, we propose a novel two-level module identifying algorithm (MDNMF) based on nonnegative matrix tri-factorization which integrates two similarity matrices (disease and microbe similarity matrices) and one microbe-disease association matrix into the objective of MDNMF. MDNMF can identify the modules from different levels and reveal the connections between these modules. In order to improve the efficiency and effectiveness of MDNMF, we also introduce human symptoms-disease network and microbial phylogenetic distance into this model. Furthermore, we applied it to HMDAD dataset and compared it with two NMF-based methods to demonstrate its effectiveness. The experimental results show that MDNMF can obtain better performance in terms of enrichment index (EI) and the number of significantly enriched taxon sets. This demonstrates the potential of MDNMF in capturing microbial modules that have significantly biological function implications.
Collapse
Affiliation(s)
- Yuanyuan Ma
- School of Computer and Information Engineering, Anyang Normal University, Anyang, China
| | - Guoying Liu
- School of Computer and Information Engineering, Anyang Normal University, Anyang, China
| | - Yingjun Ma
- School of Computer, Central China Normal University, Wuhan, China
| | - Qianjun Chen
- School of Computer, Central China Normal University, Wuhan, China.,School of Life Science, Hubei University, Wuhan, China
| |
Collapse
|
40
|
Cai J, Cai H, Chen J, Yang X. Identifying "Many-to-Many" Relationships between Gene-Expression Data and Drug-Response Data via Sparse Binary Matching. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:165-176. [PMID: 29994482 DOI: 10.1109/tcbb.2018.2849708] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Identifying gene-drug patterns is a critical step in pharmacology for unveiling disease mechanisms and drug discovery. The availability of high-throughput technologies accumulates massive large-scale pharmacological and genomic data, and thus provides a new substantial opportunity to deeply understand how the oncogenic genes and the therapeutic drugs relate to each other. However, most previous studies merely used the pharmacological and genomic datasets without any prior knowledge to infer the gene-drug patterns. Here, we proposed a novel network-guided sparse binary matching model (NSBM) to decode these relationships hidden in the datasets. Not only the large-scale gene-expression data and drug-response data are jointly analyzed in our method, but also the additional prior information of genes and drugs are integrated into the form of network-based regularization. The essential structure of the NSBM model is a convex quadratic minimization problem with network-based penalties. It was demonstrated to be superior when compared with two benchmark methods through extensive experiments on both synthetic and empirical data. Posterior validation, including gene-ontology and enrichment analysis, confirmed the effectiveness of NSBM in revealing gene-drug patterns on a large-scale heterogeneous data source.
Collapse
|
41
|
Hao Y, Cai M, Li L. Drug repositioning via matrix completion with multi-view side information. IET Syst Biol 2019; 13:267-275. [PMID: 31538961 PMCID: PMC8687211 DOI: 10.1049/iet-syb.2018.5129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 06/14/2019] [Accepted: 07/02/2019] [Indexed: 11/29/2022] Open
Abstract
In the process of drug discovery and disease treatment, drug repositioning is broadly studied to identify biological targets for existing drugs. Many methods have been proposed for drug-target interaction prediction by taking into account different kinds of data sources. However, most of the existing methods only use one side information for drugs or targets to predict new targets for drugs. Some recent works have improved the prediction accuracy by jointly considering multiple representations of drugs and targets. In this work, the authors propose a drug-target prediction approach by matrix completion with multi-view side information (MCM) of drugs and proteins from both structural view and chemical view. Different from existing studies for drug-target prediction, they predict drug-target interaction by directly completing the interaction matrix between them. The experimental results show that the MCM method could obtain significantly higher accuracies than the comparison methods. They finally report new drug-target interactions for 26 FDA-approved drugs, and biologically discuss these targets using existing references.
Collapse
Affiliation(s)
- Yunda Hao
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, People's Republic of China
| | - Menglan Cai
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, People's Republic of China
| | - Limin Li
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, People's Republic of China.
| |
Collapse
|
42
|
Li L, Cai M. Drug Target Prediction by Multi-View Low Rank Embedding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1712-1721. [PMID: 28541222 DOI: 10.1109/tcbb.2017.2706267] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Drug repositioning has been a key problem in drug development, and heterogeneous data sources are used to predict drug-target interactions by different approaches. However, most of studies focus on a single representation of drugs or proteins. It has been shown that integrating multi-view representations of drugs and proteins can strengthen the prediction ability. For example, a drug can be represented by its chemical structure, or by its chemical response in different cells. A protein can be represented by its sequence, or by its gene expression values in different cells. The docking of drugs and proteins based on their structure can be considered as one view (structural view), and the chemical performance of them based on gene expression and drug response can be considered as another view (chemical view). In this work, we first propose a single-view approach of SLRE based on low rank embedding for an arbitrary view, and then extend it to a multi-view approach of MLRE, which could integrate both views. Our experiments show that our methods perform significantly better than baseline methods including single-view methods and multi-view methods. We finally report predicted drug-target interactions for 30 FDA-approved drugs.
Collapse
|
43
|
Shi Q, Hu B, Zeng T, Zhang C. Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data. Front Genet 2019; 10:744. [PMID: 31497031 PMCID: PMC6712585 DOI: 10.3389/fgene.2019.00744] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 07/16/2019] [Indexed: 12/18/2022] Open
Abstract
Integration of distinct biological data types could provide a comprehensive view of biological processes or complex diseases. The combinations of molecules responsible for different phenotypes form multiple embedded (expression) subspaces, thus identifying the intrinsic data structure is challenging by regular integration methods. In this paper, we propose a novel framework of “Multi-view Subspace Clustering Analysis (MSCA),” which could measure the local similarities of samples in the same subspace and obtain the global consensus sample patterns (structures) for multiple data types, thereby comprehensively capturing the underlying heterogeneity of samples. Applied to various synthetic datasets, MSCA performs effectively to recognize the predefined sample patterns, and is robust to data noises. Given a real biological dataset, i.e., Cancer Cell Line Encyclopedia (CCLE) data, MSCA successfully identifies cell clusters of common aberrations across cancer types. A remarkable superiority over the state-of-the-art methods, such as iClusterPlus, SNF, and ANF, has also been demonstrated in our simulation and case studies.
Collapse
Affiliation(s)
- Qianqian Shi
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Bing Hu
- Department of Applied Mathematics, College of Science, Zhejiang University of Technology, Hangzhou, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China
| | | |
Collapse
|
44
|
Chen J, Zhang S. Discovery of two-level modular organization from matched genomic data via joint matrix tri-factorization. Nucleic Acids Res 2019; 46:5967-5976. [PMID: 29878151 PMCID: PMC6158745 DOI: 10.1093/nar/gky440] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 05/08/2018] [Indexed: 12/22/2022] Open
Abstract
With the rapid development of biotechnology, multi-dimensional genomic data are available for us to study the regulatory associations among multiple levels. Thus, it is essential to develop a tool to identify not only the modular patterns from multiple levels, but also the relationships among these modules. In this study, we adopt a novel non-negative matrix factorization framework (NetNMF) to integrate pairwise genomic data in a network manner. NetNMF could reveal the modules of each dimension and the connections within and between both types of modules. We first demonstrated the effectiveness of NetNMF using a set of simulated data and compared it with two typical NMF methods. Further, we applied it to two different types of pairwise genomic datasets including microRNA (miRNA) and gene expression data from The Cancer Genome Atlas and gene expression and pharmacological data from the Cancer Genome Project. We respectively identified a two-level miRNA–gene module network and a two-level gene–drug module network. Not only have the majority of identified modules significantly functional implications, but also the three types of module pairs have closely biological associations. This module discovery tool provides us comprehensive insights into the mechanisms of how the two levels of molecules cooperate with each other.
Collapse
Affiliation(s)
- Jinyu Chen
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
45
|
Barrot CC, Woillard JB, Picard N. Big data in pharmacogenomics: current applications, perspectives and pitfalls. Pharmacogenomics 2019; 20:609-620. [PMID: 31190620 DOI: 10.2217/pgs-2018-0184] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The efficiency of new generation sequencing methods and the reduction of their cost has led pharmacogenomics to gradually supplant pharmacogenetics, leading to new applications in personalized medicine along with new perspectives in drug design or identification of drug response factors. The amount of data generated in genomics fits the definition of big data, and need a specific bioinformatics processing following standard steps: data collection, processing, analysis and interpretation. Pitfalls of pharmacogenomics studies are directly related to these steps. This review aims to describe these steps from a pharmacogenomic point of view, focusing on bioinformatics aspects.
Collapse
Affiliation(s)
- Claire-Cécile Barrot
- INSERM, IPPRITT, U1248, F-87000, Limoges, France; Univ. Limoges, IPPRITT, F-87000 Limoges, France
| | - Jean-Baptiste Woillard
- INSERM, IPPRITT, U1248, F-87000, Limoges, France; Univ. Limoges, IPPRITT, F-87000 Limoges, France
| | - Nicolas Picard
- INSERM, IPPRITT, U1248, F-87000, Limoges, France; Univ. Limoges, IPPRITT, F-87000 Limoges, France
| |
Collapse
|
46
|
Tang H, Zeng T, Chen L. High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis. Front Genet 2019; 10:371. [PMID: 31080457 PMCID: PMC6497731 DOI: 10.3389/fgene.2019.00371] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/09/2019] [Indexed: 12/19/2022] Open
Abstract
Quantifying or labeling the sample type with high quality is a challenging task, which is a key step for understanding complex diseases. Reducing noise pollution to data and ensuring the extracted intrinsic patterns in concordance with the primary data structure are important in sample clustering and classification. Here we propose an effective data integration framework named as HCI (High-order Correlation Integration), which takes an advantage of high-order correlation matrix incorporated with pattern fusion analysis (PFA), to realize high-dimensional data feature extraction. On the one hand, the high-order Pearson's correlation coefficient can highlight the latent patterns underlying noisy input datasets and thus improve the accuracy and robustness of the algorithms currently available for sample clustering. On the other hand, the PFA can identify intrinsic sample patterns efficiently from different input matrices by optimally adjusting the signal effects. To validate the effectiveness of our new method, we firstly applied HCI on four single-cell RNA-seq datasets to distinguish the cell types, and we found that HCI is capable of identifying the prior-known cell types of single-cell samples from scRNA-seq data with higher accuracy and robustness than other methods under different conditions. Secondly, we also integrated heterogonous omics data from TCGA datasets and GEO datasets including bulk RNA-seq data, which outperformed the other methods at identifying distinct cancer subtypes. Within an additional case study, we also constructed the mRNA-miRNA regulatory network of colorectal cancer based on the feature weight estimated from HCI, where the differentially expressed mRNAs and miRNAs were significantly enriched in well-known functional sets of colorectal cancer, such as KEGG pathways and IPA disease annotations. All these results supported that HCI has extensive flexibility and applicability on sample clustering with different types and organizations of RNA-seq data.
Collapse
Affiliation(s)
- Hui Tang
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China
| |
Collapse
|
47
|
Xiao Q, Luo J, Liang C, Cai J, Li G, Cao B. CeModule: an integrative framework for discovering regulatory patterns from genomic data in cancer. BMC Bioinformatics 2019; 20:67. [PMID: 30732558 PMCID: PMC6367773 DOI: 10.1186/s12859-019-2654-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 01/24/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Non-coding RNAs (ncRNAs) are emerging as key regulators and play critical roles in a wide range of tumorigenesis. Recent studies have suggested that long non-coding RNAs (lncRNAs) could interact with microRNAs (miRNAs) and indirectly regulate miRNA targets through competing interactions. Therefore, uncovering the competing endogenous RNA (ceRNA) regulatory mechanism of lncRNAs, miRNAs and mRNAs in post-transcriptional level will aid in deciphering the underlying pathogenesis of human polygenic diseases and may unveil new diagnostic and therapeutic opportunities. However, the functional roles of vast majority of cancer specific ncRNAs and their combinational regulation patterns are still insufficiently understood. RESULTS Here we develop an integrative framework called CeModule to discover lncRNA, miRNA and mRNA-associated regulatory modules. We fully utilize the matched expression profiles of lncRNAs, miRNAs and mRNAs and establish a model based on joint orthogonality non-negative matrix factorization for identifying modules. Meanwhile, we impose the experimentally verified miRNA-lncRNA interactions, the validated miRNA-mRNA interactions and the weighted gene-gene network into this framework to improve the module accuracy through the network-based penalties. The sparse regularizations are also used to help this model obtain modular sparse solutions. Finally, an iterative multiplicative updating algorithm is adopted to solve the optimization problem. CONCLUSIONS We applied CeModule to two cancer datasets including ovarian cancer (OV) and uterine corpus endometrial carcinoma (UCEC) obtained from TCGA. The modular analysis indicated that the identified modules involving lncRNAs, miRNAs and mRNAs are significantly associated and functionally enriched in cancer-related biological processes and pathways, which may provide new insights into the complex regulatory mechanism of human diseases at the system level.
Collapse
Affiliation(s)
- Qiu Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, 410081, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| | - Cheng Liang
- College of Information Science and Engineering, Shandong Normal University, Jinan, 250000, China
| | - Jie Cai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Guanghui Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Buwen Cao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| |
Collapse
|
48
|
Yang X, Han G, Chen J, Cai H. Finding Correlated Patterns via High-Order Matching for Multiple Sourced Biological Data. IEEE Trans Biomed Eng 2018; 66:1017-1025. [PMID: 30130172 DOI: 10.1109/tbme.2018.2866266] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
OBJECTIVE The emergence of multidimensional genomic data poses new challenges in data analysis. Finding correlated patterns within multiple-sourced biological data is useful in understanding potential interactions between the multimodal genomic data. METHODS Multidimensional genomic data contain multiple genomic data types, and different types of genomic data have different scales and units. These data cannot simply be aggregated for analysis. To address this issue, a correlated pattern discovery model incorporating prior knowledge is proposed. Tensor similarity is used to measure the correlation between common patterns. The model is combined with prior knowledge, the expression of which is transformed into constraints. Efficient numerical solutions are designed and analyzed. RESULTS The proposed method is shown to perform robustly and effectively with both simulated data and real biological data. We conduct experiments on five real cancer data sets to reveal various cancer subtypes. A survival analysis of these subtypes confirms the effectiveness of the model. CONCLUSION We introduce a correlated pattern discovery model incorporating prior knowledge. This model is meaningful for the realization of personalized diagnoses by doctors in the treatment of cancer and other diseases. SIGNIFICANCE The problem of finding correlated patterns from multiple-sourced biological data was formulated as a high-order graph matching problem, and the prior knowledge data were seamlessly incorporated into the matching model.
Collapse
|
49
|
Chen J, Peng H, Han G, Cai H, Cai J. HOGMMNC: a higher order graph matching with multiple network constraints model for gene–drug regulatory modules identification. Bioinformatics 2018; 35:602-610. [DOI: 10.1093/bioinformatics/bty662] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 05/18/2018] [Accepted: 07/23/2018] [Indexed: 11/14/2022] Open
Affiliation(s)
- Jiazhou Chen
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
- Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information, South China University of Technology, Guangzhou, China
| | - Hong Peng
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Guoqiang Han
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
- Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information, South China University of Technology, Guangzhou, China
| | - Jiulun Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
50
|
Shi Q, Zhang C, Peng M, Yu X, Zeng T, Liu J, Chen L. Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data. Bioinformatics 2018; 33:2706-2714. [PMID: 28520848 DOI: 10.1093/bioinformatics/btx176] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Accepted: 03/27/2017] [Indexed: 12/20/2022] Open
Abstract
Motivation Integrating different omics profiles is a challenging task, which provides a comprehensive way to understand complex diseases in a multi-view manner. One key for such an integration is to extract intrinsic patterns in concordance with data structures, so as to discover consistent information across various data types even with noise pollution. Thus, we proposed a novel framework called 'pattern fusion analysis' (PFA), which performs automated information alignment and bias correction, to fuse local sample-patterns (e.g. from each data type) into a global sample-pattern corresponding to phenotypes (e.g. across most data types). In particular, PFA can identify significant sample-patterns from different omics profiles by optimally adjusting the effects of each data type to the patterns, thereby alleviating the problems to process different platforms and different reliability levels of heterogeneous data. Results To validate the effectiveness of our method, we first tested PFA on various synthetic datasets, and found that PFA can not only capture the intrinsic sample clustering structures from the multi-omics data in contrast to the state-of-the-art methods, such as iClusterPlus, SNF and moCluster, but also provide an automatic weight-scheme to measure the corresponding contributions by data types or even samples. In addition, the computational results show that PFA can reveal shared and complementary sample-patterns across data types with distinct signal-to-noise ratios in Cancer Cell Line Encyclopedia (CCLE) datasets, and outperforms over other works at identifying clinically distinct cancer subtypes in The Cancer Genome Atlas (TCGA) datasets. Availability and implementation PFA has been implemented as a Matlab package, which is available at http://www.sysbio.ac.cn/cb/chenlab/images/PFApackage_0.1.rar . Contact lnchen@sibs.ac.cn , liujuan@whu.edu.cn or zengtao@sibs.ac.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qianqian Shi
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Chuanchao Zhang
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China.,State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan 430072, China
| | - Minrui Peng
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiangtian Yu
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Juan Liu
- State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan 430072, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|