101
|
Abstract
Efforts to compile the phenotypic effects of drugs and environmental chemicals offer the opportunity to adopt a chemo-centric view of human health that does not require detailed mechanistic information. Here, we consider thousands of chemicals and analyze the relationship of their structures with adverse and therapeutic responses. Our study includes molecules related to the etiology of 934 health threatening conditions and used to treat 835 diseases. We first identify chemical moieties that could be independently associated with each phenotypic effect. Using these fragments, we build accurate predictors for approximately 400 clinical phenotypes, finding many privileged and liable structures. Finally, we connect two diseases if they relate to similar chemical structures. The resulting networks of human conditions are able to predict disease comorbidities, as well as identifying potential drug side effects and opportunities for drug repositioning, and show a remarkable coincidence with clinical observations.
Collapse
|
102
|
Li ZC, Lai YH, Chen LL, Xie Y, Dai Z, Zou XY. Identifying and prioritizing disease-related genes based on the network topological features. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:2214-21. [PMID: 25183318 DOI: 10.1016/j.bbapap.2014.08.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Revised: 07/22/2014] [Accepted: 08/14/2014] [Indexed: 11/26/2022]
Abstract
Identifying and prioritizing disease-related genes are the most important steps for understanding the pathogenesis and discovering the therapeutic targets. The experimental examination of these genes is very expensive and laborious, and usually has a higher false positive rate. Therefore, it is highly desirable to develop computational methods for the identification and prioritization of disease-related genes. In this study, we develop a powerful method to identify and prioritize candidate disease genes. The novel network topological features with local and global information are proposed and adopted to characterize genes. The performance of these novel features is verified based on the 10-fold cross-validation test and leave-one-out cross-validation test. The proposed features are compared with the published features, and fused strategy is investigated by combining the current features with the published features. And, these combination features are also utilized to identify and prioritize Parkinson's disease-related genes. The results indicate that identified genes are highly related to some molecular process and biological function, which provides new clues for researching pathogenesis of Parkinson's disease. The source code of Matlab is freely available on request from the authors.
Collapse
Affiliation(s)
- Zhan-Chao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China.
| | - Yan-Hua Lai
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Li-Li Chen
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Yun Xie
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
| | - Zong Dai
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Xiao-Yong Zou
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China.
| |
Collapse
|
103
|
Yang P, Li X, Chua HN, Kwoh CK, Ng SK. Ensemble positive unlabeled learning for disease gene identification. PLoS One 2014; 9:e97079. [PMID: 24816822 PMCID: PMC4016241 DOI: 10.1371/journal.pone.0097079] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 04/14/2014] [Indexed: 11/24/2022] Open
Abstract
An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.
Collapse
Affiliation(s)
- Peng Yang
- Data Analytics Department, Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- * E-mail: (PY); (XL)
| | - Xiaoli Li
- Data Analytics Department, Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- * E-mail: (PY); (XL)
| | - Hon-Nian Chua
- Data Analytics Department, Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Chee-Keong Kwoh
- Bioinformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
| | - See-Kiong Ng
- Data Analytics Department, Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| |
Collapse
|
104
|
Wang P, Lai WF, Li MJ, Xu F, Yalamanchili HK, Lovell-Badge R, Wang J. Inference of gene-phenotype associations via protein-protein interaction and orthology. PLoS One 2013; 8:e77478. [PMID: 24194887 PMCID: PMC3806783 DOI: 10.1371/journal.pone.0077478] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Accepted: 08/30/2013] [Indexed: 01/23/2023] Open
Abstract
One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.
Collapse
Affiliation(s)
- Panwen Wang
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Wing-Fu Lai
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Mulin Jun Li
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Feng Xu
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Hari Krishna Yalamanchili
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Robin Lovell-Badge
- Division of Developmental Genetics, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom
| | - Junwen Wang
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- * E-mail:
| |
Collapse
|
105
|
de Ridder D, de Ridder J, Reinders MJT. Pattern recognition in bioinformatics. Brief Bioinform 2013; 14:633-47. [DOI: 10.1093/bib/bbt020] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|