1
|
Raj S, Namdeo V, Singh P, Srivastava A. Identification and prioritization of disease candidate genes using biomedical named entity recognition and random forest classification. Comput Biol Med 2025; 192:110320. [PMID: 40349579 DOI: 10.1016/j.compbiomed.2025.110320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 04/13/2025] [Accepted: 04/30/2025] [Indexed: 05/14/2025]
Abstract
BACKGROUND AND OBJECTIVE The elucidation of candidate genes is fundamental to comprehending intricate diseases, vital for early diagnosis, personalized treatment, and drug discovery. Traditional Disease Gene Identification methods encounter limitations, necessitating substantial sample sizes and statistical power, particularly challenging for complex diseases. Conversely, Disease Gene Prioritization methods leverage biological knowledge but rely on computational predictions, often lacking experimental validation. Addressing existing tool challenges, this study introduces an innovative two-tier machine-learning protocol that distils Disease Gene Association details from disease-specific abstracts, incorporating diverse findings. Employing advanced text mining, the model classifies disease-gene associations from the abstracts into Positive, Negative, and Ambiguous classes. METHODS Leveraging Random Forest as a robust text classification tool, this study demonstrates its efficacy in navigating complexities within biomedical texts. In the developed 2-tiered protocol, the level 1 classifier categorizes information into two classes, distinguished by the presence or absence of disease-gene associations, whereas the level 2 classifier further classifies into three classes: Positive, Negative, and Ambiguous associations. The developed classifier underwent rigorous training and cross-validation on different gold standard datasets - Alzheimer's, Breast Cancer and Type 2 Diabetes. Its performance across these varied disease contexts underscores its versatility and robustness without succumbing to overfitting. RESULTS Achieving an average accuracy of 97.29 % and 98.14 % for level 1 and level 2 classification, the protocol successfully extracted 2769, 3220 and 740 genes associated positively with Alzheimer's, Breast Cancer and Type 2 Diabetes. From the identified positive genes, a substantial number-1008, 670, and 165 genes, respectively-were not reported in established databases, thus expanding the genetic exploration of these diseases. These identified genes offer promising opportunities for targeted interventions, while ambiguous genes warrant further investigation to unravel deeper disease associations. CONCLUSIONS This research significantly contributes to the understanding of genetic diseases by offering a comprehensive roadmap for their intricate exploration. Beyond the study's focus on Alzheimer's, Breast Cancer, and Type 2 Diabetes, the protocol's applicability extends to diverse biomedical landscapes, demonstrating its versatility and impactful potential for comprehensive disease exploration.
Collapse
Affiliation(s)
- Sushrutha Raj
- Amity Institute of Integrative Sciences and Health, Amity University Haryana, Amity Education Valley, Gurgaon, 122413, India
| | - Vindhya Namdeo
- Sri Innovation and Research Foundation, Ghaziabad, Uttar Pradesh, 201009, India
| | - Payal Singh
- Sri Innovation and Research Foundation, Ghaziabad, Uttar Pradesh, 201009, India
| | - Alok Srivastava
- Sri Innovation and Research Foundation, Ghaziabad, Uttar Pradesh, 201009, India; L V Prasad Eye Institute, Hyderabad, Telangana, 500034, India.
| |
Collapse
|
2
|
Tie D, He M, Li W, Xiang Z. Advances in the application of network analysis methods in traditional Chinese medicine research. PHYTOMEDICINE : INTERNATIONAL JOURNAL OF PHYTOTHERAPY AND PHYTOPHARMACOLOGY 2025; 136:156256. [PMID: 39615211 DOI: 10.1016/j.phymed.2024.156256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 11/03/2024] [Accepted: 11/11/2024] [Indexed: 01/16/2025]
Abstract
OBJECTIVE This review aims at evaluating the role and potential applications of network analysis methods in the medicinal substances of traditional Chinese medicine (TCM), theories of TCM compatibility, properties of herbs, and TCM syndromes. METHODS Literature was retrieved from databases, such as CNKI, PubMed, and Web of Science, using keywords, including "network analysis," "network biology," "network pharmacology," and "network medicine." The extracted literature included the biological network construction (including ingredient-target and target-disease relations), analysis of network topology characteristics (including node degree, clustering coefficient, and path length), network modularization analysis, functional annotation and so on. These studies were categorized and organized based on their research methods, application domains, and other relevant characteristics. RESULTS Network analysis algorithms, such as network distance, random walk, matrix factorization, graph embedding, and graph neural networks, are widely applied in fields related to the properties, compatibility, and mechanisms of TCM. They effectively reflect the interactive relations within the complex systems of TCM and elucidate and clarify theories, such as the effective substances, the principles of TCM compatibility, the TCM syndromes, and the properties of TCM. CONCLUSION The network analysis method is a powerful mathematical and computational tool that reveals the structure, dynamics, and functions of complex systems by analyzing the elements and their relations. This approach has effectively promoted the modernization of TCM, providing essential theoretical and practical tools for personalized treatment and scientific research on TCM. It also offers a significant methodological framework for the modernization and internationalization of TCM.
Collapse
Affiliation(s)
- Defu Tie
- Medical School, Hangzhou City University, Hangzhou, 310015, China; College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China.
| | - Mulan He
- Medical School, Hangzhou City University, Hangzhou, 310015, China; College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China.
| | - Wenlong Li
- College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China.
| | - Zheng Xiang
- Medical School, Hangzhou City University, Hangzhou, 310015, China.
| |
Collapse
|
3
|
Abumalloh RA, Nilashi M, Samad S, Ahmadi H, Alghamdi A, Alrizq M, Alyami S. Parkinson's disease diagnosis using deep learning: A bibliometric analysis and literature review. Ageing Res Rev 2024; 96:102285. [PMID: 38554785 DOI: 10.1016/j.arr.2024.102285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 03/20/2024] [Accepted: 03/24/2024] [Indexed: 04/02/2024]
Abstract
Parkinson's Disease (PD) is a progressive neurodegenerative illness triggered by decreased dopamine secretion. Deep Learning (DL) has gained substantial attention in PD diagnosis research, with an increase in the number of published papers in this discipline. PD detection using DL has presented more promising outcomes as compared with common machine learning approaches. This article aims to conduct a bibliometric analysis and a literature review focusing on the prominent developments taking place in this area. To achieve the target of the study, we retrieved and analyzed the available research papers in the Scopus database. Following that, we conducted a bibliometric analysis to inspect the structure of keywords, authors, and countries in the surveyed studies by providing visual representations of the bibliometric data using VOSviewer software. The study also provides an in-depth review of the literature focusing on different indicators of PD, deployed approaches, and performance metrics. The outcomes indicate the firm development of PD diagnosis using DL approaches over time and a large diversity of studies worldwide. Additionally, the literature review presented a research gap in DL approaches related to incremental learning, particularly in relation to big data analysis.
Collapse
Affiliation(s)
- Rabab Ali Abumalloh
- Department of Computer Science and Engineering, Qatar University, Doha 2713, Qatar
| | - Mehrbakhsh Nilashi
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam; School of Computer Science, Duy Tan University, Da Nang, Vietnam; UCSI Graduate Business School, UCSI University, No. 1 Jalan Menara Gading, UCSI Heights, Cheras, Kuala Lumpur 56000, Malaysia; Centre for Global Sustainability Studies (CGSS), Universiti Sains Malaysia, Penang 11800, Malaysia.
| | - Sarminah Samad
- Faculty of Business, UNITAR International University, Tierra Crest, Jalan SS6/3, Petaling Jaya, Selangor 47301, Malaysia
| | - Hossein Ahmadi
- Centre for Health Technology, Faculty of Health, University of Plymouth, Plymouth PL4 8AA, UK
| | - Abdullah Alghamdi
- Information Systems Dept., College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia; AI Lab, Scientific and Engineering Research Center (SERC), Najran University, Najran, Saudi Arabia
| | - Mesfer Alrizq
- Information Systems Dept., College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia; AI Lab, Scientific and Engineering Research Center (SERC), Najran University, Najran, Saudi Arabia
| | - Sultan Alyami
- AI Lab, Scientific and Engineering Research Center (SERC), Najran University, Najran, Saudi Arabia; Computer Science Dept., College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia
| |
Collapse
|
4
|
Kong X, Diao L, Jiang P, Nie S, Guo S, Li D. DDK-Linker: a network-based strategy identifies disease signals by linking high-throughput omics datasets to disease knowledge. Brief Bioinform 2024; 25:bbae111. [PMID: 38517698 PMCID: PMC10959161 DOI: 10.1093/bib/bbae111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/24/2024] Open
Abstract
The high-throughput genomic and proteomic scanning approaches allow investigators to measure the quantification of genome-wide genes (or gene products) for certain disease conditions, which plays an essential role in promoting the discovery of disease mechanisms. The high-throughput approaches often generate a large gene list of interest (GOIs), such as differentially expressed genes/proteins. However, researchers have to perform manual triage and validation to explore the most promising, biologically plausible linkages between the known disease genes and GOIs (disease signals) for further study. Here, to address this challenge, we proposed a network-based strategy DDK-Linker to facilitate the exploration of disease signals hidden in omics data by linking GOIs to disease knowns genes. Specifically, it reconstructed gene distances in the protein-protein interaction (PPI) network through six network methods (random walk with restart, Deepwalk, Node2Vec, LINE, HOPE, Laplacian) to discover disease signals in omics data that have shorter distances to disease genes. Furthermore, benefiting from the establishment of knowledge base we established, the abundant bioinformatics annotations were provided for each candidate disease signal. To assist in omics data interpretation and facilitate the usage, we have developed this strategy into an application that users can access through a website or download the R package. We believe DDK-Linker will accelerate the exploring of disease genes and drug targets in a variety of omics data, such as genomics, transcriptomics and proteomics data, and provide clues for complex disease mechanism and pharmacological research. DDK-Linker is freely accessible at http://ddklinker.ncpsb.org.cn/.
Collapse
Affiliation(s)
- Xiangren Kong
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Lihong Diao
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing 100029, China
| | - Peng Jiang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Shiyan Nie
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Shuzhen Guo
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing 100029, China
| | - Dong Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| |
Collapse
|
5
|
Cui Z, Guo FY, Li L, Lu F, Jin CH, Wang X, Liu F. Brazilin-7-acetate, a novel potential drug of Parkinson's disease, hinders the formation of α-synuclein fibril, mitigates cytotoxicity, and decreases oxidative stress. Eur J Med Chem 2024; 264:115965. [PMID: 38056304 DOI: 10.1016/j.ejmech.2023.115965] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/06/2023] [Accepted: 11/13/2023] [Indexed: 12/08/2023]
Abstract
Parkinson's disease (PD) is a prevalent neurodegenerative disorder characterized by the accumulation of α-synuclein (α-Syn) aggregates. However, there are currently no effective therapies for PD. Brazilin, an inhibitor of α-Syn aggregation, is unstable and toxic. Therefore, we have developed and synthesized derivatives of brazilin. One of these derivatives, called brazilin-7-acetate (B-7-A), has shown reduced toxicity and a stronger effect on inhibiting α-Syn aggregation. It showed that B-7-A prevented the formation of α-Syn fibers and disrupted existing fibers in a dosage-dependent manner. Additionally, B-7-A significantly reduced the cytotoxicity of α-Syn aggregates and alleviated oxidative stress in PC12 cells. The beneficial effects of B-7-A were also confirmed using the Caenorhabditis elegans model. These effects included preventing the accumulation of α-Syn clumps, improving behavior disorder, increasing lifespan, reducing oxidative stress, and protecting against lipid oxidation and loss. Finally, B-7-A showed good ADMET properties in silico. Based on these findings, B-7-A exhibits potential as a prospective treatment for PD.
Collapse
Affiliation(s)
- Zhan Cui
- College of Biotechnology, Tianjin University of Science & Technology, Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, Tianjin, China
| | - Fang-Yan Guo
- Key Laboratory of Natural Medicines of the Changbai Mountain, Ministry of Education, College of Pharmacy, Yanbian University, Yanji, Jilin Province, China
| | - Li Li
- College of Science, Tianjin University of Science & Technology, China
| | - Fuping Lu
- College of Biotechnology, Tianjin University of Science & Technology, Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, Tianjin, China
| | - Cheng-Hua Jin
- Key Laboratory of Natural Medicines of the Changbai Mountain, Ministry of Education, College of Pharmacy, Yanbian University, Yanji, Jilin Province, China.
| | - Xiangming Wang
- Department of Cell Biology, School of Basic Medical Science, Capital Medical University, Beijing, China.
| | - Fufeng Liu
- College of Biotechnology, Tianjin University of Science & Technology, Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, Tianjin, China.
| |
Collapse
|
6
|
Lu Y, Chen Z, Pan Y, Qi F. Identification of Drug Compounds for Capsular Contracture Based on Text Mining and Deep Learning. Plast Reconstr Surg 2023; 152:779e-790e. [PMID: 36862957 DOI: 10.1097/prs.0000000000010350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
BACKGROUND Capsular contracture is a common and unpredictable complication after breast implant placement. Currently, the pathogenesis of capsular contracture is unclear, and the effectiveness of nonsurgical treatment is still doubtful. The authors' study aimed to investigate new drug therapies for capsular contracture by using computational methods. METHODS Genes related to capsular contracture were identified by text mining and GeneCodis. Then, the candidate key genes were selected through protein-protein interaction analysis in Search Tool for the Retrieval of Interacting Genes/Proteins and Cytoscape. Drugs targeting the candidate genes with relation to capsular contracture were screened out in Pharmaprojects. Based on the drug-target interaction analysis by DeepPurpose, candidate drugs with highest predicted binding affinity were obtained eventually. RESULTS The authors' study identified 55 genes related to capsular contracture. Gene set enrichment analysis and protein-protein interaction analysis generated eight candidate genes. One hundred drugs targeting the candidate genes were selected. The seven candidate drugs with the highest predicted binding affinity were determined by DeepPurpose, including tumor necrosis factor alpha antagonist, estrogen receptor agonist, insulin-like growth factor 1 receptor, tyrosine kinase inhibitor, and matrix metallopeptidase 1 inhibitor. CONCLUSION Text mining and DeepPurpose can be used as a promising tool for drug discovery in exploring nonsurgical treatment to capsular contracture. CLINICAL QUESTION/LEVEL OF EVIDENCE Therapeutic, V.
Collapse
Affiliation(s)
- Yeheng Lu
- From the Department of Plastic Surgery, Zhongshan Hospital
| | - Zhiwei Chen
- Big Data and Artificial Intelligence Center, Zhongshan Hospital, Fudan University
| | - Yuyan Pan
- From the Department of Plastic Surgery, Zhongshan Hospital
| | - Fazhi Qi
- From the Department of Plastic Surgery, Zhongshan Hospital
| |
Collapse
|
7
|
Salcedo MV, Gravel N, Keshavarzi A, Huang LC, Kochut KJ, Kannan N. Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding. PeerJ 2023; 11:e15815. [PMID: 37868056 PMCID: PMC10590106 DOI: 10.7717/peerj.15815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 07/10/2023] [Indexed: 10/24/2023] Open
Abstract
The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied "dark" members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing.
Collapse
Affiliation(s)
- Mariah V. Salcedo
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States of America
| | - Nathan Gravel
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
| | - Abbas Keshavarzi
- School of Computing, University of Georgia, Athens, GA, United States of America
| | - Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
| | - Krzysztof J. Kochut
- School of Computing, University of Georgia, Athens, GA, United States of America
| | - Natarajan Kannan
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States of America
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
| |
Collapse
|
8
|
Pasquier C, Guerlais V, Pallez D, Rapetti-Mauss R, Soriani O. A network embedding approach to identify active modules in biological interaction networks. Life Sci Alliance 2023; 6:e202201550. [PMID: 37339804 PMCID: PMC10282331 DOI: 10.26508/lsa.202201550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/06/2023] [Accepted: 06/06/2023] [Indexed: 06/22/2023] Open
Abstract
The identification of condition-specific gene sets from transcriptomic experiments is important to reveal regulatory and signaling mechanisms associated with a given cellular response. Statistical methods of differential expression analysis, designed to assess individual gene variations, have trouble highlighting modules of small varying genes whose interaction is essential to characterize phenotypic changes. To identify these highly informative gene modules, several methods have been proposed in recent years, but they have many limitations that make them of little use to biologists. Here, we propose an efficient method for identifying these active modules that operates on a data embedding combining gene expressions and interaction data. Applications carried out on real datasets show that our method can identify new groups of genes of high interest corresponding to functions not revealed by traditional approaches. Software is available at https://github.com/claudepasquier/amine.
Collapse
Affiliation(s)
- Claude Pasquier
- Laboratoire d'Informatique, Signaux et Systèmes de Sophia-Antipolis, I3S - UMR7271 - UNS CNRS, Les Algorithmes - bât. Euclide B, Sophia Antipolis, France
| | - Vincent Guerlais
- Laboratoire d'Informatique, Signaux et Systèmes de Sophia-Antipolis, I3S - UMR7271 - UNS CNRS, Les Algorithmes - bât. Euclide B, Sophia Antipolis, France
| | - Denis Pallez
- Laboratoire d'Informatique, Signaux et Systèmes de Sophia-Antipolis, I3S - UMR7271 - UNS CNRS, Les Algorithmes - bât. Euclide B, Sophia Antipolis, France
| | - Raphaël Rapetti-Mauss
- iBV - Institut de Biologie Valrose, Université Nice Sophia Antipolis, Faculté des Sciences, Parc Valrose, Nice cedex 2, France
| | - Olivier Soriani
- iBV - Institut de Biologie Valrose, Université Nice Sophia Antipolis, Faculté des Sciences, Parc Valrose, Nice cedex 2, France
| |
Collapse
|
9
|
Doumari SA, Berahmand K, Ebadi MJ. Early and High-Accuracy Diagnosis of Parkinson's Disease: Outcomes of a New Model. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2023; 2023:1493676. [PMID: 37304324 PMCID: PMC10256450 DOI: 10.1155/2023/1493676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 03/02/2023] [Accepted: 03/06/2023] [Indexed: 06/13/2023]
Abstract
Parkinson's disease (PD) is one of the significant common neurological disorders of the current age that causes uncontrollable movements like shaking, stiffness, and difficulty. The early clinical diagnosis of this disease is essential for preventing the progression of PD. Hence, an innovative method is proposed here based on combining the crow search algorithm and decision tree (CSADT) for the early PD diagnosis. This approach is used on four crucial Parkinson's datasets, including meander, spiral, voice, and speech-Sakar. Using the presented method, PD is effectively diagnosed by evaluating each dataset's critical features and extracting the primary practical outcomes. The used algorithm was compared with other machine learning algorithms of k-nearest neighbor (KNN), support vector machine (SVM), naive Baye (NB), multilayer perceptron (MLP), decision tree (DT), random tree, logistic regression, support vector machine of radial base functions (SVM of RBFs), and combined classifier in terms of accuracy, recall, and combination measure F1. The analytical results emphasize the used algorithm's superiority over the other selected ones. The proposed model yields nearly 100% accuracy through various trials on the datasets. Notably, a high detection speed achieved the lowest detection time of 2.6 seconds. The main novelty of this paper is attributed to the accuracy of the presented PD diagnosis method, which is much higher than its counterparts.
Collapse
Affiliation(s)
- Sajjad Amiri Doumari
- Department of Mathematics and Computer Science, Sirjan University of Technology, Sirjan, Iran
| | - Kamal Berahmand
- Department of Information Technology and Communications, Azarbaijan Shahid Madani University, Tabriz, Iran
| | - M. J. Ebadi
- Department of Mathematics, Chabahar Maritime University, Chabahar, Iran
| |
Collapse
|
10
|
Zhang DY, Cui WQ, Hou L, Yang J, Lyu LY, Wang ZY, Linghu KG, He WB, Yu H, Hu YJ. Expanding potential targets of herbal chemicals by node2vec based on herb-drug interactions. Chin Med 2023; 18:64. [PMID: 37264453 PMCID: PMC10233865 DOI: 10.1186/s13020-023-00763-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 05/01/2023] [Indexed: 06/03/2023] Open
Abstract
BACKGROUND The identification of chemical-target interaction is key to pharmaceutical research and development, but the unclear materials basis and complex mechanisms of traditional medicine (TM) make it difficult, especially for low-content chemicals which are hard to test in experiments. In this research, we aim to apply the node2vec algorithm in the context of drug-herb interactions for expanding potential targets and taking advantage of molecular docking and experiments for verification. METHODS Regarding the widely reported risks between cardiovascular drugs and herbs, Salvia miltiorrhiza (Danshen, DS) and Ligusticum chuanxiong (Chuanxiong, CX), which are widely used in the treatment of cardiovascular disease (CVD), and approved drugs for CVD form the new dataset as an example. Three data groups DS-drug, CX-drug, and DS-CX-drug were applied to serve as the context of drug-herb interactions for link prediction. Three types of datasets were set under three groups, containing information from chemical-target connection (CTC), chemical-chemical connection (CCC) and protein-protein interaction (PPI) in increasing steps. Five algorithms, including node2vec, were applied as comparisons. Molecular docking and pharmacological experiments were used for verification. RESULTS Node2vec represented the best performance with average AUROC and AP values of 0.91 on the datasets "CTC, CCC, PPI". Targets of 32 herbal chemicals were identified within 43 predicted edges of herbal chemicals and drug targets. Among them, 11 potential chemical-drug target interactions showed better binding affinity by molecular docking. Further pharmacological experiments indicated caffeic acid increased the thermal stability of the protein GGT1 and ligustilide and low-content chemical neocryptotanshinone induced mRNA change of FGF2 and MTNR1A, respectively. CONCLUSIONS The analytical framework and methods established in the study provide an important reference for researchers in discovering herb-drug interactions, alerting clinical risks, and understanding complex mechanisms of TM.
Collapse
Affiliation(s)
- Dai-Yan Zhang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Wen-Qing Cui
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Ling Hou
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Jing Yang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Li-Yang Lyu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Ze-Yu Wang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Ke-Gang Linghu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Wen-Bin He
- Shanxi Key Laboratory of Chinese Medicine Encephalopathy, Shanxi University of Chinese Medicine, Taiyuan, China
| | - Hua Yu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Yuan-Jia Hu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China.
- DPM, Faculty of Health Sciences, University of Macau, Macao, China.
| |
Collapse
|
11
|
Kumar N, Mukhtar MS. Ranking Plant Network Nodes Based on Their Centrality Measures. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25040676. [PMID: 37190464 PMCID: PMC10137616 DOI: 10.3390/e25040676] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/14/2023] [Accepted: 04/16/2023] [Indexed: 05/17/2023]
Abstract
Biological networks are often large and complex, making it difficult to accurately identify the most important nodes. Node prioritization algorithms are used to identify the most influential nodes in a biological network by considering their relationships with other nodes. These algorithms can help us understand the functioning of the network and the role of individual nodes. We developed CentralityCosDist, an algorithm that ranks nodes based on a combination of centrality measures and seed nodes. We applied this and four other algorithms to protein-protein interactions and co-expression patterns in Arabidopsis thaliana using pathogen effector targets as seed nodes. The accuracy of the algorithms was evaluated through functional enrichment analysis of the top 10 nodes identified by each algorithm. Most enriched terms were similar across algorithms, except for DIAMOnD. CentralityCosDist identified more plant-pathogen interactions and related functions and pathways compared to the other algorithms.
Collapse
Affiliation(s)
- Nilesh Kumar
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - M Shahid Mukhtar
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| |
Collapse
|
12
|
Liu R, Hirn M, Krishnan A. Accurately modeling biased random walks on weighted networks using node2vec. Bioinformatics 2023; 39:btad047. [PMID: 36688699 PMCID: PMC9891245 DOI: 10.1093/bioinformatics/btad047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 01/16/2023] [Accepted: 01/20/2023] [Indexed: 01/24/2023] Open
Abstract
MOTIVATION Accurately representing biological networks in a low-dimensional space, also known as network embedding, is a critical step in network-based machine learning and is carried out widely using node2vec, an unsupervised method based on biased random walks. However, while many networks, including functional gene interaction networks, are dense, weighted graphs, node2vec is fundamentally limited in its ability to use edge weights during the biased random walk generation process, thus under-using all the information in the network. RESULTS Here, we present node2vec+, a natural extension of node2vec that accounts for edge weights when calculating walk biases and reduces to node2vec in the cases of unweighted graphs or unbiased walks. Using two synthetic datasets, we empirically show that node2vec+ is more robust to additive noise than node2vec in weighted graphs. Then, using genome-scale functional gene networks to solve a wide range of gene function and disease prediction tasks, we demonstrate the superior performance of node2vec+ over node2vec in the case of weighted graphs. Notably, due to the limited amount of training data in the gene classification tasks, graph neural networks such as GCN and GraphSAGE are outperformed by both node2vec and node2vec+. AVAILABILITY AND IMPLEMENTATION The data and code are available on GitHub at https://github.com/krishnanlab/node2vecplus_benchmarks. All additional data underlying this article are available on Zenodo at https://doi.org/10.5281/zenodo.7007164. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renming Liu
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Matthew Hirn
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, MI 48824, USA
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Center for Quantum Computing, Science & Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, MI 48824, USA
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
13
|
Voitalov I, Zhang L, Kilpatrick C, Withers JB, Saleh A, Akmaev VR, Ghiassian SD. The module triad: a novel network biology approach to utilize patients' multi-omics data for target discovery in ulcerative colitis. Sci Rep 2022; 12:21685. [PMID: 36522454 PMCID: PMC9755270 DOI: 10.1038/s41598-022-26276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Tumor necrosis factor-[Formula: see text] inhibitors (TNFi) have been a standard treatment in ulcerative colitis (UC) for nearly 20 years. However, insufficient response rate to TNFi therapies along with concerns around their immunogenicity and inconvenience of drug delivery through injections calls for development of UC drugs targeting alternative proteins. Here, we propose a multi-omic network biology method for prioritization of protein targets for UC treatment. Our method identifies network modules on the Human Interactome-a network of protein-protein interactions in human cells-consisting of genes contributing to the predisposition to UC (Genotype module), genes whose expression needs to be modulated to achieve low disease activity (Response module), and proteins whose perturbation alters expression of the Response module genes to a healthy state (Treatment module). Targets are prioritized based on their topological relevance to the Genotype module and functional similarity to the Treatment module. We demonstrate utility of our method in UC and other complex diseases by efficiently recovering the protein targets associated with compounds in clinical trials and on the market . The proposed method may help to reduce cost and time of drug development by offering a computational screening tool for identification of novel and repurposing therapeutic opportunities in UC and other complex diseases.
Collapse
Affiliation(s)
- Ivan Voitalov
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | - Lixia Zhang
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | - Casey Kilpatrick
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | - Johanna B. Withers
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | - Alif Saleh
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | | | | |
Collapse
|
14
|
Bhandari N, Walambe R, Kotecha K, Khare SP. A comprehensive survey on computational learning methods for analysis of gene expression data. Front Mol Biosci 2022; 9:907150. [PMID: 36458095 PMCID: PMC9706412 DOI: 10.3389/fmolb.2022.907150] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 09/28/2022] [Indexed: 09/19/2023] Open
Abstract
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
Collapse
Affiliation(s)
- Nikita Bhandari
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Rahee Walambe
- Electronics and Telecommunication Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Ketan Kotecha
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Satyajeet P. Khare
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, India
| |
Collapse
|
15
|
Nguyen T, Yue Z, Slominski R, Welner R, Zhang J, Chen JY. WINNER: A network biology tool for biomolecular characterization and prioritization. Front Big Data 2022; 5:1016606. [PMID: 36407327 PMCID: PMC9672476 DOI: 10.3389/fdata.2022.1016606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/14/2022] [Indexed: 12/09/2024] Open
Abstract
BACKGROUND AND CONTRIBUTION In network biology, molecular functions can be characterized by network-based inference, or "guilt-by-associations." PageRank-like tools have been applied in the study of biomolecular interaction networks to obtain further the relative significance of all molecules in the network. However, there is a great deal of inherent noise in widely accessible data sets for gene-to-gene associations or protein-protein interactions. How to develop robust tests to expand, filter, and rank molecular entities in disease-specific networks remains an ad hoc data analysis process. RESULTS We describe a new biomolecular characterization and prioritization tool called Weighted In-Network Node Expansion and Ranking (WINNER). It takes the input of any molecular interaction network data and generates an optionally expanded network with all the nodes ranked according to their relevance to one another in the network. To help users assess the robustness of results, WINNER provides two different types of statistics. The first type is a node-expansion p-value, which helps evaluate the statistical significance of adding "non-seed" molecules to the original biomolecular interaction network consisting of "seed" molecules and molecular interactions. The second type is a node-ranking p-value, which helps evaluate the relative statistical significance of the contribution of each node to the overall network architecture. We validated the robustness of WINNER in ranking top molecules by spiking noises in several network permutation experiments. We have found that node degree-preservation randomization of the gene network produced normally distributed ranking scores, which outperform those made with other gene network randomization techniques. Furthermore, we validated that a more significant proportion of the WINNER-ranked genes was associated with disease biology than existing methods such as PageRank. We demonstrated the performance of WINNER with a few case studies, including Alzheimer's disease, breast cancer, myocardial infarctions, and Triple negative breast cancer (TNBC). In all these case studies, the expanded and top-ranked genes identified by WINNER reveal disease biology more significantly than those identified by other gene prioritizing software tools, including Ingenuity Pathway Analysis (IPA) and DiAMOND. CONCLUSION WINNER ranking strongly correlates to other ranking methods when the network covers sufficient node and edge information, indicating a high network quality. WINNER users can use this new tool to robustly evaluate a list of candidate genes, proteins, or metabolites produced from high-throughput biology experiments, as long as there is available gene/protein/metabolic network information.
Collapse
Affiliation(s)
- Thanh Nguyen
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Zongliang Yue
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Radomir Slominski
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Robert Welner
- Comprehensive Arthritis, Musculoskeletal, Bone and Autoimmunity Center (CAMBAC), School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jianyi Zhang
- Department of Biomedical Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jake Y. Chen
- Informatics Institute in School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States
| |
Collapse
|
16
|
Xie F, Yang Z, Song J, Dai Q, Duan X. DHNLDA: A Novel Deep Hierarchical Network Based Method for Predicting lncRNA-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3395-3403. [PMID: 34543201 DOI: 10.1109/tcbb.2021.3113326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent studies have found that lncRNA (long non-coding RNA) in ncRNA (non-coding RNA) is not only involved in many biological processes, but also abnormally expressed in many complex diseases. Identification of lncRNA-disease associations accurately is of great significance for understanding the function of lncRNA and disease mechanism. In this paper, a deep learning framework consisting of stacked autoencoder(SAE), multi-scale ResNet and stacked ensemble module, named DHNLDA, was constructed to predict lncRNA-disease associations, which integrates multiple biological data sources and constructing feature matrices. Among them, the biological data including the similarity and the interaction of lncRNAs, diseases and miRNAs are integrated. The feature matrices are obtained by node2vec embedding and feature extraction respectively. Then, the SAE and the multi-scale ResNet are used to learn the complementary information between nodes, and the high-level features of node attributes are obtained. Finally, the fusion of high-level feature is input into the stacked ensemble module to obtain the prediction results of lncRNA-disease associations. The experimental results of five-fold cross-validation show that the AUC of DHNLDA reaches 0.975 better than the existing methods. Case studies of stomach cancer, breast cancer and lung cancer have shown the great ability of DHNLDA to discover the potential lncRNA-disease associations.
Collapse
|
17
|
Aborageh M, Krawitz P, Fröhlich H. Genetics in parkinson's disease: From better disease understanding to machine learning based precision medicine. FRONTIERS IN MOLECULAR MEDICINE 2022; 2:933383. [PMID: 39086979 PMCID: PMC11285583 DOI: 10.3389/fmmed.2022.933383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 08/30/2022] [Indexed: 08/02/2024]
Abstract
Parkinson's Disease (PD) is a neurodegenerative disorder with highly heterogeneous phenotypes. Accordingly, it has been challenging to robustly identify genetic factors associated with disease risk, prognosis and therapy response via genome-wide association studies (GWAS). In this review we first provide an overview of existing statistical methods to detect associations between genetic variants and the disease phenotypes in existing PD GWAS. Secondly, we discuss the potential of machine learning approaches to better quantify disease phenotypes and to move beyond disease understanding towards a better-personalized treatment of the disease.
Collapse
Affiliation(s)
- Mohamed Aborageh
- Bonn-Aachen International Center for Information Technology (B-IT), Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Peter Krawitz
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Holger Fröhlich
- Bonn-Aachen International Center for Information Technology (B-IT), Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| |
Collapse
|
18
|
Subramanian A, Zakeri P, Mousa M, Alnaqbi H, Alshamsi FY, Bettoni L, Damiani E, Alsafar H, Saeys Y, Carmeliet P. Angiogenesis goes computational - The future way forward to discover new angiogenic targets? Comput Struct Biotechnol J 2022; 20:5235-5255. [PMID: 36187917 PMCID: PMC9508490 DOI: 10.1016/j.csbj.2022.09.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/09/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Multi-omics technologies are being increasingly utilized in angiogenesis research. Yet, computational methods have not been widely used for angiogenic target discovery and prioritization in this field, partly because (wet-lab) vascular biologists are insufficiently familiar with computational biology tools and the opportunities they may offer. With this review, written for vascular biologists who lack expertise in computational methods, we aspire to break boundaries between both fields and to illustrate the potential of these tools for future angiogenic target discovery. We provide a comprehensive survey of currently available computational approaches that may be useful in prioritizing candidate genes, predicting associated mechanisms, and identifying their specificity to endothelial cell subtypes. We specifically highlight tools that use flexible, machine learning frameworks for large-scale data integration and gene prioritization. For each purpose-oriented category of tools, we describe underlying conceptual principles, highlight interesting applications and discuss limitations. Finally, we will discuss challenges and recommend some guidelines which can help to optimize the process of accurate target discovery.
Collapse
Affiliation(s)
- Abhishek Subramanian
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Pooya Zakeri
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Centre for Brain and Disease Research, Flanders Institute for Biotechnology (VIB), Leuven, Belgium
- Department of Neurosciences and Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Mira Mousa
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Halima Alnaqbi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Fatima Yousif Alshamsi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Leo Bettoni
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Ernesto Damiani
- Robotics and Intelligent Systems Institute, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Habiba Alsafar
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Peter Carmeliet
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| |
Collapse
|
19
|
Sahu M, Gupta R, Ambasta RK, Kumar P. Artificial intelligence and machine learning in precision medicine: A paradigm shift in big data analysis. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2022; 190:57-100. [PMID: 36008002 DOI: 10.1016/bs.pmbts.2022.03.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The integration of artificial intelligence in precision medicine has revolutionized healthcare delivery. Precision medicine identifies the phenotype of particular patients with less-common responses to treatment. Recent studies have demonstrated that translational research exploring the convergence between artificial intelligence and precision medicine will help solve the most difficult challenges facing precision medicine. Here, we discuss different aspects of artificial intelligence in precision medicine that improve healthcare delivery. First, we discuss how artificial intelligence changes the landscape of precision medicine and the evolution of artificial intelligence in precision medicine. Second, we highlight the synergies between artificial intelligence and precision medicine and promises of artificial intelligence and precision medicine in healthcare delivery. Third, we briefly explain the promise of big data analytics and the integration of nanomaterials in precision medicine. Last, we highlight the challenges and opportunities of artificial intelligence in precision medicine.
Collapse
Affiliation(s)
- Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India.
| |
Collapse
|
20
|
Zhang W, Wei H, Liu B. idenMD-NRF: a ranking framework for miRNA-disease association identification. Brief Bioinform 2022; 23:6604995. [PMID: 35679537 DOI: 10.1093/bib/bbac224] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/18/2022] [Accepted: 05/11/2022] [Indexed: 11/12/2022] Open
Abstract
Identifying miRNA-disease associations is an important task for revealing pathogenic mechanism of complicated diseases. Different computational methods have been proposed. Although these methods obtained encouraging performance for detecting missing associations between known miRNAs and diseases, how to accurately predict associated diseases for new miRNAs is still a difficult task. In this regard, a ranking framework named idenMD-NRF is proposed for miRNA-disease association identification. idenMD-NRF treats the miRNA-disease association identification as an information retrieval task. Given a novel query miRNA, idenMD-NRF employs Learning to Rank algorithm to rank associated diseases based on high-level association features and various predictors. The experimental results on two independent test datasets indicate that idenMD-NRF is superior to other compared predictors. A user-friendly web server of idenMD-NRF predictor is freely available at http://bliulab.net/idenMD-NRF/.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Hang Wei
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
21
|
Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph. PLoS One 2022; 17:e0271395. [PMID: 35830458 PMCID: PMC9278741 DOI: 10.1371/journal.pone.0271395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 06/29/2022] [Indexed: 12/24/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) that play important roles in the genetic heritability of traits and diseases. With most of these SNPs located on the non-coding part of the genome, it is currently assumed that these SNPs influence the expression of nearby genes on the genome. However, identifying which genes are targeted by these disease-associated SNPs remains challenging. In the past, protein knowledge graphs have often been used to identify genes that are associated with disease, also referred to as “disease genes”. Here, we explore whether protein knowledge graphs can be used to identify genes that are targeted by disease-associated non-coding SNPs by testing and comparing the performance of six existing methods for a protein knowledge graph, four of which were developed for disease gene identification. We compare our performance against two baselines: (1) an existing state-of-the-art method that is based on guilt-by-association, and (2) the leading assumption that SNPs target the nearest gene on the genome. We test these methods with four reference sets, three of which were obtained by different means. Furthermore, we combine methods to investigate whether their combination improves performance. We find that protein knowledge graphs that include predicate information perform comparable to the current state of the art, achieving an area under the receiver operating characteristic curve (AUC) of 79.6% on average across all four reference sets. Protein knowledge graphs that lack predicate information perform comparable to our other baseline (genetic distance) which achieved an AUC of 75.7% across all four reference sets. Combining multiple methods improved performance to 84.9% AUC. We conclude that methods for a protein knowledge graph can be used to identify which genes are targeted by disease-associated non-coding SNPs.
Collapse
|
22
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
23
|
Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree. Sci Rep 2022; 12:10004. [PMID: 35705654 PMCID: PMC9200794 DOI: 10.1038/s41598-022-14127-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 06/01/2022] [Indexed: 11/10/2022] Open
Abstract
Identifying genes related to Parkinson’s disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.
Collapse
|
24
|
Parkinson’s disease diagnosis using neural networks: Survey and comprehensive evaluation. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.102909] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
25
|
Zhang J, Zhu M, Qian Y. protein2vec: Predicting Protein-Protein Interactions Based on LSTM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1257-1266. [PMID: 32750870 DOI: 10.1109/tcbb.2020.3003941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The semantic similarity of gene ontology (GO) terms is widely used to predict protein-protein interactions (PPIs). The traditional semantic similarity measures are based mainly on manually crafted features, which may ignore some important hidden information of the gene ontology. Moreover, those methods usually obtain the similarity between proteins from similarity between GO terms by some simple statistical rules, such as MAX and BMA (best-match average), oversimplifying the possible complex relationship between the proteins and the GO terms annotated with them. To overcome the two deficiencies, we propose a new method named protein2vec, which characterizes a protein with a vector based on the GO terms annotated to it and combines the information of both the GO and known PPIs. We firstly try to apply the network embedding algorithm on the GO network to generate feature vectors for each GO term. Then, Long Short-Time Memory (LSTM) encodes the feature vectors of the GO terms annotated with a protein into another vector (called protein vector). Finally, two protein vectors are forwarded into a feedforward neural network to predict the interaction between the two corresponding proteins. The experimental results show that protein2vec outperforms almost all commonly used traditional semantic similarity methods.
Collapse
|
26
|
Yu T. AIME: Autoencoder-based integrative multi-omics data embedding that allows for confounder adjustments. PLoS Comput Biol 2022; 18:e1009826. [PMID: 35081109 PMCID: PMC8820645 DOI: 10.1371/journal.pcbi.1009826] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 02/07/2022] [Accepted: 01/11/2022] [Indexed: 11/29/2022] Open
Abstract
In the integrative analyses of omics data, it is often of interest to extract data representation from one data type that best reflect its relations with another data type. This task is traditionally fulfilled by linear methods such as canonical correlation analysis (CCA) and partial least squares (PLS). However, information contained in one data type pertaining to the other data type may be complex and in nonlinear form. Deep learning provides a convenient alternative to extract low-dimensional nonlinear data embedding. In addition, the deep learning setup can naturally incorporate the effects of clinical confounding factors into the integrative analysis. Here we report a deep learning setup, named Autoencoder-based Integrative Multi-omics data Embedding (AIME), to extract data representation for omics data integrative analysis. The method can adjust for confounder variables, achieve informative data embedding, rank features in terms of their contributions, and find pairs of features from the two data types that are related to each other through the data embedding. In simulation studies, the method was highly effective in the extraction of major contributing features between data types. Using two real microRNA-gene expression datasets, one with confounder variables and one without, we show that AIME excluded the influence of confounders, and extracted biologically plausible novel information. The R package based on Keras and the TensorFlow backend is available at https://github.com/tianwei-yu/AIME. Integrative analysis, i.e. jointly analyzing two or more data matrices, is becoming more and more common in omics research. One type of integrative analysis measures the association between two groups of variables by finding low-dimensional spaces that maximize certain measures of agreement between the data matrices. Representative methods in this area include Canonical Correlation Analysis (CCA), Partial Least Squares (PLS), Multi-Omics Factor Analysis (MOFA), integrative clustering (iCluster), Similarity Network Fusion (SNF), joint Singular Value Decomposition (jSVD) etc. Here we present a new method: Autoencoder-based Integrative Multi-omics data Embedding (AIME). The method jointly analyzes two data matrices. It finds data embedding from the input data matrix that best preserves its relation with the output data matrix. It has several characteristics: (1) It is based on neural network. Hence it can detect nonlinear associations between the data matrices; (2) It can adjust for confounding variables such as age, gender, ethnicity etc, to remove their effects in the low-dimensional space; (3) It estimates pairwise relations between variables in the two data matrices. It is a useful addition to the tools for integrative analysis.
Collapse
Affiliation(s)
- Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong–Shenzhen, Shenzhen, Guangdong, China
- Shenzhen Research Institute of Big Data, Shenzhen, Guangdong, China
- Warshel Institute for Computational Biology, Shenzhen, Guangdong, China
- * E-mail:
| |
Collapse
|
27
|
Yang K, Zheng Y, Lu K, Chang K, Wang N, Shu Z, Yu J, Liu B, Gao Z, Zhou X. PDGNet: Predicting Disease Genes Using a Deep Neural Network With Multi-View Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:575-584. [PMID: 32750864 DOI: 10.1109/tcbb.2020.3002771] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The knowledge of phenotype-genotype associations is crucial for the understanding of disease mechanisms. Numerous studies have focused on developing efficient and accurate computing approaches to predict disease genes. However, owing to the sparseness and complexity of medical data, developing an efficient deep neural network model to identify disease genes remains a huge challenge. Therefore, we develop a novel deep neural network model that fuses the multi-view features of phenotypes and genotypes to identify disease genes (termed PDGNet). Our model integrated the multi-view features of diseases and genes and leveraged the feedback information of training samples to optimize the parameters of deep neural network and obtain the deep vector features of diseases and genes. The evaluation experiments on a large data set indicated that PDGNet obtained higher performance than the state-of-the-art method (precision and recall improved by 9.55 and 9.63 percent). The analysis results for the candidate genes indicated that the predicted genes have strong functional homogeneity and dense interactions with known genes. We validated the top predicted genes of Parkinson's disease based on external curated data and published medical literatures, which indicated that the candidate genes have a huge potential to guide the selection of causal genes in the 'wet experiment'. The source codes and the data of PDGNet are available at https://github.com/yangkuoone/PDGNet.
Collapse
|
28
|
Du J, Lin D, Yuan R, Chen X, Liu X, Yan J. Graph Embedding Based Novel Gene Discovery Associated With Diabetes Mellitus. Front Genet 2021; 12:779186. [PMID: 34899863 PMCID: PMC8657768 DOI: 10.3389/fgene.2021.779186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 10/20/2021] [Indexed: 11/25/2022] Open
Abstract
Diabetes mellitus is a group of complex metabolic disorders which has affected hundreds of millions of patients world-widely. The underlying pathogenesis of various types of diabetes is still unclear, which hinders the way of developing more efficient therapies. Although many genes have been found associated with diabetes mellitus, more novel genes are still needed to be discovered towards a complete picture of the underlying mechanism. With the development of complex molecular networks, network-based disease-gene prediction methods have been widely proposed. However, most existing methods are based on the hypothesis of guilt-by-association and often handcraft node features based on local topological structures. Advances in graph embedding techniques have enabled automatically global feature extraction from molecular networks. Inspired by the successful applications of cutting-edge graph embedding methods on complex diseases, we proposed a computational framework to investigate novel genes associated with diabetes mellitus. There are three main steps in the framework: network feature extraction based on graph embedding methods; feature denoising and regeneration using stacked autoencoder; and disease-gene prediction based on machine learning classifiers. We compared the performance by using different graph embedding methods and machine learning classifiers and designed the best workflow for predicting genes associated with diabetes mellitus. Functional enrichment analysis based on Human Phenotype Ontology (HPO), KEGG, and GO biological process and publication search further evaluated the predicted novel genes.
Collapse
Affiliation(s)
| | | | | | | | | | - Jing Yan
- Zhejiang Hospital, Hangzhou, China.,Zhejiang Provincial Key Lab of Geriatrics, Zhejiang Hospital, Hangzhou, China
| |
Collapse
|
29
|
Zhang H, Ferguson A, Robertson G, Jiang M, Zhang T, Sudlow C, Smith K, Rannikmae K, Wu H. Benchmarking network-based gene prioritization methods for cerebral small vessel disease. Brief Bioinform 2021; 22:bbab006. [PMID: 33634312 PMCID: PMC8425308 DOI: 10.1093/bib/bbab006] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/31/2020] [Accepted: 01/04/2021] [Indexed: 12/25/2022] Open
Abstract
Network-based gene prioritization algorithms are designed to prioritize disease-associated genes based on known ones using biological networks of protein interactions, gene-disease associations (GDAs) and other relationships between biological entities. Various algorithms have been developed based on different mechanisms, but it is not obvious which algorithm is optimal for a specific disease. To address this issue, we benchmarked multiple algorithms for their application in cerebral small vessel disease (cSVD). We curated protein-gene interactions (PGIs) and GDAs from databases and assembled PGI networks and disease-gene heterogeneous networks. A screening of algorithms resulted in seven representative algorithms to be benchmarked. Performance of algorithms was assessed using both leave-one-out cross-validation (LOOCV) and external validation with MEGASTROKE genome-wide association study (GWAS). We found that random walk with restart on the heterogeneous network (RWRH) showed best LOOCV performance, with median LOOCV rediscovery rank of 185.5 (out of 19 463 genes). The GenePanda algorithm had most GWAS-confirmable genes in top 200 predictions, while RWRH had best ranks for small vessel stroke-associated genes confirmed in GWAS. In conclusion, RWRH has overall better performance for application in cSVD despite its susceptibility to bias caused by degree centrality. Choice of algorithms should be determined before applying to specific disease. Current pure network-based gene prioritization algorithms are unlikely to find novel disease-associated genes that are not associated with known ones. The tools for implementing and benchmarking algorithms have been made available and can be generalized for other diseases.
Collapse
Affiliation(s)
- Huayu Zhang
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Amy Ferguson
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Grant Robertson
- Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Muchen Jiang
- Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Teng Zhang
- Department of Orthopaedics and Traumatology, the University of Hong Kong, Hong Kong, China
| | - Cathie Sudlow
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Keith Smith
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Kristiina Rannikmae
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Honghan Wu
- Health Data Research UK, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
| |
Collapse
|
30
|
Random walks on B distributed resting-state functional connectivity to identify Alzheimer's disease and Mild Cognitive Impairment. Clin Neurophysiol 2021; 132:2540-2550. [PMID: 34455312 DOI: 10.1016/j.clinph.2021.06.036] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 05/29/2021] [Accepted: 06/29/2021] [Indexed: 11/20/2022]
Abstract
OBJECTIVE Resting-state functional connectivity reveals a promising way for the early detection of dementia. This study proposes a novel method to accurately classify Healthy Controls, Early Mild Cognitive Impairment, Late Mild Cognitive Impairment, and Alzheimer's Disease individuals. METHODS A novel mapping function based on the B distribution has been developed to map correlation matrices to robust functional connectivity. The node2vec algorithm is applied to the functional connectivity to produce node embeddings. The concatenation of these embedding has been used to derive the patients' feature vectors for further feeding into the Support Vector Machine and Logistic Regression classifiers. RESULTS The experimental results indicate promising results in the complex four-class classification problem with an accuracy rate of 97.73% and a quadratic kappa score of 96.86% for the Support Vector Machine. These values are 97.32% and 96.74% for Logistic Regression. CONCLUSION This study presents an accurate automated method for dementia classification. Default Mode Network and Dorsal Attention Network have been found to demonstrate a significant role in the classification method. SIGNIFICANCE A new mapping function is proposed in this study, the mapping function improves accuracy by 10-11% in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
Collapse
|
31
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 407] [Impact Index Per Article: 101.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
32
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 156] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
33
|
Abstract
AbstractParkinson’s disease (PD) genes identification plays an important role in improving the diagnosis and treatment of the disease. A number of machine learning methods have been proposed to identify disease-related genes, but only few of these methods are adopted for PD. This work puts forth a novel neural network-based ensemble (n-semble) method to identify Parkinson’s disease genes. The artificial neural network is trained in a unique way to ensemble the multiple model predictions. The proposed n-semble method is composed of four parts: (1) protein sequences are used to construct feature vectors using physicochemical properties of amino acid; (2) dimensionality reduction is achieved using the t-Distributed Stochastic Neighbor Embedding (t-SNE) method, (3) the Jaccard method is applied to find likely negative samples from unknown (candidate) genes, and (4) gene prediction is performed with n-semble method. The proposed n-semble method has been compared with Smalter’s, ProDiGe, PUDI and EPU methods using various evaluation metrics. It has been concluded that the proposed n-semble method outperforms the existing gene identification methods over the other methods and achieves significantly higher precision, recall and F Score of 88.9%, 90.9% and 89.8%, respectively. The obtained results confirm the effectiveness and validity of the proposed framework.
Collapse
|
34
|
Liu C, Han Z, Zhang ZK, Nussinov R, Cheng F. A network-based deep learning methodology for stratification of tumor mutations. Bioinformatics 2021; 37:82-88. [PMID: 33416857 PMCID: PMC8034530 DOI: 10.1093/bioinformatics/btaa1099] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 11/23/2020] [Accepted: 12/28/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Tumor stratification has a wide range of biomedical and clinical applications, including diagnosis, prognosis and personalized treatment. However, cancer is always driven by the combination of mutated genes, which are highly heterogeneous across patients. Accurately subdividing the tumors into subtypes is challenging. RESULTS We developed a network-embedding based stratification (NES) methodology to identify clinically relevant patient subtypes from large-scale patients' somatic mutation profiles. The central hypothesis of NES is that two tumors would be classified into the same subtypes if their somatic mutated genes located in the similar network regions of the human interactome. We encoded the genes on the human protein-protein interactome with a network embedding approach and constructed the patients' vectors by integrating the somatic mutation profiles of 7344 tumor exomes across 15 cancer types. We firstly adopted the lightGBM classification algorithm to train the patients' vectors. The AUC value is around 0.89 in the prediction of the patient's cancer type and around 0.78 in the prediction of the tumor stage within a specific cancer type. The high classification accuracy suggests that network embedding-based patients' features are reliable for dividing the patients. We conclude that we can cluster patients with a specific cancer type into several subtypes by using an unsupervised clustering algorithm to learn the patients' vectors. Among the 15 cancer types, the new patient clusters (subtypes) identified by the NES are significantly correlated with patient survival across 12 cancer types. In summary, this study offers a powerful network-based deep learning methodology for personalized cancer medicine. AVAILABILITY AND IMPLEMENTATION Source code and data can be downloaded from https://github.com/ChengF-Lab/NES. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chuang Liu
- Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, China
| | - Zhen Han
- Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, China
| | - Zi-Ke Zhang
- Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, China
- College of Media and International Culture, Zhejiang University, Hangzhou 310028, China
| | - Ruth Nussinov
- Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, MD 21702, USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| |
Collapse
|
35
|
Gunduz H. An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson's disease classification. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102452] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
36
|
Peng J, Lu G, Shang X. A Survey of Network Representation Learning Methods for Link Prediction in Biological Network. Curr Pharm Des 2021; 26:3076-3084. [PMID: 31951161 DOI: 10.2174/1381612826666200116145057] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 01/09/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Networks are powerful resources for describing complex systems. Link prediction is an important issue in network analysis and has important practical application value. Network representation learning has proven to be useful for network analysis, especially for link prediction tasks. OBJECTIVE To review the application of network representation learning on link prediction in a biological network, we summarize recent methods for link prediction in a biological network and discuss the application and significance of network representation learning in link prediction task. METHOD & RESULTS We first introduce the widely used link prediction algorithms, then briefly introduce the development of network representation learning methods, focusing on a few widely used methods, and their application in biological network link prediction. Existing studies demonstrate that using network representation learning to predict links in biological networks can achieve better performance. In the end, some possible future directions have been discussed.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Guilin Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
37
|
Shi W, Chen X, Deng L. A Review of Recent Developments and Progress in Computational Drug Repositioning. Curr Pharm Des 2021; 26:3059-3068. [PMID: 31951162 DOI: 10.2174/1381612826666200116145559] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/09/2020] [Indexed: 12/27/2022]
Abstract
Computational drug repositioning is an efficient approach towards discovering new indications for existing drugs. In recent years, with the accumulation of online health-related information and the extensive use of biomedical databases, computational drug repositioning approaches have achieved significant progress in drug discovery. In this review, we summarize recent advancements in drug repositioning. Firstly, we explicitly demonstrated the available data source information which is conducive to identifying novel indications. Furthermore, we provide a summary of the commonly used computing approaches. For each method, we briefly described techniques, case studies, and evaluation criteria. Finally, we discuss the limitations of the existing computing approaches.
Collapse
Affiliation(s)
- Wanwan Shi
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xuegong Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
38
|
Xicoy H, Vila M, Laguna A. Systems Medicine in Parkinson׳s Disease: Joining Efforts to Change History. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11612-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
39
|
Liu Y, Guo Y, Liu X, Wang C, Guo M. Pathogenic gene prediction based on network embedding. Brief Bioinform 2020; 22:6053103. [PMID: 33367541 DOI: 10.1093/bib/bbaa353] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/02/2020] [Accepted: 11/03/2020] [Indexed: 11/13/2022] Open
Abstract
In disease research, the study of gene-disease correlation has always been an important topic. With the emergence of large-scale connected data sets in biology, we use known correlations between the entities, which may be from different sets, to build a biological heterogeneous network and propose a new network embedded representation algorithm to calculate the correlation between disease and genes, using the correlation score to predict pathogenic genes. Then, we conduct several experiments to compare our method to other state-of-the-art methods. The results reveal that our method achieves better performance than the traditional methods.
Collapse
Affiliation(s)
- Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yuchen Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
40
|
Ata SK, Wu M, Fang Y, Ou-Yang L, Kwoh CK, Li XL. Recent advances in network-based methods for disease gene prediction. Brief Bioinform 2020; 22:6023077. [PMID: 33276376 DOI: 10.1093/bib/bbaa303] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/29/2020] [Accepted: 10/10/2020] [Indexed: 01/28/2023] Open
Abstract
Disease-gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.
Collapse
Affiliation(s)
- Sezin Kircali Ata
- School of Computer Science and Engineering Nanyang Technological University (NTU)
| | - Min Wu
- Institute for Infocomm Research (I2R), A*STAR, Singapore
| | - Yuan Fang
- School of Information Systems, Singapore Management University, Singapore
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen China
| | | | - Xiao-Li Li
- Department head and principal scientist at I2R, A*STAR, Singapore
| |
Collapse
|
41
|
Zhang T, Wang R, Jiang Q, Wang Y. An Information Gain-based Method for Evaluating the Classification Power of Features Towards Identifying Enhancers. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191120141032] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Enhancers are cis-regulatory elements that enhance gene expression on
DNA sequences. Since most of enhancers are located far from transcription start sites, it is difficult
to identify them. As other regulatory elements, the regions around enhancers contain a variety of
features, which can help in enhancer recognition.
Objective:
The classification power of features differs significantly, the performances of existing
methods that use one or a few features for identifying enhancer vary greatly. Therefore, evaluating
the classification power of each feature can improve the predictive performance of enhancers.
Methods:
We present an evaluation method based on Information Gain (IG) that captures the
entropy change of enhancer recognition according to features. To validate the performance of our
method, experiments using the Single Feature Prediction Accuracy (SFPA) were conducted on
each feature.
Results:
The average IG values of the sequence feature, transcriptional feature and epigenetic
feature are 0.068, 0.213, and 0.299, respectively. Through SFPA, the average AUC values of the
sequence feature, transcriptional feature and epigenetic feature are 0.534, 0.605, and 0.647,
respectively. The verification results are consistent with our evaluation results.
Conclusion:
This IG-based method can effectively evaluate the classification power of features for
identifying enhancers. Compared with sequence features, epigenetic features are more effective for
recognizing enhancers.
Collapse
Affiliation(s)
- Tianjiao Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Rongjie Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
42
|
Meng L, Masuda N. Analysis of node2vec random walks on networks. Proc Math Phys Eng Sci 2020; 476:20200447. [PMID: 33362414 PMCID: PMC7735314 DOI: 10.1098/rspa.2020.0447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 10/23/2020] [Indexed: 01/25/2023] Open
Abstract
Random walks have been proven to be useful for constructing various algorithms to gain information on networks. Algorithm node2vec employs biased random walks to realize embeddings of nodes into low-dimensional spaces, which can then be used for tasks such as multi-label classification and link prediction. The performance of the node2vec algorithm in these applications is considered to depend on properties of random walks that the algorithm uses. In the present study, we theoretically and numerically analyse random walks used by the node2vec. Those random walks are second-order Markov chains. We exploit the mapping of its transition rule to a transition probability matrix among directed edges to analyse the stationary probability, relaxation times in terms of the spectral gap of the transition probability matrix, and coalescence time. In particular, we show that node2vec random walk accelerates diffusion when walkers are designed to avoid both backtracking and visiting a neighbour of the previously visited node but do not avoid them completely.
Collapse
Affiliation(s)
- Lingqi Meng
- Department of Mathematics, University at Buffalo, State University of New York, Buffalo, NY 14260-2900, USA
| | - Naoki Masuda
- Department of Mathematics, University at Buffalo, State University of New York, Buffalo, NY 14260-2900, USA
- Computational and Data-Enabled Science and Engineering Program, University at Buffalo, State University of New York, Buffalo, NY 14260-5030, USA
| |
Collapse
|
43
|
Li J, Chen X, Huang Q, Wang Y, Xie Y, Dai Z, Zou X, Li Z. Seq-SymRF: a random forest model predicts potential miRNA-disease associations based on information of sequences and clinical symptoms. Sci Rep 2020; 10:17901. [PMID: 33087810 PMCID: PMC7578641 DOI: 10.1038/s41598-020-75005-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 10/09/2020] [Indexed: 12/24/2022] Open
Abstract
Increasing evidence indicates that miRNAs play a vital role in biological processes and are closely related to various human diseases. Research on miRNA-disease associations is helpful not only for disease prevention, diagnosis and treatment, but also for new drug identification and lead compound discovery. A novel sequence- and symptom-based random forest algorithm model (Seq-SymRF) was developed to identify potential associations between miRNA and disease. Features derived from sequence information and clinical symptoms were utilized to characterize miRNA and disease, respectively. Moreover, the clustering method by calculating the Euclidean distance was adopted to construct reliable negative samples. Based on the fivefold cross-validation, Seq-SymRF achieved the accuracy of 98.00%, specificity of 99.43%, sensitivity of 96.58%, precision of 99.40% and Matthews correlation coefficient of 0.9604, respectively. The areas under the receiver operating characteristic curve and precision recall curve were 0.9967 and 0.9975, respectively. Additionally, case studies were implemented with leukemia, breast neoplasms and hsa-mir-21. Most of the top-25 predicted disease-related miRNAs (19/25 for leukemia; 20/25 for breast neoplasms) and 15 of top-25 predicted miRNA-related diseases were verified by literature and dbDEMC database. It is anticipated that Seq-SymRF could be regarded as a powerful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Seq-SymRF.
Collapse
Affiliation(s)
- Jinlong Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Xingyu Chen
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Qixing Huang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Yang Wang
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Yun Xie
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Zong Dai
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China.
| | - Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China. .,Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou, 510006, People's Republic of China.
| |
Collapse
|
44
|
Peng J, Li J, Shang X. A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural network. BMC Bioinformatics 2020; 21:394. [PMID: 32938374 PMCID: PMC7495825 DOI: 10.1186/s12859-020-03677-1] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Drug-target interaction prediction is of great significance for narrowing down the scope of candidate medications, and thus is a vital step in drug discovery. Because of the particularity of biochemical experiments, the development of new drugs is not only costly, but also time-consuming. Therefore, the computational prediction of drug target interactions has become an essential way in the process of drug discovery, aiming to greatly reducing the experimental cost and time. RESULTS We propose a learning-based method based on feature representation learning and deep neural network named DTI-CNN to predict the drug-target interactions. We first extract the relevant features of drugs and proteins from heterogeneous networks by using the Jaccard similarity coefficient and restart random walk model. Then, we adopt a denoising autoencoder model to reduce the dimension and identify the essential features. Third, based on the features obtained from last step, we constructed a convolutional neural network model to predict the interaction between drugs and proteins. The evaluation results show that the average AUROC score and AUPR score of DTI-CNN were 0.9416 and 0.9499, which obtains better performance than the other three existing state-of-the-art methods. CONCLUSIONS All the experimental results show that the performance of DTI-CNN is better than that of the three existing methods and the proposed method is appropriately designed.
Collapse
Affiliation(s)
- Jiajie Peng
- The School of Computer Science, Northwestern Polytechnical University, Xian, 710072, China.,The Key Laboratory of Big Data Storage an Management, Northwestern Polytechnical Universitythe, Ministry of Industry and Information Technology, Xian, 710072, China
| | - Jingyi Li
- The School of Computer Science, Northwestern Polytechnical University, Xian, 710072, China.,The Key Laboratory of Big Data Storage an Management, Northwestern Polytechnical Universitythe, Ministry of Industry and Information Technology, Xian, 710072, China
| | - Xuequn Shang
- The School of Computer Science, Northwestern Polytechnical University, Xian, 710072, China. .,The Key Laboratory of Big Data Storage an Management, Northwestern Polytechnical Universitythe, Ministry of Industry and Information Technology, Xian, 710072, China.
| |
Collapse
|
45
|
Zhuang H, Zhang Y, Yang S, Cheng L, Liu SL. A Mendelian Randomization Study on Infant Length and Type 2 Diabetes Mellitus Risk. Curr Gene Ther 2020; 19:224-231. [PMID: 31553296 DOI: 10.2174/1566523219666190925115535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/15/2019] [Accepted: 06/16/2019] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Infant length (IL) is a positively associated phenotype of type 2 diabetes mellitus (T2DM), but the causal relationship of which is still unclear. Here, we applied a Mendelian randomization (MR) study to explore the causal relationship between IL and T2DM, which has the potential to provide guidance for assessing T2DM activity and T2DM- prevention in young at-risk populations. MATERIALS AND METHODS To classify the study, a two-sample MR, using genetic instrumental variables (IVs) to explore the causal effect was applied to test the influence of IL on the risk of T2DM. In this study, MR was carried out on GWAS data using 8 independent IL SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated by the inverse-variance weighted method for the assessment of the risk the shorter IL brings to T2DM. Sensitivity validation was conducted to identify the effect of individual SNPs. MR-Egger regression was used to detect pleiotropic bias of IVs. RESULTS The pooled odds ratio from the IVW method was 1.03 (95% CI 0.89-1.18, P = 0.0785), low intercept was -0.477, P = 0.252, and small fluctuation of ORs ranged from -0.062 ((0.966 - 1.03) / 1.03) to 0.05 ((1.081 - 1.03) / 1.03) in leave-one-out validation. CONCLUSION We validated that the shorter IL causes no additional risk to T2DM. The sensitivity analysis and the MR-Egger regression analysis also provided adequate evidence that the above result was not due to any heterogeneity or pleiotropic effect of IVs.
Collapse
Affiliation(s)
- He Zhuang
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, 150001, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shu-Lin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China.,Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, Canada.,Department of Infectious Diseases, The First Affiliated Hospital, Harbin Medical University, Harbin, China.,Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China
| |
Collapse
|
46
|
Deng S, Sun Y, Zhao T, Hu Y, Zang T. A Review of Drug Side Effect Identification Methods. Curr Pharm Des 2020; 26:3096-3104. [PMID: 32532187 DOI: 10.2174/1381612826666200612163819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 05/18/2020] [Indexed: 11/22/2022]
Abstract
Drug side effects have become an important indicator for evaluating the safety of drugs. There are two main factors in the frequent occurrence of drug safety problems; on the one hand, the clinical understanding of drug side effects is insufficient, leading to frequent adverse drug reactions, while on the other hand, due to the long-term period and complexity of clinical trials, side effects of approved drugs on the market cannot be reported in a timely manner. Therefore, many researchers have focused on developing methods to identify drug side effects. In this review, we summarize the methods of identifying drug side effects and common databases in this field. We classified methods of identifying side effects into four categories: biological experimental, machine learning, text mining and network methods. We point out the key points of each kind of method. In addition, we also explain the advantages and disadvantages of each method. Finally, we propose future research directions.
Collapse
Affiliation(s)
- Shuai Deng
- College of Science, Beijing Forestry University, Beijing, China
| | - Yige Sun
- Microbiology Department, Harbin Medical University, Harbin, 150081, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
47
|
Liu H, Guan J, Li H, Bao Z, Wang Q, Luo X, Xue H. Predicting the Disease Genes of Multiple Sclerosis Based on Network Representation Learning. Front Genet 2020; 11:328. [PMID: 32373160 PMCID: PMC7186413 DOI: 10.3389/fgene.2020.00328] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 03/19/2020] [Indexed: 02/02/2023] Open
Abstract
Multiple sclerosis (MS) is an autoimmune disease for which it is difficult to find exact disease-related genes. Effectively identifying disease-related genes would contribute to improving the treatment and diagnosis of multiple sclerosis. Current methods for identifying disease-related genes mainly focus on the hypothesis of guilt-by-association and pay little attention to the global topological information of the whole protein-protein-interaction (PPI) network. Besides, network representation learning (NRL) has attracted a huge amount of attention in the area of network analysis because of its promising performance in node representation and many downstream tasks. In this paper, we try to introduce NRL into the task of disease-related gene prediction and propose a novel framework for identifying the disease-related genes multiple sclerosis. The proposed framework contains three main steps: capturing the topological structure of the PPI network using NRL-based methods, encoding learned features into low-dimensional space using a stacked autoencoder, and training a support vector machine (SVM) classifier to predict disease-related genes. Compared with three state-of-the-art algorithms, our proposed framework shows superior performance on the task of predicting disease-related genes of multiple sclerosis.
Collapse
Affiliation(s)
- Haijie Liu
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
- Department of Physical Medicine and Rehabilitation, Tianjin Medical University General Hospital, Tianjin, China
- Stroke Biological Recovery Laboratory, Department of Physical Medicine and Rehabilitation, Spaulding Rehabilitation Hospital, The Teaching Affiliate of Harvard Medical School Charlestown, Boston, MA, United States
| | - Jiaojiao Guan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - He Li
- Department of Automation, College of Information Science and Engineering, Tianjin Tianshi College, Tianjin, China
| | - Zhijie Bao
- School of Textile Science and Engineering, Tiangong University, Tianjin, China
| | - Qingmei Wang
- Stroke Biological Recovery Laboratory, Department of Physical Medicine and Rehabilitation, Spaulding Rehabilitation Hospital, The Teaching Affiliate of Harvard Medical School Charlestown, Boston, MA, United States
| | - Xun Luo
- Kerry Rehabilitation Medicine Research Institute, Shenzhen, China
- Shenzhen Dapeng New District Nan'ao People's Hospital, Shenzhen, China
| | - Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
48
|
Liu C, Ma Y, Zhao J, Nussinov R, Zhang YC, Cheng F, Zhang ZK. Computational network biology: Data, models, and applications. PHYSICS REPORTS 2020; 846:1-66. [DOI: 10.1016/j.physrep.2019.12.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
49
|
Zhang D, Huo D, Xie H, Wu L, Zhang J, Liu L, Jin Q, Chen X. CHG: A Systematically Integrated Database of Cancer Hallmark Genes. Front Genet 2020; 11:29. [PMID: 32117445 PMCID: PMC7013921 DOI: 10.3389/fgene.2020.00029] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 01/09/2020] [Indexed: 12/20/2022] Open
Abstract
Background The analysis of cancer diversity based on a logical framework of hallmarks has greatly improved our understanding of the occurrence, development and metastasis of various cancers. Methods We designed Cancer Hallmark Genes (CHG) database which focuses on integrating hallmark genes in a systematic, standard way and annotates the potential roles of the hallmark genes in cancer processes. Following the conceptual criteria description of hallmark function the keywords for each hallmark were manually selected from the literature. Candidate hallmark genes collected were derived from 301 pathways of KEGG database by Lucene and manually corrected. Results Based on the variation data, we finally identified the hallmark genes of various types of cancer and constructed CHG. And we also analyzed the relationships among hallmarks and potential characteristics and relationships of hallmark genes based on the topological structures of their networks. We manually confirm the hallmark gene identified by CHG based on literature and database. We also predicted the prognosis of breast cancer, glioblastoma multiforme and kidney papillary cell carcinoma patients based on CHG data. Conclusions In summary, CHG, which was constructed based on a hallmark feature set, provides a new perspective for analyzing the diversity and development of cancers.
Collapse
Affiliation(s)
- Denan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Diwei Huo
- The 2nd Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Hongbo Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lingxiang Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Juan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lei Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qing Jin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiujie Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
50
|
Wang T, Peng Q, Liu B, Liu X, Liu Y, Peng J, Wang Y. eQTLMAPT: Fast and Accurate eQTL Mediation Analysis With Efficient Permutation Testing Approaches. Front Genet 2020; 10:1309. [PMID: 31998368 PMCID: PMC6970436 DOI: 10.3389/fgene.2019.01309] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 11/27/2019] [Indexed: 12/21/2022] Open
Abstract
Expression quantitative trait locus (eQTL) analyses are critical in understanding the complex functional regulatory natures of genetic variation and have been widely used in the interpretation of disease-associated variants identified by genome-wide association studies (GWAS). Emerging evidence has shown that trans-eQTL effects on remote gene expression could be mediated by local transcripts, which is known as the mediation effects. To discover the genome-wide eQTL mediation effects combing genomic and transcriptomic profiles, it is necessary to develop novel computational methods to rapidly scan large number of candidate associations while controlling for multiple testing appropriately. Here, we present eQTLMAPT, an R package aiming to perform eQTL mediation analysis with implementation of efficient permutation procedures in multiple testing correction. eQTLMAPT is advantageous in threefold. First, it accelerates mediation analysis by effectively pruning the permutation process through adaptive permutation scheme. Second, it can efficiently and accurately estimate the significance level of mediation effects by modeling the null distribution with generalized Pareto distribution (GPD) trained from a few permutation statistics. Third, eQTLMAPT provides flexible interfaces for users to combine various permutation schemes with different confounding adjustment methods. Experiments on real eQTL dataset demonstrate that eQTLMAPT provides higher resolution of estimated significance of mediation effects and is an order of magnitude faster than compared methods with similar accuracy.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qidi Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiaoli Liu
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|