Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Opap K, Mulder N. Recent advances in predicting gene-disease associations. F1000Res 2017;6:578. [PMID: 28529714 PMCID: PMC5414807 DOI: 10.12688/f1000research.10788.1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/24/2017] [Indexed: 12/14/2022] Open

For:	Opap K, Mulder N. Recent advances in predicting gene-disease associations. F1000Res 2017;6:578. [PMID: 28529714 PMCID: PMC5414807 DOI: 10.12688/f1000research.10788.1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/24/2017] [Indexed: 12/14/2022] Open

Number

Cited by Other Article(s)

Kumar P, Metzger VT, Purushotham ST, Kedia P, Bologa CG, Lambert CG, Yang JJ. KG2ML: Integrating Knowledge Graphs and Positive Unlabeled Learning for Identifying Disease-Associated Genes. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.03.17.25323906. [PMID: 40166563 PMCID: PMC11957101 DOI: 10.1101/2025.03.17.25323906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]

Abstract

Background

Biomedical knowledge graphs (KGs), such as the Data Distillery Knowledge Graph (DDKG), capture known relationships among entities (e.g., genes, diseases, proteins), providing valuable insights for research. However, these relationships are typically derived from prior studies, leaving potential unknown associations unexplored. Identifying such unknown associations, including previously unknown disease-associated genes, remains a critical challenge in bioinformatics and is crucial for advancing biomedical knowledge. Traditional methods, such as linkage analysis and genome-wide association studies (GWAS), can be time-consuming and resource-intensive. This highlights the need for efficient computational approaches to identify or predict new genes using known disease-gene associations. Recently, network-based methods and KGs, enhanced by advances in machine learning (ML) frameworks, have emerged as promising tools for inferring these unexplored associations. Given the technical limitations of the Neo4j Graph Data Science (GDS) machine learning pipeline, we developed a novel machine learning pipeline called KG2ML (Knowledge Graph to Machine Learning). This pipeline utilizes our Positive and Unlabeled (PU) learning algorithm, PULSNAR (Positive Unlabeled Learning Selected Not At Random), and incorporates path-based feature extraction from ProteinGraphML.

Results

KG2ML was applied to 12 diseases, including Bipolar Disorder, Coronary Artery Disease, and Parkinson's Disease, to infer disease-associated genes not explicitly recorded in DDKG. For several of these diseases, 14 out of the 15 top-ranked genes lacked prior explicit associations in the DDKG but were supported by literature and TINX (Target Importance and Novelty Explorer) evidence. Incorporating PULSNAR-imputed genes as positives enhanced XGBoost classification, demonstrating the potential of PU learning in identifying hidden gene-disease relationships.

Conclusion

The observed improvement in classification performance after the inclusion of PULSNAR-imputed genes as positive examples, along with the subject matter experts' (SME) evaluations of the top 15 imputed genes for 12 diseases, suggests that PU learning can effectively uncover disease-gene associations missing from existing knowledge graphs (KGs). By integrating KG data with ML-based inference, our KG2ML pipeline provides a scalable and interpretable framework to advance biomedical research while addressing the inherent limitations of current KGs.

Collapse

Gualdi F, Oliva B, Piñero J. Genopyc: a Python library for investigating the functional effects of genomic variants associated to complex diseases. Bioinformatics 2024;40:btae379. [PMID: 38889282 PMCID: PMC11211212 DOI: 10.1093/bioinformatics/btae379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 05/21/2024] [Accepted: 06/14/2024] [Indexed: 06/20/2024] Open

Abstract

MOTIVATION

Integrative Biomedicl Informatics, Research Program on Biomedical Informatics (IBI - GRIB), Hospital Del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF) C/ del Dr. Aiguader 88 Barcelona 08003 Spain.Understanding the genetic basis of complex diseases is one of the main challenges in modern genomics. However, current tools often lack the versatility to efficiently analyze the intricate relationships between genetic variations and disease outcomes. To address this, we introduce Genopyc, a novel Python library designed for comprehensive investigation of how the variants associated to complex diseases affects downstream pathways. Genopyc offers an extensive suite of functions for heterogeneous data mining and visualization, enabling researchers to delve into and integrate biological information from large-scale genomic datasets.

RESULTS

In this work, we present the Genopyc library through application to real-world genome wide association studies variants. Using Genopyc to investigate the functional consequences of variants associated to intervertebral disc degeneration enabled a deeper understanding of the potential dysregulated pathways involved in the disease, which can be explored and visualized by exploiting the functionalities featured in the package. Genopyc emerges as a powerful asset for researchers, facilitating the investigation of complex diseases paving the way for more targeted therapeutic interventions.

AVAILABILITY AND IMPLEMENTATION

Genopyc is available on pip https://pypi.org/project/genopyc/.The source code of Genopyc is available at https://github.com/freh-g/genopyc. A tutorial notebook is available at https://github.com/freh-g/genopyc/blob/main/tutorials/Genopyc_tutorial_notebook.ipynb. Finally, a detailed documentation is available at: https://genopyc.readthedocs.io/en/latest/.

Collapse

Gualdi F, Oliva B, Piñero J. Predicting gene disease associations with knowledge graph embeddings for diseases with curtailed information. NAR Genom Bioinform 2024;6:lqae049. [PMID: 38745993 PMCID: PMC11091931 DOI: 10.1093/nargab/lqae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 03/08/2024] [Accepted: 04/24/2024] [Indexed: 05/16/2024] Open

Dwivedi K, Rajpal A, Rajpal S, Kumar V, Agarwal M, Kumar N. Enlightening the path to NSCLC biomarkers: Utilizing the power of XAI-guided deep learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;243:107864. [PMID: 37866126 DOI: 10.1016/j.cmpb.2023.107864] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 10/07/2023] [Accepted: 10/11/2023] [Indexed: 10/24/2023]

Abstract

BACKGROUND AND OBJECTIVE

The early diagnosis of Non-small cell lung cancer (NSCLC) is of prime importance to improve the patient's survivability and quality of life. Being a heterogeneous disease at the molecular and cellular level, the biomarkers responsible for the heterogeneity aid in distinguishing NSCLC into its prominent subtypes-adenocarcinoma and squamous cell carcinoma. Moreover, if identified, these biomarkers could pave the path to targeted therapy. Through this work, a novel explainable AI (XAI)-guided deep learning framework is proposed that assists in discovering a set of significant NSCLC-relevant biomarkers using methylation data.

METHODS

The proposed framework is divided into two blocks- the first block combines an autoencoder and a neural network to classify NSCLC instances. The second block utilizes various eXplainable AI (XAI) methods, namely IntegratedGradients, GradientSHAP, and DeepLIFT, to discover a set of seven significant biomarkers.

RESULTS

The classification performance of the biomarkers discovered using the proposed framework is evaluated by employing multiple machine learning algorithms, among which the Multilayer Perceptron (MLP) algorithm-based model outperforms others, yielding a 10-fold cross-validation accuracy of 91.53%. An improved accuracy of 96.37% is achieved by integrating RNA-Seq, CNV, and methylation data. On performing statistical analysis using the Friedman and Nemenyi tests, the MLP model is found to be significantly better than other machine learning-based models. Further, the clinical efficacy of the resultant biomarkers is established based on their potential druggability, the likelihood of predicting NSCLC patients' survival, gene-disease association, and biological pathways targeted by them. While the biomarkers C18orf18, CCNT2, THOP1, and TNPO2, are found potentially druggable, the biomarkers CCDC15, SNORA9, THOP1, and TNPO2 are found prognostically relevant. On further analysis, some of the discovered biomarkers are found to be associated with around 104 diseases. Moreover, five KEGG, ten Reactome, and three Wiki pathways are found to be triggered by the biomarkers discovered.

CONCLUSION

In summary, the proposed framework uncovers a set of clinically effective biomarkers that accurately classify NSCLC. As a future course of work, efforts would be made to combine a variety of omics data with histopathological data to unveil more precise biomarkers for devising personalized therapy.

Collapse

Nunes S, Sousa R, Pesquita C. Multi-domain knowledge graph embeddings for gene-disease association prediction. J Biomed Semantics 2023;14:11. [PMID: 37580835 PMCID: PMC10426189 DOI: 10.1186/s13326-023-00291-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 07/29/2023] [Indexed: 08/16/2023] Open

Cinaglia P, Cannataro M. Identifying Candidate Gene-Disease Associations via Graph Neural Networks. ENTROPY (BASEL, SWITZERLAND) 2023;25:909. [PMID: 37372253 DOI: 10.3390/e25060909] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/01/2023] [Accepted: 06/05/2023] [Indexed: 06/29/2023]

Wang Z, Gu Y, Zheng S, Yang L, Li J. MGREL: A multi-graph representation learning-based ensemble learning method for gene-disease association prediction. Comput Biol Med 2023;155:106642. [PMID: 36805231 DOI: 10.1016/j.compbiomed.2023.106642] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/15/2023] [Accepted: 02/05/2023] [Indexed: 02/12/2023]

Stolfi P, Mastropietro A, Pasculli G, Tieri P, Vergni D. NIAPU: network-informed adaptive positive-unlabeled learning for disease gene identification. Bioinformatics 2023;39:7023926. [PMID: 36727493 PMCID: PMC9933847 DOI: 10.1093/bioinformatics/btac848] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 12/23/2022] [Indexed: 02/03/2023] Open

Systematic approach to identify therapeutic targets and functional pathways for the cervical cancer. J Genet Eng Biotechnol 2023;21:10. [PMID: 36723760 PMCID: PMC9892376 DOI: 10.1186/s43141-023-00469-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 01/14/2023] [Indexed: 02/02/2023]

Huang L, Zhang L, Chen X. Updated review of advances in microRNAs and complex diseases: taxonomy, trends and challenges of computational models. Brief Bioinform 2022;23:6686738. [PMID: 36056743 DOI: 10.1093/bib/bbac358] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/24/2022] [Accepted: 07/30/2022] [Indexed: 12/12/2022] Open

Taguchi YH, Turki T. Integrated Analysis of Tissue-Specific Gene Expression in Diabetes by Tensor Decomposition Can Identify Possible Associated Diseases. Genes (Basel) 2022;13:1097. [PMID: 35741859 PMCID: PMC9222230 DOI: 10.3390/genes13061097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 06/14/2022] [Accepted: 06/17/2022] [Indexed: 01/27/2023] Open

Bhatnagar R, Sardar S, Beheshti M, Podichetty JT. How can natural language processing help model informed drug development?: a review. JAMIA Open 2022;5:ooac043. [PMID: 35702625 PMCID: PMC9188322 DOI: 10.1093/jamiaopen/ooac043] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/28/2022] [Accepted: 05/26/2022] [Indexed: 01/20/2023] Open

Abstract

Objective

To summarize applications of natural language processing (NLP) in model informed drug development (MIDD) and identify potential areas of improvement.

Materials and Methods

Publications found on PubMed and Google Scholar, websites and GitHub repositories for NLP libraries and models. Publications describing applications of NLP in MIDD were reviewed. The applications were stratified into 3 stages: drug discovery, clinical trials, and pharmacovigilance. Key NLP functionalities used for these applications were assessed. Programming libraries and open-source resources for the implementation of NLP functionalities in MIDD were identified.

Results

NLP has been utilized to aid various processes in drug development lifecycle such as gene-disease mapping, biomarker discovery, patient-trial matching, adverse drug events detection, etc. These applications commonly use NLP functionalities of named entity recognition, word embeddings, entity resolution, assertion status detection, relation extraction, and topic modeling. The current state-of-the-art for implementing these functionalities in MIDD applications are transformer models that utilize transfer learning for enhanced performance. Various libraries in python, R, and Java like huggingface, sparkNLP, and KoRpus as well as open-source platforms such as DisGeNet, DeepEnroll, and Transmol have enabled convenient implementation of NLP models to MIDD applications.

Discussion

Challenges such as reproducibility, explainability, fairness, limited data, limited language-support, and security need to be overcome to ensure wider adoption of NLP in MIDD landscape. There are opportunities to improve the performance of existing models and expand the use of NLP in newer areas of MIDD.

Conclusions

This review provides an overview of the potential and pitfalls of current NLP approaches in MIDD.

Collapse

Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022;23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open

Abstract

MOTIVATION

Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction.

RESULTS

We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation.

CONCLUSIONS

The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.

Collapse

Liu L, Mamitsuka H, Zhu S. HPODNets: deep graph convolutional networks for predicting human protein-phenotype associations. Bioinformatics 2022;38:799-808. [PMID: 34672333 DOI: 10.1093/bioinformatics/btab729] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 09/18/2021] [Accepted: 10/18/2021] [Indexed: 02/03/2023] Open

Manoharan S, Iyyappan OR. A Hybrid Protocol for Finding Novel Gene Targets for Various Diseases Using Microarray Expression Data Analysis and Text Mining. Methods Mol Biol 2022;2496:41-70. [PMID: 35713858 DOI: 10.1007/978-1-0716-2305-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Rosário-Ferreira N, Guimarães V, Costa VS, Moreira IS. SicknessMiner: a deep-learning-driven text-mining tool to abridge disease-disease associations. BMC Bioinformatics 2021;22:482. [PMID: 34607568 PMCID: PMC8491382 DOI: 10.1186/s12859-021-04397-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 09/24/2021] [Indexed: 12/24/2022] Open

Zhang Y, Xiang J, Tang L, Li J, Lu Q, Tian G, He BS, Yang J. Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity. Front Genet 2021;12:596794. [PMID: 34484285 PMCID: PMC8415302 DOI: 10.3389/fgene.2021.596794] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 05/05/2021] [Indexed: 01/04/2023] Open

Abstract

Complex diseases, such as breast cancer, are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we develop a novel computational framework to analyze the network properties of the known breast cancer–associated genes, based on which we develop a random-walk-with-restart (RCRWR) algorithm to predict novel disease genes. Specifically, we first curated a set of breast cancer–associated genes from the Genome-Wide Association Studies catalog and Online Mendelian Inheritance in Man database and then studied the distribution of these genes on an integrated protein–protein interaction (PPI) network. We found that the breast cancer–associated genes are significantly closer to each other than random, which confirms the modularity property of disease genes in a PPI network as revealed by previous studies. We then retrieved PPI subnetworks spanning top breast cancer–associated KEGG pathways and found that the distribution of these genes on the subnetworks are non-random, suggesting that these KEGG pathways are activated non-uniformly. Taking advantage of the non-random distribution of breast cancer–associated genes, we developed an improved RCRWR algorithm to predict novel cancer genes, which integrates network reconstruction based on local random walk dynamics and subnetworks spanning KEGG pathways. Compared with the disease gene prediction without using the information from the KEGG pathways, this method has a better prediction performance on inferring breast cancer–associated genes, and the top predicted genes are better enriched on known breast cancer–associated gene ontologies. Finally, we performed a literature search on top predicted novel genes and found that most of them are supported by at least wet-lab experiments on cell lines. In summary, we propose a robust computational framework to prioritize novel breast cancer–associated genes, which could be used for further in vitro and in vivo experimental validation.

Collapse

Zhang Y, Xiang J, Tang L, Li J, Lu Q, Tian G, He BS, Yang J. Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity. Front Genet 2021;12:596794. [PMID: 34484285 DOI: 10.3389/fgene.2021.596794/full] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 05/05/2021] [Indexed: 05/28/2023] Open

Abstract

Complex diseases, such as breast cancer, are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we develop a novel computational framework to analyze the network properties of the known breast cancer-associated genes, based on which we develop a random-walk-with-restart (RCRWR) algorithm to predict novel disease genes. Specifically, we first curated a set of breast cancer-associated genes from the Genome-Wide Association Studies catalog and Online Mendelian Inheritance in Man database and then studied the distribution of these genes on an integrated protein-protein interaction (PPI) network. We found that the breast cancer-associated genes are significantly closer to each other than random, which confirms the modularity property of disease genes in a PPI network as revealed by previous studies. We then retrieved PPI subnetworks spanning top breast cancer-associated KEGG pathways and found that the distribution of these genes on the subnetworks are non-random, suggesting that these KEGG pathways are activated non-uniformly. Taking advantage of the non-random distribution of breast cancer-associated genes, we developed an improved RCRWR algorithm to predict novel cancer genes, which integrates network reconstruction based on local random walk dynamics and subnetworks spanning KEGG pathways. Compared with the disease gene prediction without using the information from the KEGG pathways, this method has a better prediction performance on inferring breast cancer-associated genes, and the top predicted genes are better enriched on known breast cancer-associated gene ontologies. Finally, we performed a literature search on top predicted novel genes and found that most of them are supported by at least wet-lab experiments on cell lines. In summary, we propose a robust computational framework to prioritize novel breast cancer-associated genes, which could be used for further in vitro and in vivo experimental validation.

Collapse

LIN L, LIN X, LIU Z, ZHANG H, HAN Q, CHEN R, CHEN L, YAN J. Identification and analysis of key regulatory genes associated with pre-eclampsia: a systems biology approach. MINERVA BIOTECNOL 2021. [DOI: 10.23736/s1120-4826.20.02687-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Wang J, Kuang Z, Ma Z, Han G. GBDTL2E: Predicting lncRNA-EF Associations Using Diffusion and HeteSim Features Based on a Heterogeneous Network. Front Genet 2020;11:272. [PMID: 32351537 PMCID: PMC7174746 DOI: 10.3389/fgene.2020.00272] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Accepted: 03/06/2020] [Indexed: 12/02/2022] Open

Zhang G, Wang W, Huang W, Xie X, Liang Z, Cao H. Cross-disease analysis identified novel common genes for both lung adenocarcinoma and lung squamous cell carcinoma. Oncol Lett 2019;18:3463-3470. [PMID: 31516564 PMCID: PMC6732964 DOI: 10.3892/ol.2019.10678] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 05/25/2019] [Indexed: 12/25/2022] Open

Abstract

Lung squamous cell carcinoma (LSCC) exhibits a number of similarities with lung adenocarcinoma (LA) in terms of copy number alterations. However, compared with LA, the range of genetic alterations in LSCC is less understood. In the present study, a large-scale literature-based search of LA-associated genes and LSCC-associated genes was performed to identify the genetic basis in common with these two diseases. For each of the LA-associated genes, a mega-analysis was performed to test its expression variations in LSCC using 11 RNA expression datasets, with significant genes identified using statistical analysis. Subsequently, a functional pathway analysis was performed to identify a possible association between any of the significant genes identified from the mega-analysis and LSCC, followed by a co-expression analysis. A multiple linear regression (MLR) model was employed to investigate the possible influence of sample size, country of origin and study date on gene expression in patients with LSCC. Disease-gene association data analysis identified 1,178 genes involved in LA, 334 in LSCC, with a significant overlap of 187 genes (P<1.02×⁻¹⁶¹). Mega-analysis revealed that three LA-associated genes, such as solute carrier family 2 member 1 (SLC2A1), endothelial PAS domain protein 1 (EPAS1) and cyclin-dependent kinase 4 (CDK4), were significantly associated with LSCC (P<1.60×10⁻⁸), with multiple potential pathways identified by functional pathway analysis, which were further validated by co-expression analysis. The present MLR analysis suggested that the country of origin was a significant factor for the levels of expression of all three genes in patients with LSCC (P<4.0×10⁻³). Collectively, the present results suggested that genes associated with LA should be further investigated for their association with LSCC. In addition, SLC2A1, EPAS1 and CDK4 may be novel risk genes associated with LA and LSCC.

Collapse

Luo L, Zheng C, Wang J, Tan M, Li Y, Xu R. Analysis of disease organ as a novel phenotype towards disease genetics understanding. J Biomed Inform 2019;95:103235. [PMID: 31207382 PMCID: PMC6644057 DOI: 10.1016/j.jbi.2019.103235] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 06/06/2019] [Accepted: 06/13/2019] [Indexed: 11/24/2022]

Mariappan R, Rajan V. Deep collective matrix factorization for augmented multi-view learning. Mach Learn 2019. [DOI: 10.1007/s10994-019-05801-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Zheng C, Xu R. Large-scale mining disease comorbidity relationships from post-market drug adverse events surveillance data. BMC Bioinformatics 2018;19:500. [PMID: 30591027 PMCID: PMC6309066 DOI: 10.1186/s12859-018-2468-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Abstract

Background

Systems approaches in studying disease relationship have wide applications in biomedical discovery, such as disease mechanism understanding and drug discovery. The FDA Adverse Event Reporting System (FAERS) contains rich information about patient diseases, medications, drug adverse events and demographics of 17 million case reports. Here, we explored this data resource to mine disease comorbidity relationships using association rule mining algorithm and constructed a disease comorbidity network.

Results

We constructed a disease comorbidity network with 1059 disease nodes and 12,608 edges using association rule mining of FAERS (14,157 rules). We evaluated the performance of comorbidity mining from FAERS using known disease comorbidities of multiple sclerosis (MS), psoriasis and obesity that represent rare, moderate and common disease respectively. Comorbidities of MS, obesity and psoriasis obtained from our network achieved precisions of 58.6%, 73.7%, 56.2% and recalls 87.5%, 69.2% and 72.7% separately. We performed comparative analysis of the disease comorbidity network with disease semantic network, disease genetic network and disease treatment network. We showed that (1) disease comorbidity clusters exhibit significantly higher semantic similarity than random network (0.18 vs 0.10); (2) disease comorbidity clusters share significantly more genes (0.46 vs 0.06); and (3) disease comorbidity clusters share significantly more drugs (0.64 vs 0.17). Finally, we demonstrated that the disease comorbidity network has potential in uncovering novel disease relationships using asthma as a case study.

Conclusions

Our study presented the first comprehensive attempt to build a disease comorbidity network from FDA Adverse Event Reporting System. This network shows well correlated with disease semantic similarity, disease genetics and disease treatment, which has great potential in disease genetics prediction and drug discovery.

Collapse

Zheng C, Xu R. The Alzheimer's comorbidity phenome: mining from a large patient database and phenome-driven genetics prediction. JAMIA Open 2018;2:131-138. [PMID: 30944912 PMCID: PMC6434979 DOI: 10.1093/jamiaopen/ooy050] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 10/23/2018] [Accepted: 12/05/2018] [Indexed: 01/08/2023] Open

Haendel MA, McMurry JA, Relevo R, Mungall CJ, Robinson PN, Chute CG. A Census of Disease Ontologies. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013459] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Khordad M, Mercer RE. Identifying genotype-phenotype relationships in biomedical text. J Biomed Semantics 2017;8:57. [PMID: 29212530 PMCID: PMC5719522 DOI: 10.1186/s13326-017-0163-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 10/28/2017] [Indexed: 11/10/2022] Open