1
|
Hsieh AR, Tsai CY. Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction. Eur J Med Res 2024; 29:404. [PMID: 39095899 PMCID: PMC11297645 DOI: 10.1186/s40001-024-01983-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 07/17/2024] [Indexed: 08/04/2024] Open
Abstract
The supervised machine learning method is often used for biomedical relationship extraction. The disadvantage is that it requires much time and money to manually establish an annotated dataset. Based on distant supervision, the knowledge base is combined with the corpus, thus, the training corpus can be automatically annotated. As many biomedical databases provide knowledge bases for study with a limited number of annotated corpora, this method is practical in biomedicine. The clinical significance of each patient's genetic makeup can be understood based on the healthcare provider's genetic database. Unfortunately, the lack of previous biomedical relationship extraction studies focuses on gene-gene interaction. The main purpose of this study is to develop extraction methods for gene-gene interactions that can help explain the heritability of human complex diseases. This study referred to the information on gene-gene interactions in the KEGG PATHWAY database, the abstracts in PubMed were adopted to generate the training sample set, and the graph kernel method was adopted to extract gene-gene interactions. The best assessment result was an F1-score of 0.79. Our developed distant supervision method automatically finds sentences through the corpus without manual labeling for extracting gene-gene interactions, which can effectively reduce the time cost for manual annotation data; moreover, the relationship extraction method based on a graph kernel can be successfully applied to extract gene-gene interactions. In this way, the results of this study are expected to help achieve precision medicine.
Collapse
Affiliation(s)
- Ai-Ru Hsieh
- Department of Statistics, Tamkang University, Tamsui District, New Taipei City, 251301, Taiwan.
| | - Chen-Yu Tsai
- Department of Statistics, Tamkang University, Tamsui District, New Taipei City, 251301, Taiwan
| |
Collapse
|
2
|
Alqahtani SM, Altharawi A, Alabbas A, Ahmad F, Ayaz H, Nawaz A, Rahman S, Alossaimi MA. System biology approach to identify the novel biomarkers in glioblastoma multiforme tumors by using computational analysis. Front Pharmacol 2024; 15:1364138. [PMID: 38841373 PMCID: PMC11150670 DOI: 10.3389/fphar.2024.1364138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/22/2024] [Indexed: 06/07/2024] Open
Abstract
Introduction: The most common primary brain tumor in adults is glioblastoma multiforme (GBM), accounting for 45.2% of all cases. The characteristics of GBM, a highly aggressive brain tumor, include rapid cell division and a propensity for necrosis. Regretfully, the prognosis is extremely poor, with only 5.5% of patients surviving after diagnosis. Methodology: To eradicate these kinds of complicated diseases, significant focus is placed on developing more effective drugs and pinpointing precise pharmacological targets. Finding appropriate biomarkers for drug discovery entails considering a variety of factors, including illness states, gene expression levels, and interactions between proteins. Using statistical techniques like p-values and false discovery rates, we identified differentially expressed genes (DEGs) as the first step in our research for identifying promising biomarkers in GBM. Of the 132 genes, 13 showed upregulation, and only 29 showed unique downregulation. No statistically significant changes in the expression of the remaining genes were observed. Results: Matrix metallopeptidase 9 (MMP9) had the greatest degree in the hub biomarker gene identification, followed by (periostin (POSTN) at 11 and Hes family BHLH transcription factor 5 (HES5) at 9. The significance of the identification of each hub biomarker gene in the initiation and advancement of glioblastoma multiforme was brought to light by the survival analysis. Many of these genes participate in signaling networks and function in extracellular areas, as demonstrated by the enrichment analysis.We also identified the transcription factors and kinases that control proteins in the proteinprotein interactions (PPIs) of the DEGs. Discussion: We discovered drugs connected to every hub biomarker. It is an appealing therapeutic target for inhibiting MMP9 involved in GBM. Molecular docking investigations indicated that the chosen complexes (carmustine, lomustine, marimastat, and temozolomide) had high binding affinities of -6.3, -7.4, -7.7, and -8.7 kcal/mol, respectively, the mean root-mean-square deviation (RMSD) value for the carmustine complex and marimastat complex was 4.2 Å and 4.9 Å, respectively, and the lomustine and temozolomide complex system showed an average RMSD of 1.2 Å and 1.6 Å, respectively. Additionally, high stability in root-mean-square fluctuation (RMSF) analysis was observed with no structural conformational changes among the atomic molecules. Thus, these in silico investigations develop a new way for experimentalists to target lethal diseases in future.
Collapse
Affiliation(s)
- Safar M. Alqahtani
- Department of Pharmaceutical Chemistry, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia
| | - Ali Altharawi
- Department of Pharmaceutical Chemistry, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia
| | - Alhumaidi Alabbas
- Department of Pharmaceutical Chemistry, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia
| | - Faisal Ahmad
- Foundation University Medical College, Foundation University Islamabad, Islamabad, Pakistan
- School of Biology Georgia Institute of Technology, Atlanta, GA, United States
| | - Hassan Ayaz
- Department of Biotechnology, Quaid-i-Azam University Islamabad, Islamabad, Pakistan
| | - Asia Nawaz
- Department of Biotechnology, Quaid-i-Azam University Islamabad, Islamabad, Pakistan
| | - Sidra Rahman
- Department of Biotechnology, Quaid-i-Azam University Islamabad, Islamabad, Pakistan
| | - Manal A. Alossaimi
- Department of Pharmaceutical Chemistry, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia
| |
Collapse
|
3
|
Ayaz H, Ahmad F, Ahmad S, Arfan Q, Alasmari AF, Siddique F, Rehman B, Zeb A, Crovella S, Ali SS, Waheed Y, Suleman M. Network-base approaches to identify therapeutic biomarkers in hepatocellular carcinoma and search for drug hunting utilizing molecular dynamics simulations. J Biomol Struct Dyn 2024:1-17. [PMID: 38486461 DOI: 10.1080/07391102.2024.2326197] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 02/27/2024] [Indexed: 12/06/2024]
Abstract
The presence of conditions like Alpha-1 antitrypsin deficiency, hemochromatosis, non-alcoholic fatty liver diseases and metabolic syndrome can elevate the susceptibility to hepatic cellular carcinoma (HCC). Utilizing network-based gene expression profiling via network analyst tools, presents a novel approach for drug target discovery. The significance level (p-score) obtained through Cytoscape in the intended center gene survival assessment confirms the identification of all target center genes, which play a fundamental role in disease formation and progression in HCC. A total of 1064 deferential expression genes were found. These include MCM2 with the highest degree, followed by 4917 MCM6 and MCM4 with a 3944-degree score. We investigated the regulatory kinases involved in establishing the protein-protein interactions network using X2K web tool. The docking approach yields a favorable binding affinity of -8.7 kcal/mol against the target MCM2 using Auto-Dock Vina. Interestingly after simulating the complex system via AMBER16 package, results showed that the root mean square deviation values remained within 4.74 Å for a protein and remains stable throughout the time intervals. Additionally, the ligand's fit to the protein exhibited fluctuations at some intervals but remains stable. Finally, Gibbs free energy was found to be at its lowest at 1 kcal/mol which presents the real time interactive binding of the atomic residues among inhibitor and protein. The displacement of the ligand was measured showing stable movement and displacement along the active site. These findings increased our understanding for potential biomarkers in hepatocellular carcinoma and an experimental approach will further enhance our outcomes in future.
Collapse
Affiliation(s)
- Hassan Ayaz
- Centre for Biotechnology and Microbiology, University of Swat, Mingora, Pakistan
- Department of Biotechnology, Quaid-I-Azam University, Islamabad, Pakistan
| | - Faisal Ahmad
- Foundation University Medical College, Foundation University Islamabad, DHA-I, Islamabad, Pakistan
- School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Sajjad Ahmad
- Department of Health and Biological Sciences, Abasyn University, Peshawar, Pakistan
- Gilbert and Rose-Marie Chagoury School of Medicine, Lebanese American University, Beirut, Lebanon
- Department of Natural Sciences, Lebanese American University, Beirut, Lebanon
| | - Qaiser Arfan
- Department of Bioinformatics, Hazara University, Mansehra, Pakistan
| | - Abdullah F Alasmari
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Farhan Siddique
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Bahauddin Zakriya University Multan, Multan, Pakistan
| | - Bushra Rehman
- Institute of Biotechnology and Microbiology, Bacha khan University, Charsadda, Pakistan
| | - Adnan Zeb
- Centre for Biotechnology and Microbiology, University of Swat, Mingora, Pakistan
| | - Sergio Crovella
- Laboratory Animal Research Centre, Qatar University, Doha, Qatar
| | - Syed Shujait Ali
- Centre for Biotechnology and Microbiology, University of Swat, Mingora, Pakistan
| | - Yasir Waheed
- Gilbert and Rose-Marie Chagoury School of Medicine, Lebanese American University, Beirut, Lebanon
- Bridging Health Foundation, Rawalpindi, Pakistan
| | - Muhammad Suleman
- Centre for Biotechnology and Microbiology, University of Swat, Mingora, Pakistan
- Institute of Biotechnology and Microbiology, Bacha khan University, Charsadda, Pakistan
| |
Collapse
|
4
|
Soares GH, Ribeiro Santiago PH, Biazevic MGH, Michel-Crosato E, Jamieson L. Dynamics in oral health-related factors of Indigenous Australian children: A network analysis of a randomized controlled trial. Community Dent Oral Epidemiol 2022; 50:251-259. [PMID: 34050531 DOI: 10.1111/cdoe.12661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 04/28/2021] [Accepted: 04/30/2021] [Indexed: 12/15/2022]
Abstract
OBJECTIVES Network analysis is an innovative, analytic approach that enables visual representation of variables as nodes and their corresponding statistical associations as edges. It also provides a new way of framing oral health-related questions as complex systems of variables. We aimed to generate networks of oral health variables using epidemiological data of Indigenous children, and to compare network structures of oral health variables among participants who received immediate or delayed delivery of an oral health intervention. METHODS Epidemiological data from 448 mother-child dyads enrolled in a randomized controlled trial of dental caries prevention in South Australia, Australia, were obtained. Networks were estimated with nodes representing study variables and edges representing partial correlation coefficients between variables. Data included dental caries, impact on quality of life, self-rated general health, self-rated oral health, dental service utilization, knowledge of oral health, fatalism and self-efficacy in three time points. Communities of nodes, centrality, clustering coefficient and network stability were estimated. RESULTS The oral health intervention interacted with the network through self-rated general health and knowledge of oral health. Networks depicting groups shortly after receiving the intervention presented higher clustering coefficients and a similar arrangement of nodes. Networks tended to return to a preintervention state. CONCLUSION The intervention resulted in increased connectivity and changes in the structure of communities of variables in both intervention groups. Our findings contribute to elucidating dynamics between variables depicting oral health networks over time.
Collapse
Affiliation(s)
| | | | | | | | - Lisa Jamieson
- Australian Research Centre for Population Oral Health, The University of Adelaide, Adelaide, SA, Australia
| |
Collapse
|
5
|
Khan A, Rehman Z, Hashmi HF, Khan AA, Junaid M, Sayaf AM, Ali SS, Hassan FU, Heng W, Wei DQ. An Integrated Systems Biology and Network-Based Approaches to Identify Novel Biomarkers in Breast Cancer Cell Lines Using Gene Expression Data. Interdiscip Sci 2020; 12:155-168. [DOI: 10.1007/s12539-020-00360-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Revised: 12/31/2019] [Accepted: 01/18/2020] [Indexed: 12/12/2022]
|
6
|
Perscheid C, Grasnick B, Uflacker M. Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches. J Integr Bioinform 2018; 16:/j/jib.ahead-of-print/jib-2018-0064/jib-2018-0064.xml. [PMID: 30785707 PMCID: PMC6798862 DOI: 10.1515/jib-2018-0064] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/12/2018] [Indexed: 12/30/2022] Open
Abstract
The advance of high-throughput RNA-Sequencing techniques enables researchers to analyze the complete gene activity in particular cells. From the insights of such analyses, researchers can identify disease-specific expression profiles, thus understand complex diseases like cancer, and eventually develop effective measures for diagnosis and treatment. The high dimensionality of gene expression data poses challenges to its computational analysis, which is addressed with measures of gene selection. Traditional gene selection approaches base their findings on statistical analyses of the actual expression levels, which implies several drawbacks when it comes to accurately identifying the underlying biological processes. In turn, integrative approaches include curated information on biological processes from external knowledge bases during gene selection, which promises to lead to better interpretability and improved predictive performance. Our work compares the performance of traditional and integrative gene selection approaches. Moreover, we propose a straightforward approach to integrate external knowledge with traditional gene selection approaches. We introduce a framework enabling the automatic external knowledge integration, gene selection, and evaluation. Evaluation results prove our framework to be a useful tool for evaluation and show that integration of external knowledge improves overall analysis results.
Collapse
Affiliation(s)
- Cindy Perscheid
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Bastien Grasnick
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Matthias Uflacker
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| |
Collapse
|