1
|
Tuly KF, Hossen MB, Islam MA, Kibria MK, Alam MS, Harun-Or-Roshid M, Begum AA, Hasan S, Mahumud RA, Mollah MNH. Robust Identification of Differential Gene Expression Patterns from Multiple Transcriptomics Datasets for Early Diagnosis, Prognosis, and Therapies for Breast Cancer. MEDICINA (KAUNAS, LITHUANIA) 2023; 59:1705. [PMID: 37893423 PMCID: PMC10608013 DOI: 10.3390/medicina59101705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/07/2023] [Accepted: 09/20/2023] [Indexed: 10/29/2023]
Abstract
Background and Objectives: Breast cancer (BC) is one of the major causes of cancer-related death in women globally. Proper identification of BC-causing hub genes (HubGs) for prognosis, diagnosis, and therapies at an earlier stage may reduce such death rates. However, most of the previous studies detected HubGs through non-robust statistical approaches that are sensitive to outlying observations. Therefore, the main objectives of this study were to explore BC-causing potential HubGs from robustness viewpoints, highlighting their early prognostic, diagnostic, and therapeutic performance. Materials and Methods: Integrated robust statistics and bioinformatics methods and databases were used to obtain the required results. Results: We robustly identified 46 common differentially expressed genes (cDEGs) between BC and control samples from three microarrays (GSE26910, GSE42568, and GSE65194) and one scRNA-seq (GSE235168) dataset. Then, we identified eight cDEGs (COL11A1, COL10A1, CD36, ACACB, CD24, PLK1, UBE2C, and PDK4) as the BC-causing HubGs by the protein-protein interaction (PPI) network analysis of cDEGs. The performance of BC and survival probability prediction models with the expressions of HubGs from two independent datasets (GSE45827 and GSE54002) and the TCGA (The Cancer Genome Atlas) database showed that our proposed HubGs might be considered as diagnostic and prognostic biomarkers, where two genes, COL11A1 and CD24, exhibit better performance. The expression analysis of HubGs by Box plots with the TCGA database in different stages of BC progression indicated their early diagnosis and prognosis ability. The HubGs set enrichment analysis with GO (Gene ontology) terms and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways disclosed some BC-causing biological processes, molecular functions, and pathways. Finally, we suggested the top-ranked six drug molecules (Suramin, Rifaximin, Telmisartan, Tukysa Tucatinib, Lynparza Olaparib, and TG.02) for the treatment of BC by molecular docking analysis with the proposed HubGs-mediated receptors. Molecular docking analysis results also showed that these drug molecules may inhibit cancer-related post-translational modification (PTM) sites (Succinylation, phosphorylation, and ubiquitination) of hub proteins. Conclusions: This study's findings might be valuable resources for diagnosis, prognosis, and therapies at an earlier stage of BC.
Collapse
Affiliation(s)
- Khanis Farhana Tuly
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh; (K.F.T.); (M.B.H.); (M.A.I.); (M.K.K.); (M.S.A.); (M.H.-O.-R.); (A.A.B.)
| | - Md. Bayazid Hossen
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh; (K.F.T.); (M.B.H.); (M.A.I.); (M.K.K.); (M.S.A.); (M.H.-O.-R.); (A.A.B.)
| | - Md. Ariful Islam
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh; (K.F.T.); (M.B.H.); (M.A.I.); (M.K.K.); (M.S.A.); (M.H.-O.-R.); (A.A.B.)
| | - Md. Kaderi Kibria
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh; (K.F.T.); (M.B.H.); (M.A.I.); (M.K.K.); (M.S.A.); (M.H.-O.-R.); (A.A.B.)
- Department of Statistics, Hajee Mohammad Danesh Science & Technology University, Dinajpur 5200, Bangladesh
| | - Md. Shahin Alam
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh; (K.F.T.); (M.B.H.); (M.A.I.); (M.K.K.); (M.S.A.); (M.H.-O.-R.); (A.A.B.)
| | - Md. Harun-Or-Roshid
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh; (K.F.T.); (M.B.H.); (M.A.I.); (M.K.K.); (M.S.A.); (M.H.-O.-R.); (A.A.B.)
| | - Anjuman Ara Begum
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh; (K.F.T.); (M.B.H.); (M.A.I.); (M.K.K.); (M.S.A.); (M.H.-O.-R.); (A.A.B.)
| | - Sohel Hasan
- Molecular and Biomedical Health Science Lab, Department of Biochemistry and Molecular Biology, University of Rajshahi, Rajshahi 6205, Bangladesh;
| | - Rashidul Alam Mahumud
- NHMRC Clinical Trials Centre, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia;
| | - Md. Nurul Haque Mollah
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh; (K.F.T.); (M.B.H.); (M.A.I.); (M.K.K.); (M.S.A.); (M.H.-O.-R.); (A.A.B.)
| |
Collapse
|
2
|
Hossen MB, Islam MA, Reza MS, Kibria MK, Horaira MA, Tuly KF, Faruqe MO, Kabir F, Mollah MNH. Robust identification of common genomic biomarkers from multiple gene expression profiles for the prognosis, diagnosis, and therapies of pancreatic cancer. Comput Biol Med 2023; 152:106411. [PMID: 36502691 DOI: 10.1016/j.compbiomed.2022.106411] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 11/17/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022]
Abstract
Pancreatic cancer (PC) is one of the leading causes of cancer-related death globally. So, identification of potential molecular signatures is required for diagnosis, prognosis, and therapies of PC. In this study, we detected 71 common differentially expressed genes (cDEGs) between PC and control samples from four microarray gene-expression datasets (GSE15471, GSE16515, GSE71989, and GSE22780) by using robust statistical and machine learning approaches, since microarray gene-expression datasets are often contaminated by outliers due to several steps involved in the data generating processes. Then we detected 8 cDEGs (ADAM10, COL1A2, FN1, P4HB, ITGB1, ITGB5, ANXA2, and MYOF) as the PC-causing key genes (KGs) by the protein-protein interaction (PPI) network analysis. We validated the expression patterns of KGs between case and control samples by box plot analysis with the TCGA and GTEx databases. The proposed KGs showed high prognostic power with the random forest (RF) based prediction model and Kaplan-Meier-based survival probability curve. The KGs regulatory network analysis detected few transcriptional and post-transcriptional regulators for KGs. The cDEGs-set enrichment analysis revealed some crucial PC-causing molecular functions, biological processes, cellular components, and pathways that are associated with KGs. Finally, we suggested KGs-guided five repurposable drug molecules (Linsitinib, CX5461, Irinotecan, Timosaponin AIII, and Olaparib) and a new molecule (NVP-BHG712) against PC by molecular docking. The stability of the top three protein-ligand complexes was confirmed by molecular dynamic (MD) simulation studies. The cross-validation and some literature reviews also supported our findings. Therefore, the finding of this study might be useful resources to the researchers and medical doctors for diagnosis, prognosis and therapies of PC by the wet-lab validation.
Collapse
Affiliation(s)
- Md Bayazid Hossen
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Ariful Islam
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Selim Reza
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Kaderi Kibria
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Abu Horaira
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Khanis Farhana Tuly
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Omar Faruqe
- Department of Computer Science and Engineering, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Firoz Kabir
- Department of Ophthalmology and Visual Sciences, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Md Nurul Haque Mollah
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
3
|
Akond Z, Ahsan MA, Alam M, Mollah MNH. Robustification of GWAS to explore effective SNPs addressing the challenges of hidden population stratification and polygenic effects. Sci Rep 2021; 11:13060. [PMID: 34158546 PMCID: PMC8219685 DOI: 10.1038/s41598-021-90774-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 05/12/2021] [Indexed: 11/24/2022] Open
Abstract
Genome-wide association studies (GWAS) play a vital role in identifying important genes those is associated with the phenotypic variations of living organisms. There are several statistical methods for GWAS including the linear mixed model (LMM) which is popular for addressing the challenges of hidden population stratification and polygenic effects. However, most of these methods including LMM are sensitive to phenotypic outliers that may lead the misleading results. To overcome this problem, in this paper, we proposed a way to robustify the LMM approach for reducing the influence of outlying observations using the β-divergence method. The performance of the proposed method was investigated using both synthetic and real data analysis. Simulation results showed that the proposed method performs better than both linear regression model (LRM) and LMM approaches in terms of powers and false discovery rates in presence of phenotypic outliers. On the other hand, the proposed method performed almost similar to LMM approach but much better than LRM approach in absence of outliers. In the case of real data analysis, our proposed method identified 11 SNPs that are significantly associated with the rice flowering time. Among the identified candidate SNPs, some were involved in seed development and flowering time pathways, and some were connected with flower and other developmental processes. These identified candidate SNPs could assist rice breeding programs effectively. Thus, our findings highlighted the importance of robust GWAS in identifying candidate genes.
Collapse
Affiliation(s)
- Zobaer Akond
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
- Institute of Environmental Science, University of Rajshahi, Rajshahi, 6205, Bangladesh
- Agricultural Statistics and ICT Division, Bangladesh Agricultural Research Institute (BARI), Gazipur, 1701, Bangladesh
| | - Md Asif Ahsan
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Munirul Alam
- Molecular Ecology and Metagenomic Laboratory, Infectious Diseases Division, International Centre for Diarrheal Disease Research (Icddr,b), Rajshahi, Bangladesh
| | - Md Nurul Haque Mollah
- Bioinformatics Lab, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
4
|
Liu N, Song T, Zhang S, Liu H, Zhao X, Shao Y, Li C, Zhang W. Characterization of the Potential Probiotic Vibrio sp. V33 Antagonizing Vibrio Splendidus Based on Iron Competition. IRANIAN JOURNAL OF BIOTECHNOLOGY 2020; 18:e2259. [PMID: 32884955 PMCID: PMC7461713 DOI: 10.30498/ijb.2019.85192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Background: Vibrio Splendidus Vs is an important aquaculture pathogen that can infect a broad host of marine organisms. In our previous study,
an antagonistic bacterium Vibrio sp. V33 that possessed inhibitory effects on the growth and virulence of a pathogenic isolate V. splendidus Vs was identified. Objectives: Here, we further explored the antagonistic substances and antagonistic effects from the viewpoint of iron competition. Materials and Methods: The main antagonistic substances in the supernatants from Vibrio sp. V33 were identified using the bioassay-guided method.
The response of V. splendidus Vs under the challenge of cell-free supernatant from Vibrio sp. V33 was determined via sodium dodecyl
sulfate-polyacrylamide gel electrophoresis and real-time reverse-transcription PCR. Results: The main antagonistic substances produced by Vibrio sp. V33 have low molecular weights, are water soluble, and are heat-stable substances.
Meanwhile, the iron uptake rate of Vibrio sp. V33 was higher than that of V. splendidus Vs. In the presence of cell-free supernatant
from Vibrio sp. V33, expressions of two functional genes, viuB and asbJ related to ferric uptake processes in V. splendidus Vs,
were up-regulated, whereas furVs coding the ferric uptake repressor was suppressed below 0.5-fold. One gene coding phosphopyruvate
hydratase does not change at mRNA level, but was up-regulated at protein level. Conclusions: Our results suggested that antagonistic effect of Vibrio sp. V33 on the pathogenic isolate V. splendidus Vs was partially due to the stronger
ability of Vibrio sp. V33 to seize iron. This cell-free supernatant from Vibrio sp. V33 created an iron-limited milieu for V. splendidus Vs,
which led to the changed expression profiles of genes that were related to iron uptake in V. splendidus Vs.
Collapse
Affiliation(s)
- Ningning Liu
- School of Marine Sciences, Ningbo University, Ningbo 315211, P.R. China
| | - Tongxiang Song
- School of Marine Sciences, Ningbo University, Ningbo 315211, P.R. China
| | - Shanshan Zhang
- School of Marine Sciences, Ningbo University, Ningbo 315211, P.R. China
| | - Huijie Liu
- School of Marine Sciences, Ningbo University, Ningbo 315211, P.R. China
| | - Xuelin Zhao
- School of Marine Sciences, Ningbo University, Ningbo 315211, P.R. China
| | - Yina Shao
- School of Marine Sciences, Ningbo University, Ningbo 315211, P.R. China
| | - Chenghua Li
- School of Marine Sciences, Ningbo University, Ningbo 315211, P.R. China
| | - Weiwei Zhang
- School of Marine Sciences, Ningbo University, Ningbo 315211, P.R. China
| |
Collapse
|
5
|
A Robust Approach for Identification of Cancer Biomarkers and Candidate Drugs. ACTA ACUST UNITED AC 2019; 55:medicina55060269. [PMID: 31212673 PMCID: PMC6631768 DOI: 10.3390/medicina55060269] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 03/17/2019] [Accepted: 06/01/2019] [Indexed: 12/31/2022]
Abstract
Background and objectives: Identification of cancer biomarkers that are differentially expressed (DE) between two biological conditions is an important task in many microarray studies. There exist several methods in the literature in this regards and most of these methods designed especially for unpaired samples, those are not suitable for paired samples. Furthermore, the traditional methods use p-values or fold change (FC) values to detect the DE genes. However, sometimes, p-value based results do not comply with FC based results due to the smaller pooled variance of gene expressions, which occurs when variance of each individual condition becomes smaller. There are some methods that combine both p-values and FC values to solve this problem. But, those methods also show weak performance for small sample cases in the presence of outlying expressions. To overcome this problem, in this paper, an attempt is made to propose a hybrid robust SAM-FC approach by combining rank of FC values and rank of p-values computed by SAM statistic using minimum β-divergence method, which is designed for paired samples. Materials and Methods: The proposed method introduces a weight function known as β-weight function. This weight function produces larger weights corresponding to usual and smaller weights for unusual expressions. The β-weight function plays the significant role on the performance of the proposed method. The proposed method uses β-weight function as a measure of outlier detection by setting β = 0.2. We unify both classical and robust estimates using β-weight function, such that maximum likelihood estimators (MLEs) are used in absence of outliers and minimum β-divergence estimators are used in presence of outliers to obtain reasonable p-values and FC values in the proposed method. Results: We examined the performance of proposed method in a comparison of some popular methods (t-test, SAM, LIMMA, Wilcoxon, WAD, RP, and FCROS) using both simulated and real gene expression profiles for both small and large sample cases. From the simulation and a real spike in data analysis results, we observed that the proposed method outperforms other methods for small sample cases in the presence of outliers and it keeps almost equal performance with other robust methods (Wilcoxon, RP, and FCROS) otherwise. From the head and neck cancer (HNC) gene expression dataset, the proposed method identified two additional genes (CYP3A4 and NOVA1) that are significantly enriched in linoleic acid metabolism, drug metabolism, steroid hormone biosynthesis and metabolic pathways. The survival analysis through Kaplan-Meier curve revealed that combined effect of these two genes has prognostic capability and they might be promising biomarker of HNC. Moreover, we retrieved the 12 candidate drugs based on gene interaction from glad4u and drug bank literature based gene associations. Conclusions: Using pathway analysis, disease association study, protein-protein interactions and survival analysis we found that our proposed two additional genes might be involved in the critical pathways of cancer. Furthermore, the identified drugs showed statistical significance which indicates that proteins associated with these genes might be therapeutic target in cancer.
Collapse
|
6
|
Robust Significance Analysis of Microarrays by Minimum β-Divergence Method. BIOMED RESEARCH INTERNATIONAL 2017; 2017:5310198. [PMID: 28819626 PMCID: PMC5551475 DOI: 10.1155/2017/5310198] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 05/28/2017] [Indexed: 11/18/2022]
Abstract
Identification of differentially expressed (DE) genes with two or more conditions is an important task for discovery of few biomarker genes. Significance Analysis of Microarrays (SAM) is a popular statistical approach for identification of DE genes for both small- and large-sample cases. However, it is sensitive to outlying gene expressions and produces low power in presence of outliers. Therefore, in this paper, an attempt is made to robustify the SAM approach using the minimum β-divergence estimators instead of the maximum likelihood estimators of the parameters. We demonstrated the performance of the proposed method in a comparison of some other popular statistical methods such as ANOVA, SAM, LIMMA, KW, EBarrays, GaGa, and BRIDGE using both simulated and real gene expression datasets. We observe that all methods show good and almost equal performance in absence of outliers for the large-sample cases, while in the small-sample cases only three methods (SAM, LIMMA, and proposed) show almost equal and better performance than others with two or more conditions. However, in the presence of outliers, on an average, only the proposed method performs better than others for both small- and large-sample cases with each condition.
Collapse
|
7
|
Taguchi YH. Principal Components Analysis Based Unsupervised Feature Extraction Applied to Gene Expression Analysis of Blood from Dengue Haemorrhagic Fever Patients. Sci Rep 2017; 7:44016. [PMID: 28276456 PMCID: PMC5343617 DOI: 10.1038/srep44016] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 02/02/2017] [Indexed: 12/12/2022] Open
Abstract
Dengue haemorrhagic fever (DHF) sometimes occurs after recovery from the disease caused by Dengue virus (DENV), and is often fatal. However, the mechanism of DHF has not been determined, possibly because no suitable methodologies are available to analyse this disease. Therefore, more innovative methods are required to analyse the gene expression profiles of DENV-infected patients. Principal components analysis (PCA)-based unsupervised feature extraction (FE) was applied to the gene expression profiles of DENV-infected patients, and an integrated analysis of two independent data sets identified 46 genes as critical for DHF progression. PCA using only these 46 genes rendered the two data sets highly consistent. The application of PCA to the 46 genes of an independent third data set successfully predicted the progression of DHF. A fourth in vitro data set confirmed the identification of the 46 genes. These 46 genes included interferon- and heme-biosynthesis-related genes. The former are enriched in binding sites for STAT1, STAT2, and IRF1, which are associated with DHF-promoting antibody-dependent enhancement, whereas the latter are considered to be related to the dysfunction of spliceosomes, which may mediate haemorrhage. These results are outcomes that other type of bioinformatic analysis could hardly achieve.
Collapse
Affiliation(s)
- Y-h. Taguchi
- Department of Physics, Chuo University, Tokyo, 112-8551, Japan
| |
Collapse
|
8
|
Chirumbolo S, Bjørklund G. Commentary: Arnica Montana Effects on Gene Expression in a Human Macrophage Cell Line: Evaluation by Quantitative Real-Time PCR. Front Immunol 2016; 7:280. [PMID: 27660630 PMCID: PMC5015595 DOI: 10.3389/fimmu.2016.00280] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2016] [Accepted: 07/12/2016] [Indexed: 12/26/2022] Open
Affiliation(s)
| | - Geir Bjørklund
- Council for Nutritional and Environmental Medicine , Mo i Rana , Norway
| |
Collapse
|