1
|
Yang J, Guan P, Yu D, Li Q, Wang X, Xu G, Liu X. MetCohort: Precise Feature Detection and Correspondence for Untargeted Metabolomics in Large-Scale Cohort Studies. Anal Chem 2025. [PMID: 40327722 DOI: 10.1021/acs.analchem.4c04906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025]
Abstract
Liquid chromatography-high-resolution mass spectrometry (LC-HRMS)-based untargeted metabolomics is becoming increasingly popular in large-scale cohort studies. However, its data processing is complex and challenging. We present MetCohort, a computational tool for performing metabolomics raw data alignment for large-scale sample analysis, and accurate feature detection and quantification. By combining chromatogram profile alignment and local anchor matching with an outlier removal algorithm, the retention times of the raw data were aligned. With aligned retention times across all the samples, regions of interest (ROIs) are detected and stacked among samples to form a two-dimensional (2D) ROI-matrix. This 2D ROI-matrix, resembling an image with rows representing samples and columns corresponding to the time, allows the application of image processing techniques. Since the peaks are already aligned in the alignment step, features can be accurately detected and quantified with automatic correspondence of all the samples. Based on the 2D image processing technique, holistic scale feature detection is performed, which not only significantly decreases the number of false-positives and improves the detection of low-intensity compounds, but also avoids tricky peak matching and quantification uncertainty. Overall, MetCohort has potential to enhance the accuracy and efficiency of data processing in large-scale LC-HRMS.
Collapse
Affiliation(s)
- Jun Yang
- State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Pengwei Guan
- State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Di Yu
- State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Qi Li
- State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Xiaolin Wang
- State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Guowang Xu
- State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xinyu Liu
- State Key Laboratory of Medical Proteomics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
2
|
Gong Y, Dai Y, Wu Q, Guo L, Yao X, Yang Q. Benchmark of Data Integration in Single-Cell Proteomics. Anal Chem 2025; 97:1254-1263. [PMID: 39761355 DOI: 10.1021/acs.analchem.4c04933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Single-cell proteomics (SCP) detected based on different technologies always involves batch-specific variations because of differences in sample processing and other potential biases. How to integrate SCP data effectively has become a great challenge. Integration of SCP data not only requires the conservation of true biological variances, but also realizes the removal of unwanted batch effects. In this study, benchmarking analysis of popular data integration methods was conducted to determine the most suitable method for SCP data. To comprehensively evaluate the performance of these integration methods, a novel evaluation system was proposed for integrating SCP data. This evaluation system consists of three objective measures from different perspectives: category (a), the efficacy of correcting batch effects; category (b), the power of conserving biological variances; and category (c), the ability to identify consistent markers. For this comprehensive evaluation, five benchmark data sets under different scenarios (containing substantial proteins, substantial cells, multiple batches, multiple cell types, and unbalanced data) were utilized for selecting the most suitable data integration method. As a result, three methods, ComBat, Scanorama, and Seurat version 3 CCA, were identified as the most recommended methods for integrating SCP data. Overall, this systematic evaluation might provide valuable guidance in choosing the appropriate method for data integration in the SCP.
Collapse
Affiliation(s)
- Yaguo Gong
- School of Pharmacy, State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Macao 999078, China
| | - Yangbo Dai
- State Key Laboratory for Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Qibiao Wu
- School of Pharmacy, State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Macao 999078, China
| | - Li Guo
- State Key Laboratory for Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
| |
Collapse
|
3
|
Hansen J, Kunert C, Raezke KP, Seifert S. Detection of Sugar Syrups in Honey Using Untargeted Liquid Chromatography-Mass Spectrometry and Chemometrics. Metabolites 2024; 14:633. [PMID: 39590869 PMCID: PMC11596609 DOI: 10.3390/metabo14110633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 11/11/2024] [Accepted: 11/14/2024] [Indexed: 11/28/2024] Open
Abstract
Background: Honey is one of the most adulterated foods worldwide, and several analytical methods have been developed over the last decade to detect syrup additions to honey. These include approaches based on stable isotopes and the specific detection of individual marker compounds or foreign enzymes. Proton nuclear magnetic resonance (1H-NMR) spectroscopy is applied as a rapid and comprehensive screening method, which also enables the detection of quality parameters and the analysis of the geographical and botanical origin. However, especially for the detection of foreign sugars, 1H-NMR has insufficient sensitivity. Methods: Since untargeted liquid chromatography-mass spectrometry (LC-MS) is more sensitive, we used this approach for the detection of positive and negative ions in combination with a recently developed data processing workflow for routine laboratories based on bucketing and random forest for the detection of rice, beet and high-fructose corn syrup in honey. Results: We show that the distinction between pure and adulterated honey is possible for all three syrups, with classification accuracies ranging from 98 to 100%, while the accuracy of the syrup content estimation depends on the respective syrup. For rice and beet syrup, the deviations from the true proportion were in the single-digit percentage range, while for high-fructose corn syrup they were much higher, in some cases exceeding 20%. Conclusions: The approach presented here is very promising for the robust and sensitive detection of syrup in honey applied in routine laboratories.
Collapse
Affiliation(s)
- Jule Hansen
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| | - Christof Kunert
- Eurofins Food Integrity Control Services GmbH, Berliner Str. 2, 27721 Ritterhude, Germany
| | - Kurt-Peter Raezke
- Eurofins Food Integrity Control Services GmbH, Berliner Str. 2, 27721 Ritterhude, Germany
| | - Stephan Seifert
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| |
Collapse
|
4
|
Hansen J, Kunert C, Münstermann H, Raezke KP, Seifert S. Application of untargeted liquid chromatography-mass spectrometry to routine analysis of food using three-dimensional bucketing and machine learning. Sci Rep 2024; 14:16594. [PMID: 39026016 PMCID: PMC11258308 DOI: 10.1038/s41598-024-67459-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 07/11/2024] [Indexed: 07/20/2024] Open
Abstract
For the detection of food adulteration, sensitive and reproducible analytical methods are required. Liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS) is a highly sensitive method that can be used to obtain analytical fingerprints consisting of a variety of different components. Since the comparability of measurements carried out with different devices and at different times is not given, specific adulterants are usually detected in targeted analyses instead of analyzing the entire fingerprint. However, this comprehensive analysis is desirable in order to stay ahead in the race against food fraudsters, who are constantly adapting their adulterations to the latest state of the art in analytics. We have developed and optimized an approach that enables the separate processing of untargeted LC‑HRMS data obtained from different devices and at different times. We demonstrate this by the successful determination of the geographical origin of honey samples using a random forest model. We then show that this approach can be applied to develop a continuously learning classification model and our final model, based on data from 835 samples, achieves a classification accuracy of 94% for 126 test samples from 6 different countries.
Collapse
Affiliation(s)
- Jule Hansen
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Christof Kunert
- Eurofins Food Integrity Control Services GmbH, Berliner Str. 2, 27721, Ritterhude, Germany
| | - Hella Münstermann
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
| | - Kurt-Peter Raezke
- Eurofins Food Integrity Control Services GmbH, Berliner Str. 2, 27721, Ritterhude, Germany
| | - Stephan Seifert
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany.
| |
Collapse
|
5
|
Cui Y, Hu M, Zhou H, Guo J, Wang Q, Xu Z, Chen L, Zhang W, Tang S. Identifying potential drug targets for varicose veins through integration of GWAS and eQTL summary data. Front Genet 2024; 15:1385293. [PMID: 38818040 PMCID: PMC11138158 DOI: 10.3389/fgene.2024.1385293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/17/2024] [Indexed: 06/01/2024] Open
Abstract
Background Varicose veins (VV) are a common chronic venous disease that is influenced by multiple factors. It affects the quality of life of patients and imposes a huge economic burden on the healthcare system. This study aimed to use integrated analysis methods, including Mendelian randomization analysis, to identify potential pathogenic genes and drug targets for VV treatment. Methods This study conducted Summary-data-based Mendelian Randomization (SMR) analysis and colocalization analysis on data collected from genome-wide association studies and cis-expression quantitative trait loci databases. Only genes with PP.H4 > 0.7 in colocalization were chosen from the significant SMR results. After the above analysis, we screened 12 genes and performed Mendelian Randomization (MR) analysis on them. After sensitivity analysis, we identified four genes with potential causal relationships with VV. Finally, we used transcriptome-wide association studies and The Drug-Gene Interaction Database data to identify and screen the remaining genes and identified four drug targets for the treatment of VV. Results We identified four genes significantly associated with VV, namely, KRTAP5-AS1 [Odds ratio (OR) = 1.08, 95% Confidence interval (CI): 1.05-1.11, p = 1.42e-10] and PLEKHA5 (OR = 1.13, 95% CI: 1.06-1.20, p = 6.90e-5), CBWD1 (OR = 1.05, 95% CI: 1.01-1.11, p = 1.42e-2) and CRIM1 (OR = 0.87, 95% CI: 0.81-0.95, p = 3.67e-3). Increased expression of three genes, namely, KRTAP5-AS1, PLEKHA5, and CBWD1, was associated with increased risk of the disease, and increased expression of CRIM1 was associated with decreased risk of the disease. These four genes could be targeted for VV therapy. Conclusion We identified four potential causal proteins for varicose veins with MR. A comprehensive analysis indicated that KRTAP5-AS1, PLEKHA5, CBWD1, and CRIM1 might be potential drug targets for varicose veins.
Collapse
Affiliation(s)
- Yu Cui
- Shantou University Medical College, Shantou, Guangdong, China
| | - Mengting Hu
- Shantou University Medical College, Shantou, Guangdong, China
| | - He Zhou
- Shantou University Medical College, Shantou, Guangdong, China
| | - Jiarui Guo
- Shantou University Medical College, Shantou, Guangdong, China
| | - Qijia Wang
- Shantou University Medical College, Shantou, Guangdong, China
| | - Zaihua Xu
- Shantou University Medical College, Shantou, Guangdong, China
| | - Liyun Chen
- Research Center of Translational Medicine, Second Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, China
- Plastic Surgery Institute of Shantou University Medical College, Shantou, Guangdong, China
- Shantou Plastic Surgery Clinical Research Center, Shantou, Guangdong, China
| | - Wancong Zhang
- Plastic Surgery Institute of Shantou University Medical College, Shantou, Guangdong, China
- Shantou Plastic Surgery Clinical Research Center, Shantou, Guangdong, China
- Department of Plastic Surgery and Burns Center, Second Affiliated Hospital, Shantou University Medical College, Shantou, Guangdong, China
| | - Shijie Tang
- Plastic Surgery Institute of Shantou University Medical College, Shantou, Guangdong, China
- Shantou Plastic Surgery Clinical Research Center, Shantou, Guangdong, China
- Department of Plastic Surgery and Burns Center, Second Affiliated Hospital, Shantou University Medical College, Shantou, Guangdong, China
| |
Collapse
|
6
|
Fu Y, Wang C, Wu Z, Zhang X, Liu Y, Wang X, Liu F, Chen Y, Zhang Y, Zhao H, Wang Q. Discovery of the potential biomarkers for early diagnosis of endometrial cancer via integrating metabolomics and transcriptomics. Comput Biol Med 2024; 173:108327. [PMID: 38552279 DOI: 10.1016/j.compbiomed.2024.108327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/07/2024] [Accepted: 03/17/2024] [Indexed: 04/17/2024]
Abstract
Endometrial cancer (EC) is one of the most common malignant tumors in women, and the increasing incidence and mortality pose a serious threat to the public health. Early diagnosis of EC could prolong the survival period and optimize the survivorship, greatly alleviating patients' suffering and social medical pressure. In this study, we collected urine and serum samples from the recruited patients, analyzed the samples using LC-MS approach, and identified the differential metabolites through metabolomic analysis. Then, the differentially expressed genes were identified through the systematic transcriptomic analysis of EC-related dataset from Gene Expression Omnibus (GEO), followed by network profiling of metabolic-reaction-enzyme-gene. In this experiment, a total of 83 differential metabolites and 19 hub genes were discovered, of which 10 different metabolites and 3 hub genes were further evaluated as more potential biomarkers based on network analysis. According to the KEGG enrichment analysis, the potential biomarkers and gene-encoded proteins were found to be involved in the arginine and proline metabolism, histidine metabolism, and pyrimidine metabolism, which was of significance for the early diagnosis of EC. In particular, the combination of metabolites (histamine, 1-methylhistamine, and methylimidazole acetaldehyde) as well as the combination of RRM2, TYMS and TK1 exerted more accurate discrimination abilities between EC and healthy groups, providing more criteria for the early diagnosis of EC.
Collapse
Affiliation(s)
- Yan Fu
- School of Pharmacy, Hebei Medical University, Shijiazhuang, 050017, China; Core Facilities and Centers, Hebei Medical University, Shijiazhuang, 050017, China
| | - Chengzhao Wang
- College of Basic Medicine, Hebei Medical University, Shijiazhuang, 050017, China
| | - Zhimin Wu
- School of Pharmacy, Hebei Medical University, Shijiazhuang, 050017, China
| | - Xiaoguang Zhang
- Core Facilities and Centers, Hebei Medical University, Shijiazhuang, 050017, China; College of Basic Medicine, Hebei Medical University, Shijiazhuang, 050017, China
| | - Yan Liu
- School of Pharmacy, Hebei Medical University, Shijiazhuang, 050017, China
| | - Xu Wang
- School of Pharmacy, Hebei Medical University, Shijiazhuang, 050017, China
| | - Fangfang Liu
- School of Pharmacy, Hebei Medical University, Shijiazhuang, 050017, China
| | - Yujuan Chen
- School of Pharmacy, Hebei Medical University, Shijiazhuang, 050017, China
| | - Yang Zhang
- School of Pharmacy, Hebei Medical University, Shijiazhuang, 050017, China.
| | - Huanhuan Zhao
- Department of Obstetrics and Gynecology, The Fourth Hospital of Hebei Medical University, 050011, China.
| | - Qiao Wang
- School of Pharmacy, Hebei Medical University, Shijiazhuang, 050017, China.
| |
Collapse
|
7
|
Zhang W, Mou M, Hu W, Lu M, Zhang H, Zhang H, Luo Y, Xu H, Tao L, Dai H, Gao J, Zhu F. MOINER: A Novel Multiomics Early Integration Framework for Biomedical Classification and Biomarker Discovery. J Chem Inf Model 2024; 64:2720-2732. [PMID: 38373720 DOI: 10.1021/acs.jcim.4c00013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
In the context of precision medicine, multiomics data integration provides a comprehensive understanding of underlying biological processes and is critical for disease diagnosis and biomarker discovery. One commonly used integration method is early integration through concatenation of multiple dimensionally reduced omics matrices due to its simplicity and ease of implementation. However, this approach is seriously limited by information loss and lack of latent feature interaction. Herein, a novel multiomics early integration framework (MOINER) based on information enhancement and image representation learning is thus presented to address the challenges. MOINER employs the self-attention mechanism to capture the intrinsic correlations of omics-features, which make it significantly outperform the existing state-of-the-art methods for multiomics data integration. Moreover, visualizing the attention embedding and identifying potential biomarkers offer interpretable insights into the prediction results. All source codes and model for MOINER are freely available https://github.com/idrblab/MOINER.
Collapse
Affiliation(s)
- Wei Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wei Hu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Hongquan Xu
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Haibin Dai
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
8
|
Tang J, Mou M, Zheng X, Yan J, Pan Z, Zhang J, Li B, Yang Q, Wang Y, Zhang Y, Gao J, Li S, Yang H, Zhu F. Strategy for Identifying a Robust Metabolomic Signature Reveals the Altered Lipid Metabolism in Pituitary Adenoma. Anal Chem 2024; 96:4745-4755. [PMID: 38417094 DOI: 10.1021/acs.analchem.3c03796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Despite the well-established connection between systematic metabolic abnormalities and the pathophysiology of pituitary adenoma (PA), current metabolomic studies have reported an extremely limited number of metabolites associated with PA. Moreover, there was very little consistency in the identified metabolite signatures, resulting in a lack of robust metabolic biomarkers for the diagnosis and treatment of PA. Herein, we performed a global untargeted plasma metabolomic profiling on PA and identified a highly robust metabolomic signature based on a strategy. Specifically, this strategy is unique in (1) integrating repeated random sampling and a consensus evaluation-based feature selection algorithm and (2) evaluating the consistency of metabolomic signatures among different sample groups. This strategy demonstrated superior robustness and stronger discriminative ability compared with that of other feature selection methods including Student's t-test, partial least-squares-discriminant analysis, support vector machine recursive feature elimination, and random forest recursive feature elimination. More importantly, a highly robust metabolomic signature comprising 45 PA-specific differential metabolites was identified. Moreover, metabolite set enrichment analysis of these potential metabolic biomarkers revealed altered lipid metabolism in PA. In conclusion, our findings contribute to a better understanding of the metabolic changes in PA and may have implications for the development of diagnostic and therapeutic approaches targeting lipid metabolism in PA. We believe that the proposed strategy serves as a valuable tool for screening robust, discriminating metabolic features in the field of metabolomics.
Collapse
Affiliation(s)
- Jing Tang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Xin Zheng
- Multidisciplinary Center for Pituitary Adenoma of Chongqing, Department of Neuosurgery, Xinqiao Hospital, Army Medical University, Chongqing 400037, China
| | - Jin Yan
- Multidisciplinary Center for Pituitary Adenoma of Chongqing, Department of Neuosurgery, Xinqiao Hospital, Army Medical University, Chongqing 400037, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jinsong Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Bo Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Qingxia Yang
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Song Li
- Multidisciplinary Center for Pituitary Adenoma of Chongqing, Department of Neuosurgery, Xinqiao Hospital, Army Medical University, Chongqing 400037, China
| | - Hui Yang
- Multidisciplinary Center for Pituitary Adenoma of Chongqing, Department of Neuosurgery, Xinqiao Hospital, Army Medical University, Chongqing 400037, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
9
|
Zheng T, Zheng Z, Zhou H, Guo Y, Li S. The multifaceted roles of COL4A4 in lung adenocarcinoma: An integrated bioinformatics and experimental study. Comput Biol Med 2024; 170:107896. [PMID: 38217972 DOI: 10.1016/j.compbiomed.2023.107896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 12/03/2023] [Accepted: 12/23/2023] [Indexed: 01/15/2024]
Abstract
BACKGROUND Abnormal expression of collagen IV subunits has been reported in cancers, but the significance is not clear. No study has reported the significance of COL4A4 in lung adenocarcinoma (LUAD). METHODS COL4A4 expression data, single-cell sequencing data and clinical data were downloaded from public databases. A range of bioinformatics and experimental methods were adopted to analyze the association of COL4A4 expression with clinical parameters, tumor microenvironment (TME), drug resistance and immunotherapy response, and to investigate the roles and underlying mechanism of COL4A4 in LUAD. RESULTS COL4A4 is differentially expressed in most of cancers analyzed, being associated with prognosis, tumor stemness, immune checkpoint gene expression and TME parameters. In LUAD, COL4A4 expression is down-regulated and associated with various TME parameters, response to immunotherapy and drug resistance. LUAD patients with lower COL4A4 have worse prognosis. Knockdown of COL4A4 significantly inhibited the expression of cell-cycle associated genes, and the expression and activation of signaling pathways including JAK/STAT3, p38, and ERK pathways, and induced quiescence in LUAD cells. Besides, it significantly induced the expression of a range of bioactive molecule genes that have been shown to have critical roles in TME remodeling and immune regulation. CONCLUSIONS COL4A4 is implicated in the pathogenesis of cancers including LUAD. Its function may be multifaceted. It can modulate the activity of LUAD cells, TME remodeling and tumor stemness, thus affecting the pathological process of LUAD. COL4A4 may be a prognostic molecular marker and a potential therapeutic target.
Collapse
Affiliation(s)
- Tiaozhan Zheng
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, Zhuang Autonomous Region, 530021, PR China
| | - Zhiwen Zheng
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, Zhuang Autonomous Region, 530021, PR China
| | - Hanxi Zhou
- Department of Pathology, Taizhou Hospital, Wenzhou Medical University, Linhai, Zhejiang Province, PR China
| | - Yiqing Guo
- Department of Pathology, Taizhou Hospital, Wenzhou Medical University, Linhai, Zhejiang Province, PR China
| | - Shikang Li
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, Zhuang Autonomous Region, 530021, PR China.
| |
Collapse
|
10
|
Yang Q, Chen S, Jiang W, Mi L, Liu J, Hu Y, Ji X, Wang J, Zhu F. MultiClassMetabo: A Superior Classification Model Constructed Using Metabolic Markers in Multiclass Metabolomics. Anal Chem 2024; 96:1410-1418. [PMID: 38221713 DOI: 10.1021/acs.analchem.3c03212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Multiclass metabolomics has become a popular technique for revealing the mechanisms underlying certain physiological processes, different tumor types, or different therapeutic responses. In multiclass metabolomics, it is highly important to uncover the underlying biological information on biosamples by identifying the metabolic markers with the most associations and classifying the different sample classes. The classification problem of multiclass metabolomics is more difficult than that of the binary problem. To date, various methods exist for constructing classification models and identifying metabolic markers consisting of well-established techniques and newly emerging machine learning algorithms. However, how to construct a superior classification model using these methods remains unclear for a given multiclass metabolomic data set. Herein, MultiClassMetabo has been developed for constructing a superior classification model using metabolic markers identified in multiclass metabolomics. MultiClassMetabo can enable online services, including (a) identifying metabolic markers by marker identification methods, (b) constructing classification models by classification methods, and (c) performing a comprehensive assessment from multiple perspectives to construct a superior classification model for multiclass metabolomics. In summary, MultiClassMetabo is distinguished for its capability to construct a superior classification model using the most appropriate method through a comprehensive assessment, which makes it an important complement to other available tools in multiclass metabolomics. MultiClassMetabo can be accessed at http://idrblab.cn/multiclassmetabo/.
Collapse
Affiliation(s)
- Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Shuman Chen
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Wenyu Jiang
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Lan Mi
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Jiarui Liu
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Yu Hu
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Xinglai Ji
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Jun Wang
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
11
|
Wang J, Zhang L, Sun J, Yang X, Wu W, Chen W, Zhao Q. Predicting drug-induced liver injury using graph attention mechanism and molecular fingerprints. Methods 2024; 221:18-26. [PMID: 38040204 DOI: 10.1016/j.ymeth.2023.11.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/14/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023] Open
Abstract
Drug-induced liver injury (DILI) is a significant issue in drug development and clinical treatment due to its potential to cause liver dysfunction or damage, which, in severe cases, can lead to liver failure or even fatality. DILI has numerous pathogenic factors, many of which remain incompletely understood. Consequently, it is imperative to devise methodologies and tools for anticipatory assessment of DILI risk in the initial phases of drug development. In this study, we present DMFPGA, a novel deep learning predictive model designed to predict DILI. To provide a comprehensive description of molecular properties, we employ a multi-head graph attention mechanism to extract features from the molecular graphs, representing characteristics at the level of compound nodes. Additionally, we combine multiple fingerprints of molecules to capture features at the molecular level of compounds. The fusion of molecular fingerprints and graph features can more fully express the properties of compounds. Subsequently, we employ a fully connected neural network to classify compounds as either DILI-positive or DILI-negative. To rigorously evaluate DMFPGA's performance, we conduct a 5-fold cross-validation experiment. The obtained results demonstrate the superiority of our method over four existing state-of-the-art computational approaches, exhibiting an average AUC of 0.935 and an average ACC of 0.934. We believe that DMFPGA is helpful for early-stage DILI prediction and assessment in drug development.
Collapse
Affiliation(s)
- Jifeng Wang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang 110036, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Wu
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China.
| |
Collapse
|
12
|
Gong Y, Ding W, Wang P, Wu Q, Yao X, Yang Q. Evaluating Machine Learning Methods of Analyzing Multiclass Metabolomics. J Chem Inf Model 2023; 63:7628-7641. [PMID: 38079572 DOI: 10.1021/acs.jcim.3c01525] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
Multiclass metabolomic studies have become popular for revealing the differences in multiple stages of complex diseases, various lifestyles, or the effects of specific treatments. In multiclass metabolomics, there are multiple data manipulation steps for analyzing raw data, which consist of data filtering, the imputation of missing values, data normalization, marker identification, sample separation, classification, and so on. In each step, several to dozens of machine learning methods can be chosen for the given data set, with potentially hundreds or thousands of method combinations in the whole data processing chain. Therefore, a clear understanding of these machine learning methods is helpful for selecting an appropriate method combination for obtaining stable and reliable analytical results of specific data. However, there has rarely been an overall introduction or evaluation of these methods based on multiclass metabolomic data. Herein, detailed descriptions of these machine learning methods in multiple data manipulation steps are reviewed. Moreover, an assessment of these methods was performed using a benchmark data set for multiclass metabolomics. First, 12 imputation methods for imputing missing values were evaluated based on the PSS (Procrustes statistical shape analysis) and NRMSE (normalized root-mean-square error) values. Second, 17 normalization methods for processing multiclass metabolomic data were evaluated by applying the PMAD (pooled median absolute deviation) value. Third, different methods of identifying markers of multiclass metabolomics were evaluated based on the CWrel (relative weighted consistency) value. Fourth, nine classification methods for constructing multiclass models were assessed using the AUC (area under the curve) value. Performance evaluations of machine learning methods are highly recommended to select the most appropriate method combination before performing the final analysis of the given data. Overall, detailed descriptions and evaluation of various machine learning methods are expected to improve analyses of multiclass metabolomic data.
Collapse
Affiliation(s)
- Yaguo Gong
- State Key Laboratory of Quality Research in Chinese Medicine, School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Wei Ding
- State Key Laboratory of Quality Research in Chinese Medicine, School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Panpan Wang
- College of Chemistry and Pharmaceutical Engineering, Huanghuai University, Zhumadian 463000, China
| | - Qibiao Wu
- State Key Laboratory of Quality Research in Chinese Medicine, School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| |
Collapse
|
13
|
Moslemi A, Ahmadian A. Dual regularized subspace learning using adaptive graph learning and rank constraint: Unsupervised feature selection on gene expression microarray datasets. Comput Biol Med 2023; 167:107659. [PMID: 37950946 DOI: 10.1016/j.compbiomed.2023.107659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/13/2023] [Accepted: 10/31/2023] [Indexed: 11/13/2023]
Abstract
High-dimensional problems have increasingly drawn attention in gene selection and analysis. To add insult to injury, usually the number of features is greater than number of samples in microarray gene dataset which leads to an ill-posed underdetermined equation system. Poor performance and high computational time for learning algorithms are consequences of redundant features in high-dimensional data. Feature selection is a noteworthy pre-processing method to ameliorate the curse of dimensionality with aim of maximum relevancy and minimum redundancy information preservation. Likewise, unsupervised feature selection has been important since collecting labels for data is expensive. In this paper, we develop a novel robust unsupervised feature selection to select discriminative subset of features for unlabeled data based on rank constrained and dual regularized nonnegative matrix factorization. The major focus of the proposed technique is to discard redundant features while keeping the informative features. Proposed feature selection technique consists of nonnegative matrix factorization to decompose the data into feature weight matrix and representation matrix, inner product norm as regularization for both feature weight matrix and representation matrix, adaptive structure learning to preserve local information and Schatten-p norm as rank constraint. To demonstrate the effectiveness of the proposed method, numerical studies are conducted on six benchmark microarray datasets. The results show that the proposed technique outperforms eight state-of-art unsupervised feature selection techniques in terms of clustering accuracy and normalized mutual information.
Collapse
Affiliation(s)
- Amir Moslemi
- Imaging Research and Physical Sciences, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada.
| | - Arash Ahmadian
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
14
|
Manochkumar J, Cherukuri AK, Kumar RS, Almansour AI, Ramamoorthy S, Efferth T. A critical review of machine-learning for "multi-omics" marine metabolite datasets. Comput Biol Med 2023; 165:107425. [PMID: 37696182 DOI: 10.1016/j.compbiomed.2023.107425] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/13/2023]
Abstract
During the last decade, genomic, transcriptomic, proteomic, metabolomic, and other omics datasets have been generated for a wide range of marine organisms, and even more are still on the way. Marine organisms possess unique and diverse biosynthetic pathways contributing to the synthesis of novel secondary metabolites with significant bioactivities. As marine organisms have a greater tendency to adapt to stressed environmental conditions, the chance to identify novel bioactive metabolites with potential biotechnological application is very high. This review presents a comprehensive overview of the available "-omics" and "multi-omics" approaches employed for characterizing marine metabolites along with novel data integration tools. The need for the development of machine-learning algorithms for "multi-omics" approaches is briefly discussed. In addition, the challenges involved in the analysis of "multi-omics" data and recommendations for conducting "multi-omics" study were discussed.
Collapse
Affiliation(s)
- Janani Manochkumar
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, 632014, India
| | - Aswani Kumar Cherukuri
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, 632014, India
| | - Raju Suresh Kumar
- Department of Chemistry, College of Science, King Saud University, P. O. Box 2455, Riyadh, 11451, Saudi Arabia
| | - Abdulrahman I Almansour
- Department of Chemistry, College of Science, King Saud University, P. O. Box 2455, Riyadh, 11451, Saudi Arabia
| | - Siva Ramamoorthy
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, 632014, India.
| | - Thomas Efferth
- Department of Pharmaceutical Biology, Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg University, Mainz, Germany.
| |
Collapse
|
15
|
Meng R, Yin S, Sun J, Hu H, Zhao Q. scAAGA: Single cell data analysis framework using asymmetric autoencoder with gene attention. Comput Biol Med 2023; 165:107414. [PMID: 37660567 DOI: 10.1016/j.compbiomed.2023.107414] [Citation(s) in RCA: 63] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 08/02/2023] [Accepted: 08/28/2023] [Indexed: 09/05/2023]
Abstract
In recent years, single-cell RNA sequencing (scRNA-seq) has emerged as a powerful technique for investigating cellular heterogeneity and structure. However, analyzing scRNA-seq data remains challenging, especially in the context of COVID-19 research. Single-cell clustering is a key step in analyzing scRNA-seq data, and deep learning methods have shown great potential in this area. In this work, we propose a novel scRNA-seq analysis framework called scAAGA. Specifically, we utilize an asymmetric autoencoder with a gene attention module to learn important gene features adaptively from scRNA-seq data, with the aim of improving the clustering effect. We apply scAAGA to COVID-19 peripheral blood mononuclear cell (PBMC) scRNA-seq data and compare its performance with state-of-the-art methods. Our results consistently demonstrate that scAAGA outperforms existing methods in terms of adjusted rand index (ARI), normalized mutual information (NMI), and adjusted mutual information (AMI) scores, achieving improvements ranging from 2.8% to 27.8% in NMI scores. Additionally, we discuss a data augmentation technology to expand the datasets and improve the accuracy of scAAGA. Overall, scAAGA presents a robust tool for scRNA-seq data analysis, enhancing the accuracy and reliability of clustering results in COVID-19 research.
Collapse
Affiliation(s)
- Rui Meng
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Shuaidong Yin
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Huan Hu
- Institute of Applied Genomics, Fuzhou University, Fuzhou, 350108, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| |
Collapse
|
16
|
Hao JD, Chen YY, Wang YZ, An N, Bai PR, Zhu QF, Feng YQ. Novel Peak Shift Correction Method Based on the Retention Index for Peak Alignment in Untargeted Metabolomics. Anal Chem 2023; 95:13330-13337. [PMID: 37609864 DOI: 10.1021/acs.analchem.3c02583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Peak alignment is a crucial step in liquid chromatography-mass spectrometry (LC-MS)-based large-scale untargeted metabolomics workflows, as it enables the integration of metabolite peaks across multiple samples, which is essential for accurate data interpretation. Slight differences or fluctuations in chromatographic separation conditions, however, can cause the chromatographic retention time (RT) shift between consecutive analyses, ultimately affecting the accuracy of peak alignment between samples. Here, we introduce a novel RT shift correction method based on the retention index (RI) and apply it to peak alignment. We synthesized a series of N-acyl glycine (C2-C23) homologues via the amidation reaction between glycine with normal saturated fatty acids (C2-C23) as calibrants able to respond proficiently in both mass spectrometric positive- and negative-ion modes. Using these calibrants, we established an N-acyl glycine RI system. This RI system is capable of covering a broad chromatographic space and addressing chromatographic RT shift caused by variations in flow rate, gradient elution, instrument systems, and LC separation columns. Moreover, based on the RI system, we developed a peak shift correction model to enhance peak alignment accuracy. Applying the model resulted in a significant improvement in the accuracy of peak alignment from 15.5 to 80.9% across long-term data spanning a period of 157 days. To facilitate practical application, we developed a Python-based program, which is freely available at https://github.com/WHU-Fenglab/RI-based-CPSC.
Collapse
Affiliation(s)
- Jun-Di Hao
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Yao-Yu Chen
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Yan-Zhen Wang
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Na An
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Pei-Rong Bai
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Quan-Fei Zhu
- School of Public Health, Wuhan University, Wuhan 430071, China
| | - Yu-Qi Feng
- Department of Chemistry, Wuhan University, Wuhan 430072, China
- Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan 430071, China
| |
Collapse
|
17
|
Bernardini M, Doinychko A, Romeo L, Frontoni E, Amini MR. A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets. Comput Biol Med 2023; 163:107188. [PMID: 37393785 DOI: 10.1016/j.compbiomed.2023.107188] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/13/2023] [Accepted: 06/19/2023] [Indexed: 07/04/2023]
Abstract
The missing data mechanism is a relevant problem in Machine Learning (ML) and biomedical informatics communities. Real-world Electronic Health Record (EHR) datasets comprise several missing values, thus revealing a high level of spatiotemporal sparsity in the predictors' matrix. Several approaches in the state-of-the-art tried to deal with this problem by proposing different data imputation strategies that (i) are often unrelated to the ML model, (ii) are not conceived for EHR data where laboratory exams are not prescribed uniformly over time and percentage of missing values is high (iii) exploit only univariate and linear information on the observed features. Our paper proposes a data imputation strategy based on a clinical conditional Generative Adversarial Network (ccGAN) capable of imputing missing values by exploiting non-linear and multivariate information across patients. Unlike other GAN data imputation-based approaches, our method deals explicitly with the high level of missingness of routine EHR data by conditioning the imputing strategy to the observable values and those fully-annotated. We demonstrated the statistical significance of the ccGAN to other state-of-the-art approaches in terms of imputation (around 19.79% of gain to the best competitor) and predictive performance (up to 1.60% of gain to the best competitor) on a real multi-diabetic centers dataset. We also demonstrated its robustness across different missingness rates (up to 1.61% of gain to the best competitor in the highest missingness rates condition) on an additional benchmark EHR dataset.
Collapse
Affiliation(s)
- Michele Bernardini
- Department of Information Engineering (DII), Università Politecnica delle Marche, Ancona, Italy.
| | - Anastasiia Doinychko
- Grenoble Informatics Laboratory, Université Grenoble Alpes, Saint-Martin-d'Hères, France.
| | - Luca Romeo
- Department of Economics and Law, University of Macerata, Macerata, Italy.
| | - Emanuele Frontoni
- Department of Political Sciences, Communication and International Relations, University of Macerata, Macerata, Italy.
| | - Massih-Reza Amini
- Grenoble Informatics Laboratory, Université Grenoble Alpes, Saint-Martin-d'Hères, France.
| |
Collapse
|
18
|
Liang S, Zhao Y, Jin J, Qiao J, Wang D, Wang Y, Wei L. Rm-LR: A long-range-based deep learning model for predicting multiple types of RNA modifications. Comput Biol Med 2023; 164:107238. [PMID: 37515874 DOI: 10.1016/j.compbiomed.2023.107238] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Recent research has highlighted the pivotal role of RNA post-transcriptional modifications in the regulation of RNA expression and function. Accurate identification of RNA modification sites is important for understanding RNA function. In this study, we propose a novel RNA modification prediction method, namely Rm-LR, which leverages a long-range-based deep learning approach to accurately predict multiple types of RNA modifications using RNA sequences only. Rm-LR incorporates two large-scale RNA language pre-trained models to capture discriminative sequential information and learn local important features, which are subsequently integrated through a bilinear attention network. Rm-LR supports a total of ten RNA modification types (m6A, m1A, m5C, m5U, m6Am, Ψ, Am, Cm, Gm, and Um) and significantly outperforms the state-of-the-art methods in terms of predictive capability on benchmark datasets. Experimental results show the effectiveness and superiority of Rm-LR in prediction of various RNA modifications, demonstrating the strong adaptability and robustness of our proposed model. We demonstrate that RNA language pretrained models enable to learn dense biological sequential representations from large-scale long-range RNA corpus, and meanwhile enhance the interpretability of the models. This work contributes to the development of accurate and reliable computational models for RNA modification prediction, providing insights into the complex landscape of RNA modifications.
Collapse
Affiliation(s)
- Sirui Liang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yanxi Zhao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Ding Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, 250101, China.
| |
Collapse
|
19
|
Vahabzadeh V, Moattar MH. Robust microarray data feature selection using a correntropy based distance metric learning approach. Comput Biol Med 2023; 161:107056. [PMID: 37235945 DOI: 10.1016/j.compbiomed.2023.107056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 04/18/2023] [Accepted: 05/20/2023] [Indexed: 05/28/2023]
Abstract
Classification of high-dimensional microarray data is a challenge in bioinformatics and genetic data processing. One of the challenging issues of feature selection is the presence of outliers. The Euclidean distance metric is sensitive to outliers. In this study, a distance metric learning based feature selection approach that uses the correntropy function as the discrimination metric is proposed. For this purpose, the metric learning problem is formulated as an optimization problem and solved using the Lagrange method. The output of the approach signifies the most important and robust features. After feature selection, different classification methods such as SVM, decision trees, and NN classifiers are used to investigate the classification accuracy of the proposed method as well as precision, recall, and F-measure. Experiments are carried out on 13 high-dimensional datasets and show that the proposed method outperforms the previous models in terms of accuracy and robustness.
Collapse
Affiliation(s)
- Venus Vahabzadeh
- Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
| | | |
Collapse
|
20
|
Walakira A, Skubic C, Nadižar N, Rozman D, Režen T, Mraz M, Moškon M. Integrative computational modeling to unravel novel potential biomarkers in hepatocellular carcinoma. Comput Biol Med 2023; 159:106957. [PMID: 37116239 DOI: 10.1016/j.compbiomed.2023.106957] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 03/17/2023] [Accepted: 04/16/2023] [Indexed: 04/30/2023]
Abstract
Hepatocellular carcinoma (HCC) is a major health problem around the world. The management of this disease is complicated by the lack of noninvasive diagnostic tools and the few treatment options available. Better clinical outcomes can be achieved if HCC is detected early, but unfortunately, clinical signs appear when the disease is in its late stages. We aim to identify novel genes that can be targeted for the diagnosis and therapy of HCC. We performed a meta-analysis of transcriptomics data to identify differentially expressed genes and applied network analysis to identify hub genes. Fatty acid metabolism, complement and coagulation cascade, chemical carcinogenesis and retinol metabolism were identified as key pathways in HCC. Furthermore, we integrated transcriptomics data into a reference human genome-scale metabolic model to identify key reactions and subsystems relevant in HCC. We conclude that fatty acid activation, purine metabolism, vitamin D, and E metabolism are key processes in the development of HCC and therefore need to be further explored for the development of new therapies. We provide the first evidence that GABRP, HBG1 and DAK (TKFC) genes are important in HCC in humans and warrant further studies.
Collapse
Affiliation(s)
- Andrew Walakira
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia.
| | - Cene Skubic
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Nejc Nadižar
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Damjana Rozman
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Tadeja Režen
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Mraz
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Moškon
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.
| |
Collapse
|
21
|
Fajarda O, Almeida JR, Duarte-Pereira S, Silva RM, Oliveira JL. Methodology to identify a gene expression signature by merging microarray datasets. Comput Biol Med 2023; 159:106867. [PMID: 37060770 DOI: 10.1016/j.compbiomed.2023.106867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/01/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
Collapse
Affiliation(s)
- Olga Fajarda
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal.
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Sara Duarte-Pereira
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Raquel M Silva
- Universidade Católica Portuguesa, Faculty of Dental Medicine (FMD), Center for Interdisciplinary Research in Health (CIIS), Viseu, Portugal.
| | | |
Collapse
|
22
|
Wang Y, Liu X, Dong L, Cheng KK, Lin C, Wang X, Dong J, Deng L, Raftery D. iMSEA: A Novel Metabolite Set Enrichment Analysis Strategy to Decipher Drug Interactions. Anal Chem 2023; 95:6203-6211. [PMID: 37023366 DOI: 10.1021/acs.analchem.2c04603] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
Drug combinations are commonly used to treat various diseases to achieve synergistic therapeutic effects or to alleviate drug resistance. Nevertheless, some drug combinations might lead to adverse effects, and thus, it is crucial to explore the mechanisms of drug interactions before clinical treatment. Generally, drug interactions have been studied using nonclinical pharmacokinetics, toxicology, and pharmacology. Here, we propose a complementary strategy based on metabolomics, which we call interaction metabolite set enrichment analysis, or iMSEA, to decipher drug interactions. First, a digraph-based heterogeneous network model was constructed to model the biological metabolic network based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Second, treatment-specific influences on all detected metabolites were calculated and propagated across the whole network model. Third, pathway activity was defined and enriched to quantify the influence of each treatment on the predefined functional metabolite sets, i.e., metabolic pathways. Finally, drug interactions were identified by comparing the pathway activity enriched by the drug combination treatments and the single drug treatments. A data set consisting of hepatocellular carcinoma (HCC) cells that were treated with oxaliplatin (OXA) and/or vitamin C (VC) was used to illustrate the effectiveness of the iMSEA strategy for evaluation of drug interactions. Performance evaluation using synthetic noise data was also performed to evaluate sensitivities and parameter settings for the iMSEA strategy. The iMSEA strategy highlighted synergistic effects of combined OXA and VC treatments including the alterations in the glycerophospholipid metabolism pathway and glycine, serine, and threonine metabolism pathway. This work provides an alternative method to reveal the mechanisms of drug combinations from the viewpoint of metabolomics.
Collapse
Affiliation(s)
- Yongpei Wang
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Xingxing Liu
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Liheng Dong
- School of Computing and Data Science, Xiamen University Malaysia, Sepang 43600, Malaysia
| | - Kian-Kai Cheng
- Department of Bioprocess and Polymer Engineering, Universiti Teknologi Malaysia, Johor Bahru, Johor 81310, Malaysia
| | - Caigui Lin
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Xiaomin Wang
- Department of Hepatobiliary Surgery, Fujian Provincial Key Laboratory of Chronic Liver Disease and Hepatocellular Carcinoma, ZhongShan Hospital of Xiamen University, Xiamen 361005, China
| | - Jiyang Dong
- Department of Electronic Science, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Lingli Deng
- Department of Information Engineering, East China University of Technology, Nanchang 330013, China
| | - Daniel Raftery
- Northwest Metabolomics Research Center, University of Washington, Seattle, Washington 98109, United States
| |
Collapse
|
23
|
Baran Y, Doğan B. scMAGS: Marker gene selection from scRNA-seq data for spatial transcriptomics studies. Comput Biol Med 2023; 155:106634. [PMID: 36774895 DOI: 10.1016/j.compbiomed.2023.106634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 01/28/2023] [Accepted: 02/04/2023] [Indexed: 02/11/2023]
Abstract
Single-Cell RNA sequencing (scRNA-seq) has provided unprecedented opportunities for exploring gene expression and thus uncovering regulatory relationships between genes at the single-cell level. However, scRNA-seq relies on isolating cells from tissues. Therefore, the spatial context of the regulatory processes is lost. A recent technological innovation, spatial transcriptomics, allows for the measurement of gene expression while preserving spatial information. An initial step in the spatial transcriptomic analysis is to identify the cell type, which requires a careful selection of cell-specific marker genes. For this purpose, currently, scRNA-seq data is used to select a limited number of marker genes from among all genes that distinguish cell types from each other. This study proposes scMAGS (single-cell MArker Gene Selection), a novel method for marker gene selection from scRNA-seq data for spatial transcriptomics studies. scMAGS uses a filtering step in which the candidate genes are identified before the marker gene selection step. For the selection of marker genes, cluster validity indices, the Silhouette index, or the Calinski-Harabasz index (for large datasets) are utilized. Experimental results showed that, in comparison to the existing methods, scMAGS is scalable, fast, and accurate. Even for large datasets with millions of cells, scMAGS could find the required number of marker genes in a reasonable amount of time with fewer memory requirements. scMAGS is made freely available at https://github.com/doganlab/scmags and can be downloaded from the Python Package Directory (PyPI) software repository with the command pip install scmags.
Collapse
Affiliation(s)
- Yusuf Baran
- Department of Biomedical Engineering, Inonu University, Malatya, Turkey
| | - Berat Doğan
- Department of Biomedical Engineering, Inonu University, Malatya, Turkey.
| |
Collapse
|
24
|
Yagin FH, Cicek İB, Alkhateeb A, Yagin B, Colak C, Azzeh M, Akbulut S. Explainable artificial intelligence model for identifying COVID-19 gene biomarkers. Comput Biol Med 2023; 154:106619. [PMID: 36738712 PMCID: PMC9889119 DOI: 10.1016/j.compbiomed.2023.106619] [Citation(s) in RCA: 60] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/11/2023] [Accepted: 01/28/2023] [Indexed: 02/04/2023]
Abstract
AIM COVID-19 has revealed the need for fast and reliable methods to assist clinicians in diagnosing the disease. This article presents a model that applies explainable artificial intelligence (XAI) methods based on machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS) samples. METHODS In the data set used in the study, there are 15,979 gene expressions of 234 patients with COVID-19 negative 141 (60.3%) and COVID-19 positive 93 (39.7%). The least absolute shrinkage and selection operator (LASSO) method was applied to select genes associated with COVID-19. Support Vector Machine - Synthetic Minority Oversampling Technique (SVM-SMOTE) method was used to handle the class imbalance problem. Logistics regression (LR), SVM, random forest (RF), and extreme gradient boosting (XGBoost) methods were constructed to predict COVID-19. An explainable approach based on local interpretable model-agnostic explanations (LIME) and SHAPley Additive exPlanations (SHAP) methods was applied to determine COVID-19- associated biomarker candidate genes and improve the final model's interpretability. RESULTS For the diagnosis of COVID-19, the XGBoost (accuracy: 0.930) model outperformed the RF (accuracy: 0.912), SVM (accuracy: 0.877), and LR (accuracy: 0.912) models. As a result of the SHAP, the three most important genes associated with COVID-19 were IFI27, LGR6, and FAM83A. The results of LIME showed that especially the high level of IFI27 gene expression contributed to increasing the probability of positive class. CONCLUSIONS The proposed model (XGBoost) was able to predict COVID-19 successfully. The results show that machine learning combined with LIME and SHAP can explain the biomarker prediction for COVID-19 and provide clinicians with an intuitive understanding and interpretability of the impact of risk factors in the model.
Collapse
Affiliation(s)
- Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - İpek Balikci Cicek
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - Abedalrhman Alkhateeb
- Software Engineering Department, King Hussein School for Computing Sciences, Amman, Jordan.
| | - Burak Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - Mohammad Azzeh
- Data Science Department, King Hussein School for Computing Sciences, Amman, Jordan.
| | - Sami Akbulut
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey; Inonu University, Faculty of Medicine, Department of Surgery, 44280, Malatya, Turkey; Inonu University, Faculty of Medicine, Department of Public Health, 44280, Malatya, Turkey.
| |
Collapse
|
25
|
Marjit S, Bhattacharyya T, Chatterjee B, Sarkar R. Simulated annealing aided genetic algorithm for gene selection from microarray data. Comput Biol Med 2023; 158:106854. [PMID: 37023541 DOI: 10.1016/j.compbiomed.2023.106854] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/26/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023]
Abstract
In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-markers. Consequently, a significant amount of data is redundant, and it is essential to filter out important genes carefully. In this paper, we propose the Simulated Annealing aided Genetic Algorithm (SAGA), a meta-heuristic approach to identify informative genes from high-dimensional datasets. SAGA utilizes a two-way mutation-based Simulated Annealing (SA) as well as Genetic Algorithm (GA) to ensure a good trade-off between exploitation and exploration of the search space, respectively. The naive version of GA often gets stuck in a local optimum and depends on the initial population, leading to premature convergence. To address this, we have blended a clustering-based population generation with SA to distribute the initial population of GA over the entire feature space. To further enhance the performance, we reduce the initial search space by a score-based filter approach called the Mutually Informed Correlation Coefficient (MICC). The proposed method is evaluated on 6 microarray and 6 omics datasets. Comparison of SAGA with contemporary algorithms has shown that SAGA performs much better than its peers. Our code is available at https://github.com/shyammarjit/SAGA.
Collapse
|
26
|
Rather AA, Chachoo MA. Robust correlation estimation and UMAP assisted topological analysis of omics data for disease subtyping. Comput Biol Med 2023; 155:106640. [PMID: 36774889 DOI: 10.1016/j.compbiomed.2023.106640] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/08/2023] [Accepted: 02/05/2023] [Indexed: 02/10/2023]
Abstract
Deciphering information hidden in the gene expression assays for identifying disease subtypes has significant importance in precision medicine. However, computational limitations thwart this process due to the intricacy of the biological networks and the curse of dimensionality of gene expression data. Therefore, clustering in such scenarios often becomes the first choice of exploratory data analysis to identify natural structures and intrinsic patterns in the data. However, sparse and high dimensional nature of omics data prevents conventional clustering algorithms to discover subtypes that are clinically relevant and statistically significant. Hence, non-linear dimensionality reduction techniques coupled with clustering in such scenarios often becomes imperative to improve the clustering results. In this study, we present a robust pipeline to discover disease subtypes with clinical relevance. Specifically, we focus on discovering patient sub-groups that have a residual life patterns remarkably different from other sub-groups. This is significant because by refining prognosis, subtyping can reduce uncertainty in approximating patients expected outcome. The methodology present is based on robust correlation estimation, UMAP- a non-linear dimensionality reduction method and mapper- a tool from topology. Notably, we suggest a method for improving the robustness of the correlation matrix of gene expression data for improving the clustering results. The performance of the model is evaluated by applying to five cancer datasets obtained through TCGA and comparisons are performed with some state of the art methods of NEMO, RSC-OTRI and SNF with regard to log-rank test and Restricted Life Expectancy Difference. For example in GBM dataset, the minimum separation for any two discovered subtypes is 221 days which is significantly higher than the other methodologies. We also compared the results without using the robust correlation based estimate and observed that robust correlation improves separability between survival curves significantly. From the results we infer that our methodology performs better compared to other methodologies with regard to separating survival curves of patient sub-groups despite using single omics profiles of patients compared to multiple omics profiles of SNF and NEMO. Pathway over-representation analysis is performed on the final clustering results to investigate the biological underpinnings characterizing each subtype.
Collapse
Affiliation(s)
- Arif Ahmad Rather
- Department of Computer Sciences, University of Kashmir, Srinagar, JK, India.
| | | |
Collapse
|
27
|
Somadder PD, Hossain MA, Ahsan A, Sultana T, Soikot SH, Rahman MM, Ibrahim SM, Ahmed K, Bui FM. Drug Repurposing and Systems Biology approaches of Enzastaurin can target potential biomarkers and critical pathways in Colorectal Cancer. Comput Biol Med 2023; 155:106630. [PMID: 36774894 DOI: 10.1016/j.compbiomed.2023.106630] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/28/2023] [Accepted: 02/04/2023] [Indexed: 02/10/2023]
Abstract
Colorectal cancer (CRC) is a severe health concern that results from a cocktail of genetic, epigenetic, and environmental abnormalities. Because it is the second most lethal malignancy in the world and the third-most common malignant tumor, but the treatment is unavailable. The goal of the current study was to use bioinformatics and systems biology techniques to determine the pharmacological mechanism underlying putative important genes and linked pathways in early-onset CRC. Computer-aided methods were used to uncover similar biological targets and signaling pathways associated with CRC, along with bioinformatics and network pharmacology techniques to assess the effects of enzastaurin on CRC. The KEGG and gene ontology (GO) pathway analysis revealed several significant pathways including in positive regulation of protein phosphorylation, negative regulation of the apoptotic process, nucleus, nucleoplasm, protein tyrosine kinase activity, PI3K-Akt signaling pathway, pathways in cancer, focal adhesion, HIF-1 signaling pathway, and Rap1 signaling pathway. Later, the hub protein module identified from the protein-protein interactions (PPIs) network, molecular docking and molecular dynamics simulation represented that enzastaurin showed strong binding interaction with two hub proteins including CASP3 (-8.6 kcal/mol), and MCL1 (-8.6 kcal/mol), which were strongly implicated in CRC management than other the five hub proteins. Moreover, the pharmacokinetic features of enzastaurin revealed that it is an effective therapeutic agent with minimal adverse effects. Enzastaurin may inhibit the potential biological targets that are thought to be responsible for the advancement of CRC and this study suggests a potential novel therapeutic target for CRC.
Collapse
Affiliation(s)
- Pratul Dipta Somadder
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Md Arju Hossain
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Asif Ahsan
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Tayeba Sultana
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Sadat Hossain Soikot
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Md Masuder Rahman
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1092, Bangladesh.
| | - Sobhy M Ibrahim
- Department of Biochemistry, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Saudi Arabia.
| | - Kawsar Ahmed
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada; Group of Biophotomatiχ, Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh.
| | - Francis M Bui
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada.
| |
Collapse
|
28
|
Zafari N, Bathaei P, Velayati M, Khojasteh-Leylakoohi F, Khazaei M, Fiuji H, Nassiri M, Hassanian SM, Ferns GA, Nazari E, Avan A. Integrated analysis of multi-omics data for the discovery of biomarkers and therapeutic targets for colorectal cancer. Comput Biol Med 2023; 155:106639. [PMID: 36805214 DOI: 10.1016/j.compbiomed.2023.106639] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/14/2023] [Accepted: 02/05/2023] [Indexed: 02/12/2023]
Abstract
The considerable burden of colorectal cancer and the rising trend in young adults emphasize the necessity of understanding its underlying mechanisms, providing new diagnostic and prognostic markers, and improving therapeutic approaches. Precision medicine is a new trend all over the world and identification of novel biomarkers and therapeutic targets is a step forward towards this trend. In this context, multi-omics data and integrated analysis are being investigated to develop personalized medicine in the management of colorectal cancer. Given the large amount of data from multi-omics approach, data integration and analysis is a great challenge. In this Review, we summarize how statistical and machine learning techniques are applied to analyze multi-omics data and how it contributes to the discovery of useful diagnostic and prognostic biomarkers and therapeutic targets. Moreover, we discuss the importance of these biomarkers and therapeutic targets in the clinical management of colorectal cancer in the future. Taken together, integrated analysis of multi-omics data has great potential for finding novel diagnostic and prognostic biomarkers and therapeutic targets, however, there are still challenges to overcome in future studies.
Collapse
Affiliation(s)
- Nima Zafari
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Parsa Bathaei
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahla Velayati
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Fatemeh Khojasteh-Leylakoohi
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Khazaei
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Hamid Fiuji
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohammadreza Nassiri
- Recombinant Proteins Research Group, The Research Institute of Biotechnology, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Seyed Mahdi Hassanian
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Brighton & Sussex Medical School, Division of Medical Education, Falmer, Brighton, Sussex, BN1 9PH, UK
| | - Elham Nazari
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Amir Avan
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
29
|
He H, Duo H, Hao Y, Zhang X, Zhou X, Zeng Y, Li Y, Li B. Computational drug repurposing by exploiting large-scale gene expression data: Strategy, methods and applications. Comput Biol Med 2023; 155:106671. [PMID: 36805225 DOI: 10.1016/j.compbiomed.2023.106671] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 02/05/2023] [Accepted: 02/10/2023] [Indexed: 02/18/2023]
Abstract
De novo drug development is an extremely complex, time-consuming and costly task. Urgent needs for therapies of various diseases have greatly accelerated searches for more effective drug development methods. Luckily, drug repurposing provides a new and effective perspective on disease treatment. Rapidly increased large-scale transcriptome data paints a detailed prospect of gene expression during disease onset and thus has received wide attention in the field of computational drug repurposing. However, how to efficiently mine transcriptome data and identify new indications for old drugs remains a critical challenge. This review discussed the irreplaceable role of transcriptome data in computational drug repurposing and summarized some representative databases, tools and strategies. More importantly, it proposed a practical guideline through establishing the correspondence between three gene expression data types and five strategies, which would facilitate researchers to adopt appropriate strategies to deeply mine large-scale transcriptome data and discover more effective therapies.
Collapse
Affiliation(s)
- Hao He
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China; State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Institutes of Brain Science, Fudan University, Shanghai, 200032, PR China
| | - Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Xinyi Zhou
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Yujie Zeng
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Yinghong Li
- The Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, PR China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China.
| |
Collapse
|
30
|
Wang T, Sun J, Zhao Q. Investigating cardiotoxicity related with hERG channel blockers using molecular fingerprints and graph attention mechanism. Comput Biol Med 2023; 153:106464. [PMID: 36584603 DOI: 10.1016/j.compbiomed.2022.106464] [Citation(s) in RCA: 149] [Impact Index Per Article: 74.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/12/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
Human ether-a-go-go-related gene (hERG) channel blockade by small molecules is a big concern during drug development in the pharmaceutical industry. Failure or inhibition of hERG channel activity caused by drug molecules can lead to prolonging QT interval, which will result in serious cardiotoxicity. Thus, evaluating the hERG blocking activity of all these small molecular compounds is technically challenging, and the relevant procedures are expensive and time-consuming. In this study, we develop a novel deep learning predictive model named DMFGAM for predicting hERG blockers. In order to characterize the molecule more comprehensively, we first consider the fusion of multiple molecular fingerprint features to characterize its final molecular fingerprint features. Then, we use the multi-head attention mechanism to extract the molecular graph features. Both molecular fingerprint features and molecular graph features are fused as the final features of the compounds to make the feature expression of compounds more comprehensive. Finally, the molecules are classified into hERG blockers or hERG non-blockers through the fully connected neural network. We conduct 5-fold cross-validation experiment to evaluate the performance of DMFGAM, and verify the robustness of DMFGAM on external validation datasets. We believe DMFGAM can serve as a powerful tool to predict hERG channel blockers in the early stages of drug discovery and development.
Collapse
Affiliation(s)
- Tianyi Wang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jianqiang Sun
- School of Automation and Electrical Engineering, Linyi University, Linyi, 276000, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| |
Collapse
|
31
|
Cheng N, Liu J, Chen C, Zheng T, Li C, Huang J. Prediction of lung cancer metastasis by gene expression. Comput Biol Med 2023; 153:106490. [PMID: 36638618 DOI: 10.1016/j.compbiomed.2022.106490] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/14/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022]
Abstract
Tumor metastasis is the main cause of death in cancer patients. Early prediction of tumor metastasis can allow for timely intervention. At present, research on tumor metastasis mainly focuses on manual diagnosis by imaging or diagnosis by computational methods. With the deterioration of the tumor, gene expression levels in blood change greatly. It is feasible to measure the transcripts of key genes to predict whether cancer will metastasize. Therefore, in this paper, we obtained gene expression data from 226 patients from TCGA. These data included 239,322 transcripts. Background screening and LASSO analysis were used to select 31 transcripts as features. Finally, a deep neural network (DNN) was used to determine whether or not lung cancer would metastasize. We compared our methods with several other methods and found that our method achieved the best precision. In addition, in a previous study, we identified 7 genes that play a vital role in lung cancer. We added those gene transcripts into the DNN and found that the AUC and AUPR of the model were increased.
Collapse
Affiliation(s)
- Nitao Cheng
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Junliang Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Chen Chen
- Department of Biological Repositories, Zhongnan Hospital of Wuhan University, China
| | - Tang Zheng
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Changsheng Li
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Jingyu Huang
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China.
| |
Collapse
|
32
|
Cheng L, Wu H, Zheng X, Zhang N, Zhao P, Wang R, Wu Q, Liu T, Yang X, Geng Q. GPGPS: a robust prognostic gene pair signature of glioma ensembling IDH mutation and 1p/19q co-deletion. Bioinformatics 2023; 39:6986965. [PMID: 36637205 PMCID: PMC9843586 DOI: 10.1093/bioinformatics/btac850] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/14/2022] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION Many studies have shown that IDH mutation and 1p/19q co-deletion can serve as prognostic signatures of glioma. Although these genetic variations affect the expression of one or more genes, the prognostic value of gene expression related to IDH and 1p/19q status is still unclear. RESULTS We constructed an ensemble gene pair signature for the risk evaluation and survival prediction of glioma based on the prior knowledge of the IDH and 1p/19q status. First, we separately built two gene pair signatures IDH-GPS and 1p/19q-GPS and elucidated that they were useful transcriptome markers projecting from corresponding genome variations. Then, the gene pairs in these two models were assembled to develop an integrated model named Glioma Prognostic Gene Pair Signature (GPGPS), which demonstrated high area under the curves (AUCs) to predict 1-, 3- and 5-year overall survival (0.92, 0.88 and 0.80) of glioma. GPGPS was superior to the single GPSs and other existing prognostic signatures (avg AUC = 0.70, concordance index = 0.74). In conclusion, the ensemble prognostic signature with 10 gene pairs could serve as an independent predictor for risk stratification and survival prediction in glioma. This study shed light on transferring knowledge from genetic alterations to expression changes to facilitate prognostic studies. AVAILABILITY AND IMPLEMENTATION Codes are available at https://github.com/Kimxbzheng/GPGPS.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lixin Cheng
- To whom correspondence should be addressed. or
| | | | - Xubin Zheng
- Shenzhen People’s Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Ning Zhang
- Guangdong Provincial Key Laboratory of Infectious Disease and Molecular Immunopathology, Shantou University Medical College, Shantou 515041, China
| | - Pengfei Zhao
- Shenzhen People’s Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
- Department of Geriatrics, Shenzhen Clinical Research Center for Aging, Shenzhen 518020, China
| | - Ran Wang
- Shenzhen People’s Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Qiong Wu
- Hong Kong Genome Institute, Shatin, New Territories, Hong Kong
| | - Tao Liu
- International Digital Economy Academy, Shenzhen 518020, China
| | - Xiaojun Yang
- Guangdong Provincial Key Laboratory of Infectious Disease and Molecular Immunopathology, Shantou University Medical College, Shantou 515041, China
| | | |
Collapse
|
33
|
Yue ZX, Yan TC, Xu HQ, Liu YH, Hong YF, Chen GX, Xie T, Tao L. A systematic review on the state-of-the-art strategies for protein representation. Comput Biol Med 2023; 152:106440. [PMID: 36543002 DOI: 10.1016/j.compbiomed.2022.106440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/08/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
The study of drug-target protein interaction is a key step in drug research. In recent years, machine learning techniques have become attractive for research, including drug research, due to their automated nature, predictive power, and expected efficiency. Protein representation is a key step in the study of drug-target protein interaction by machine learning, which plays a fundamental role in the ultimate accomplishment of accurate research. With the progress of machine learning, protein representation methods have gradually attracted attention and have consequently developed rapidly. Therefore, in this review, we systematically classify current protein representation methods, comprehensively review them, and discuss the latest advances of interest. According to the information extraction methods and information sources, these representation methods are generally divided into structure and sequence-based representation methods. Each primary class can be further divided into specific subcategories. As for the particular representation methods involve both traditional and the latest approaches. This review contains a comprehensive assessment of the various methods which researchers can use as a reference for their specific protein-related research requirements, including drug research.
Collapse
Affiliation(s)
- Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|