1
|
Li Q, Wei X, Wu F, Qin C, Dong J, Chen C, Lin Y. Development and validation of preeclampsia predictive models using key genes from bioinformatics and machine learning approaches. Front Immunol 2024; 15:1416297. [PMID: 39544937 PMCID: PMC11560445 DOI: 10.3389/fimmu.2024.1416297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 09/27/2024] [Indexed: 11/17/2024] Open
Abstract
Background Preeclampsia (PE) poses significant diagnostic and therapeutic challenges. This study aims to identify novel genes for potential diagnostic and therapeutic targets, illuminating the immune mechanisms involved. Methods Three GEO datasets were analyzed, merging two for training set, and using the third for external validation. Intersection analysis of differentially expressed genes (DEGs) and WGCNA highlighted candidate genes. These were further refined through LASSO, SVM-RFE, and RF algorithms to identify diagnostic hub genes. Diagnostic efficacy was assessed using ROC curves. A predictive nomogram and fully Connected Neural Network (FCNN) were developed for PE prediction. ssGSEA and correlation analysis were employed to investigate the immune landscape. Further validation was provided by qRT-PCR on human placental samples. Result Five biomarkers were identified with validation AUCs: CGB5 (0.663, 95% CI: 0.577-0.750), LEP (0.850, 95% CI: 0.792-0.908), LRRC1 (0.797, 95% CI: 0.728-0.867), PAPPA2 (0.839, 95% CI: 0.775-0.902), and SLC20A1 (0.811, 95% CI: 0.742-0.880), all of which are involved in key biological processes. The nomogram showed strong predictive power (C-index 0.873), while FCNN achieved an optimal AUC of 0.911 (95% CI: 0.732-1.000) in five-fold cross-validation. Immune infiltration analysis revealed the importance of T cell subsets, neutrophils, and NK cells in PE, linking these genes to immune mechanisms underlying PE pathogenesis. Conclusion CGB5, LEP, LRRC1, PAPPA2, and SLC20A1 are validated as key diagnostic biomarkers for PE. Nomogram and FCNN could credibly predict PE. Their association with immune infiltration underscores the crucial role of immune responses in PE pathogenesis.
Collapse
Affiliation(s)
- Qian Li
- Reproductive Medicine Center, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiaowei Wei
- Reproductive Medicine Center, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Fan Wu
- The International Peace Maternity and Child Health Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Chuanmei Qin
- Reproductive Medicine Center, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Junpeng Dong
- Reproductive Medicine Center, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Cailian Chen
- Department of Automation, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Yi Lin
- Reproductive Medicine Center, Shanghai Sixth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
2
|
Wagle MM, Long S, Chen C, Liu C, Yang P. Interpretable deep learning in single-cell omics. Bioinformatics 2024; 40:btae374. [PMID: 38889275 PMCID: PMC11211213 DOI: 10.1093/bioinformatics/btae374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/11/2024] [Accepted: 06/12/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Single-cell omics technologies have enabled the quantification of molecular profiles in individual cells at an unparalleled resolution. Deep learning, a rapidly evolving sub-field of machine learning, has instilled a significant interest in single-cell omics research due to its remarkable success in analysing heterogeneous high-dimensional single-cell omics data. Nevertheless, the inherent multi-layer nonlinear architecture of deep learning models often makes them 'black boxes' as the reasoning behind predictions is often unknown and not transparent to the user. This has stimulated an increasing body of research for addressing the lack of interpretability in deep learning models, especially in single-cell omics data analyses, where the identification and understanding of molecular regulators are crucial for interpreting model predictions and directing downstream experimental validations. RESULTS In this work, we introduce the basics of single-cell omics technologies and the concept of interpretable deep learning. This is followed by a review of the recent interpretable deep learning models applied to various single-cell omics research. Lastly, we highlight the current limitations and discuss potential future directions.
Collapse
Affiliation(s)
- Manoj M Wagle
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Siqu Long
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Carissa Chen
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| |
Collapse
|
3
|
Chen L, Wan Y, Yang T, Zhang Q, Zeng Y, Zheng S, Ling Z, Xiao Y, Wan Q, Liu R, Yang C, Huang G, Zeng Q. Bibliometric and visual analysis of single-cell sequencing from 2010 to 2022. Front Genet 2024; 14:1285599. [PMID: 38274109 PMCID: PMC10808606 DOI: 10.3389/fgene.2023.1285599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 12/31/2023] [Indexed: 01/27/2024] Open
Abstract
Background: Single-cell sequencing (SCS) is a technique used to analyze the genome, transcriptome, epigenome, and other genetic data at the level of a single cell. The procedure is commonly utilized in multiple fields, including neurobiology, immunology, and microbiology, and has emerged as a key focus of life science research. However, a thorough and impartial analysis of the existing state and trends of SCS-related research is lacking. The current study aimed to map the development trends of studies on SCS during the years 2010-2022 through bibliometric software. Methods: Pertinent papers on SCS from 2010 to 2022 were obtained using the Web of Science Core Collection. Research categories, nations/institutions, authors/co-cited authors, journals/co-cited journals, co-cited references, and keywords were analyzed using VOSviewer, the R package "bibliometric", and CiteSpace. Results: The bibliometric analysis included 9,929 papers published between 2010 and 2022, and showed a consistent increase in the quantity of papers each year. The United States was the source of the highest quantity of articles and citations in this field. The majority of articles were published in the periodical Nature Communications. Butler A was the most frequently quoted author on this topic, and his article "Integrating single-cell transcriptome data across diverse conditions, technologies, and species" has received numerous citations to date. The literature and keyword analysis showed that studies involving single-cell RNA sequencing (scRNA-seq) were prominent in this discipline during the study period. Conclusion: This study utilized bibliometric techniques to visualize research in SCS-related domains, which facilitated the identification of emerging patterns and future directions in the field. Current hot topics in SCS research include COVID-19, tumor microenvironment, scRNA-seq, and neuroscience. Our results are significant for scholars seeking to identify key issues and generate new research ideas.
Collapse
Affiliation(s)
- Ling Chen
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Yantong Wan
- Guangdong Provincial Key Laboratory of Proteomics, Department of Pathophysiology, School of BasicMedical Sciences, Southern Medical University, Guangzhou, China
| | - Tingting Yang
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Qi Zhang
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Yuting Zeng
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Shuqi Zheng
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Zhishan Ling
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Yupeng Xiao
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Qingyi Wan
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Ruili Liu
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Chun Yang
- Dongguan Key Laboratory of Stem Cell and Regenerative Tissue Engineering, Guangdong Medical University, Dongguan, China
| | - Guozhi Huang
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| | - Qing Zeng
- Department of Rehabilitation Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- School of Rehabilitation Medicine, Southern Medical University, Guangzhou, China
| |
Collapse
|
4
|
Xiao D, Lin M, Liu C, Geddes TA, Burchfield J, Parker B, Humphrey SJ, Yang P. SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data. NAR Genom Bioinform 2023; 5:lqad099. [PMID: 37954574 PMCID: PMC10632189 DOI: 10.1093/nargab/lqad099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 09/18/2023] [Accepted: 10/25/2023] [Indexed: 11/14/2023] Open
Abstract
A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a 'pseudo-positive' learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model ('SnapKin') by incorporating the above two learning strategies into a 'snapshot' ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin.
Collapse
Affiliation(s)
- Di Xiao
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
| | - Michael Lin
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
| | - Thomas A Geddes
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Environmental and Life Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - James G Burchfield
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Environmental and Life Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Benjamin L Parker
- Centre for Muscle Research, Department of Anatomy and Physiology, School of Biomedical Sciences, Melbourne, VIC 3010, Australia
| | - Sean J Humphrey
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Environmental and Life Sciences, The University of Sydney, Sydney, NSW 2006, Australia
- Murdoch Children’s Research Institute, The Royal Children’s Hospital, Melbourne, VIC, 3052, Australia
| | - Pengyi Yang
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|