1
|
Ye Y, Lu Y, Su H, Tian Y, Jin S, Li G, Yang Y, Jiang L, Zhou Z, Wei X, Tao TH, Sun L. A hybrid bioelectronic retina-probe interface for object recognition. Biosens Bioelectron 2025; 279:117408. [PMID: 40147085 DOI: 10.1016/j.bios.2025.117408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 02/05/2025] [Accepted: 03/23/2025] [Indexed: 03/29/2025]
Abstract
Retina converts light stimuli into spike firings, encoding abundant visual information critical for both fundamental studies of the visual system and therapies for visual diseases. However, probing these spikes directly from the retina is hindered by limited recording channels, insufficient contact between the retina and electrodes, and short operational lifetimes. In this study, we developed a perforated and flexible microelectrode array to achieve a robust retina-probe interface, ensuring high-quality detection of spike firings from hundreds of neurons. Leveraging the retina's natural light-sensing ability, we created a hybrid bioelectronic system that enables image recognition through machine learning integration. We systematically explored the system's spatial resolution, and demonstrated its capability to recognize different colors and light intensities. Importantly, due to the perforated structure, the hybrid system maintained over 94 % accuracy in distinguishing light on/off conditions for 9 h ex vivo. Finally, inspired by the eye's configuration, we developed a bioelectronic mimic eye capable of recognizing objects in real environments. This work demonstrated that the hybrid bioelectronic retina-probe interface is effective not only for light sensing but also for efficient image and object recognition.
Collapse
Affiliation(s)
- Yifei Ye
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
| | - Yunxiao Lu
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China; College of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai, 201306, China
| | - Haoyang Su
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China; School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ye Tian
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China; School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Shuang Jin
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
| | - Gen Li
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China; School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yingkang Yang
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China; School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Luyue Jiang
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
| | - Zhitao Zhou
- School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China; State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
| | - Xiaoling Wei
- School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China; State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China
| | - Tiger H Tao
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China; School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China; State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China; Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing, 100049, China; School of Physical Science and Technology, ShanghaiTech University, Shanghai, 201210, China; Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China; Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong, 519031, China; Tianqiao and Chrissy Chen Institute for Translational Research, Shanghai, 200020, China.
| | - Liuyang Sun
- 2020 X-Lab, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China; School of Graduate Study, University of Chinese Academy of Sciences, Beijing, 100049, China; State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, 200050, China.
| |
Collapse
|
2
|
Zhang T, Ye Q, Liu Y, Liu Q, Han Z, Wu D, Chen Z, Li Y, Fan HJ. Data-driven discovery of biaxially strained single atoms array for hydrogen production. Nat Commun 2025; 16:3644. [PMID: 40240379 PMCID: PMC12003809 DOI: 10.1038/s41467-025-59053-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 04/08/2025] [Indexed: 04/18/2025] Open
Abstract
The structure-performance relationship for single atom catalysts has remained unclear due to the averaged coordination information obtained from most single-atom catalysts. Periodic array of single atoms may provide a platform to tackle this inaccuracy. Here, we develop a data-driven approach by incorporating high-throughput density functional theory computations and machine learning to screen candidates based on a library of 1248 sites from single atoms array anchored on biaxial-strained transition metal dichalcogenides. Our screening results in Au atom anchored on biaxial-strained MoSe2 surface via Au-Se3 bonds. Machine learning analysis identifies four key structural features by classifying the ΔGH* data. We show that the average band center of the adsorption sites can be a predictor for hydrogen adsorption energy. This prediction is validated by experiments which show single-atom Au array anchored on biaxial-strained MoSe2 archives 1000 hour-stability at 800 mA cm-2 towards acidic hydrogen evolution. Moreover, active hotspot consisting of Au atoms array and the neighboring Se atoms is unraveled for enhanced activity.
Collapse
Affiliation(s)
- Tao Zhang
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| | - Qitong Ye
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Materials Science and Engineering, Hainan University, Haikou, P. R. China
| | - Yipu Liu
- State Key Laboratory of Marine Resource Utilization in South China Sea, School of Materials Science and Engineering, Hainan University, Haikou, P. R. China.
| | - Qingyi Liu
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| | - Zengyu Han
- School of Materials Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Dongshuang Wu
- School of Materials Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Zhiming Chen
- School of Physical Science and Technology, Tiangong University, XiQing District, Tianjin, P.R. of China
| | - Yue Li
- School of Physical Science and Technology, Tiangong University, XiQing District, Tianjin, P.R. of China
| | - Hong Jin Fan
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
3
|
Lin CW, Lin JJ, Tseng HH, Jang FL, Lu MK, Chen PS, Huang CC, Yao CY, Wang TY, Chang WH, Tan HP, Lin SH. Exploring Primary and Interaction Effects of Minor Physical Anomalies: Development and Validation of Prediction Models Using Explainable Machine Learning Algorithms for Early-Onset Schizophrenia. Schizophr Bull 2025:sbaf016. [PMID: 40178447 DOI: 10.1093/schbul/sbaf016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/05/2025]
Abstract
BACKGROUND AND HYPOTHESIS Minor physical abnormalities (MPAs) are neurodevelopmental markers that can be traced to prenatal events and may be significant features of early-onset schizophrenia (EOS). Therefore, our study aimed to (1) find the primary and interaction effects of MPAs for EOS and (2) develop and validate the model for EOS based on explainable machine learning algorithms. STUDY DESIGN The study included 549 patients with schizophrenia (193 EOS and 356 AOS) and 420 healthy controls (HC) in southern Taiwan. For the feature selection, variable selection using random forests (varSelRF) and recursive feature elimination (RFE) were applied to identify the important variables of MPAs. We used different machine learning algorithms to build the prediction models based on the selected MPAs variables. STUDY RESULTS The results showed that the mouth anomalies are significant MPAs variables and have interaction effects with craniofacial MPAs variables for EOS. The prediction models using the selected MPAs variables performed better in discriminating EOS vs HC compared to AOS vs HC. The AUC values for distinguishing EOS vs HC were 0.85-0.93, AOS vs HC were 0.80-0.87, and EOS vs AOS were 0.67-0.77 in validation sets. CONCLUSIONS This risk prediction model provides a clinical decision support system for detecting patients at high risk of developing EOS and enables early intervention in clinical practice.
Collapse
Affiliation(s)
- Chih-Wei Lin
- Institute of Clinical Medicine, College of Medicine, National Cheng Kung University, Tainan 704302, Taiwan
- Department of Public Health, College of Medicine, National Cheng Kung University, Tainan 704302, Taiwan
| | - Jin-Jia Lin
- Department of Psychiatry, Chi Mei Medical Center, Tainan 702010, Taiwan
| | - Huai-Hsuan Tseng
- Department of Psychiatry, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704302, Taiwan
| | - Fong-Lin Jang
- Department of Psychiatry, Chi Mei Medical Center, Tainan 702010, Taiwan
| | - Ming-Kun Lu
- Jianan Psychiatric Center, Ministry of Health and Welfare, Tainan 717204, Taiwan
| | - Po-See Chen
- Department of Psychiatry, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704302, Taiwan
| | - Chih-Chun Huang
- Department of Psychiatry, National Cheng Kung University Hospital, Dou-Liou Branch, Yunlin 640003, Taiwan
| | - Chi-Yu Yao
- Department of Psychiatry, Taiwan Municipal An-Nan Hospital, Tainan 709204, Taiwan
| | - Tzu-Yun Wang
- Department of Psychiatry, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704302, Taiwan
| | - Wei-Hung Chang
- Department of Psychiatry, National Cheng Kung University Hospital, Dou-Liou Branch, Yunlin 640003, Taiwan
| | - Hung-Pin Tan
- Jianan Psychiatric Center, Ministry of Health and Welfare, Tainan 717204, Taiwan
| | - Sheng-Hsiang Lin
- Institute of Clinical Medicine, College of Medicine, National Cheng Kung University, Tainan 704302, Taiwan
- Department of Public Health, College of Medicine, National Cheng Kung University, Tainan 704302, Taiwan
- Biostatistics Consulting Center, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704302, Taiwan
| |
Collapse
|
4
|
Li J, Shi Q, Yang Y, Xie J, Xie Q, Ni M, Wang X. Prediction of EGFR mutations in non-small cell lung cancer: a nomogram based on 18F-FDG PET and thin-section CT radiomics with machine learning. Front Oncol 2025; 15:1510386. [PMID: 40242240 PMCID: PMC11999825 DOI: 10.3389/fonc.2025.1510386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Accepted: 03/14/2025] [Indexed: 04/18/2025] Open
Abstract
Background This study aimed to develop and validate radiomics-based nomograms for the identification of EGFR mutations in non-small cell lung cancer (NSCLC). Methods A retrospective analysis was performed on 313 NSCLC patients, who were randomly divided into training (n = 250) and validation (n = 63) groups. Radiomic features were extracted from 18F-fluorodeoxyglucose positron emission tomography (18F-FDG PET) and thin-section computed tomography (CT) scans. After selecting optimal radiomic features, four machine learning algorithms, including logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost), were used to develop and validate radiomics models. A combined model, incorporating the Rad score from the best performing radiomics model with clinical and radiological features, was then formulated. Finally, the integrated nomogram was generated. Its predictive performance and clinical utility were evaluated using receiver operating characteristic curves, calibration curves, and decision curve analysis. Results Among the radiomics models, the RF model showed the best performance with AUCs of 0.785 (95% CI, 0.726-0.844) and 0.776 (95% CI, 0.662-0.889) in the training and validation groups, respectively. The AUCs of the clinical and radiological models in both groups were 0.711 (95% CI, 0.645-0.776) and 0.758 (95% CI, 0.627-0.890), and 0.632 (95% CI, 0.564-0.699) and 0.677 (95% CI, 0.531-0.822), respectively. The combined model achieved the highest AUCs of 0.872 (95% CI, 0.829-0.915) and 0.831 (95% CI, 0.723-0.940) in the training and validation groups, respectively. The DeLong test confirmed the superiority of the combined model over the other three models. Both the calibration curve and the DCA indicated that the radiomics nomogram was consistent and clinically useful. Conclusions Radiomics combined with machine learning and based on 18F-FDG PET/CT images can effectively determine EGFR mutation status in NSCLC patients. Radiomics-based nomograms provide a non-invasive and visually intuitive prediction tool for screening NSCLC patients with EGFR mutations in a clinical setting.
Collapse
Affiliation(s)
- Jianbo Li
- Department of Nuclear Medicine, The Affiliated Hospital of Inner Mongolia Medical University, Hohhot, China
| | - Qin Shi
- Department of Nuclear Medicine, Division of Life Sciences and Medicine, The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei, China
| | - Yi Yang
- Department of Nuclear Medicine, Division of Life Sciences and Medicine, The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei, China
| | - Jikui Xie
- Department of Nuclear Medicine, Division of Life Sciences and Medicine, The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei, China
| | - Qiang Xie
- Department of Nuclear Medicine, Division of Life Sciences and Medicine, The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei, China
| | - Ming Ni
- Department of Nuclear Medicine, Division of Life Sciences and Medicine, The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei, China
| | - Xuemei Wang
- Department of Nuclear Medicine, The Affiliated Hospital of Inner Mongolia Medical University, Hohhot, China
- Department of Nuclear Medicine, Division of Life Sciences and Medicine, The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei, China
| |
Collapse
|
5
|
Augusto L, Borelle R, Boča A, Bon L, Orazio C, Arias-González A, Bakker MR, Gartzia-Bengoetxea N, Auge H, Bernier F, Cantero A, Cavender-Bares J, Correia AH, De Schrijver A, Diez-Casero JJ, Eisenhauer N, Fotelli MN, Gâteblé G, Godbold DL, Gomes-Caetano-Ferreira M, Gundale MJ, Jactel H, Koricheva J, Larsson M, Laudicina VA, Legout A, Martín-García J, Mason WL, Meredieu C, Mereu S, Montgomery RA, Musch B, Muys B, Paillassa E, Paquette A, Parker JD, Parker WC, Ponette Q, Reynolds C, Rozados-Lorenzo MJ, Ruiz-Peinado R, Santesteban-Insausti X, Scherer-Lorenzen M, Silva-Pando FJ, Smolander A, Spyroglou G, Teixeira-Barcelos EB, Vanguelova EI, Verheyen K, Vesterdal L, Charru M. Widespread slow growth of acquisitive tree species. Nature 2025; 640:395-401. [PMID: 40108455 DOI: 10.1038/s41586-025-08692-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 01/23/2025] [Indexed: 03/22/2025]
Abstract
Trees are an important carbon sink as they accumulate biomass through photosynthesis1. Identifying tree species that grow fast is therefore commonly considered to be essential for effective climate change mitigation through forest planting. Although species characteristics are key information for plantation design and forest management, field studies often fail to detect clear relationships between species functional traits and tree growth2. Here, by consolidating four independent datasets and classifying the acquisitive and conservative species based on their functional trait values, we show that acquisitive tree species, which are supposedly fast-growing species, generally grow slowly in field conditions. This discrepancy between the current paradigm and field observations is explained by the interactions with environmental conditions that influence growth. Acquisitive species require moist mild climates and fertile soils, conditions that are generally not met in the field. By contrast, conservative species, which are supposedly slow-growing species, show generally higher realized growth due to their ability to tolerate unfavourable environmental conditions. In general, conservative tree species grow more steadily than acquisitive tree species in non-tropical forests. We recommend planting acquisitive tree species in areas where they can realize their fast-growing potential. In other regions, where environmental stress is higher, conservative tree species have a larger potential to fix carbon in their biomass.
Collapse
Affiliation(s)
- L Augusto
- INRAE, Bordeaux Sciences Agro, UMR 1391 ISPA, Villenave d'Ornon, France.
| | - R Borelle
- INRAE, Bordeaux Sciences Agro, UMR 1391 ISPA, Villenave d'Ornon, France
| | - A Boča
- Latvia University of Life Sciences and Technologies, Jelgava, Latvia
| | - L Bon
- INRAE, Bordeaux Sciences Agro, UMR 1391 ISPA, Villenave d'Ornon, France
| | - C Orazio
- Institut Européen de la Forêt Cultivée (IEFC), Cestas, France
| | - A Arias-González
- NEIKER, Basque Institute for Agricultural Research and Development, Department of Forest Sciences, Bizkaia, Spain
| | - M R Bakker
- INRAE, Bordeaux Sciences Agro, UMR 1391 ISPA, Villenave d'Ornon, France
| | - N Gartzia-Bengoetxea
- NEIKER, Basque Institute for Agricultural Research and Development, Department of Forest Sciences, Bizkaia, Spain
| | - H Auge
- Helmholtz Centre for Environmental Research-UFZ, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | | | | | - J Cavender-Bares
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - A H Correia
- Forest Research Centre, School of Agriculture, University of Lisbon, Lisbon, Portugal
| | - A De Schrijver
- Research Centre AgroFoodNature, HOGENT University of Applied Sciences and Arts, Ghent, Belgium
| | - J J Diez-Casero
- Sustainable Forest Management Research Institute (iuFOR), University of Valladolid, Palencia, Spain
| | - N Eisenhauer
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
- Institute of Biology, Leipzig University, Leipzig, Germany
| | - M N Fotelli
- Forest Research Institute, Hellenic Agricultural Organization Dimitra, Thessaloniki, Greece
| | - G Gâteblé
- INRAE, UEVT, Antibes Juan-les-Pins, France
| | - D L Godbold
- Department of Forest Protection and Wildlife Management, Mendel University in Brno, Brno, Czech Republic
- Institute of Forest Ecology, Department of Ecosystem Management, Climate and Biodiversity, BOKU University, Vienna, Austria
| | - M Gomes-Caetano-Ferreira
- SRAAC, Azores Regional Ministry for Environment and Climate Change, Angra do Heroísmo, Azores, Portugal
| | - M J Gundale
- Swedish University of Agricultural Sciences, Umeå, Sweden
| | - H Jactel
- INRAE, University of Bordeaux, BIOGECO, Cestas, France
| | - J Koricheva
- Department of Biological Sciences, Royal Holloway University of London, Egham, UK
| | - M Larsson
- Swedish University of Agricultural Sciences, Umeå, Sweden
| | - V A Laudicina
- Department of Agricultural, Food and Forest Sciences, University of Palermo, Palermo, Italy
| | | | - J Martín-García
- Sustainable Forest Management Research Institute (iuFOR), University of Valladolid, Palencia, Spain
- Department of Plant Production and Forest Resources, University of Valladolid, Palencia, Spain
| | - W L Mason
- Forest Research, Northern Research Station, Roslin, UK
| | - C Meredieu
- INRAE, University of Bordeaux, BIOGECO, Cestas, France
| | - S Mereu
- CNR-IBE, Consiglio Nazionale delle Ricerche, Istituto per la BioEconomia, Sassari, Italy
| | - R A Montgomery
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - B Musch
- ONF, UMR 0588 BioForA, Orléans, France
| | - B Muys
- Department of Earth & Environmental Sciences, KU Leuven, Leuven, Belgium
- Leuven Plant Institute, KU Leuven, Leuven, Belgium
| | - E Paillassa
- Institut pour le Développement Forestier (IDF), Paris, France
| | - A Paquette
- Centre for Forest Research, Université du Québec à Montréal, Montreal, Quebec, Canada
| | - J D Parker
- Smithsonian Environmental Research Center, Edgewater, MD, USA
| | - W C Parker
- Ontario Ministry of Natural Resources and Forestry, Sault Ste. Marie, Ontario, Canada
| | - Q Ponette
- Earth and Life Institute, UCLouvain-Université Catholique de Louvain, Louvain-la-Neuve, Belgium
| | - C Reynolds
- Forest Research, Alice Holt Lodge, Farnham, UK
| | | | - R Ruiz-Peinado
- Institute of Forest Science (ICIFOR-INIA), CSIC, Madrid, Spain
| | | | - M Scherer-Lorenzen
- Geobotany, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - F J Silva-Pando
- AGACAL-Centro de Investigación Forestal de Lourizán, Pontevedra, Spain
| | - A Smolander
- Natural Resources Institute Finland (Luke), Helsinki, Finland
| | - G Spyroglou
- Forest Research Institute, Hellenic Agricultural Organization Dimitra, Thessaloniki, Greece
| | | | | | - K Verheyen
- Forest & Nature Lab, Department of Environment, Ghent University, Melle-Gontrode, Belgium
| | - L Vesterdal
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Frederiksberg, Denmark
| | - M Charru
- INRAE, Bordeaux Sciences Agro, UMR 1391 ISPA, Villenave d'Ornon, France.
| |
Collapse
|
6
|
Wardrope A, Ferrar M, Goodacre S, Habershon D, Heaton TJ, Howell SJ, Reuber M. Validation of a Machine-Learning Clinical Decision Aid for the Differential Diagnosis of Transient Loss of Consciousness. Neurol Clin Pract 2025; 15:e200448. [PMID: 40196464 PMCID: PMC11975300 DOI: 10.1212/cpj.0000000000200448] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 01/15/2025] [Indexed: 04/09/2025]
Abstract
Background and Objectives The aim of this study was to develop and validate a machine-learning classifier based on patient and witness questionnaires to support differential diagnosis of transient loss of consciousness (TLOC) at first presentation. Methods We prospectively recruited patients newly presenting with TLOC to an emergency department, an acute medical unit, and a first seizure or syncope clinic. We invited participants to complete an online questionnaire, either at home or at time of initial assessment. Two expert raters determined the cause of participants' TLOC after 6-month follow-up. We used independent development and validation samples to train a random forest classifier to predict diagnosis from participants' questionnaire responses and validate classifier performance. We compared classifier performance against penalized linear regression and referrer diagnosis. Results We included 178 participants in the final analysis, of whom 46 identified a witness able to complete an additional witness questionnaire. Given low witness recruitment, we developed a classifier based on patient answers only. A classifier trained on 9 items correctly identified 63 of 78 diagnoses (80.8%) (95% CI 70.0-88.5), an increase over the accuracy of initial assessing clinicians who were only able to diagnose 70.5% correctly. Within this, 96% (87.0%-99.4%) of those expertly rated as having syncope were correctly classified by the classifier (classifier sensitivity); 40% (20%-63.6%) of those expertly rated after follow-up as having either epilepsy or functional/dissociative seizures were similarly classified as being nonsyncope (classifier specificity). Discussion A machine-learning classifier for differential diagnosis of TLOC has comparable performance in differentiating between 3 main causes of primary TLOC as the current standard of care but is insufficiently accurate in its current form to warrant incorporation into routine care. A system including information from witnesses might improve classification performance.
Collapse
Affiliation(s)
- Alistair Wardrope
- Department of Neurology, Sheffield Teaching Hospitals NHS Foundation Trust, Royal Hallamshire Hospital, Sheffield, United Kingdom
- Division of Neuroscience, Royal Hallamshire Hospital, University of Sheffield, Sheffield, United Kingdom
| | - Melloney Ferrar
- Syncope and Postural Tachycardia Syndrome Service, Sheffield Teaching Hospitals NHS Foundation Trust, Royal Hallamshire Hospital, Sheffield, United Kingdom
| | - Steve Goodacre
- Directorate of Acute and Emergency Medicine, Sheffield Teaching Hospitals NHS Foundation Trust, Northern General Hospital, Sheffield, United Kingdom
- Division of Population Health, University of Sheffield, Sheffield, United Kingdom
| | - Daniel Habershon
- Specialised Cancer Services, Sheffield Teaching Hospitals NHS Foundation Trust, Weston Park Cancer Centre, Sheffield, United Kingdom; and
| | - Timothy J Heaton
- Department of Statistics, School of Mathematics, University of Leeds, United Kingdom
| | - Stephen J Howell
- Department of Neurology, Sheffield Teaching Hospitals NHS Foundation Trust, Royal Hallamshire Hospital, Sheffield, United Kingdom
| | - Markus Reuber
- Department of Neurology, Sheffield Teaching Hospitals NHS Foundation Trust, Royal Hallamshire Hospital, Sheffield, United Kingdom
- Division of Neuroscience, Royal Hallamshire Hospital, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
7
|
Lange TM, Gültas M, Schmitt AO, Heinrich F. optRF: Optimising random forest stability by determining the optimal number of trees. BMC Bioinformatics 2025; 26:95. [PMID: 40165065 PMCID: PMC11959736 DOI: 10.1186/s12859-025-06097-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 02/26/2025] [Indexed: 04/02/2025] Open
Abstract
Machine learning is frequently used to make decisions based on big data. Among these techniques, random forest is particularly prominent. Although random forest is known to have many advantages, one aspect that is often overseen is that it is a non-deterministic method that can produce different models using the same input data. This can have severe consequences on decision-making processes. In this study, we introduce a method to quantify the impact of non-determinism on predictions, variable importance estimates, and decisions based on the predictions or variable importance estimates. Our findings demonstrate that increasing the number of trees in random forests enhances the stability in a non-linear way while computation time increases linearly. Consequently, we conclude that there exists an optimal number of trees for any given data set that maximises the stability without unnecessarily increasing the computation time. Based on these findings, we have developed the R package optRF which models the relationship between the number of trees and the stability of random forest, providing recommendations for the optimal number of trees for any given data set.
Collapse
Affiliation(s)
- Thomas M Lange
- Breeding Informatics Group, Georg-August University, Margarethe Von Wrangell-Weg 7, 37075, Göttingen, Germany.
| | - Mehmet Gültas
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494, Soest, Germany
- Center for Integrated Breeding Research (Cibreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| | - Armin O Schmitt
- Breeding Informatics Group, Georg-August University, Margarethe Von Wrangell-Weg 7, 37075, Göttingen, Germany
- Center for Integrated Breeding Research (Cibreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| | - Felix Heinrich
- Breeding Informatics Group, Georg-August University, Margarethe Von Wrangell-Weg 7, 37075, Göttingen, Germany
| |
Collapse
|
8
|
Yan B, Liao P, Zhang W, Han Z, Wang C, Chen F, Lei P. Identification of Key Fatty Acid Metabolism-Related Genes in Alzheimer's Disease. Mol Neurobiol 2025:10.1007/s12035-025-04857-x. [PMID: 40108056 DOI: 10.1007/s12035-025-04857-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 03/14/2025] [Indexed: 03/22/2025]
Abstract
Alzheimer's disease (AD) is a progressive neurodegenerative disorder, and the role of fatty acid metabolism in its pathogenesis remains incompletely understood. Using AD transcriptome sequencing data from the GEO database, we initially screened for differentially expressed genes and applied Weighted Gene Correlation Network Analysis (WGCNA) to identify crucial gene modules. By intersecting these genes with fatty acid metabolism-related genes (FAMRGs), we obtained AD-related fatty acid metabolism genes (AD-FAMRGs). Subsequently, we conducted KEGG, GO, and Single-sample Gene Set Enrichment Analysis (ssGSEA). Furthermore, we employed three machine learning algorithms to determine the key AD-FAMRGs. Risk genes were thus identified, leading to the construction of a risk model which was subsequently validated through receiver operating characteristic (ROC) curve analysis. Additionally, protein docking studies were performed to assess interactions between key AD-FAMRGs and Tau as well as amyloid beta (Aβ) proteins. To explore potential therapeutic avenues, we searched the DrugBank database for agents targeting these AD-FAMRGs, followed by molecular docking and dynamics simulations. Our investigations highlighted three key AD-FAMRGs: DLD, ELOVL5, and HMGCS1. Functional enrichment analysis indicated their association with metabolism, oxidative stress, and AD pathogenesis. ZDOCK analysis further suggested their interactions with Tau and Aβ proteins, pointing to their possible involvement in AD's pathological processes. ROC analysis demonstrated the predictive accuracy of these AD-FAMRGs, with AUC values ranging from 0.764 to 0.876. Molecular docking and dynamic simulations confirmed the favorable binding of predicted therapeutic agents to these key AD-FAMRGs. Our findings suggest that fatty acid metabolism may be involved in AD pathogenesis, and DLD, ELOVL5, and HMGCS1 may serve as potential therapeutic targets for AD.
Collapse
Affiliation(s)
- Bo Yan
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin, 300052, China
- Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin, 300052, China
| | - Pan Liao
- Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin, 300052, China
- School of Medicine, Nankai University, Tianjin, 300192, China
| | - Wei Zhang
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin, 300052, China
- Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin, 300052, China
| | - Zhaoli Han
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin, 300052, China
- Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin, 300052, China
| | - Conglin Wang
- First Department of General Medicine, Tianjin First Central Hospital, Tianjin, 300190, China
| | - Fanglian Chen
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin, 300052, China.
- Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin, 300052, China.
| | - Ping Lei
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin, 300052, China.
- Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin, 300052, China.
- School of Medicine, Nankai University, Tianjin, 300192, China.
| |
Collapse
|
9
|
Chen X, Yu B, Zhang Y, Wang X, Huang D, Gong S, Hu W. A machine learning model based on emergency clinical data predicting 3-day in-hospital mortality for stroke and trauma patients. Front Neurol 2025; 16:1512297. [PMID: 40183016 PMCID: PMC11966482 DOI: 10.3389/fneur.2025.1512297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 03/05/2025] [Indexed: 04/05/2025] Open
Abstract
Background Accurately predicting the short-term in-hospital mortality risk for patients with stroke and TBI (Traumatic Brain Injury) is crucial for improving the quality of emergency medical care. Method This study analyzed data from 2,125 emergency admission patients with stroke and traumatic brain injury at two Grade a hospitals in China from January 2021 to March 2024. LASSO regression was used for feature selection, and the predictive performance of logistic regression was compared with six machine learning algorithms. A 70:30 ratio was applied for cross-validation, and confidence intervals were calculated using the bootstrap method. Temporal validation was performed on the best-performing model. SHAP values were employed to assess variable importance. Results The random forest algorithm excelled in predicting in-hospital 3-day mortality, achieving an AUC of 0.978 (95% CI: 0.966-0.986). Time series validation demonstrated the model's strong generalization capability, with an AUC of 0.975 (95% CI: 0.963-0.986). Key predictive factors in the final model included metabolic syndrome, NEWS2 score, Glasgow Coma Scale (GCS), whether surgery was performed, bowel movement status, potassium level (K), aspartate transaminase (AST) level, and temporal factors. SHAP value analysis further confirmed the significant contributions of these variables to the predictive outcomes. The random forest model developed in this study demonstrates good accuracy in predicting short-term in-hospital mortality rates for stroke and traumatic brain injury patients. The model integrates emergency scores, clinical signs, and key biochemical indicators, providing a comprehensive perspective for risk assessment. This approach, which incorporates emergency data, holds promise for assisting decision-making in clinical practice, thereby improving patient outcomes.
Collapse
Affiliation(s)
- Xu Chen
- Shangrao People's Hospital, Shangrao, China
| | - Bin Yu
- Shangrao People's Hospital, Shangrao, China
| | | | - Xin Wang
- Huaian Hospital of Huaian City, Huai'an, China
| | | | | | - Wei Hu
- School of Nursing, Jinzhou Medical University, Jinzhou, China
| |
Collapse
|
10
|
Turky MA, Youssef I, El Amir A. Identifying behavior regulatory leverage over mental disorders transcriptomic network hubs toward lifestyle-dependent psychiatric drugs repurposing. Hum Genomics 2025; 19:29. [PMID: 40102990 PMCID: PMC11921594 DOI: 10.1186/s40246-025-00733-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 02/19/2025] [Indexed: 03/20/2025] Open
Abstract
BACKGROUND There is a vast prevalence of mental disorders, but patient responses to psychiatric medication fluctuate. As food choices and daily habits play a fundamental role in this fluctuation, integrating machine learning with network medicine can provide valuable insights into disease systems and the regulatory leverage of lifestyle in mental health. METHODS This study analyzed coexpression network modules of MDD and PTSD blood transcriptomic profile using modularity optimization method, the first runner-up of Disease Module Identification DREAM challenge. The top disease genes of both MDD and PTSD modules were detected using random forest model. Afterward, the regulatory signature of two predominant habitual phenotypes, diet-induced obesity and smoking, were identified. These transcription/translation regulating factors (TRFs) signals were transduced toward the two disorders' disease genes. A bipartite network of drugs that target the TRFS together with PTSD or MDD hubs was constructed. RESULTS The research revealed one MDD hub, the CENPJ, which is known to influence intellectual ability. This observation paves the way for additional investigations into the potential of CENPJ as a novel target for MDD therapeutic agents development. Additionally, most of the predicted PTSD hubs were associated with multiple carcinomas, of which the most notable was SHCBP1. SHCBP1 is a known risk factor for glioma, suggesting the importance of continuous monitoring of patients with PTSD to mitigate potential cancer comorbidities. The signaling network illustrated that two PTSD and three MDD biomarkers were co-regulated by habitual phenotype TRFs. 6-Prenylnaringenin and Aflibercept were identified as potential candidates for targeting the MDD and PTSD hubs: ATP6V0A1 and PIGF. However, habitual phenotype TRFs have no leverage over ATP6V0A1 and PIGF. CONCLUSION Combining machine learning and network biology succeeded in revealing biomarkers for two notoriously spreading disorders, MDD and PTSD. This approach offers a non-invasive diagnostic pipeline and identifies potential drug targets that could be repurposed under further investigation. These findings contribute to our understanding of the complex interplay between mental disorders, daily habits, and psychiatric interventions, thereby facilitating more targeted and personalized treatment strategies.
Collapse
Affiliation(s)
| | - Ibrahim Youssef
- Faculty of Engineering, Biomedical Engineering Department, Cairo University, Giza, 12613, Egypt
| | - Azza El Amir
- Faculty of Science, Biotechnology Department, Cairo University, Giza, 12613, Egypt
| |
Collapse
|
11
|
Zadorozhny BS, Petrides KV, Cheng Y, Cuppello S, van der Linden D. Predicting Leadership Status Through Trait Emotional Intelligence and Cognitive Ability. Behav Sci (Basel) 2025; 15:345. [PMID: 40150239 PMCID: PMC11939709 DOI: 10.3390/bs15030345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 02/20/2025] [Accepted: 03/06/2025] [Indexed: 03/29/2025] Open
Abstract
Many interconnected factors have been implicated in the prediction of whether a given individual occupies a managerial role. These include an assortment of demographic variables such as age and gender as well as trait emotional intelligence (trait EI) and cognitive ability. In order to disentangle their respective effects on formal leadership position, the present study compares a traditional linear approach in the form of a logistic regression with the results of a set of supervised machine learning (SML) algorithms. In addition to merely extending beyond linear effects, a series of techniques were incorporated so as to practically apply ML approaches and interpret their results, including feature importance and interactions. The results demonstrated the superior predictive strength of trait EI over cognitive ability, especially of its sociability factor, and supported the predictive utility of the random forest (RF) algorithm in this context. We thereby hope to contribute and support a developing trend of acknowledging the genuine complexity of real-world contexts such as leadership and provide direction for future investigations, including more sophisticated ML approaches.
Collapse
Affiliation(s)
| | - K. V. Petrides
- Department of Psychology, University College London (UCL), London WC1E 6BT, UK
| | - Yongtian Cheng
- Department of Psychology, University College London (UCL), London WC1E 6BT, UK
| | | | | |
Collapse
|
12
|
Shah SNA, Parveen R. Differential gene expression analysis and machine learning identified structural, TFs, cytokine and glycoproteins, including SOX2, TOP2A, SPP1, COL1A1, and TIMP1 as potential drivers of lung cancer. Biomarkers 2025; 30:200-215. [PMID: 39888730 DOI: 10.1080/1354750x.2025.2461698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 01/26/2025] [Indexed: 02/02/2025]
Abstract
BACKGROUND Lung cancer is a primary global health concern, responsible for a considerable portion of cancer-related fatalities worldwide. Understanding its molecular complexities is crucial for identifying potential targets for treatment. The goal is to slow disease progression and intervene early to prevent the development of advanced lung cancer cases. Hence, there's an urgent need for new biomarkers that can detect lung cancer in its early stages. METHODS The study conducted RNA-Seq analysis of lung cancer samples from the publicly available SRA database (NCBI SRP009408), including both control and tumour samples. The genes with differential expression between tumour and healthy tissues were identified using R and Bioconductor. Machine learning (ML) techniques, Random Forest, Lasso, XGBoost, Gradient Boosting and Elastic Net were employed to pinpoint significant genes followed by classifiers, Multilayer Perceptron (MLP), Support Vector Machines (SVM) and k-Nearest Neighbours (k-NN). Gene ontology and pathway analyses were performed on the significant differentially expressed genes (DEGs). The top genes from DEG and machine learning analyses were combined for protein-protein interaction (PPI) analysis, identifying 10 hub genes essential for lung cancer progression. RESULTS The integrated analysis of ML and DEGs revealed the significance of specific genes in lung cancer samples, identified the top 5 upregulated genes (COL11A1, TOP2A, SULF1, DIO2, MIR196A2) and the top 5 downregulated genes (PDK4, FOSB, FLYWCH1, CYB5D2, MIR328), along with their associated genes implicated in pathways or co-expression networks were identified. Among the various algorithms employed, Random Forest and XGBoost proved effective in identifying common genes, underscoring their potential significance in lung cancer pathogenesis. The MLP exhibited the highest accuracy in classifying samples using all genes. Additionally, the protein-protein interaction (PPI) analysis identified 10 hub genes that are pivotal in lung cancer pathogenesis: COL1A1, SOX2, SPP1, THBS2, POSTN, COL5A1, COL11A1, TIMP1, TOP2A and PKP1. CONCLUSION The study contributes to the early prediction of lung cancer by identifying potential biomarkers that could enhance early diagnosis and pave the way for practical clinical applications in the future. Integrating DEGs and machine learning-derived significant genes for PPI analysis offers a robust approach to uncovering critical molecular targets for lung cancer treatment.
Collapse
Affiliation(s)
| | - Rafat Parveen
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
13
|
Shen L, Jin Y, Pan AX, Wang K, Ye R, Lin Y, Anwar S, Xia W, Zhou M, Guo X. Machine learning-based predictive models for perioperative major adverse cardiovascular events in patients with stable coronary artery disease undergoing noncardiac surgery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108561. [PMID: 39708562 DOI: 10.1016/j.cmpb.2024.108561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 11/17/2024] [Accepted: 12/07/2024] [Indexed: 12/23/2024]
Abstract
BACKGROUND AND OBJECTIVE Accurate prediction of perioperative major adverse cardiovascular events (MACEs) is crucial, as it not only aids clinicians in comprehensively assessing patients' surgical risks and tailoring personalized surgical and perioperative management plans, but also for information-based shared decision-making with patients and efficient allocation of medical resources. This study developed and validated a machine learning (ML) model using accessible preoperative clinical data to predict perioperative MACEs in stable coronary artery disease (SCAD) patients undergoing noncardiac surgery (NCS). METHODS We collected data from 9171 adult SCAD patients who underwent NCS and extracted 64 preoperative variables. First, the optimal data imputation, resampling, and feature selection methods were compared and selected to deal with missing data values and imbalances. Then, nine independent machine learning models (logistic regression (LR), support vector machine, Gaussian Naive Bayes (GNB), random forest, gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), light gradient boosting machine, categorical boosting (CatBoost), and deep neural network) and a stacking ensemble model were constructed and compared with the validated Revised Cardiac Risk Index's (RCRI) model for predictive performance, which was evaluated using the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), calibration curve, and decision curve analysis (DCA). To reduce overfitting and enhance robustness, we performed hyperparameter tuning and 5-fold cross-validation. Finally, the Shapley additive interpretation (SHAP) method and a partial dependence plot (PDP) were used to determine the optimal ML model. RESULTS Of the 9,171 patients, 514 (5.6 %) developed MACEs. 24 significant preoperative features were selected for model development and evaluation. All ML models performed well, with AUROC above 0.88 and AUPRC above 0.39, outperforming the AUROC (0.716) and AUPRC (0.185) of RCRI (P < 0.001). The best independent model was XGBoost (AUROC = 0.898, AUPRC = 0.479). The calibration curve accurately predicted the risk of MACEs (Brier score = 0.040), and the DCA results showed that XGBoost had a high net benefit for predicting MACEs. The top-ranked stacking ensemble model, consisting of CatBoost, GBDT, GNB, and LR, proved to be the best (AUROC 0.894, AUPRC 0.485). We identified the top 20 most important features using the mean absolute SHAP values and depicted their effects on model predictions using PDP. CONCLUSIONS This study combined missing-value imputation, feature screening, unbalanced data processing, and advanced machine learning methods to successfully develop and verify the first ML-based perioperative MACEs prediction model for patients with SCAD, which is more accurate than RCRI and enables effective identification of high-risk patients and implementation of targeted interventions to reduce the incidence of MACEs.
Collapse
Affiliation(s)
- Liang Shen
- Department of Information Technology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| | - YunPeng Jin
- Department of Cardiovascular Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| | - AXiang Pan
- Department of Information Technology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| | - Kai Wang
- Department of Cardiovascular Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| | - RunZe Ye
- Department of Cardiovascular Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| | - YangKai Lin
- Department of Cardiovascular Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| | - Safraz Anwar
- Department of Cardiovascular Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| | - WeiCong Xia
- Department of Cardiovascular Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| | - Min Zhou
- Department of Information Technology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China.
| | - XiaoGang Guo
- Department of Cardiovascular Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China.
| |
Collapse
|
14
|
Pan B, Li X, Weng J, Xu X, Yu P, Zhao Y, Yu D, Zhang X, Tang X. Identifying periphery biomarkers of first-episode drug-naïve patients with schizophrenia using machine-learning-based strategies. Prog Neuropsychopharmacol Biol Psychiatry 2025; 137:111302. [PMID: 40015618 DOI: 10.1016/j.pnpbp.2025.111302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2024] [Revised: 02/17/2025] [Accepted: 02/23/2025] [Indexed: 03/01/2025]
Abstract
Schizophrenia is a complex mental disorder. Accurate diagnosis and classification of schizophrenia has always been a major challenge in clinic due to the lack of biomarkers. Therefore, identifying molecular biomarkers, particularly in the peripheral blood, is of great significance. This study aimed to identify immune-related molecular biomarkers of schizophrenia in peripheral blood. Eighty-four Peripheral blood leukocytes of first-episode drug-naïve (FEDN) patients with schizophrenia and 97 healthy controls were collected and examined using high-throughput RNA-sequencing. Differentially-expressed genes (DEGs) were analysed. Weighted correlation network analysis (WGCNA) was employed to identify schizophrenia-associated module genes. The CIBERSORT algorithm was adopted to analyse immune cell proportions. Then, machine-learning algorithms including random forest, LASSO, and SVM-RFE were employed to screen immune-related predictive genes of schizophrenia. The RNA-seq analyses revealed 734 DEGs. Further machine-learning-based bioinformatic analyses screened out three immune-related predictive genes of schizophrenia (FOSB, NUP43, and H3C1), all of which were correlated with neutrophils and natural killer cells resting. Lastly, external GEO datasets were used to verify the performance of the machine-learning models with these predictive genes. In conclusion, by analysing the peripheral mRNA expression profiles of FEDN patients with schizophrenia, this study identified three predictive genes that could be potential molecular biomarkers for schizophrenia.
Collapse
Affiliation(s)
- Bo Pan
- Jiangsu Key Laboratory of Integrated Traditional Chinese and Western Medicine for Prevention and Treatment of Senile Diseases, Yangzhou University Medical College, Yangzhou, Jiangsu 225001, PR China; Department of Pharmacy, Yangzhou University Medical College, Yangzhou, Jiangsu 225001, PR China
| | - Xueying Li
- Department of Pharmacy, Yangzhou University Medical College, Yangzhou, Jiangsu 225001, PR China; Affiliated WuTaiShan Hospital of Yangzhou University Medical College, Yangzhou, Jiangsu 225003, PR China; Department of Psychiatry, Yangzhou WuTaiShan Hospital of Jiangsu Province, Yangzhou, Jiangsu 225003, PR China
| | - Jianjun Weng
- Jiangsu Key Laboratory of Integrated Traditional Chinese and Western Medicine for Prevention and Treatment of Senile Diseases, Yangzhou University Medical College, Yangzhou, Jiangsu 225001, PR China; Department of Pharmacy, Yangzhou University Medical College, Yangzhou, Jiangsu 225001, PR China
| | - Xiaofeng Xu
- Department of Pharmacy, Yangzhou University Medical College, Yangzhou, Jiangsu 225001, PR China; Affiliated WuTaiShan Hospital of Yangzhou University Medical College, Yangzhou, Jiangsu 225003, PR China; Department of Psychiatry, Yangzhou WuTaiShan Hospital of Jiangsu Province, Yangzhou, Jiangsu 225003, PR China
| | - Ping Yu
- Department of Pharmacy, Yangzhou University Medical College, Yangzhou, Jiangsu 225001, PR China; Affiliated WuTaiShan Hospital of Yangzhou University Medical College, Yangzhou, Jiangsu 225003, PR China; Department of Psychiatry, Yangzhou WuTaiShan Hospital of Jiangsu Province, Yangzhou, Jiangsu 225003, PR China
| | - Yaqin Zhao
- Department of Pharmacy, Yangzhou University Medical College, Yangzhou, Jiangsu 225001, PR China; Affiliated WuTaiShan Hospital of Yangzhou University Medical College, Yangzhou, Jiangsu 225003, PR China; Department of Psychiatry, Yangzhou WuTaiShan Hospital of Jiangsu Province, Yangzhou, Jiangsu 225003, PR China
| | - Doudou Yu
- Department of Pharmacy, Yangzhou University Medical College, Yangzhou, Jiangsu 225001, PR China; Affiliated WuTaiShan Hospital of Yangzhou University Medical College, Yangzhou, Jiangsu 225003, PR China; Department of Psychiatry, Yangzhou WuTaiShan Hospital of Jiangsu Province, Yangzhou, Jiangsu 225003, PR China
| | - Xiangrong Zhang
- Department of Geriatric Psychiatry, Nanjing Brain Hospital Affiliated to Nanjing Medical University, Nanjing, Jiangsu 210029, PR China.
| | - Xiaowei Tang
- Affiliated WuTaiShan Hospital of Yangzhou University Medical College, Yangzhou, Jiangsu 225003, PR China; Department of Psychiatry, Yangzhou WuTaiShan Hospital of Jiangsu Province, Yangzhou, Jiangsu 225003, PR China.
| |
Collapse
|
15
|
Benfatto S, Sill M, Jones DTW, Pfister SM, Sahm F, von Deimling A, Capper D, Hovestadt V. Explainable artificial intelligence of DNA methylation-based brain tumor diagnostics. Nat Commun 2025; 16:1787. [PMID: 39979307 PMCID: PMC11842776 DOI: 10.1038/s41467-025-57078-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 02/07/2025] [Indexed: 02/22/2025] Open
Abstract
We have recently developed a machine learning classifier that enables fast, accurate, and affordable classification of brain tumors based on genome-wide DNA methylation profiles that is widely employed in the clinic. Neuro-oncology research would benefit greatly from understanding the underlying artificial intelligence decision process, which currently remains unclear. Here, we describe an interpretable framework to explain the classifier's decisions. We show that functional genomic regions of various sizes are predominantly employed to distinguish between different tumor classes, ranging from enhancers and CpG islands to large-scale heterochromatic domains. We detect a high degree of genomic redundancy, with many genes distinguishing individual tumor classes, explaining the robustness of the classifier and revealing potential targets for further therapeutic investigation. We anticipate that our resource will build up trust in machine learning in clinical settings, foster biomarker discovery and development of compact point-of-care assays, and enable further epigenome research of brain tumors. Our interpretable framework is accessible to the research community via an interactive web application ( https://hovestadtlab.shinyapps.io/shinyMNP/ ).
Collapse
Affiliation(s)
- Salvatore Benfatto
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Martin Sill
- Division of Pediatric Neurooncology, Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany
- National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and Heidelberg University Hospital, Heidelberg, Germany
- German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, Germany
| | - David T W Jones
- National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and Heidelberg University Hospital, Heidelberg, Germany
- German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Pediatric Glioma Research, Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany
| | - Stefan M Pfister
- Division of Pediatric Neurooncology, Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany
- National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and Heidelberg University Hospital, Heidelberg, Germany
- German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, Germany
- Department of Pediatric Oncology, Hematology & Immunology, Heidelberg University Hospital, Heidelberg, Germany
| | - Felix Sahm
- Department of Neuropathology, Heidelberg University Hospital, Heidelberg, Germany
- Clinical Cooperation Unit Neuropathology, German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Andreas von Deimling
- Department of Neuropathology, Heidelberg University Hospital, Heidelberg, Germany
- Clinical Cooperation Unit Neuropathology, German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, Germany
| | - David Capper
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Volker Hovestadt
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
16
|
Alharbi F, Vakanski A, Zhang B, Elbashir MK, Mohammed M. Comparative Analysis of Multi-Omics Integration Using Graph Neural Networks for Cancer Classification. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2025; 13:37724-37736. [PMID: 40123934 PMCID: PMC11928009 DOI: 10.1109/access.2025.3540769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]
Abstract
Recent studies on integrating multiple omics data highlighted the potential to advance our understanding of the cancer disease process. Computational models based on graph neural networks and attention-based architectures have demonstrated promising results for cancer classification due to their ability to model complex relationships among biological entities. However, challenges related to addressing the high dimensionality and complexity in integrating multi-omics data, as well as in constructing graph structures that effectively capture the interactions between nodes, remain active areas of research. This study evaluates graph neural network architectures for multi-omics (MO) data integration based on graph-convolutional networks (GCN), graph-attention networks (GAT), and graph-transformer networks (GTN). Differential gene expression and LASSO (Least Absolute Shrinkage and Selection Operator) regression are employed for reducing the omics data dimensionality and feature selection; hence, the developed models are referred to as LASSO-MOGCN, LASSO-MOGAT, and LASSO-MOGTN. Graph structures constructed using sample correlation matrices and protein-protein interaction networks are investigated. Experimental validation is performed with a dataset of 8,464 samples from 31 cancer types and normal tissue, comprising messenger-RNA, micro-RNA, and DNA methylation data. The results show that the models integrating multi-omics data outperformed the models trained on single omics data, where LASSO-MOGAT achieved the best overall performance, with an accuracy of 95.9%. The findings also suggest that correlation-based graph structures enhance the models' ability to identify shared cancer-specific signatures across patients in comparison to protein-protein interaction networks-based graph structures. The code and data used in this study are available in the link (https://github.com/FadiAlharbi2024/Graph_Based_Architecture.git).
Collapse
Affiliation(s)
- Fadi Alharbi
- College of Engineering, Department of Computer Science, University of Idaho, Moscow, ID 83844, USA
| | - Aleksandar Vakanski
- College of Engineering, Department of Computer Science, University of Idaho, Moscow, ID 83844, USA
| | - Boyu Zhang
- College of Engineering, Department of Computer Science, University of Idaho, Moscow, ID 83844, USA
| | - Murtada K Elbashir
- College of Computer and Information Sciences, Department of Information Systems, Jouf University, Sakaka, Al-Jouf 72441, Saudi Arabia
| | - Mohanad Mohammed
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg 3209, South Africa
| |
Collapse
|
17
|
Zhang L, Yu T, Zheng G, Tang Q, Peng M, Li C, Hou Q, Yang Z. Using machine learning to predict selenium content in crops: Implications for soil health and agricultural land utilization in longevity regions. THE SCIENCE OF THE TOTAL ENVIRONMENT 2025; 964:178520. [PMID: 39842296 DOI: 10.1016/j.scitotenv.2025.178520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 01/11/2025] [Accepted: 01/12/2025] [Indexed: 01/24/2025]
Abstract
Selenium (Se) is an indispensable trace element to human health, yet its biological tolerance threshold is relatively narrow. The potential application of machine learning methods to indirectly predict the Se content in crops across regional areas, thereby validating the reasonableness of soil health thresholds, remains to be explored. This study analyzed the factors influencing Se absorption in crops from longevity regions and employed machine learning models to predict the bioconcentration factor of Se, thereby obtaining selenium content in these crops and ultimately estimated the Se threshold for healthy soils. The results indicated that the Artificial Neural Network (ANN) model demonstrated the best predictive performance for the bioaccumulation factor (BAF) of Se in crops. The maximum permissible concentration of Se in rice was 0.17 mg/kg, while the minimum was 0.03 mg/kg; for maize, the maximum permissible concentration was 0.25 mg/kg, and the minimum was 0.04 mg/kg. Approximately 68 % of the arable land in the study area was suitable for cultivating Se-rich crops, providing important insights for the optimization of crop cultivation.
Collapse
Affiliation(s)
- Liyue Zhang
- School of Science, China University of Geosciences, Beijing 100083, PR China
| | - Tao Yu
- School of Science, China University of Geosciences, Beijing 100083, PR China; Research Center of Geochemical Survey and Assessment on Land Quality, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang, Hebei 065000, PR China; Key Laboratory of Ecogeochemistry, Ministry of Natural Resources, Beijing 100037, PR China.
| | - Guodong Zheng
- Guangxi Institute of Geological Survey, Nanning, Guangxi 530023, PR China
| | - Qifeng Tang
- Key Laboratory of Ecogeochemistry, Ministry of Natural Resources, Beijing 100037, PR China.
| | - Min Peng
- Research Center of Geochemical Survey and Assessment on Land Quality, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang, Hebei 065000, PR China; Key Laboratory of Geochemical Cycling of Carbon and Mercury in the Earth's Critical Zone, Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang, Hebei 065000, PR China.
| | - Chang Li
- School of Science, China University of Geosciences, Beijing 100083, PR China
| | - Qingye Hou
- Key Laboratory of Ecogeochemistry, Ministry of Natural Resources, Beijing 100037, PR China; School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, PR China
| | - Zhongfang Yang
- Key Laboratory of Ecogeochemistry, Ministry of Natural Resources, Beijing 100037, PR China; School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, PR China
| |
Collapse
|
18
|
Gilholm P, Lister P, Irwin A, Harley A, Raman S, Schlapbach LJ, Gibbons KS. Comparison of Random Forest and Stepwise Regression for Variable Selection Using Low Prevalence Predictors: A case Study in Paediatric Sepsis. Matern Child Health J 2025:10.1007/s10995-025-04038-1. [PMID: 39812888 DOI: 10.1007/s10995-025-04038-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/01/2025] [Indexed: 01/16/2025]
Abstract
INTRODUCTION Variable selection is a common technique to identify the most predictive variables from a pool of candidate predictors. Low prevalence predictors (LPPs) are frequently found in clinical data, yet few studies have explored their impact on model performance during variable selection. This study compared the Random Forest (RF) algorithm and stepwise regression (SWR) for variable selection using data from a paediatric sepsis screening tool, where 18 out of 32 predictors had a prevalence < 10%. METHODS Variable selection using RF was compared to forward and backward SWR. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), and the variables retained. Additionally, a simulation study assessed how increasing the prevalence of the predictors impacted the variable selection results. RESULTS The best fitting RF and SWR models retained were 22, and 17 predictors, respectively, with 14 and 10 predictors having a prevalence < 10%. Both the RF and SWR models had similar predictive performance (RF: AUC [95% Confidence Interval] 0.79 [0.77, 0.81], LR: 0.80 [0.78, 0.82]). The simulation study revealed differences for both RF and SWR models in variable importance rankings and predictor selection with increasing prevalence thresholds, particularly for moderately and strongly associated predictors. DISCUSSION The RF algorithm retained a number of very low prevalence predictors compared to SWR. However, the predictive performance of both models were comparable, demonstrating that when applied correctly and the number of candidate predictors is small, both methods are suitable for variable selection when using low prevalence predictors.
Collapse
Affiliation(s)
- Patricia Gilholm
- Children's Intensive Care Research Program, Child Health Research Centre, The University of Queensland, Brisbane, QLD, Australia.
| | - Paula Lister
- Children's Intensive Care Research Program, Child Health Research Centre, The University of Queensland, Brisbane, QLD, Australia
- Paediatric Critical Care Unit, Sunshine Coast University Hospital, Birtinya, QLD, Australia
- School of Medicine, Griffith University, Nathan, QLD, Australia
| | - Adam Irwin
- UQ Centre for Clinical Research, The University of Queensland, Brisbane, QLD, Australia
- Queensland Children's Hospital, Brisbane, QLD, Australia
| | - Amanda Harley
- Children's Intensive Care Research Program, Child Health Research Centre, The University of Queensland, Brisbane, QLD, Australia
- Queensland Children's Hospital, Brisbane, QLD, Australia
| | - Sainath Raman
- Children's Intensive Care Research Program, Child Health Research Centre, The University of Queensland, Brisbane, QLD, Australia
- Queensland Children's Hospital, Brisbane, QLD, Australia
| | - Luregn J Schlapbach
- Children's Intensive Care Research Program, Child Health Research Centre, The University of Queensland, Brisbane, QLD, Australia
- Department of Intensive Care and Neonatology, and Children's Research Center, University Children's Hospital Zurich, Zurich, Switzerland
| | - Kristen S Gibbons
- Children's Intensive Care Research Program, Child Health Research Centre, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
19
|
Yan B, Liao P, Han Z, Zhao J, Gao H, Liu Y, Chen F, Lei P. Association of aging related genes and immune microenvironment with major depressive disorder. J Affect Disord 2025; 369:706-717. [PMID: 39419187 DOI: 10.1016/j.jad.2024.10.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 09/06/2024] [Accepted: 10/14/2024] [Indexed: 10/19/2024]
Abstract
OBJECTIVE To study the relationship between aging related genes (ARGs) and Major Depressive Disorder (MDD). METHODS The datasets GSE98793, GSE52790 and GSE39653 for MDD were obtained from the GEO database, and ARGs were obtained from the Human Aging Genome Resources database. Differential expression genes (DEGs) screening and GO, KEGG enrichment analysis were performed to uncover the underlying mechanisms. To identify key ARGs associated with MDD (key ARG-DEGs), we employed machine learning methods such as LASSO, SVM, and Random Forest, as well as the plug-ins CytoHubba-MCC and MCODE methods. SsGSEA was used to analyze the immune infiltration of MDD and healthy controls. Furthermore, we created risk prediction nomograms model and ROC curves to assess not only the ability of key ARG-DEGs to diagnose MDD, but also predicted miRNAs and transcription factors (TFs) that might interact. Finally, a two-sample Mendelian randomization (MR) study was performed to confirm the association of identified key ARG-DEGs with depression. RESULTS DEGs of ARGs between MDD and healthy controls led to the identification of eight ARG-DEGs. GO and KEGG analysis revealed that the pathways associated with these eight ARG-DEGs were primarily concentrated in Foxo pathway, JAK-STAT pathway, Pl3K-AKT pathway, and metabolic diseases. A comprehensive analysis further narrowed down the 8 ARG-DEGs to 4 key ARG-DEGs: MMP9, IL7R, S100B, and EGF. Immune infiltration analysis indicated significant differences in CD8(+) T cells, macrophages, neutrophils, Th2 cells, and TIL cells between MDD and control groups, correlating with these four key ARG-DEGs. Based on these four key ARG-DEGs, a risk prediction model for MDD was developed. The miRNA-TF-mRNA interaction network of the key ARG-DEGs highlights the complexity of the regulatory process, providing valuable insights for future related research. The MR study suggested a potential causal relationship between MMP9 and the risk of depression. CONCLUSION The process of aging, immune dysregulation, and MDD are closely interconnected. MMP9, IL7R, S100B, and EGF may be used as novel diagnostic biomarkers and potential therapeutic targets for MDD, especially MMP9.
Collapse
Affiliation(s)
- Bo Yan
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China
| | - Pan Liao
- Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China; School of Medicine, Nankai University, Tianjin 300192, China
| | - Zhaoli Han
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China
| | - Jing Zhao
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China
| | - Han Gao
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China
| | - Yuan Liu
- Institute of Mental Health, Tianjin Anding Hospital, Mental Health Center of Tianjin Medical University, Tianjin 300222, China
| | - Fanglian Chen
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China.
| | - Ping Lei
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China; School of Medicine, Nankai University, Tianjin 300192, China.
| |
Collapse
|
20
|
Sung K, Hwang S, Lee J, Cho J. Prognostic factors in patients with gastrointestinal perforation under the acute care surgery model : a retrospective cohort study. BMC Surg 2024; 24:406. [PMID: 39709362 DOI: 10.1186/s12893-024-02687-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 11/27/2024] [Indexed: 12/23/2024] Open
Abstract
BACKGROUND Gastrointestinal perforation (GIP) is a life-threatening condition that necessitates immediate surgical intervention. This study aims to identify prognostic factors in patients with GIP treated within a standardized acute care surgery (ACS) framework. MATERIALS AND METHODS This single center retrospective cohort study analyzed patients diagnosed with GIP who underwent emergent surgery and were admitted to the intensive care unit between January 2013 and March 2023. RESULTS Among 354 patients, the mortality was 11%, and 38% of survivors experienced significant complications (Clavien-Dindo class III or higher). Independent prognostic factors for mortality included initial sequential organ failure assessment (SOFA) scores (at the time of admission or ACS activation), postoperative SOFA (p-SOFA) scores, and postoperative body temperatures. For morbidity, independent predictors were the extent of peritonitis, the open surgery, postoperative albumin levels, and p-SOFA scores. These factors showed significant predictive accuracy for patient outcomes, as evidenced by the area under the receiver operating characteristic curve. The Random Forest model identified p-SOFA scores and postoperative albumin levels as the most significant predictors for both survival and complications, with feature importances of 40.46% and 36.61% for survival, and 39.97% and 37.28% for complications, respectively. Postoperative body temperature also played a moderately important role, contributing 14.63% to mortality and 15.9% to morbidity predictions. Patients with a p-SOFA score ≥ 7, postoperative albumin ≤ 2, and body temperature ≤ 36 °C, as well as those with a p-SOFA score ≥ 10, albumin ≤ 2.9, and body temperature ≤ 36 °C, had a 100% mortality rate. These factors are critical indicators for predicting patient outcomes. CONCLUSION It is crucial to establish a system that ensures rapid preoperative work-up, accurate surgical intervention, and evidence-based postoperative critical care. Implementing such a system and assessing patient outcomes after surgery using the identified factors could provide a more detailed evaluation.
Collapse
Affiliation(s)
- Kiyoung Sung
- Department of Surgery, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Sanguk Hwang
- Department of Artificial Intelligence, The Catholic University of Korea, Bucheon, Republic of Korea
| | - Jaeheon Lee
- Department of Surgery, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Jinbeom Cho
- Department of Surgery, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
| |
Collapse
|
21
|
Qian L, Yu X, Zhang Z, Wu L, Fan J, Xiang Y, Chen J, Liu X. Assessing and improving the high uncertainty of global gross primary productivity products based on deep learning under extreme climatic conditions. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 957:177344. [PMID: 39521074 DOI: 10.1016/j.scitotenv.2024.177344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 10/30/2024] [Accepted: 10/31/2024] [Indexed: 11/16/2024]
Abstract
Gross Primary Productivity (GPP) is a crucial indicator of the carbon fixed by plants through photosynthesis, playing a vital role in understanding and managing ecological and environmental processes. However, global warming, characterized by elevated temperatures, water shortage, and increased drought stress, has significantly impacted GPP. Various GPP products based on different algorithms and input data have been developed, but their performance under extreme climatic conditions remains unverified. This study evaluated the consistency and accuracy of eight global GPP products from 2003 to 2014 using flux towers data. The results show that GPP products performed well under overall conditions, with an average correlation coefficient (R2) of 0.604, and Penman-Monteith-Leuning-version-2 (PMLv2) showed the best performance (R2 = 0.664). However, under extreme climatic conditions like high temperature, high vapor pressure deficit (VPD), and drought, the accuracy significantly dropped (R2 = 0.3), with Global-dataset-of-solar-induced-chlorophyll-fluorescence (GOSIF) being the most affected. Accuracy was lower in croplands (CRO) and grasslands (GRA). To enhance accuracy under extreme climatic conditions, GPP products were used as inputs to a Convolutional Neural Network (CNN) based on ECMWF-Reanalysis-5th-Generation (ERA5) meteorological data and compared with random forests (RF). Four GPP products significantly contributed to the model, with a cumulative contribution of 80.3 %. Under extreme climatic conditions, CNN significantly improved the estimation accuracy of GPP and outperformed RF. The optimal values for R2 and the root mean square error (RMSE) were 0.905 (increase by at least 201.7 %) and 7.708 gC m-2 8d-1 (decrease by at least 50.7 %). The model also performed well at 20 independent validation sites (R2 = 0.783). This study offers a method to improve GPP estimation under extreme climatic conditions, unrestricted by time and space.
Collapse
Affiliation(s)
- Long Qian
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, China; Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Yangling 712100, China; School of Hydraulic and Ecological Engineering, Nanchang Institute of Technology, Nanchang 330099, China; Faculty of Modern Agricultural Engineering, Kunming University of Science and Technology, Kunming 650500, China.
| | - Xingjiao Yu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, China; Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Yangling 712100, China
| | - Zhitao Zhang
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, China; Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Yangling 712100, China.
| | - Lifeng Wu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, China; School of Hydraulic and Ecological Engineering, Nanchang Institute of Technology, Nanchang 330099, China; Faculty of Modern Agricultural Engineering, Kunming University of Science and Technology, Kunming 650500, China.
| | - Junliang Fan
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, China; Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Yangling 712100, China
| | - Youzhen Xiang
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, China; Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Yangling 712100, China
| | - Junying Chen
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, China; Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Yangling 712100, China
| | - Xiaogang Liu
- Faculty of Modern Agricultural Engineering, Kunming University of Science and Technology, Kunming 650500, China
| |
Collapse
|
22
|
Pelletier MC, Latimer JS, Rashleigh B, Tilburg C, Charpentier MA. Monitoring data compilations can be leveraged to highlight relationships between estuarine and watershed factors influencing eutrophication in estuaries. ENVIRONMENTAL MONITORING AND ASSESSMENT 2024; 197:80. [PMID: 39707068 PMCID: PMC11753031 DOI: 10.1007/s10661-024-13564-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 12/09/2024] [Indexed: 12/23/2024]
Abstract
Estuaries have been adversely impacted by increased nutrient loads. Eutrophication impacts from these loads include excess algal blooms and low oxygen conditions. In this study, we leveraged data from 28 monitoring programs in the northeastern US to explore the relationships between eutrophication response variables and watershed and estuarine variables. Extensive effort was needed to locate, harmonize, and assure the quality of the data. Random forest regression allowed us to identify the most important variables that could predict summer total nitrogen (TN), chlorophyll (chl), and bottom dissolved oxygen (DO). Several different summaries of the data were assessed. The best models for TN and chl used data summarized by estuary and year, explaining > 70% and > 60% of the variation, respectively. The best model for DO used data that were averaged by estuary across all years and explained > 55% of the variation. All models showed the importance of variables related to nutrient loading, such as population density and % development, and variables related to flushing rate, such as tidal range, length:width at mouth, and estuary openness. Future work will examine the impacts of climate on eutrophication response variables. This study demonstrates the utility of combining data from multiple unrelated routine monitoring programs to understand eutrophication impacts at regional scales.
Collapse
Affiliation(s)
- Marguerite C Pelletier
- Office of Research and Development, U.S. Environmental Protection Agency, Narragansett, RI, USA.
| | - James S Latimer
- Office of Research and Development, U.S. Environmental Protection Agency, Narragansett, RI, USA
| | - Brenda Rashleigh
- Office of Research and Development, U.S. Environmental Protection Agency, Narragansett, RI, USA
| | - Christine Tilburg
- Gulf of Maine Council, Ecosystem Indicator Partnership, Buxton, ME, USA
| | - Michael A Charpentier
- General Dynamics Information Technology, U.S. Environmental Protection Agency, Narragansett, RI, USA
| |
Collapse
|
23
|
Sinha K, Chakraborty S, Bardhan A, Saha R, Chakraborty S, Biswas S. A New Differential Gene Expression Based Simulated Annealing for Solving Gene Selection Problem: A Case Study on Eosinophilic Esophagitis and Few Other Gastro-intestinal Diseases. Biochem Genet 2024:10.1007/s10528-024-10987-z. [PMID: 39643769 DOI: 10.1007/s10528-024-10987-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Accepted: 11/25/2024] [Indexed: 12/09/2024]
Abstract
Identifying the set of genes collectively responsible for causing a disease from differential gene expression data is called gene selection problem. Though many complex methodologies have been applied to solve gene selection, formulated as an optimization problem, this study introduces a new simple, efficient, and biologically plausible solution procedure where the collective power of the targeted gene set to discriminate between diseased and normal gene expression profiles was focused. It uses Simulated Annealing to solve the underlying optimization problem and termed here as Differential Gene Expression Based Simulated Annealing (DGESA). The Ranked Variance (RV) method has been applied to prioritize genes to form reference set to compare with the outcome of DGESA. In a case study on Eosinophilic Esophagitis (EoE) and other gastrointestinal diseases, RV identified the top 40 high-variance genes, overlapping with disease-causing genes from DGESA. DGESA identified 40 gene pathways each for EoE, Crohn's Disease (CD), and Ulcerative Colitis (UC), with 10 genes for EoE, 8 for CD, and 7 for UC confirmed in literature. For EoE, confirmed genes include KRT79, CRISP2, IL36G, SPRR2B, SPRR2D, and SPRR2E. For CD, validated genes are NPDC1, SLC2A4RG, LGALS8, CDKN1A, XAF1, and CYBA. For UC, confirmed genes include TRAF3, BAG6, CCDC80, CDC42SE2, and HSPA9. RV and DGESA effectively elucidate molecular signatures in gastrointestinal diseases. Validating genes like SPRR2B, SPRR2D, SPRR2E, and STAT6 for EoE demonstrates DGESA's efficacy, highlighting potential targets for future research.
Collapse
Affiliation(s)
- Koushiki Sinha
- Department of CSE, Meghnad Saha Institute of Technology, Behind Urbana Complex Near Ruby General Hospital, Anandapur Rd, Uchhepota, Kolkata, West Bengal, 700150, India
| | - Sanchari Chakraborty
- Department of CSE, Meghnad Saha Institute of Technology, Behind Urbana Complex Near Ruby General Hospital, Anandapur Rd, Uchhepota, Kolkata, West Bengal, 700150, India
| | - Arohit Bardhan
- Department of CSE, Meghnad Saha Institute of Technology, Behind Urbana Complex Near Ruby General Hospital, Anandapur Rd, Uchhepota, Kolkata, West Bengal, 700150, India
| | - Riju Saha
- Department of CSE, Meghnad Saha Institute of Technology, Behind Urbana Complex Near Ruby General Hospital, Anandapur Rd, Uchhepota, Kolkata, West Bengal, 700150, India
| | - Srijan Chakraborty
- Department of CSE, Meghnad Saha Institute of Technology, Behind Urbana Complex Near Ruby General Hospital, Anandapur Rd, Uchhepota, Kolkata, West Bengal, 700150, India
| | - Surama Biswas
- Department of CSE, Meghnad Saha Institute of Technology, Behind Urbana Complex Near Ruby General Hospital, Anandapur Rd, Uchhepota, Kolkata, West Bengal, 700150, India.
| |
Collapse
|
24
|
Shamraeva M, Visvikis T, Zoidis S, Anthony IGM, Van Nuffel S. The Application of a Random Forest Classifier to ToF-SIMS Imaging Data. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:2801-2814. [PMID: 39455427 PMCID: PMC11622239 DOI: 10.1021/jasms.4c00324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 10/11/2024] [Accepted: 10/18/2024] [Indexed: 10/28/2024]
Abstract
Time-of-flight secondary ion mass spectrometry (ToF-SIMS) imaging is a potent analytical tool that provides spatially resolved chemical information on surfaces at the microscale. However, the hyperspectral nature of ToF-SIMS datasets can be challenging to analyze and interpret. Both supervised and unsupervised machine learning (ML) approaches are increasingly useful to help analyze ToF-SIMS data. Random Forest (RF) has emerged as a robust and powerful algorithm for processing mass spectrometry data. This machine learning approach offers several advantages, including accommodating nonlinear relationships, robustness to outliers in the data, managing the high-dimensional feature space, and mitigating the risk of overfitting. The application of RF to ToF-SIMS imaging facilitates the classification of complex chemical compositions and the identification of features contributing to these classifications. This tutorial aims to assist nonexperts in either machine learning or ToF-SIMS to apply Random Forest to complex ToF-SIMS datasets.
Collapse
Affiliation(s)
- Mariya
A. Shamraeva
- Maastricht
MultiModal Molecular Imaging Institute (M4i), Maastricht University, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | - Theodoros Visvikis
- Faculty
of Science and Engineering, Maastricht University, Paul-Henri Spaaklaan 1, Maastricht 6229EN, The Netherlands
| | - Stefanos Zoidis
- Faculty
of Science and Engineering, Maastricht University, Paul-Henri Spaaklaan 1, Maastricht 6229EN, The Netherlands
| | - Ian G. M. Anthony
- Maastricht
MultiModal Molecular Imaging Institute (M4i), Maastricht University, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | - Sebastiaan Van Nuffel
- Maastricht
MultiModal Molecular Imaging Institute (M4i), Maastricht University, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
- Faculty
of Science and Engineering, Maastricht University, Paul-Henri Spaaklaan 1, Maastricht 6229EN, The Netherlands
| |
Collapse
|
25
|
Myers CE, Dave CV, Chesin MS, Marx BP, St Hill LM, Reddy V, Miller RB, King A, Interian A. Initial evaluation of a personalized advantage index to determine which individuals may benefit from mindfulness-based cognitive therapy for suicide prevention. Behav Res Ther 2024; 183:104637. [PMID: 39306938 PMCID: PMC11620942 DOI: 10.1016/j.brat.2024.104637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 08/09/2024] [Accepted: 09/16/2024] [Indexed: 09/26/2024]
Abstract
OBJECTIVE Develop and evaluate a treatment matching algorithm to predict differential treatment response to Mindfulness-Based Cognitive Therapy for suicide prevention (MBCT-S) versus enhanced treatment-as-usual (eTAU). METHODS Analyses used data from Veterans at high-risk for suicide assigned to either MBCT-S (n = 71) or eTAU (n = 69) in a randomized clinical trial. Potential predictors (n = 55) included available demographic, clinical, and neurocognitive variables. Random forest models were used to predict risk of suicidal event (suicidal behaviors, or ideation resulting in hospitalization or emergency department visit) within 12 months following randomization, characterize the prediction, and develop a Personalized Advantage Index (PAI). RESULTS A slightly better prediction model emerged for MBCT-S (AUC = 0.70) than eTAU (AUC = 0.63). Important outcome predictors for participants in the MBCT-S arm included PTSD diagnosis, decisional efficiency on a neurocognitive task (Go/No-Go), prior-year mental health residential treatment, and non-suicidal self-injury. Significant predictors for participants in the eTAU arm included past-year acute psychiatric hospitalizations, past-year outpatient psychotherapy visits, past-year suicidal ideation severity, and attentional control (indexed by Stroop task). A moderation analysis showed that fewer suicidal events occurred among those randomized to their PAI-indicated optimal treatment. CONCLUSIONS PAI-guided treatment assignment may enhance suicide prevention outcomes. However, prior to real-world application, additional research is required to improve model accuracy and evaluate model generalization.
Collapse
Affiliation(s)
- Catherine E Myers
- Research and Development Service, VA New Jersey Health Care System, East Orange, NJ, USA; Department of Pharmacology, Physiology & Neuroscience, New Jersey Medical School, Rutgers, The State University of New Jersey, Newark, NJ, USA
| | - Chintan V Dave
- Center for Pharmacoepidemiology and Treatment Science, Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, USA
| | - Megan S Chesin
- Department of Psychology, William Paterson University, USA
| | - Brian P Marx
- National Center for PTSD, Behavioral Sciences Division at the VA Boston Health Care System, Boston, MA, USA; Boston University School of Medicine, Boston, MA, USA
| | - Lauren M St Hill
- Mental Health and Behavioral Sciences, VA New Jersey Health Care System, Lyons, NJ, USA
| | - Vibha Reddy
- Research and Development Service, VA New Jersey Health Care System, East Orange, NJ, USA
| | - Rachael B Miller
- Mental Health and Behavioral Sciences, VA New Jersey Health Care System, Lyons, NJ, USA
| | - Arlene King
- Mental Health and Behavioral Sciences, VA New Jersey Health Care System, Lyons, NJ, USA
| | - Alejandro Interian
- Mental Health and Behavioral Sciences, VA New Jersey Health Care System, Lyons, NJ, USA; Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
| |
Collapse
|
26
|
Sun C, Liu ZP. Discovering explainable biomarkers for breast cancer anti-PD1 response via network Shapley value analysis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 257:108481. [PMID: 39488042 DOI: 10.1016/j.cmpb.2024.108481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/20/2024] [Accepted: 10/24/2024] [Indexed: 11/04/2024]
Abstract
BACKGROUND AND OBJECTIVE Immunotherapy holds promise in enhancing pathological complete response rates in breast cancer, albeit confined to a select cohort of patients. Consequently, pinpointing factors predictive of treatment responsiveness is of paramount importance. Gene expression and regulation, inherently operating within intricate networks, constitute fundamental molecular machinery for cellular processes and often serve as robust biomarkers. Nevertheless, contemporary feature selection approaches grapple with two key challenges: opacity in modeling and scarcity in accounting for gene-gene interactions METHODS: To address these limitations, we devise a novel feature selection methodology grounded in cooperative game theory, harmoniously integrating with sophisticated machine learning models. This approach identifies interconnected gene regulatory network biomarker modules with priori genetic linkage architecture. Specifically, we leverage Shapley values on network to quantify feature importance, while strategically constraining their integration based on network expansion principles and nodal adjacency, thereby fostering enhanced interpretability in feature selection. We apply our methods to a publicly available single-cell RNA sequencing dataset of breast cancer immunotherapy responses, using the identified feature gene set as biomarkers. Functional enrichment analysis with independent validations further illustrates their effective predictive performance RESULTS: We demonstrate the sophistication and excellence of the proposed method in data with network structure. It unveiled a cohesive biomarker module encompassing 27 genes for immunotherapy response. Notably, this module proves adept at precisely predicting anti-PD1 therapeutic outcomes in breast cancer patients with classification accuracy of 0.905 and AUC value of 0.971, underscoring its unique capacity to illuminate gene functionalities CONCLUSION: The proposed method is effective for identifying network module biomarkers, and the detected anti-PD1 response biomarkers can enrich our understanding of the underlying physiological mechanisms of immunotherapy, which have a promising application for realizing precision medicine.
Collapse
Affiliation(s)
- Chenxi Sun
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China.
| |
Collapse
|
27
|
Pradhan UK, Behera P, Das R, Naha S, Gupta A, Parsad R, Pradhan SK, Meher PK. AScirRNA: A novel computational approach to discover abiotic stress-responsive circular RNAs in plant genome. Comput Biol Chem 2024; 113:108205. [PMID: 39265460 DOI: 10.1016/j.compbiolchem.2024.108205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 07/12/2024] [Accepted: 09/04/2024] [Indexed: 09/14/2024]
Abstract
In the realm of plant biology, understanding the intricate regulatory mechanisms governing stress responses stands as a pivotal pursuit. Circular RNAs (circRNAs), emerging as critical players in gene regulation, have garnered attention in recent days for their potential roles in abiotic stress adaptation. A comprehensive grasp of circRNAs' functions in stress response offers avenues for breeders to manipulating plants to develop abiotic stress resistant crop cultivars to thrive in challenging climates. This study pioneers a machine learning-based model for predicting abiotic stress-responsive circRNAs. The K-tuple nucleotide composition (KNC) and Pseudo KNC (PKNC) features were utilized to numerically represent circRNAs. Three different feature selection strategies were employed to select relevant and non-redundant features. Eight shallow and four deep learning algorithms were evaluated to build the final predictive model. Following five-fold cross-validation process, XGBoost learning algorithm demonstrated superior performance with LightGBM-chosen 260 KNC features (Accuracy: 74.55 %, auROC: 81.23 %, auPRC: 76.52 %) and 160 PKNC features (Accuracy: 74.32 %, auROC: 81.04 %, auPRC: 76.43 %), over other combinations of learning algorithms and feature selection techniques. Further, the robustness of the developed models were evaluated using an independent test dataset, where the overall accuracy, auROC and auPRC were found to be 73.13 %, 72.34 % and 72.68 % for KNC feature set and 73.52 %, 79.53 % and 73.09 % for PKNC feature set, respectively. This computational approach was also integrated into an online prediction tool, AScirRNA (https://iasri-sg.icar.gov.in/ascirna/) for easy prediction by the users. Both the proposed model and the developed tool are poised to augment ongoing efforts in identifying stress-responsive circRNAs in plants.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Prasanjit Behera
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, Odisha 751003, India.
| | - Ritwika Das
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Sukanta Kumar Pradhan
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, Odisha 751003, India.
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| |
Collapse
|
28
|
Raveendrakumar E, Gopichand B, Bhosale H, Melethadathil N, Valadi J. Uncovering blood-brain barrier permeability: a comparative study of machine learning models using molecular fingerprints, and SHAP explainability. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024; 35:1155-1171. [PMID: 39773123 DOI: 10.1080/1062936x.2024.2446352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 12/17/2024] [Indexed: 01/11/2025]
Abstract
This study illustrates the use of chemical fingerprints with machine learning for blood-brain barrier (BBB) permeability prediction. Employing the Blood Brain Barrier Database (B3DB) dataset for BBB permeability prediction, we extracted nine different fingerprints. Support Vector Machine (SVM) and Extreme Gradient Boosting (XGBoost) algorithms were used to develop models for permeability prediction. Random Forest recursive Feature Selection (RF-RFS) method was used for extracting informative attributes. An additional database was employed for the validation phase. The results indicate that all nine datasets achieved good performance in training, test and validation stages. We further took MACC Keys fingerprints, one of the best performing models for explainability analysis. For this purpose, we used SHapley Additive exPlanations (SHAP) analysis on this dataset for the identification of key structural features influencing BBB permeability prediction. These features include aliphatic carbons, methyl groups and oxygen-containing groups. This study highlights the effectiveness of different fingerprint descriptors in predicting BBB permeability. SHAP analysis provides value additions to the simulations. These simulations will be of significant help in drug discovery processes, particularly in developing Central Nervous System (CNS) therapeutics.
Collapse
Affiliation(s)
- E Raveendrakumar
- Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri, India
| | - B Gopichand
- Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri, India
| | - H Bhosale
- School of Computing and Data Sciences, FLAME University, Pune, India
| | - N Melethadathil
- Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri, India
| | - J Valadi
- School of Computing and Data Sciences, FLAME University, Pune, India
| |
Collapse
|
29
|
Redeker I, Tsiami S, Eicker J, Kiltz U, Kiefer D, Andreica I, Sewerin P, Baraliakos X. Identification of a machine learning-based diagnostic model for axial spondyloarthritis in rheumatological routine care using a random forest approach. RMD Open 2024; 10:e004702. [PMID: 39608866 PMCID: PMC11603692 DOI: 10.1136/rmdopen-2024-004702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 10/29/2024] [Indexed: 11/30/2024] Open
Abstract
OBJECTIVES In axial spondyloarthritis (axSpA), early diagnosis is crucial, but diagnostic delay remains long and diagnostic criteria do not exist. We aimed to identify a diagnostic model that distinguishes patients with axSpA from patients without axSpA with chronic back pain based on clinical data in routine care. METHODS Clinical data from patients with chronic back pain were used, with information on rheumatological examinations based on clinical indications. The total dataset was randomly divided into training and test datasets at a 7:3 ratio. A machine learning-based model was built to distinguish axSpA from non-axSpA using the random forest algorithm. Overall accuracy, sensitivity, specificity and the area under the receiver operating characteristic curve-area under the curve (ROC-AUC) in the test dataset were calculated. The contribution of each variable to the accuracy of the model was assessed. RESULTS Data from 939 randomly selected patients were available: 659 diagnosed with axSpA and 280 with non-axSpA. In the test dataset, the model reached an accuracy of 0.9234, a sensitivity of 0.9586, a specificity of 0.8438 and a ROC-AUC of 0.9717. Human leucocyte antigen B27 (HLA-B27) contributed most to the accuracy of the model; that is, the accuracy would suffer most from not using HLA-B27, followed by insidious onset of back pain and erosions in the sacroiliac joint. CONCLUSIONS We provide a machine learning-based model that reveals high performance in diagnosing patients with chronic back pain with axSpA versus without axSpA based on information from a tertiary rheumatology practice. This model has the potential to improve diagnostic delay in patients with axSpA in daily routine settings.
Collapse
Affiliation(s)
- Imke Redeker
- Rheumatology, Ruhr-Universität Bochum and Rheumazentrum Ruhrgebiet, Bochum, Germany
| | - Styliani Tsiami
- Rheumatology, Ruhr-Universität Bochum and Rheumazentrum Ruhrgebiet, Bochum, Germany
| | - Jan Eicker
- Rheumatology, Ruhr-Universität Bochum and Rheumazentrum Ruhrgebiet, Bochum, Germany
| | - Uta Kiltz
- Rheumatology, Ruhr-Universität Bochum and Rheumazentrum Ruhrgebiet, Bochum, Germany
| | - David Kiefer
- Rheumatology, Ruhr-Universität Bochum and Rheumazentrum Ruhrgebiet, Bochum, Germany
| | - Ioana Andreica
- Rheumatology, Ruhr-Universität Bochum and Rheumazentrum Ruhrgebiet, Bochum, Germany
| | - Philipp Sewerin
- Rheumatology, Ruhr-Universität Bochum and Rheumazentrum Ruhrgebiet, Bochum, Germany
| | - Xenofon Baraliakos
- Rheumatology, Ruhr-Universität Bochum and Rheumazentrum Ruhrgebiet, Bochum, Germany
| |
Collapse
|
30
|
Aghdam R, Tang X, Shan S, Lankau R, Solís-Lemus C. Human limits in machine learning: prediction of potato yield and disease using soil microbiome data. BMC Bioinformatics 2024; 25:366. [PMID: 39592933 PMCID: PMC11600749 DOI: 10.1186/s12859-024-05977-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 11/06/2024] [Indexed: 11/28/2024] Open
Abstract
BACKGROUND The preservation of soil health is a critical challenge in the 21st century due to its significant impact on agriculture, human health, and biodiversity. We provide one of the first comprehensive investigations into the predictive potential of machine learning models for understanding the connections between soil and biological phenotypes. We investigate an integrative framework performing accurate machine learning-based prediction of plant performance from biological, chemical, and physical properties of the soil via two models: random forest and Bayesian neural network. RESULTS Prediction improves when we add environmental features, such as soil properties and microbial density, along with microbiome data. Different preprocessing strategies show that human decisions significantly impact predictive performance. We show that the naive total sum scaling normalization that is commonly used in microbiome research is one of the optimal strategies to maximize predictive power. Also, we find that accurately defined labels are more important than normalization, taxonomic level, or model characteristics. ML performance is limited when humans can't classify samples accurately. Lastly, we provide domain scientists via a full model selection decision tree to identify the human choices that optimize model prediction power. CONCLUSIONS Our study highlights the importance of incorporating diverse environmental features and careful data preprocessing in enhancing the predictive power of machine learning models for soil and biological phenotype connections. This approach can significantly contribute to advancing agricultural practices and soil health management.
Collapse
Affiliation(s)
- Rosa Aghdam
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Xudong Tang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Shan Shan
- Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI, USA
| | - Richard Lankau
- Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI, USA
| | - Claudia Solís-Lemus
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
31
|
Smith HL, Biggs PJ, French NP, Smith ANH, Marshall JC. Out of (the) bag-encoding categorical predictors impacts out-of-bag samples. PeerJ Comput Sci 2024; 10:e2445. [PMID: 39650463 PMCID: PMC11623134 DOI: 10.7717/peerj-cs.2445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 10/01/2024] [Indexed: 12/11/2024]
Abstract
Performance of random forest classification models is often assessed and interpreted using out-of-bag (OOB) samples. Observations which are OOB when a tree is trained may serve as a test set for that tree and predictions from the OOB observations used to calculate OOB error and variable importance measures (VIM). OOB errors are popular because they are fast to compute and, for large samples, are a good estimate of the true prediction error. In this study, we investigate how target-based vs. target-agnostic encoding of categorical predictor variables for random forest can bias performance measures based on OOB samples. We show that, when categorical variables are encoded using a target-based encoding method, and when the encoding takes place prior to bagging, the OOB sample can underestimate the true misclassification rate, and overestimate variable importance. We recommend using a separate test data set when evaluating variable importance and/or predictive performance of tree based methods that utilise a target-based encoding method.
Collapse
Affiliation(s)
- Helen L. Smith
- School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand
| | - Patrick J. Biggs
- School of Food Technology and Natural Sciences, Massey University, Palmerston North, New Zealand
- NZ Food Safety and Science Research Centre, Massey University, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Nigel P. French
- NZ Food Safety and Science Research Centre, Massey University, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Adam N. H. Smith
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| | - Jonathan C. Marshall
- School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand
| |
Collapse
|
32
|
Li J, Guo S, Zhang X, He Y, Wang Y, Tian H, Zhang Q. Identification of Key Genes Involved in Seed Germination of Astragalus mongholicus. Int J Mol Sci 2024; 25:12342. [PMID: 39596407 PMCID: PMC11595215 DOI: 10.3390/ijms252212342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 11/12/2024] [Accepted: 11/14/2024] [Indexed: 11/28/2024] Open
Abstract
Seed germination is a fundamental process in plant reproduction, and it involves a series of complex physiological mechanisms. The germination rate of Astragalus mongholicus (AM) seeds is significantly lower under natural conditions. To investigate the key genes associated with AM seed germination, seeds from AM plants were collected at 0, 12, 24, and 48 h for a transcriptomic analysis, weighted gene co-expression network analysis (WGCNA), and machine learning (ML) analysis. The primary pathways involved in AM seed germination include plant-pathogen interactions and plant hormone signaling. Four key genes were identified through the WGCNA and ML: Cluster-28,554.0, FAS4, T10O24.10, and EPSIN2. These findings were validated using real-time quantitative reverse transcription PCR (qRT-PCR), and results from RNA sequencing demonstrated a high degree of concordance. This study reveals, for the first time, the key genes related to AM seed germination, providing potential gene targets for further research. The discovery of N4-acetylcysteine (ac4C) modification during seed germination not only enhances our understanding of plant ac4C but also offers valuable insights for future functional research and application exploration.
Collapse
Affiliation(s)
- Junlin Li
- Industrial Crop Institute, Shanxi Agricultural University, Fenyang 032200, China; (J.L.); (S.G.); (Y.W.)
- School of Pharmacy, Shanxi Medical University, Taiyuan 030001, China;
| | - Shuhong Guo
- Industrial Crop Institute, Shanxi Agricultural University, Fenyang 032200, China; (J.L.); (S.G.); (Y.W.)
| | - Xian Zhang
- College of Agriculture, Shanxi Agricultural University, Jinzhong 030801, China;
| | - Yuhao He
- School of Pharmacy, Shanxi Medical University, Taiyuan 030001, China;
| | - Yaoqin Wang
- Industrial Crop Institute, Shanxi Agricultural University, Fenyang 032200, China; (J.L.); (S.G.); (Y.W.)
| | - Hongling Tian
- Industrial Crop Institute, Shanxi Agricultural University, Fenyang 032200, China; (J.L.); (S.G.); (Y.W.)
| | - Qiong Zhang
- School of Pharmacy, Shanxi Medical University, Taiyuan 030001, China;
| |
Collapse
|
33
|
Cattelani L, Fortino V. Triple and quadruple optimization for feature selection in cancer biomarker discovery. J Biomed Inform 2024; 159:104736. [PMID: 39395708 DOI: 10.1016/j.jbi.2024.104736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 10/07/2024] [Accepted: 10/09/2024] [Indexed: 10/14/2024]
Abstract
The proliferation of omics data has advanced cancer biomarker discovery but often falls short in external validation, mainly due to a narrow focus on prediction accuracy that neglects clinical utility and validation feasibility. We introduce three- and four-objective optimization strategies based on genetic algorithms to identify clinically actionable biomarkers in omics studies, addressing classification tasks aimed at distinguishing hard-to-differentiate cancer subtypes beyond histological analysis alone. Our hypothesis is that by optimizing more than one characteristic of cancer biomarkers, we may identify biomarkers that will enhance their success in external validation. Our objectives are to: (i) assess the biomarker panel's accuracy using a machine learning (ML) framework; (ii) ensure the biomarkers exhibit significant fold-changes across subtypes, thereby boosting the success rate of PCR or immunohistochemistry validations; (iii) select a concise set of biomarkers to simplify the validation process and reduce clinical costs; and (iv) identify biomarkers crucial for predicting overall survival, which plays a significant role in determining the prognostic value of cancer subtypes. We implemented and applied triple and quadruple optimization algorithms to renal carcinoma gene expression data from TCGA. The study targets kidney cancer subtypes that are difficult to distinguish through histopathology methods. Selected RNA-seq biomarkers were assessed against the gold standard method, which relies solely on clinical information, and in external microarray-based validation datasets. Notably, these biomarkers achieved over 0.8 of accuracy in external validations and added significant value to survival predictions, outperforming the use of clinical data alone with a superior c-index. The provided tool also helps explore the trade-off between objectives, offering multiple solutions for clinical evaluation before proceeding to costly validation or clinical trials.
Collapse
Affiliation(s)
- L Cattelani
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, 70210 Kuopio, Finland
| | - V Fortino
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, 70210 Kuopio, Finland.
| |
Collapse
|
34
|
Lu C, Wang X, Ye P, Lu Z, Ma J, Luo W, Wang S, Chen X. Antimicrobial Peptides From the Gut Microbiome of the Centenarians: Diversification of Biosynthesis and Youthful Development of Resistance Genes. J Gerontol A Biol Sci Med Sci 2024; 79:glae218. [PMID: 39207726 DOI: 10.1093/gerona/glae218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Indexed: 09/04/2024] Open
Abstract
Antimicrobial peptides (AMPs) offer a potential solution to the antibiotic crisis owing to their antimicrobial properties, and the human gut biome may be a source of these peptides. However, the potential AMPs and AMP resistance genes (AMPRGs) of gut microbes in different age groups have not been thoroughly assessed. Here, we investigated the potential development of AMPs and the distribution pattern of AMPRGs in the gut microbiome at different ages by analyzing the intestinal metagenomic data of healthy individuals at different life stages (CG: centenarians group n = 20; OAG: older adults group: n = 15; YG: young group: n = 15). Age-related increases were observed in the potential AMPs within the gut microbiome, with centenarians showing a greater diversity of these peptides. However, the gut microbiome of the CG group had a lower level of AMPRGs compared to that of the OAG group, and it was similar to the level found in the YG group. Additionally, conventional probiotic strains showed a significant positive correlation with certain potential AMPs and were associated with a lower detection of resistance genes. Furthermore, comparing potential AMPs with existing libraries revealed limited similarity, indicating that current machine learning models can identify novel peptides in the gut microbiota. These results indicate that longevity may benefit from the diversity of AMPs and lower resistance genes. Our findings help explain the age advantage of the centenarians and identify the potential for antimicrobial peptide biosynthesis in the human gut microbiome, offering insights into the development of antimicrobial peptide resistance and the screening of probiotic strains.
Collapse
Affiliation(s)
- Chunrong Lu
- AIage Life Science Corporation Ltd., Guangxi Free Trade Zone Aisheng Biotechnology Corporation Ltd., Nanning, Guangxi, China
| | - Xiaojun Wang
- AIage Life Science Corporation Ltd., Guangxi Free Trade Zone Aisheng Biotechnology Corporation Ltd., Nanning, Guangxi, China
| | - Pengpeng Ye
- AIage Life Science Corporation Ltd., Guangxi Free Trade Zone Aisheng Biotechnology Corporation Ltd., Nanning, Guangxi, China
| | - Zhilong Lu
- AIage Life Science Corporation Ltd., Guangxi Free Trade Zone Aisheng Biotechnology Corporation Ltd., Nanning, Guangxi, China
| | - Jie Ma
- AIage Life Science Corporation Ltd., Guangxi Free Trade Zone Aisheng Biotechnology Corporation Ltd., Nanning, Guangxi, China
| | - Weifei Luo
- AIage Life Science Corporation Ltd., Guangxi Free Trade Zone Aisheng Biotechnology Corporation Ltd., Nanning, Guangxi, China
| | - Shuai Wang
- AIage Life Science Corporation Ltd., Guangxi Free Trade Zone Aisheng Biotechnology Corporation Ltd., Nanning, Guangxi, China
- State Key Laboratory for Animal Disease Control and Prevention, College of Veterinary Medicine, Lanzhou University, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, China
| | - Xiaochun Chen
- AIage Life Science Corporation Ltd., Guangxi Free Trade Zone Aisheng Biotechnology Corporation Ltd., Nanning, Guangxi, China
| |
Collapse
|
35
|
Leclerc H, Lee AKW, Kunicki ZJ, Alber J. Added value of inflammatory plasma biomarkers to pathologic biomarkers in predicting preclinical Alzheimer's disease. J Alzheimers Dis 2024; 102:89-98. [PMID: 39497301 PMCID: PMC11540337 DOI: 10.1177/13872877241283692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2024]
Abstract
BACKGROUND Plasma biomarkers have recently emerged for the diagnosis, assessment, and disease monitoring of Alzheimer's disease (AD), but have yet to be fully validated in preclinical AD. In addition to AD pathologic plasma biomarkers (amyloid-β (Aβ) and phosphorylated tau (p-tau) species), a proteomic panel can discriminate between symptomatic AD and cognitively unimpaired older adults in a dementia clinic population. OBJECTIVE Examine the added value of a plasma proteomic panel, validated in symptomatic AD, over standard AD pathologic plasma biomarkers and demographic and genetic (apolipoprotein (APOE) ɛ4 status) risk factors in detecting preclinical AD. METHODS 125 cognitively unimpaired older adults (mean age = 66 years) who completed Aβ PET and plasma draw were analyzed using multiple regression with Aβ PET status (positive versus negative) as the outcome to determine the best fit for predicting preclinical AD. Model 1 included age, education, and gender. Model 2 and 3 added predictors APOE ɛ4 status (carrier versus non-carrier) and AD pathologic blood biomarkers (Aβ42/40 ratio, p-tau181), respectively. Random forest modeling established the 5 proteomic markers from the proteomic panel that best predicted Aβ PET status, and these markers were added in Model 4. RESULTS The best model for predicting Aβ PET status included age, years of education, APOE ɛ4 status, Aβ42/40 ratio, and p-tau181. Adding the top 5 proteomic markers did not significantly improve the model. CONCLUSIONS Proteomic markers in plasma did not add predictive value to standard AD pathologic plasma biomarkers in predicting preclinical AD in this sample.
Collapse
Affiliation(s)
- Haley Leclerc
- Interdisciplinary Neuroscience Program, University of Rhode Island, Kingston, RI, USA
| | - Athene KW Lee
- Butler Hospital Memory & Aging Program, Providence, RI, USA
- Department of Psychiatry and Human Behavior, Alpert Medical School of Brown University, Providence, RI, USA
| | - Zachary J Kunicki
- Department of Psychiatry and Human Behavior, Alpert Medical School of Brown University, Providence, RI, USA
| | - Jessica Alber
- Interdisciplinary Neuroscience Program, University of Rhode Island, Kingston, RI, USA
- Butler Hospital Memory & Aging Program, Providence, RI, USA
- Department of Biomedical and Pharmaceutical Sciences, University of Rhode Island, Kingston, RI, USA
| |
Collapse
|
36
|
Ravindran U, Gunavathi C. Deep learning assisted cancer disease prediction from gene expression data using WT-GAN. BMC Med Inform Decis Mak 2024; 24:311. [PMID: 39449042 PMCID: PMC11515488 DOI: 10.1186/s12911-024-02712-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 10/08/2024] [Indexed: 10/26/2024] Open
Abstract
Several diverse fields including the healthcare system and drug development sectors have benefited immensely through the adoption of deep learning (DL), which is a subset of artificial intelligence (AI) and machine learning (ML). Cancer makes up a significant percentage of the illnesses that cause early human mortality across the globe, and this situation is likely to rise in the coming years, especially when non-communicable illnesses are not considered. As a result, cancer patients would greatly benefit from precise and timely diagnosis and prediction. Deep learning (DL) has become a common technique in healthcare due to the abundance of computational power. Gene expression datasets are frequently used in major DL-based applications for illness detection, notably in cancer therapy. The quantity of medical data, on the other hand, is often insufficient to fulfill deep learning requirements. Microarray gene expression datasets are used for training procedures despite their extreme dimensionality, limited volume of data samples, and sparsely available information. Data augmentation is commonly used to expand the training sample size for gene data. The Wasserstein Tabular Generative Adversarial Network (WT-GAN) model is used for the data augmentation process for generating synthetic data in this proposed work. The correlation-based feature selection technique selects the most relevant characteristics based on threshold values. Deep FNN and ML algorithms train and classify the gene expression samples. The augmented data give better classification results (> 97%) when using WT-GAN for cancer diagnosis.
Collapse
Affiliation(s)
- U Ravindran
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
| | - C Gunavathi
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
37
|
Li X, Kouznetsova VL, Tsigelny IF. miRNA in Machine-Learning-Based Diagnostics of Oral Cancer. Biomedicines 2024; 12:2404. [PMID: 39457716 PMCID: PMC11504892 DOI: 10.3390/biomedicines12102404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Revised: 09/23/2024] [Accepted: 10/12/2024] [Indexed: 10/28/2024] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) are crucial regulators of gene expression, playing significant roles in various cellular processes, including cancer pathogenesis. Traditional cancer diagnostic methods, such as biopsies and histopathological analyses, while effective, are invasive, costly, and require specialized skills. With the rising global incidence of cancer, there is a pressing need for more accessible and less invasive diagnostic alternatives. OBJECTIVE This research investigates the potential of machine-learning (ML) models based on miRNA attributes as non-invasive diagnostic tools for oral cancer. Methods and Tools: We utilized a comprehensive methodological framework involving the generation of miRNA attributes, including sequence characteristics, target gene associations, and cancer-specific signaling pathways. RESULTS The miRNAs were classified using various ML algorithms, with the BayesNet classifier demonstrating superior performance, achieving an accuracy of 95% and an area under receiver operating characteristic curve (AUC) of 0.98 during cross-validation. The model's effectiveness was further validated using independent datasets, confirming its potential clinical utility. DISCUSSION Our findings highlight the promise of miRNA-based ML models in enhancing early cancer detection, reducing healthcare burdens, and potentially saving lives. CONCLUSIONS This study paves the way for future research into miRNA biomarkers, offering a scalable and adaptable diagnostic approach for various cancers.
Collapse
Affiliation(s)
- Xinghang Li
- IUL Scientific Program, La Jolla, CA 92038, USA (V.L.K.)
| | - Valentina L. Kouznetsova
- IUL Scientific Program, La Jolla, CA 92038, USA (V.L.K.)
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Igor F. Tsigelny
- IUL Scientific Program, La Jolla, CA 92038, USA (V.L.K.)
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
38
|
Han Y, Ding C, Yang S, Ge Y, Yin J, Zhao Y, Zhang J. Comparison of Electrocardiogram between Dilated Cardiomyopathy and Ischemic Cardiomyopathy Based on Empirical Mode Decomposition and Variational Mode Decomposition. Bioengineering (Basel) 2024; 11:1012. [PMID: 39451388 PMCID: PMC11505311 DOI: 10.3390/bioengineering11101012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 09/29/2024] [Accepted: 10/08/2024] [Indexed: 10/26/2024] Open
Abstract
The clinical manifestations of ischemic cardiomyopathy (ICM) bear resemblance to dilated cardiomyopathy (DCM), yet their treatments and prognoses are quite different. Early differentiation between these conditions yields positive outcomes, but the gold standard (coronary angiography) is invasive. The potential use of ECG signals based on variational mode decomposition (VMD) as an alternative remains underexplored. An ECG dataset containing 87 subjects (44 DCM, 43 ICM) is pre-processed for denoising and heartbeat division. Firstly, the ECG signal is processed by empirical mode decomposition (EMD) and VMD. And then, five modes are determined by correlation analysis. Secondly, bispectral analysis is conducted on these modes, extracting corresponding bispectral and nonlinear features. Finally, the features are processed using five machine learning classification models, and a comparative assessment of their classification efficacy is facilitated. The results show that the technique proposed provides a better categorization for DCM and ICM using ECG signals compared to previous approaches, with a highest classification accuracy of 98.30%. Moreover, VMD consistently outperforms EMD under diverse conditions such as different modes, leads, and classifiers. The superiority of VMD on ECG analysis is verified.
Collapse
Affiliation(s)
- Yuduan Han
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China; (Y.H.); (C.D.); (S.Y.); (Y.G.); (J.Y.)
- Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Chonglong Ding
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China; (Y.H.); (C.D.); (S.Y.); (Y.G.); (J.Y.)
| | - Shuo Yang
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China; (Y.H.); (C.D.); (S.Y.); (Y.G.); (J.Y.)
| | - Yingfeng Ge
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China; (Y.H.); (C.D.); (S.Y.); (Y.G.); (J.Y.)
| | - Jianan Yin
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China; (Y.H.); (C.D.); (S.Y.); (Y.G.); (J.Y.)
| | - Yunyue Zhao
- Department of Cardiology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China
| | - Jinxin Zhang
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China; (Y.H.); (C.D.); (S.Y.); (Y.G.); (J.Y.)
| |
Collapse
|
39
|
Than NG, Romero R, Fitzgerald W, Gudicha DW, Gomez-Lopez N, Posta M, Zhou F, Bhatti G, Meyyazhagan A, Awonuga AO, Chaiworapongsa T, Matthies D, Bryant DR, Erez O, Margolis L, Tarca AL. Proteomic Profiles of Maternal Plasma Extracellular Vesicles for Prediction of Preeclampsia. Am J Reprod Immunol 2024; 92:e13928. [PMID: 39347565 DOI: 10.1111/aji.13928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 08/30/2024] [Accepted: 09/01/2024] [Indexed: 10/01/2024] Open
Abstract
PROBLEM Preeclampsia is a heterogeneous syndrome of diverse etiologies and molecular pathways leading to distinct clinical subtypes. Herein, we aimed to characterize the extracellular vesicle (EV)-associated and soluble fractions of the maternal plasma proteome in patients with preeclampsia and to assess their value for disease prediction. METHOD OF STUDY This case-control study included 24 women with term preeclampsia, 23 women with preterm preeclampsia, and 94 healthy pregnant controls. Blood samples were collected from cases on average 7 weeks before the diagnosis of preeclampsia and were matched to control samples. Soluble and EV fractions were separated from maternal plasma; EVs were confirmed by cryo-EM, NanoSight, and flow cytometry; and 82 proteins were analyzed with bead-based, multiplexed immunoassays. Quantile regression analysis and random forest models were implemented to evaluate protein concentration differences and their predictive accuracy. Preeclampsia subgroups defined by molecular profiles were identified by hierarchical cluster analysis. Significance was set at p < 0.05 or false discovery rate-adjusted q < 0.1. RESULTS In preterm preeclampsia, PlGF, PTX3, and VEGFR-1 displayed differential abundance in both soluble and EV fractions, whereas angiogenin, CD40L, endoglin, galectin-1, IL-27, CCL19, and TIMP1 were changed only in the soluble fraction (q < 0.1). The direction of changes in the EV fraction was consistent with that in the soluble fraction for nine proteins. In term preeclampsia, CCL3 had increased abundance in both fractions (q < 0.1). The combined EV and soluble fraction proteomic profiles predicted preterm and term preeclampsia with an AUC of 78% (95% CI, 66%-90%) and 68% (95% CI, 56%-80%), respectively. Three clusters of preeclampsia featuring distinct clinical characteristics and placental pathology were identified based on combined protein data. CONCLUSIONS Our findings reveal distinct alterations of the maternal EV-associated and soluble plasma proteome in preterm and term preeclampsia and identify molecular subgroups of patients with distinct clinical and placental histopathologic features.
Collapse
Affiliation(s)
- Nándor Gábor Than
- Systems Biology of Reproduction Research Group, Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
- Department of Obstetrics and Gynecology, Semmelweis University, Budapest, Hungary
- Maternity Private Clinic of Obstetrics and Gynecology, Budapest, Hungary
| | - Roberto Romero
- Pregnancy Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
- Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, Michigan, USA
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, USA
| | - Wendy Fitzgerald
- Section on Intercellular Interactions, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| | - Dereje W Gudicha
- Pregnancy Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| | - Nardhy Gomez-Lopez
- Department of Obstetrics and Gynecology & Department of Pathology and Immunology, Washington University, St. Louis, Missouri, USA
| | - Máté Posta
- Systems Biology of Reproduction Research Group, Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences, Budapest, Hungary
- Semmelweis University Doctoral School, Budapest, Hungary
| | - Fei Zhou
- Unit on Structural Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| | - Gaurav Bhatti
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, USA
| | - Arun Meyyazhagan
- Pregnancy Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| | - Awoniyi O Awonuga
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, USA
| | - Tinnakorn Chaiworapongsa
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, USA
| | - Doreen Matthies
- Unit on Structural Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| | - David R Bryant
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, USA
| | - Offer Erez
- Department of Obstetrics and Gynecology, Ben Gurion University of the Negev, Beer-Sheva, Israel
| | - Leonid Margolis
- Faculty of Natural Sciences and Medicine, Ilia State University, Tbilisi, Georgia
| | - Adi L Tarca
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, Michigan, USA
- Department of Computer Science, Wayne State University College of Engineering, Detroit, Michigan, USA
| |
Collapse
|
40
|
Porreca A, Ibrahimi E, Maturo F, Marcos Zambrano LJ, Meto M, Lopes MB. Robust prediction of colorectal cancer via gut microbiome 16S rRNA sequencing data. J Med Microbiol 2024; 73. [PMID: 39377779 DOI: 10.1099/jmm.0.001903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/09/2024] Open
Abstract
Introduction. The study addresses the challenge of utilizing human gut microbiome data for the early detection of colorectal cancer (CRC). The research emphasizes the potential of using machine learning techniques to analyze complex microbiome datasets, providing a non-invasive approach to identifying CRC-related microbial markers.Hypothesis/Gap Statement. The primary hypothesis is that a robust machine learning-based analysis of 16S rRNA microbiome data can identify specific microbial features that serve as effective biomarkers for CRC detection, overcoming the limitations of classical statistical models in high-dimensional settings.Aim. The primary objective of this study is to explore and validate the potential of the human microbiome, specifically in the colon, as a valuable source of biomarkers for colorectal cancer (CRC) detection and progression. The focus is on developing a classifier that effectively predicts the presence of CRC and normal samples based on the analysis of three previously published faecal 16S rRNA sequencing datasets.Methodology. To achieve the aim, various machine learning techniques are employed, including random forest (RF), recursive feature elimination (RFE) and a robust correlation-based technique known as the fuzzy forest (FF). The study utilizes these methods to analyse the three datasets, comparing their performance in predicting CRC and normal samples. The emphasis is on identifying the most relevant microbial features (taxa) associated with CRC development via partial dependence plots, i.e. a machine learning tool focused on explainability, visualizing how a feature influences the predicted outcome.Results. The analysis of the three faecal 16S rRNA sequencing datasets reveals the consistent and superior predictive performance of the FF compared to the RF and RFE. Notably, FF proves effective in addressing the correlation problem when assessing the importance of microbial taxa in explaining the development of CRC. The results highlight the potential of the human microbiome as a non-invasive means to detect CRC and underscore the significance of employing FF for improved predictive accuracy.Conclusion. In conclusion, this study underscores the limitations of classical statistical techniques in handling high-dimensional information such as human microbiome data. The research demonstrates the potential of the human microbiome, specifically in the colon, as a valuable source of biomarkers for CRC detection. Applying machine learning techniques, particularly the FF, is a promising approach for building a classifier to predict CRC and normal samples. The findings advocate for integrating FF to overcome the challenges associated with correlation when identifying crucial microbial features linked to CRC development.
Collapse
Affiliation(s)
- Annamaria Porreca
- Department of Economics, Statistics and Business, Faculty of Economics and Law, Universitas Mercatorum, Rome, Italy
| | - Eliana Ibrahimi
- Department of Biology, University of Tirana, Tirana, Albania
| | - Fabrizio Maturo
- Department of Economics, Statistics and Business, Faculty of Technological and Innovation Sciences, Universitas Mercatorum, Rome, Italy
| | - Laura Judith Marcos Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | - Melisa Meto
- Department of Biology, University of Tirana, Tirana, Albania
| | - Marta B Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- UNIDEMI, Research and Development Unit for Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| |
Collapse
|
41
|
Meher PK, Pradhan UK, Sethi PL, Naha S, Gupta A, Parsad R. PredPSP: a novel computational tool to discover pathway-specific photosynthetic proteins in plants. PLANT MOLECULAR BIOLOGY 2024; 114:106. [PMID: 39316155 DOI: 10.1007/s11103-024-01500-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 09/04/2024] [Indexed: 09/25/2024]
Abstract
Photosynthetic proteins play a crucial role in agricultural productivity by harnessing light energy for plant growth. Understanding these proteins, especially within C3 and C4 pathways, holds promise for improving crops in challenging environments. Despite existing models, a comprehensive computational framework specifically targeting plant photosynthetic proteins is lacking. The underutilization of plant datasets in computational algorithms accentuates the gap this study aims to fill by introducing a novel sequence-based computational method for identifying these proteins. The scope of this study encompassed diverse plant species, ensuring comprehensive representation across C3 and C4 pathways. Utilizing six deep learning models and seven shallow learning algorithms, paired with six sequence-derived feature sets followed by feature selection strategy, this study developed a comprehensive model for prediction of plant-specific photosynthetic proteins. Following 5-fold cross-validation analysis, LightGBM with 65 and 90 LGBM-VIM selected features respectively emerged as the best models for C3 (auROC: 91.78%, auPRC: 92.55%) and C4 (auROC: 99.05%, auPRC: 99.18%) plants. Validation using an independent dataset confirmed the robustness of the proposed model for both C3 (auROC: 87.23%, auPRC: 88.40%) and C4 (auROC: 92.83%, auPRC: 92.29%) categories. Comparison with existing methods demonstrated the superiority of the proposed model in predicting plant-specific photosynthetic proteins. This study further established a free online prediction server PredPSP ( https://iasri-sg.icar.gov.in/predpsp/ ) to facilitate ongoing efforts for identifying photosynthetic proteins in C3 and C4 plants. Being first of its kind, this study offers valuable insights into predicting plant-specific photosynthetic proteins which holds significant implications for plant biology.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India.
| | - Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | - Padma Lochan Sethi
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, 751003, Odisha, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| |
Collapse
|
42
|
Malamon JS. DNA N-gram analysis framework (DNAnamer): A generalized N-gram frequency analysis framework for the supervised classification of DNA sequences. Heliyon 2024; 10:e36914. [PMID: 39281454 PMCID: PMC11399624 DOI: 10.1016/j.heliyon.2024.e36914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 08/22/2024] [Accepted: 08/23/2024] [Indexed: 09/18/2024] Open
Abstract
In 1948, Claude Shannon published a mathematical system describing the probabilistic relationships between the letters of a natural language and their subsequent order or syntax structure. By counting unique, reoccurring sequences of letters called N-grams, this language model was used to generate recognizable English sentences from N-gram frequency probability tables. More recently, N-gram analysis methodologies have been successfully applied to address many complex problems in a variety of domains, from language processing to genomics. One such example is the common use of N-gram frequency patterns and supervised classification models to determine authorship and plagiarism. In this paradigm, DNA is a language model where nucleotides are analogous to the letters of a word and nucleotide N-grams are analogous to the words of a sentence. Because DNA contains highly conserved and identifiable nucleotide sequence frequency patterns, this approach can be applied to a variety of classification and data reduction problems, such as identifying species based on unknown DNA segments. Other useful applications of this methodology include the identification of functional gene elements, microorganisms, sequence contamination, and sequencing artifacts. To this end, I present DNAnamer, a generalized and extensible methodological framework and analysis toolkit for the supervised classification of DNA sequences based on their N-gram frequency patterns.
Collapse
Affiliation(s)
- John S Malamon
- University of Colorado Anschutz Medical Campus, Department of Surgery, Division of Transplant Surgery, 1635 Aurora Court, Aurora, CO, 80045, USA
- Colorado Center for Transplantation Care, Research and Education (CCTCARE), Division of Transplant Surgery, Aurora, CO, 80045, USA
| |
Collapse
|
43
|
Sun T, Yan N, Zhu W, Zhuang Q. Assessing a machine learning-based downscaling framework for obtaining 1km daily precipitation from GPM data. Heliyon 2024; 10:e36368. [PMID: 39286221 PMCID: PMC11403431 DOI: 10.1016/j.heliyon.2024.e36368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 08/14/2024] [Accepted: 08/14/2024] [Indexed: 09/19/2024] Open
Abstract
Hydro-meteorological monitoring through satellites in arid and semi-arid regions is constrained by the coarse spatial resolution of precipitation data, which impedes detailed analyses. The objective of this study is to evaluate various machine learning techniques for developing a downscaling framework that generates high spatio-temporal resolution precipitation products. Focusing on the Hai River Basin, we evaluated three machine learning approaches-Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Back Propagation (BP) neural networks. These methods integrate environmental variables including land surface temperature (LST), Normalized Difference Vegetation Index (NDVI), Digital Elevation Model (DEM), Precipitable Water Vapor (PWV), and albedo, to downscale the 0.1° spatial resolution Global Precipitation Measurement (GPM) product to a 1 km resolution. We further refined the results with residual correction and calibration using terrestrial rain gauge data. Subsequently, utilizing the 1 km annual precipitation, we employed the moving average window method to derive monthly and daily precipitation. The results demonstrated that the XGBoost method, calibrated with Geographical Difference Analysis (GDA) and Kriging spatial interpolation, proved to be the most accurate, achieving a Mean Absolute Error (MAE) of 58.40 mm for the annual product, representing a 14 % improvement over the original data. The monthly and daily products achieved MAE values of 11.61 mm and 1.79 mm, respectively, thus enhancing spatial resolution while maintaining accuracy comparable to the original product. In the Hai River Basin, key factors including longitude, latitude, DEM, LST_night, and PWV demonstrated greater importance and stability than other factors, thereby enhancing the model's precipitation prediction capabilities. This study provides a comprehensive assessment of the annual, monthly, and daily high-temporal and high-spatial resolution downscaling processes of precipitation, serving as an important reference for hydrology and related fields.
Collapse
Affiliation(s)
- Tao Sun
- College of Geomatics Science and Technology, Nanjing Tech University, Nanjing, 211816, China
| | - Nana Yan
- Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences. Beijing 100101, China
| | - Weiwei Zhu
- Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences. Beijing 100101, China
| | - Qifeng Zhuang
- College of Geomatics Science and Technology, Nanjing Tech University, Nanjing, 211816, China
| |
Collapse
|
44
|
Turchi M, Galmarini S, Lunati I. Learning Adsorption Patterns on Amorphous Surfaces. J Chem Theory Comput 2024; 20:7597-7610. [PMID: 39186282 PMCID: PMC11391580 DOI: 10.1021/acs.jctc.4c00702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
The physicochemical heterogeneity found on amorphous surfaces leads to a complex interaction of adsorbate molecules with topological and undercoordinated defects, which enhance the adsorption capacity and can participate in catalytic reactions. The identification and analysis of the adsorption structure observed on amorphous surfaces require novel tools that allow the segmentation of the surfaces into complex-shaped regions that contrast with the periodic patterns found on crystalline surfaces. We propose a Random Forest (RF) classifier that segments the surface into regions that can then be further analyzed and classified to reveal the dynamics of the interaction with the adsorbate. The RF segmentation is applied to the surface density map of the adsorbed molecules and employs multiple features (intensity, gradient, and the eigenvalues of the Hessian matrix) which are nonlocal and allow a better identification of the adsorption structures. The segmentation depends on a set of parameters that specify the training set and can be tailored to serve the specific purpose of the segmentation. Here, we consider an example in which we aim to separate highly heterogeneous regions from weakly heterogeneous regions. We demonstrate that the RF segmentation is able to separate the surface into a fully connected weakly heterogeneous region (whose behavior is somehow similar to crystalline surfaces and has an exponential distribution of the residence time) and a very heterogeneous region characterized by a complex residence-time distribution, which is generated by the undercoordinated defects and is responsible for the peculiar characteristics of the amorphous surface.
Collapse
Affiliation(s)
- Mattia Turchi
- Laboratory for Computational Engineering, Swiss Federal Laboratories for Materials Science and Technology, Empa, Überlandstrasse 129, 8600 Dübendorf, Switzerland
| | - Sandra Galmarini
- Laboratory for Building Energy Materials and Components, Swiss Federal Laboratories for Materials Science and Technology, Empa, Überlandstrasse 129, 8600 Dübendorf, Switzerland
| | - Ivan Lunati
- Laboratory for Computational Engineering, Swiss Federal Laboratories for Materials Science and Technology, Empa, Überlandstrasse 129, 8600 Dübendorf, Switzerland
| |
Collapse
|
45
|
Wei S, Richard R, Hogue D, Mondal I, Xu T, Boyer T, Hamilton K. High resolution data visualization and machine learning prediction of free chlorine residual in a green building water system. WATER RESEARCH X 2024; 24:100244. [PMID: 39188328 PMCID: PMC11345929 DOI: 10.1016/j.wroa.2024.100244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 07/23/2024] [Accepted: 07/25/2024] [Indexed: 08/28/2024]
Abstract
People spend most of their time indoors and are exposed to numerous contaminants in the built environment. Water management plans implemented in buildings are designed to manage the risks of preventable diseases caused by drinking water contaminants such as opportunistic pathogens (e.g., Legionella spp.), metals, and disinfection by-products (DBPs). However, specialized training required to implement water management plans and heterogeneity in building characteristics limit their widespread adoption. Implementation of machine learning and artificial intelligence (ML/AI) models in building water settings presents an opportunity for faster, more widespread use of data-driven water quality management approaches. We demonstrate the utility of Random Forest and Long Short-Term Memory (LSTM) ML models for predicting a key public health parameter, free chlorine residual, as a function of data collected from building water quality sensors (ORP, pH, conductivity, and temperature) as well as WiFi signals as a proxy for building occupancy and water usage in a "green" Leadership in Energy and Environmental Design (LEED) commercial and institutional building. The models successfully predicted free chlorine residual declines below 0.2 ppm, a common minimum reference level for public health protection in drinking water distribution systems. The predictions were valid up to 5 min in advance, and in some cases reasonably accurate up to 24 h in advance, presenting opportunities for proactive water quality management as part of a sense-analyze-decide framework. An online data dashboard for visualizing water quality in the building is presented, with the potential to link these approaches for real-time water quality management.
Collapse
Affiliation(s)
- S. Wei
- School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85281, United States
| | - R. Richard
- Wilson & Company Engineers, United States
| | - D. Hogue
- School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85281, United States
| | - I. Mondal
- School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85281, United States
- Biodesign Center for Environmental Health Engineering, Arizona State University, Tempe, AZ 85281, United States
| | - T. Xu
- School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85281, United States
| | - T.H. Boyer
- School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85281, United States
| | - K.A. Hamilton
- School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85281, United States
- Biodesign Center for Environmental Health Engineering, Arizona State University, Tempe, AZ 85281, United States
| |
Collapse
|
46
|
Fouquier J, Stanislawski M, O'Connor J, Scadden A, Lozupone C. EXPLANA: A user-friendly workflow for EXPLoratory ANAlysis and feature selection in cross-sectional and longitudinal microbiome studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.20.585968. [PMID: 39185201 PMCID: PMC11343137 DOI: 10.1101/2024.03.20.585968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Motivation Longitudinal microbiome studies (LMS) are increasingly common but have analytic challenges including non-independent data requiring mixed-effects models and large amounts of data that motivate exploratory analysis to identify factors related to outcome variables. Although change analysis (i.e. calculating deltas between values at different timepoints) can be powerful, how to best conduct these analyses is not always clear. For example, observational LMS measurements show natural fluctuations, so baseline might not be a reference of primary interest; whereas, for interventional LMS, baseline is a key reference point, often indicating the start of treatment. Results To address these challenges, we developed a feature selection workflow for cross-sectional and LMS that supports numerical and categorical data called EXPLANA (EXPLoratory ANAlysis). Machine-learning methods were combined with different types of change calculations and downstream interpretation methods to identify statistically meaningful variables and explain their relationship to outcomes. EXPLANA generates an interactive report that textually and graphically summarizes methods and results. EXPLANA had good performance on simulated data, with an average area under the curve (AUC) of 0.91 (range: 0.79-1.0, SD = 0.05), outperformed an existing tool (AUC: 0.95 vs. 0.56), and identified novel order-dependent categorical feature changes. EXPLANA is broadly applicable and simplifies analytics for identifying features related to outcomes of interest.
Collapse
Affiliation(s)
- Jennifer Fouquier
- Department of Biomedical Informatics, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO
| | - Maggie Stanislawski
- Department of Biomedical Informatics, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO
| | - John O'Connor
- Department of Biomedical Informatics, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO
| | - Ashley Scadden
- Department of Biomedical Informatics, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO
| | - Catherine Lozupone
- Department of Biomedical Informatics, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO
| |
Collapse
|
47
|
Feng S, Wang Z, Jin Y, Xu S. TabDEG: Classifying differentially expressed genes from RNA-seq data based on feature extraction and deep learning framework. PLoS One 2024; 19:e0305857. [PMID: 39037985 PMCID: PMC11262683 DOI: 10.1371/journal.pone.0305857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 06/05/2024] [Indexed: 07/24/2024] Open
Abstract
Traditional differential expression genes (DEGs) identification models have limitations in small sample size datasets because they require meeting distribution assumptions, otherwise resulting high false positive/negative rates due to sample variation. In contrast, tabular data model based on deep learning (DL) frameworks do not need to consider the data distribution types and sample variation. However, applying DL to RNA-Seq data is still a challenge due to the lack of proper labeling and the small sample size compared to the number of genes. Data augmentation (DA) extracts data features using different methods and procedures, which can significantly increase complementary pseudo-values from limited data without significant additional cost. Based on this, we combine DA and DL framework-based tabular data model, propose a model TabDEG, to predict DEGs and their up-regulation/down-regulation directions from gene expression data obtained from the Cancer Genome Atlas database. Compared to five counterpart methods, TabDEG has high sensitivity and low misclassification rates. Experiment shows that TabDEG is robust and effective in enhancing data features to facilitate classification of high-dimensional small sample size datasets and validates that TabDEG-predicted DEGs are mapped to important gene ontology terms and pathways associated with cancer.
Collapse
Affiliation(s)
- Sifan Feng
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, Guangdong, China
| | - Zhenyou Wang
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, Guangdong, China
| | - Yinghua Jin
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, Guangdong, China
| | - Shengbin Xu
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, Guangdong, China
| |
Collapse
|
48
|
Pradhan UK, Meher PK, Naha S, Sharma NK, Agarwal A, Gupta A, Parsad R. DBPMod: a supervised learning model for computational recognition of DNA-binding proteins in model organisms. Brief Funct Genomics 2024; 23:363-372. [PMID: 37651627 DOI: 10.1093/bfgp/elad039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 08/09/2023] [Accepted: 08/15/2023] [Indexed: 09/02/2023] Open
Abstract
DNA-binding proteins (DBPs) play critical roles in many biological processes, including gene expression, DNA replication, recombination and repair. Understanding the molecular mechanisms underlying these processes depends on the precise identification of DBPs. In recent times, several computational methods have been developed to identify DBPs. However, because of the generic nature of the models, these models are unable to identify species-specific DBPs with higher accuracy. Therefore, a species-specific computational model is needed to predict species-specific DBPs. In this paper, we introduce the computational DBPMod method, which makes use of a machine learning approach to identify species-specific DBPs. For prediction, both shallow learning algorithms and deep learning models were used, with shallow learning models achieving higher accuracy. Additionally, the evolutionary features outperformed sequence-derived features in terms of accuracy. Five model organisms, including Caenorhabditis elegans, Drosophila melanogaster, Escherichia coli, Homo sapiens and Mus musculus, were used to assess the performance of DBPMod. Five-fold cross-validation and independent test set analyses were used to evaluate the prediction accuracy in terms of area under receiver operating characteristic curve (auROC) and area under precision-recall curve (auPRC), which was found to be ~89-92% and ~89-95%, respectively. The comparative results demonstrate that the DBPMod outperforms 12 current state-of-the-art computational approaches in identifying the DBPs for all five model organisms. We further developed the web server of DBPMod to make it easier for researchers to detect DBPs and is publicly available at https://iasri-sg.icar.gov.in/dbpmod/. DBPMod is expected to be an invaluable tool for discovering DBPs, supplementing the current experimental and computational methods.
Collapse
Affiliation(s)
- Upendra K Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Prabina K Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Nitesh K Sharma
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, USA
| | - Aarushi Agarwal
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh 201313, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| |
Collapse
|
49
|
Fu C, Ji W, Cui Q, Chen A, Weng H, Lu N, Yang W. GSDME-mediated pyroptosis promotes anti-tumor immunity of neoadjuvant chemotherapy in breast cancer. Cancer Immunol Immunother 2024; 73:177. [PMID: 38954046 PMCID: PMC11219631 DOI: 10.1007/s00262-024-03752-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 06/02/2024] [Indexed: 07/04/2024]
Abstract
Paclitaxel and anthracycline-based chemotherapy is one of the standard treatment options for breast cancer. However, only about 6-30% of breast cancer patients achieved a pathological complete response (pCR), and the mechanism responsible for the difference is still unclear. In this study, random forest algorithm was used to screen feature genes, and artificial neural network (ANN) algorithm was used to construct an ANN model for predicting the efficacy of neoadjuvant chemotherapy for breast cancer. Furthermore, digital pathology, cytology, and molecular biology experiments were used to verify the relationship between the efficacy of neoadjuvant chemotherapy and immune ecology. It was found that paclitaxel and doxorubicin, an anthracycline, could induce typical pyroptosis and bubbling in breast cancer cells, accompanied by gasdermin E (GSDME) cleavage. Paclitaxel with LDH release and Annexin V/PI doubule positive cell populations, and accompanied by the increased release of damage-associated molecular patterns, HMGB1 and ATP. Cell coculture experiments also demonstrated enhanced phagocytosis of macrophages and increased the levels of IFN-γ and IL-2 secretion after paclitaxel treatment. Mechanistically, GSDME may mediate paclitaxel and doxorubicin-induced pyroptosis in breast cancer cells through the caspase-9/caspase-3 pathway, activate anti-tumor immunity, and promote the efficacy of paclitaxel and anthracycline-based neoadjuvant chemotherapy. This study has practical guiding significance for the precision treatment of breast cancer, and can also provide ideas for understanding molecular mechanisms related to the chemotherapy sensitivity.
Collapse
Affiliation(s)
- Changfang Fu
- Department of Pharmacy, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001, Anhui, China
- Anhui Provincial Key Laboratory of Precision Pharmaceutical Preparations and Clinical Pharmacy, Hefei, 230001, Anhui, China
| | - Wenbo Ji
- Clinical Pharmacy Department, Anhui Provincial Children's Hospital, Hefei, 230000, Anhui, China
| | - Qianwen Cui
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China
| | - Anling Chen
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China
| | - Haiyan Weng
- Department of Pathology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001, Anhui, China
| | - Nannan Lu
- Department of Oncology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001, Anhui, China.
| | - Wulin Yang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China.
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, 230031, China.
| |
Collapse
|
50
|
Pelletier M, Oczkowski A, Hagy J. Deciphering patterns in whole fish nitrogen isotopes on a continental scale. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 931:172684. [PMID: 38663629 PMCID: PMC11109980 DOI: 10.1016/j.scitotenv.2024.172684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 04/02/2024] [Accepted: 04/20/2024] [Indexed: 05/06/2024]
Abstract
Nitrogen isotopes (δ15N) have been used as an indicator of anthropogenic nitrogen loading at local and regional scales. We examined δ15N in fish from estuaries across the continental United States. In the summer of 2015, the U.S. Environmental Protection Agency's National Coastal Condition Assessment (NCCA) collected fish in 136 coastal waterbodies throughout the United States. Whole fish were analyzed by NCCA for metals, organic contaminants, and lipids. For this study, we also analyzed these fish for isotopes of nitrogen (N). NCCA collected water quality, nutrients, chlorophyll a, and sediment chemistry at each site. We used these data, along with fish life history and watershed land use, to examine how whole fish δ15N was related to these environmental variables using random forest regression models at national and ecoregional scales. At the national scale, fish δ15N were negatively related to total N:total phosphorous (P) ratios (TN:TP) in surface water and reflected differences between the P-limited, δ15N depleted sites in the Floridian ecoregion to sites in other regions. δ15N was lower on the Atlantic relative to the Pacific coast. When considered by region, TN:TP was an important predictor of fish δ15N in 4 of 9 ecoregions, with higher δ15N observed with increasing N limitation (lower TN:TP) Fish life history was also an important predictor of fish δ15N at both the national and ecoregional scale. Whole fish δ15N was positively associated with bioaccumulative contaminants such as PCBs and mercury. Although land use was related to δ15N in fish, it was location specific. This study showed that N stable isotopes reflected ecological conditions at both regional and continental scales.
Collapse
Affiliation(s)
- Marguerite Pelletier
- Atlantic Coastal Environmental Sciences Division, Center for Environmental Measurement and Modeling, US Environmental Protection Agency, United States of America.
| | - Autumn Oczkowski
- Atlantic Coastal Environmental Sciences Division, Center for Environmental Measurement and Modeling, US Environmental Protection Agency, United States of America
| | - James Hagy
- Atlantic Coastal Environmental Sciences Division, Center for Environmental Measurement and Modeling, US Environmental Protection Agency, United States of America
| |
Collapse
|