1
|
Lanjewar MG, Panchbhai KG, Patle LB. Sugar detection in adulterated honey using hyper-spectral imaging with stacking generalization method. Food Chem 2024; 450:139322. [PMID: 38613963 DOI: 10.1016/j.foodchem.2024.139322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 03/26/2024] [Accepted: 04/08/2024] [Indexed: 04/15/2024]
Abstract
This paper develops a new hybrid, automated, and non-invasive approach by combining hyper-spectral imaging, Savitzky-Golay (SG) Filter, Principal Components Analysis (PCA), Machine Learning (ML) classifiers/regressors, and stacking generalization methods to detect sugar in honey. First, the 32 different sugar concentration levels in honey were predicted using various ML regressors. Second, the six ranges of sugar were classified using various classifiers. Third, the 11 types of honey and 100% sugar were classified using classifiers. The stacking model (STM) obtained R2: 0.999, RMSE: 0.493 ml (v/v), RPD: 40.2, a 10-fold average R2: 0.996 and RMSE: 1.27 ml (v/v) for predicting 32 sugar concentrations. The STM achieved a Matthews Correlation Coefficient (MCC) of 99.7% and a Kappa score of 99.7%, a 10-fold average MCC of 98.9% and a Kappa score of 98.9% for classifying the six sugar ranges and 12 categories of honey types and a sugar.
Collapse
Affiliation(s)
- Madhusudan G Lanjewar
- School of Physical and Applied Sciences, Goa University, Taleigao Plateau, Goa 403206, India.
| | | | - Lalchand B Patle
- PG Department of Electronics, MGSM's DDSGP College Chopda, KBCNMU, Jalgaon 425107, Maharashtra, India
| |
Collapse
|
2
|
Huang H, Fang Z, Xu Y, Lu G, Feng C, Zeng M, Tian J, Ping Y, Han Z, Zhao Z. Stacking and ridge regression-based spectral ensemble preprocessing method and its application in near-infrared spectral analysis. Talanta 2024; 276:126242. [PMID: 38761656 DOI: 10.1016/j.talanta.2024.126242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/20/2024]
Abstract
Spectral preprocessing techniques can, to a certain extent, eliminate irrelevant information, such as current noise and stray light from spectral data, thereby enhancing the performance of prediction models. However, current preprocessing techniques mostly attempt to find the best single preprocessing method or their combination, overlooking the complementary information among different preprocessing methods. These preprocessing techniques fail to maximize the utilization of useful information in spectral data and restrict the performance of prediction models. This study proposed a spectral ensemble preprocessing method based on the rapidly developing ensemble learning methods in recent years and the ridge regression (RR) model, named stacking preprocessing ridge regression (SPRR), to address the aforementioned issues. Different from conventional ensemble learning methods, the proposed SPRR method applied multiple different preprocessing techniques to the original spectral data, generating multiple preprocessed datasets. These datasets were then individually inputted into RR base models for training. Ultimately, RR still served as the meta-model, integrating the output results of each RR base model through stacking. This approach not only produced diversity in base models but also achieved higher accuracy and lower computational complexity by using a single type of base model. On the apple spectral dataset collected by our team, correlation analysis showed significant complementary information among the data produced by different preprocessing techniques. This provided robust theoretical support for the proposed SPRR method. By introducing the currently popular averaging ensemble preprocessing method in a comparative experiment, the results of applying the proposed SPRR method to six datasets (apple, meat, wheat, olive oil, tablet, and corn) demonstrated that compared to the single preprocessing method and averaging ensemble preprocessing method, SPRR yielded the best accuracy and reliability for all six datasets. Furthermore, under the same conditions of the training and test datasets, the proposed SPRR method demonstrated better performance than the four commonly used ensemble preprocessing methods.
Collapse
Affiliation(s)
- Haowen Huang
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Zile Fang
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Yuelong Xu
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Guosheng Lu
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Can Feng
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Min Zeng
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Jiaju Tian
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Yongfu Ping
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Zhuolin Han
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Zhigang Zhao
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China.
| |
Collapse
|
3
|
He H, Yang H, Mercaldo F, Santone A, Huang P. Isolation forest-voting fusion-multioutput: A stroke risk classification method based on the multidimensional output of abnormal sample detection. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 253:108255. [PMID: 38833760 DOI: 10.1016/j.cmpb.2024.108255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 12/23/2023] [Accepted: 05/26/2024] [Indexed: 06/06/2024]
Abstract
BACKGROUND AND OBJECTIVE Stroke has become a major disease threatening the health of people around the world. It has the characteristics of high incidence, high fatality, and a high recurrence rate. At this stage, problems such as poor recognition accuracy of stroke screening based on electronic medical records and insufficient recognition of stroke risk levels exist. These problems occur because of the systematic errors of medical equipment and the characteristics of the collectors during the process of electronic medical record collection. Errors can also occur due to misreporting or underreporting by the collection personnel and the strong subjectivity of the evaluation indicators. METHODS This paper proposes an isolation forest-voting fusion-multioutput algorithm model. First, the screening data are collected for numerical processing and normalization. The composite feature score index of this paper is used to analyze the importance of risk factors, and then, the isolation forest is used. The algorithm detects abnormal samples, uses the voting fusion algorithm proposed in this article to perform decision fusion prediction classification, and outputs multidimensional (risk factor importance score, abnormal sample label, risk level classification, and stroke prediction) results that can be used as auxiliary decision information by doctors and medical staff. RESULTS The isolation forest-voting fusion-multioutput algorithm proposed in this article has five categories (zero risk, low risk, high risk, ischemic stroke (TIA), and hemorrhagic stroke (HE)). The average accuracy rate of stroke prediction reached 79.59 %. CONCLUSIONS The isolation forest-voting fusion-multioutput algorithm model proposed in this paper can not only accurately identify the various categories of stroke risk levels and stroke prediction but can also output multidimensional auxiliary decision-making information to help medical staff make decisions, thereby greatly improving the screening efficiency.
Collapse
Affiliation(s)
- Hai He
- School of Big Data and Information Industry, Chongqing City Management College, Chongqing 401331, China
| | - Haibo Yang
- Information Center, Chongqing Medical University, Chongqing 400016, China.
| | - Francesco Mercaldo
- Department of Medicine and Health Sciences "Vincenzo Tiberio", University of Molise, 86100 Campobasso, Italy.
| | - Antonella Santone
- Department of Medicine and Health Sciences "Vincenzo Tiberio", University of Molise, 86100 Campobasso, Italy
| | - Pan Huang
- School of Microelectronics and Communication Engineering, Chongqing University 400044, China
| |
Collapse
|
4
|
Akbar MN, Ruf SF, Singh A, Faghihpirayesh R, Garner R, Bennett A, Alba C, Rocca ML, Imbiriba T, Erdoğmuş D, Duncan D. Advancing post-traumatic seizure classification and biomarker identification: Information decomposition based multimodal fusion and explainable machine learning with missing neuroimaging data. Comput Med Imaging Graph 2024; 115:102386. [PMID: 38718562 DOI: 10.1016/j.compmedimag.2024.102386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 04/16/2024] [Accepted: 04/16/2024] [Indexed: 06/03/2024]
Abstract
A late post-traumatic seizure (LPTS), a consequence of traumatic brain injury (TBI), can potentially evolve into a lifelong condition known as post-traumatic epilepsy (PTE). Presently, the mechanism that triggers epileptogenesis in TBI patients remains elusive, inspiring the epilepsy community to devise ways to predict which TBI patients will develop PTE and to identify potential biomarkers. In response to this need, our study collected comprehensive, longitudinal multimodal data from 48 TBI patients across multiple participating institutions. A supervised binary classification task was created, contrasting data from LPTS patients with those without LPTS. To accommodate missing modalities in some subjects, we took a two-pronged approach. Firstly, we extended a graphical model-based Bayesian estimator to directly classify subjects with incomplete modality. Secondly, we explored conventional imputation techniques. The imputed multimodal information was then combined, following several fusion and dimensionality reduction techniques found in the literature, and subsequently fitted to a kernel- or a tree-based classifier. For this fusion, we proposed two new algorithms: recursive elimination of correlated components (RECC) that filters information based on the correlation between the already selected features, and information decomposition and selective fusion (IDSF), which effectively recombines information from decomposed multimodal features. Our cross-validation findings showed that the proposed IDSF algorithm delivers superior performance based on the area under the curve (AUC) score. Ultimately, after rigorous statistical comparisons and interpretable machine learning examination using Shapley values of the most frequently selected features, we recommend the two following magnetic resonance imaging (MRI) abnormalities as potential biomarkers: the left anterior limb of internal capsule in diffusion MRI (dMRI), and the right middle temporal gyrus in functional MRI (fMRI).
Collapse
Affiliation(s)
- Md Navid Akbar
- Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America.
| | - Sebastian F Ruf
- Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
| | - Ashutosh Singh
- Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
| | - Razieh Faghihpirayesh
- Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
| | - Rachael Garner
- Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, 2025 Zonal Ave. 210, Los Angeles, CA 90033, United States of America
| | - Alexis Bennett
- Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, 2025 Zonal Ave. 210, Los Angeles, CA 90033, United States of America
| | - Celina Alba
- Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, 2025 Zonal Ave. 210, Los Angeles, CA 90033, United States of America
| | - Marianna La Rocca
- Dipartimento Interateneo di Fisica "M. Merlin", Università degli studi di Bari "A. Moro", Bari, Italy
| | - Tales Imbiriba
- Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
| | - Deniz Erdoğmuş
- Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
| | - Dominique Duncan
- Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, 2025 Zonal Ave. 210, Los Angeles, CA 90033, United States of America
| |
Collapse
|
5
|
Miao J, Chen T, Misir M, Lin Y. Deep learning for predicting 16S rRNA gene copy number. Sci Rep 2024; 14:14282. [PMID: 38902329 PMCID: PMC11190246 DOI: 10.1038/s41598-024-64658-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/11/2024] [Indexed: 06/22/2024] Open
Abstract
Culture-independent 16S rRNA gene metabarcoding is a commonly used method for microbiome profiling. To achieve more quantitative cell fraction estimates, it is important to account for the 16S rRNA gene copy number (hereafter 16S GCN) of different community members. Currently, there are several bioinformatic tools available to estimate the 16S GCN values, either based on taxonomy assignment or phylogeny. Here we present a novel approach ANNA16, Artificial Neural Network Approximator for 16S rRNA gene copy number, a deep learning-based method that estimates the 16S GCN values directly from the 16S gene sequence strings. Based on 27,579 16S rRNA gene sequences and gene copy number data from the rrnDB database, we show that ANNA16 outperforms the commonly used 16S GCN prediction algorithms. Interestingly, Shapley Additive exPlanations (SHAP) shows that ANNA16 can identify unexpected informative positions in 16S rRNA gene sequences without any prior phylogenetic knowledge, which suggests potential applications beyond 16S GCN prediction.
Collapse
Affiliation(s)
- Jiazheng Miao
- Division of Applied and Natural Sciences, Duke Kunshan University, Suzhou, China
- Department of Biomedical Informatics, Harvard Medical School, Boston, USA
| | - Tianlai Chen
- Division of Applied and Natural Sciences, Duke Kunshan University, Suzhou, China
- Department of Biomedical Engineering, Duke University, Durham, USA
| | - Mustafa Misir
- Division of Applied and Natural Sciences, Duke Kunshan University, Suzhou, China.
| | - Yajuan Lin
- Division of Applied and Natural Sciences, Duke Kunshan University, Suzhou, China.
- Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, USA.
| |
Collapse
|
6
|
Ameksa M, Elamrani Abou Elassad Z, Lamjadli S, Mousannif H. Predicting stroke events with a proactive fusion system: a comprehensive study on imbalance class handling in computational biomechanics. Comput Methods Biomech Biomed Engin 2024:1-18. [PMID: 38902976 DOI: 10.1080/10255842.2024.2363946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 05/28/2024] [Indexed: 06/22/2024]
Abstract
Stroke, as a critical global health concern and the second leading cause of death, occurs when blood flow to the brain is interrupted. Although machine learning has advanced in medical safety, there is limited research on stroke prediction using information fusion systems. This study presents a fusion framework that combines multiple base classifiers and a Meta classifier to improve stroke prediction performance. The research utilizes Grid Search optimized models, such as Random Forest, Support Vector Machine, K Nearest Neighbors, AdaBoost, Gradient Boosting, Light Gradient Boosting, Categorical Boosting, and eXtreme Gradient Boosting for in-depth stroke analysis. Since stroke events are rare and unpredictable, classification outcomes can be deceptive due to imbalanced data. This article examines stroke probability by comparing three data balancing methods: over-sampling, under-sampling, and tomek-link synthetic minority over-sampling (SMOTE-TL) to enhance prediction accuracy. The findings reveal that AdaBoost as a meta-classifier attains the highest performance in the fusion framework, with a peak of 88.09% Recall and 83.66% F1 score. This innovative approach provides crucial insights into stroke prediction and can be a valuable resource for strengthening intervention efforts in advanced healthcare systems.
Collapse
Affiliation(s)
- Mohammed Ameksa
- LISI Laboratory, Computer Science Department, FSSM, Cadi Ayyad University, Marrakesh, Morocco
| | | | - Saad Lamjadli
- Immunology Laboratory, Arrazi Hospital, CHU Mohamed VI, Marrakech, Morocco
| | - Hajar Mousannif
- LISI Laboratory, Computer Science Department, FSSM, Cadi Ayyad University, Marrakesh, Morocco
| |
Collapse
|
7
|
Srisongkram T. DeepRA: A novel deep learning-read-across framework and its application in non-sugar sweeteners mutagenicity prediction. Comput Biol Med 2024; 178:108731. [PMID: 38870727 DOI: 10.1016/j.compbiomed.2024.108731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/07/2024] [Accepted: 06/08/2024] [Indexed: 06/15/2024]
Abstract
Non-sugar sweeteners (NSSs) or artificial sweeteners have long been used as food chemicals since World War II. NSSs, however, also raise a concern about their mutagenicity. Evaluating the mutagenic ability of NSSs is crucial for food safety; this step is needed for every new chemical registration in the food and pharmaceutical industries. A computational assessment provides less time, money, and involved animals than the in vivo experiments; thus, this study developed a novel computational method from an ensemble convolutional deep neural network and read-across algorithms, called DeepRA, to classify the mutagenicity of chemicals. The mutagenicity data were obtained from the curated Ames test data set. The DeepRA model was developed using both molecular descriptors and molecular fingerprints. The obtained DeepRA model provides accurate and reliable mutagenicity classification through an independent test set. This model was then used to examine the NSSs-related chemicals, enabling the evaluation of mutagenicity from the NSSs-like substances. Finally, this model was publicly available at https://github.com/taraponglab/deepra for further use in chemical regulation and risk assessment.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand.
| |
Collapse
|
8
|
Fereidooni D, Karimi Z, Ghasemi F. Non-destructive test-based assessment of uniaxial compressive strength and elasticity modulus of intact carbonate rocks using stacking ensemble models. PLoS One 2024; 19:e0302944. [PMID: 38857272 PMCID: PMC11164374 DOI: 10.1371/journal.pone.0302944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/14/2024] [Indexed: 06/12/2024] Open
Abstract
The uniaxial compressive strength (UCS) and elasticity modulus (E) of intact rock are two fundamental requirements in engineering applications. These parameters can be measured either directly from the uniaxial compressive strength test or indirectly by using soft computing predictive models. In the present research, the UCS and E of intact carbonate rocks have been predicted by introducing two stacking ensemble learning models from non-destructive simple laboratory test results. For this purpose, dry unit weight, porosity, P-wave velocity, Brinell surface harnesses, UCS, and static E were measured for 70 carbonate rock samples. Then, two stacking ensemble learning models were developed for estimating the UCS and E of the rocks. The applied stacking ensemble learning method integrates the advantages of two base models in the first level, where base models are multi-layer perceptron (MLP) and random forest (RF) for predicting UCS, and support vector regressor (SVR) and extreme gradient boosting (XGBoost) for predicting E. Grid search integrating k-fold cross validation is applied to tune the parameters of both base models and meta-learner. The results demonstrate the generalization ability of the stacking ensemble method in the comparison of base models in the terms of common performance measures. The values of coefficient of determination (R2) obtained from the stacking ensemble are 0.909 and 0.831 for predicting UCS and E, respectively. Similarly, the stacking ensemble yielded Root Mean Squared Error (RMSE) values of 1.967 and 0.621 for the prediction of UCS and E, respectively. Accordingly, the proposed models have superiority in the comparison of SVR and MLP as single models and RF and XGBoost as two representative ensemble models. Furthermore, sensitivity analysis is carried out to investigate the impact of input parameters.
Collapse
Affiliation(s)
| | - Zohre Karimi
- School of Engineering, Damghan University, Damghan, Semnan, Iran
| | - Fatemeh Ghasemi
- School of Earth Sciences, Damghan University, Damghan, Semnan, Iran
| |
Collapse
|
9
|
Razlivina J, Dmitrenko A, Vinogradov V. AI-Powered Knowledge Base Enables Transparent Prediction of Nanozyme Multiple Catalytic Activity. J Phys Chem Lett 2024; 15:5804-5813. [PMID: 38781458 DOI: 10.1021/acs.jpclett.4c00959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Nanozymes are unique materials with many valuable properties for applications in biomedicine, biosensing, environmental monitoring, and beyond. In this work, we developed a machine learning (ML) approach to search for new nanozymes and deployed a web platform, DiZyme, featuring a state-of-the-art database of nanozymes containing 1210 experimental samples, catalytic activity prediction, and DiZyme Assistant interface powered by a large language model (LLM). For the first time, we enable the prediction of multiple catalytic activities of nanozymes by training an ensemble learning algorithm achieving R2 = 0.75 for the Michaelis-Menten constant and R2 = 0.77 for the maximum velocity on unseen test data. We envision an accurate prediction of multiple catalytic activities (peroxidase, oxidase, and catalase) promoting novel applications for a wide range of surface-modified inorganic nanozymes. The DiZyme Assistant based on the ChatGPT model provides users with supporting information on experimental samples, such as synthesis procedures, measurement protocols, etc. DiZyme (dizyme.aicidlab.itmo.ru) is now openly available worldwide.
Collapse
Affiliation(s)
- Julia Razlivina
- Center for AI in Chemistry, SCAMT institute, ITMO University, Saint-Petersburg 191002, Russian Federation
| | - Andrei Dmitrenko
- Center for AI in Chemistry, SCAMT institute, ITMO University, Saint-Petersburg 191002, Russian Federation
| | - Vladimir Vinogradov
- Center for AI in Chemistry, SCAMT institute, ITMO University, Saint-Petersburg 191002, Russian Federation
| |
Collapse
|
10
|
James Jensen A, Silva CS, Costello KE, Banks S. A novel post-processing technique for correcting symmetric implant ambiguity in measuring total knee arthroplasty kinematics from single-plane fluoroscopy. J Biomech 2024; 170:112172. [PMID: 38833908 DOI: 10.1016/j.jbiomech.2024.112172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 05/17/2024] [Accepted: 05/23/2024] [Indexed: 06/06/2024]
Abstract
Recent advancements in computer vision and machine learning enable autonomous measurement of total knee arthroplasty kinematics through single-plane fluoroscopy. However, symmetric components present challenges in optimization routines, causing "symmetry traps" and ambiguous poses. Achieving clinically robust kinematics measurement requires addressing this issue. We devised an algorithm that converts a "true" pose to its corresponding "symmetry trap" orientation. From a dataset of nearly 13,000 human supervised kinematics, this algorithm constructs an augmented dataset of "true" and "symmetry trap" kinematics, used to train eight classification machine learning algorithms. The outputs from the highest-performing algorithm classify kinematics sequences as 'obviously true' or 'potentially ambiguous.' We construct a spline through 'obviously true' poses, and 'ambiguous' poses are compared to the spline to determine correct orientation. The machine learning algorithms achieved 88-94% accuracy on our internal test set and 91-93% on our external test set. Applying our spline algorithm to kinematics sequences yielded 91.1% accuracy, 94% specificity, but 67% sensitivity. The accuracy of standard ML algorithms for implants within 5 degrees of a pure-lateral view was 71%, rising to 88% beyond 5 degrees. This pioneering study systematizes addressing model-image registration issues with symmetric tibial implants. High accuracy suggests potential use of ML algorithms to mitigate shape-ambiguity errors in pose measurements from single-plane fluoroscopy. Our results also suggest an imaging protocol for measuring kinematics that favors more oblique viewing angles, which could further disambiguate "true" and "symmetry trap" poses.
Collapse
Affiliation(s)
- Andrew James Jensen
- Department of Mechanical & Aerospace Engineering, PO Box 116250, Gainesville, FL 32611, USA.
| | - Catia S Silva
- Department of Electrical & Computer Engineering, 968 Center Drive, Gainesville, FL 32611, USA.
| | - Kerry E Costello
- Department of Mechanical & Aerospace Engineering, PO Box 116250, Gainesville, FL 32611, USA.
| | - Scott Banks
- Department of Mechanical & Aerospace Engineering, PO Box 116250, Gainesville, FL 32611, USA.
| |
Collapse
|
11
|
Zhang J, Jin A, Han X, Chen Z, Diao C, Zhang Y, Liu X, Xu F, Liu J, Qiu X, Tan X, Luo L, Liu Y. The LISA-PPV Formula: An Ensemble Artificial Intelligence-Based Thick Intraocular Lens Calculation Formula for Vitrectomized Eyes. Am J Ophthalmol 2024; 262:237-245. [PMID: 38452920 DOI: 10.1016/j.ajo.2024.02.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/20/2024] [Accepted: 02/27/2024] [Indexed: 03/09/2024]
Abstract
PURPOSE To investigate the relationship between effective lens position (ELP) and patient characteristics, and to further develop a new intraocular lens (IOL) calculation formula for cataract patients with previous pars plana vitrectomy (PPV). DESIGN Cross-sectional study. METHODS A total of 2793 age-related cataract patients (group 1) and 915 post-PPV cataract patients (group 2) who underwent phacoemulsification with IOL implantation were included. The ELP of 2 groups was compared and the association between ELP and patient characteristics was further evaluated using standardized multivariate regression coefficients. An ensemble artificial intelligence-based ELP prediction model was developed using a training set of 810 vitrectomized eyes, and a thick-lens IOL formula (LISA-PPV) was constructed and compared with 7 existing formulas on an external multi-center testing set of 105 eyes. RESULTS Compared to eyes with age-related cataract, vitrectomized eyes showed a similar ELP distribution (P = .19), but different standardized coefficients of preoperative biometry for ELP. The standardized coefficients also varied with the type of vitreous tamponade, history of scleral buckling, and ciliary sulcus IOL implantation. The LISA-PPV formula showed the lowest mean and median absolute prediction error (MAE: 0.63 D; MedAE: 0.44 D), and the highest percentages of eyes within ±0.5 D of prediction error (57.14%) in the testing dataset. CONCLUSIONS The ELP prediction required optimization specifically for vitrectomized eyes based on their biometric and surgical characteristics. The LISA-PPV formula is a useful and accurate tool for determining IOL power in cataract patients with previous PPV (available at http://ppv-iolcalculator.com/).
Collapse
Affiliation(s)
- Jiaqing Zhang
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China
| | - Aixia Jin
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China
| | - Xiaotong Han
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China
| | - Zhixin Chen
- Shenzhen Key Laboratory of Ophthalmology, Shenzhen Eye Hospital, Affiliated Hospital of Jinan University (Z.C.), Shenzhen, China
| | - Chunli Diao
- Department Of Ophthalmology, The People's Hospital of Guangxi Zhuang Autonomous Region and Institute of Ophthalmic Diseases, Guangxi Academy Of Medical Sciences (C.D.), Nanning, China; Department of Ophthalmology, The First Affiliated Hospital of Guangxi University of Chinese Medicine (C.D.), Nanning, China
| | - Yu Zhang
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Cataract Department, Shanxi Eye Hospital (Y.Z., J.L.), Taiyuan, China
| | - Xinhua Liu
- Shenzhen Key Laboratory of Ophthalmology, Shenzhen Eye Hospital, Affiliated Hospital of Jinan University (Z.C.), Shenzhen, China
| | - Fan Xu
- Department Of Ophthalmology, The People's Hospital of Guangxi Zhuang Autonomous Region and Institute of Ophthalmic Diseases, Guangxi Academy Of Medical Sciences (C.D.), Nanning, China
| | - Jiewei Liu
- Cataract Department, Shanxi Eye Hospital (Y.Z., J.L.), Taiyuan, China
| | - Xiaozhang Qiu
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China
| | - Xuhua Tan
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China.
| | - Lixia Luo
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China.
| | - Yizhi Liu
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China
| |
Collapse
|
12
|
Matougui Z, Djerbal L, Bahar R. A comparative study of heterogeneous and homogeneous ensemble approaches for landslide susceptibility assessment in the Djebahia region, Algeria. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:40554-40580. [PMID: 36892699 DOI: 10.1007/s11356-023-26247-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
This study aims to compare the performance of ensembles according to their inherent diversity in the context of landslide susceptibility assessment. Heterogeneous and homogeneous ensemble types can be distinguished; four ensembles of each approach were implemented in the Djebahia region. The heterogeneous ensembles include stacking (ST), voting (VO), weighting (WE), and a new approach in landslide assessment called meta-dynamic ensemble selection (DES), while the homogeneous ensembles include AdaBoost (ADA), bagging (BG), random forest (RF), and random subspace (RSS). To ensure a consistent comparison, each ensemble was implemented using individual base learners. The heterogeneous ensembles were generated by combining eight different machine learning algorithms, while the homogeneous ensembles only used a single base learner, with diversity achieved through resampling the training dataset. The spatial dataset used in this study consisted of 115 landslide events and 12 conditioning factors, which were randomly divided into training and testing datasets. The models were evaluated through various aspects, including receiver operating characteristic (ROC) curves, root mean squared error (RMSE), landslide density distribution (LDD), threshold-dependent metrics (Kappa index, accuracy, and recall scores), and a global visual representation using the Taylor diagram. Additionally, a sensitivity analysis (SA) was conducted for the best performing models to assess the importance of the factors and the resilience of the ensembles. The results revealed that homogeneous ensembles outperformed heterogeneous ensembles in terms of AUC and threshold-dependent metrics, with AUC ranging from 0.962 to 0.971 for the test dataset. ADA was the best performing model for these metrics and the least in terms of RMSE (0.366). However, the heterogeneous ensemble ST provided a finer RMSE (0.272), and DES showed the best LDD, indicating a stronger potential to generalize the phenomenon. The Taylor diagram was consistent with the other results, indicating that ST was the best performing model, followed by RSS. The SA demonstrated that RSS was the most robust (mean AUC variation of - 0.022) and ADA was the least robust (mean AUC variation of - 0.038).
Collapse
Affiliation(s)
- Zakaria Matougui
- Department of Civil Engineering, Laboratory of LEEGO, University of Sciences and Technology Houari Boumediene (USTHB), 16111, Bab Ezzouar, Algiers, Algeria.
| | - Lynda Djerbal
- Department of Civil Engineering, Laboratory of LEEGO, University of Sciences and Technology Houari Boumediene (USTHB), 16111, Bab Ezzouar, Algiers, Algeria
| | - Ramdane Bahar
- Department of Civil Engineering, Laboratory of LEEGO, University of Sciences and Technology Houari Boumediene (USTHB), 16111, Bab Ezzouar, Algiers, Algeria
| |
Collapse
|
13
|
Zarbakhsh S, Shahsavar AR, Soltani M. Optimizing PGRs for in vitro shoot proliferation of pomegranate with bayesian-tuned ensemble stacking regression and NSGA-II: a comparative evaluation of machine learning models. PLANT METHODS 2024; 20:82. [PMID: 38822411 PMCID: PMC11143642 DOI: 10.1186/s13007-024-01211-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/17/2024] [Indexed: 06/03/2024]
Abstract
BACKGROUND The process of optimizing in vitro shoot proliferation is a complicated task, as it is influenced by interactions of many factors as well as genotype. This study investigated the role of various concentrations of plant growth regulators (zeatin and gibberellic acid) in the successful in vitro shoot proliferation of three Punica granatum cultivars ('Faroogh', 'Atabaki' and 'Shirineshahvar'). Also, the utility of five Machine Learning (ML) algorithms-Support Vector Regression (SVR), Random Forest (RF), Extreme Gradient Boosting (XGB), Ensemble Stacking Regression (ESR) and Elastic Net Multivariate Linear Regression (ENMLR)-as modeling tools were evaluated on in vitro multiplication of pomegranate. A new automatic hyperparameter optimization method named Adaptive Tree Pazen Estimator (ATPE) was developed to tune the hyperparameters. The performance of the models was evaluated and compared using statistical indicators (MAE, RMSE, RRMSE, MAPE, R and R2), while a specific Global Performance Indicator (GPI) was introduced to rank the models based on a single parameter. Moreover, Non‑dominated Sorting Genetic Algorithm‑II (NSGA‑II) was employed to optimize the selected prediction model. RESULTS The results demonstrated that the ESR algorithm exhibited higher predictive accuracy in comparison to other ML algorithms. The ESR model was subsequently introduced for optimization by NSGA‑II. ESR-NSGA‑II revealed that the highest proliferation rate (3.47, 3.84, and 3.22), shoot length (2.74, 3.32, and 1.86 cm), leave number (18.18, 19.76, and 18.77), and explant survival (84.21%, 85.49%, and 56.39%) could be achieved with a medium containing 0.750, 0.654, and 0.705 mg/L zeatin, and 0.50, 0.329, and 0.347 mg/L gibberellic acid in the 'Atabaki', 'Faroogh', and 'Shirineshahvar' cultivars, respectively. CONCLUSIONS This study demonstrates that the 'Shirineshahvar' cultivar exhibited lower shoot proliferation success compared to the other cultivars. The results indicated the good performance of ESR-NSGA-II in modeling and optimizing in vitro propagation. ESR-NSGA-II can be applied as an up-to-date and reliable computational tool for future studies in plant in vitro culture.
Collapse
Affiliation(s)
- Saeedeh Zarbakhsh
- Department of Horticultural Science, College of Agriculture, Faculty of Agriculture, Shiraz University, Shiraz, Iran
| | - Ali Reza Shahsavar
- Department of Horticultural Science, College of Agriculture, Faculty of Agriculture, Shiraz University, Shiraz, Iran.
| | | |
Collapse
|
14
|
Hosseinzadeh M, Hussain D, Zeki Mahmood FM, A. Alenizi F, Varzeghani AN, Asghari P, Darwesh A, Malik MH, Lee SW. A model for skin cancer using combination of ensemble learning and deep learning. PLoS One 2024; 19:e0301275. [PMID: 38820401 PMCID: PMC11142560 DOI: 10.1371/journal.pone.0301275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 03/13/2024] [Indexed: 06/02/2024] Open
Abstract
Skin cancer has a significant impact on the lives of many individuals annually and is recognized as the most prevalent type of cancer. In the United States, an estimated annual incidence of approximately 3.5 million people receiving a diagnosis of skin cancer underscores its widespread prevalence. Furthermore, the prognosis for individuals afflicted with advancing stages of skin cancer experiences a substantial decline in survival rates. This paper is dedicated to aiding healthcare experts in distinguishing between benign and malignant skin cancer cases by employing a range of machine learning and deep learning techniques and different feature extractors and feature selectors to enhance the evaluation metrics. In this paper, different transfer learning models are employed as feature extractors, and to enhance the evaluation metrics, a feature selection layer is designed, which includes diverse techniques such as Univariate, Mutual Information, ANOVA, PCA, XGB, Lasso, Random Forest, and Variance. Among transfer models, DenseNet-201 was selected as the primary feature extractor to identify features from data. Subsequently, the Lasso method was applied for feature selection, utilizing diverse machine learning approaches such as MLP, XGB, RF, and NB. To optimize accuracy and precision, ensemble methods were employed to identify and enhance the best-performing models. The study provides accuracy and sensitivity rates of 87.72% and 92.15%, respectively.
Collapse
Affiliation(s)
- Mehdi Hosseinzadeh
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam
- School of Medicine and Pharmacy, Duy Tan University, Da Nang, Vietnam
| | - Dildar Hussain
- Department of AI and Data Science, Sejong University, Seoul, Republic of Korea
| | | | - Farhan A. Alenizi
- Electrical Engineering Department, College of engineering, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | | | - Parvaneh Asghari
- Department of Computer Engineering, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Aso Darwesh
- Department of Information Technology, University of Human Development, Sulaymaniyah, Kurdistan region of Iraq
| | - Mazhar Hussain Malik
- School of Computer Science and Creative Technologies College of Arts, Technology and Environment (CATE) University of the West of England Frenchay Campus, Coldharbour Lane Bristol, Bristol, United Kingdom
| | - Sang-Woong Lee
- Pattern Recognition and Machine Learning Lab, Gachon University, Seongnamdaero, Sujeonggu, Seongnam, Republic of Korea
| |
Collapse
|
15
|
Kushwaha NL, Kudnar NS, Vishwakarma DK, Subeesh A, Jatav MS, Gaddikeri V, Ahmed AA, Abdelaty I. Stacked hybridization to enhance the performance of artificial neural networks (ANN) for prediction of water quality index in the Bagh river basin, India. Heliyon 2024; 10:e31085. [PMID: 38784559 PMCID: PMC11112320 DOI: 10.1016/j.heliyon.2024.e31085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/03/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
Water quality assessment is paramount for environmental monitoring and resource management, particularly in regions experiencing rapid urbanization and industrialization. This study introduces Artificial Neural Networks (ANN) and its hybrid machine learning models, namely ANN-RF (Random Forest), ANN-SVM (Support Vector Machine), ANN-RSS (Random Subspace), ANN-M5P (M5 Pruned), and ANN-AR (Additive Regression) for water quality assessment in the rapidly urbanizing and industrializing Bagh River Basin, India. The Relief algorithm was employed to select the most influential water quality input parameters, including Nitrate (NO3-), Magnesium (Mg2+), Sulphate (SO42-), Calcium (Ca2+), and Potassium (K+). The comparative analysis of developed ANN and its hybrid models was carried out using statistical indicators (i.e., Nash-Sutcliffe Efficiency (NSE), Pearson Correlation Coefficient (PCC), Coefficient of Determination (R2), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Root Square Error (RRSE), Relative Absolute Error (RAE), and Mean Bias Error (MBE)) and graphical representations (i.e., Taylor diagram). Results indicate that the integration of support vector machine (SVM) with ANN significantly improves performance, yielding impressive statistical indicators: NSE (0.879), R2 (0.904), MAE (22.349), and MBE (12.548). The methodology outlined in this study can serve as a template for enhancing the predictive capabilities of ANN models in various other environmental and ecological applications, contributing to sustainable development and safeguarding natural resources.
Collapse
Affiliation(s)
- Nand Lal Kushwaha
- Department of Soil and Water Engineering, Punjab Agricultural University Ludhiana, Punjab, 141004, India
- Division of Agricultural Engineering, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| | - Nanabhau S. Kudnar
- Department of Geography, C. J. Patel College Tirora, Gondia, Maharashtra, 441911, India
| | - Dinesh Kumar Vishwakarma
- Department of Irrigation and Drainage Engineering, G.B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, 263145, India
| | - A. Subeesh
- ICAR- Central Institute of Agricultural Engineering, Bhopal, Madhya Pradesh, 462038, India
| | - Malkhan Singh Jatav
- National Institute of Hydrology, North Western Regional Centre, Jodhpur, Rajasthan, 342003, India
| | - Venkatesh Gaddikeri
- Division of Agricultural Engineering, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| | - Ashraf A. Ahmed
- Department of Civil and Environmental Engineering, Brunel University London, Kingston Lane, Uxbridge UB38PH, UK
| | - Ismail Abdelaty
- Water and Water Structures Engineering Department, Faculty of Engineering, Zagazig University, Zagazig, 44519, Egypt
| |
Collapse
|
16
|
Pratiwi NKC, Tayara H, Chong KT. An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction. Int J Mol Sci 2024; 25:5957. [PMID: 38892144 PMCID: PMC11172808 DOI: 10.3390/ijms25115957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/27/2024] [Accepted: 05/27/2024] [Indexed: 06/21/2024] Open
Abstract
In this study, we present an innovative approach to improve the prediction of protein-protein interactions (PPIs) through the utilization of an ensemble classifier, specifically focusing on distinguishing between native and non-native interactions. Leveraging the strengths of various base models, including random forest, gradient boosting, extreme gradient boosting, and light gradient boosting, our ensemble classifier integrates these diverse predictions using a logistic regression meta-classifier. Our model was evaluated using a comprehensive dataset generated from molecular dynamics simulations. While the gains in AUC and other metrics might seem modest, they contribute to a model that is more robust, consistent, and adaptable. To assess the effectiveness of various approaches, we compared the performance of logistic regression to four baseline models. Our results indicate that logistic regression consistently underperforms across all evaluated metrics. This suggests that it may not be well-suited to capture the complex relationships within this dataset. Tree-based models, on the other hand, appear to be more effective for problems involving molecular dynamics simulations. Extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) are optimized for performance and speed, handling datasets effectively and incorporating regularizations to avoid over-fitting. Our findings indicate that the ensemble method enhances the predictive capability of PPIs, offering a promising tool for computational biology and drug discovery by accurately identifying potential interaction sites and facilitating the understanding of complex protein functions within biological systems.
Collapse
Affiliation(s)
- Nor Kumalasari Caecar Pratiwi
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea;
- Department of Electrical Engineering, Telkom University, Bandung 40257, West Java, Indonesia
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea;
- Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
17
|
Dayimu A, Simidjievski N, Demiris N, Abraham J. Sample size determination for prediction models via learning-type curves. Stat Med 2024. [PMID: 38803150 DOI: 10.1002/sim.10121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 05/07/2024] [Accepted: 05/10/2024] [Indexed: 05/29/2024]
Abstract
This article is concerned with sample size determination methodology for prediction models. We propose to combine the individual calculations via learning-type curves. We suggest two distinct ways of doing so, a deterministic skeleton of a learning curve and a Gaussian process centered upon its deterministic counterpart. We employ several learning algorithms for modeling the primary endpoint and distinct measures for trial efficacy. We find that the performance may vary with the sample size, but borrowing information across sample size universally improves the performance of such calculations. The Gaussian process-based learning curve appears more robust and statistically efficient, while computational efficiency is comparable. We suggest that anchoring against historical evidence when extrapolating sample sizes should be adopted when such data are available. The methods are illustrated on binary and survival endpoints.
Collapse
Affiliation(s)
- Alimu Dayimu
- Cambridge Clinical Trials Unit Cancer Theme, University of Cambridge, Cambridge, UK
| | - Nikola Simidjievski
- Cambridge Precision Breast Cancer Institute, University of Cambridge, Cambridge, UK
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | - Nikolaos Demiris
- Department of Statistics, Athens University of Economics and Business, Athens, Greece
| | - Jean Abraham
- Cambridge Precision Breast Cancer Institute, University of Cambridge, Cambridge, UK
| |
Collapse
|
18
|
Chen R, Yan Q, Tuoheti T, Xu L, Gao Q, Zhang Y, Ren H, Zheng L, Wang F, Liu Y. A prediction model of rubber content in the dried root of Taraxacum kok-saghyz Rodin based on near-infrared spectroscopy. PLANT METHODS 2024; 20:77. [PMID: 38797847 PMCID: PMC11128126 DOI: 10.1186/s13007-024-01183-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 04/12/2024] [Indexed: 05/29/2024]
Abstract
BACKGROUND Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified. RESULT The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS-FD CASR-RF and MWS-FD chemical characteristic band RF, respectively. Their respective R P 2 , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively. CONCLUSIONS This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.
Collapse
Affiliation(s)
- Runfeng Chen
- Agricultural College, Xinjiang Agricultural University, Urumqi, 830052, People's Republic of China
- Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China
| | - Qingqing Yan
- Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China
- National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China
| | - Tuhanguli Tuoheti
- Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China
- National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China
| | - Lin Xu
- Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China.
- National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China.
| | - Qiang Gao
- Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China.
- National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China.
| | - Yan Zhang
- Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China
- National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China
| | - Hailong Ren
- Crops Research Institute, Guangdong Academy of Agricultural Sciences, Guangdong Provincial Key Laboratory of Crop Genetic Improvement, Guangzhou, 510308, People's Republic of China
| | - Lipeng Zheng
- Agricultural College, Xinjiang Agricultural University, Urumqi, 830052, People's Republic of China
- Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China
| | - Feng Wang
- Beijing Linglong Tyre Company Limited, Beijing, 101102, People's Republic of China
| | - Ya Liu
- Comprehensive Testing Ground, Xinjiang Academy of Agricultural Sciences, Urumqi, 830052, People's Republic of China
| |
Collapse
|
19
|
Kim SB, Kang JH, Cheon M, Kim DJ, Lee BC. Stacked neural network for predicting polygenic risk score. Sci Rep 2024; 14:11632. [PMID: 38773257 PMCID: PMC11109142 DOI: 10.1038/s41598-024-62513-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 05/17/2024] [Indexed: 05/23/2024] Open
Abstract
In recent years, the utility of polygenic risk scores (PRS) in forecasting disease susceptibility from genome-wide association studies (GWAS) results has been widely recognised. Yet, these models face limitations due to overfitting and the potential overestimation of effect sizes in correlated variants. To surmount these obstacles, we devised the Stacked Neural Network Polygenic Risk Score (SNPRS). This novel approach synthesises outputs from multiple neural network models, each calibrated using genetic variants chosen based on diverse p-value thresholds. By doing so, SNPRS captures a broader array of genetic variants, enabling a more nuanced interpretation of the combined effects of these variants. We assessed the efficacy of SNPRS using the UK Biobank data, focusing on the genetic risks associated with breast and prostate cancers, as well as quantitative traits like height and BMI. We also extended our analysis to the Korea Genome and Epidemiology Study (KoGES) dataset. Impressively, our results indicate that SNPRS surpasses traditional PRS models and an isolated deep neural network in terms of accuracy, highlighting its promise in refining the efficacy and relevance of PRS in genetic studies.
Collapse
Affiliation(s)
- Sun Bin Kim
- Genoplan Korea Inc., Seoul, Republic of Korea
| | | | | | | | | |
Collapse
|
20
|
Reinen JM, Polosecki P, Castro E, Corcoran CM, Cecchi GA, Colibazzi T. Multimodal fusion of brain signals for robust prediction of psychosis transition. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2024; 10:54. [PMID: 38773120 PMCID: PMC11109212 DOI: 10.1038/s41537-024-00464-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/15/2024] [Indexed: 05/23/2024]
Abstract
The prospective study of youths at clinical high risk (CHR) for psychosis, including neuroimaging, can identify neural signatures predictive of psychosis outcomes using algorithms that integrate complex information. Here, to identify risk and psychosis conversion, we implemented multiple kernel learning (MKL), a multimodal machine learning approach allowing patterns from each modality to inform each other. Baseline multimodal scans (n = 74, 11 converters) included structural, resting-state functional imaging, and diffusion-weighted data. Multimodal MKL outperformed unimodal models (AUC = 0.73 vs. 0.66 in predicting conversion). Moreover, patterns learned by MKL were robust to training set variations, suggesting it can identify cross-modality redundancies and synergies to stabilize the predictive pattern. We identified many predictors consistent with the literature, including frontal cortices, cingulate, thalamus, and striatum. This highlights the advantage of methods that leverage the complex pathophysiology of psychosis.
Collapse
Affiliation(s)
- Jenna M Reinen
- IBM T.J. Watson Research Center, Yorktown Heights, NY, USA.
| | | | - Eduardo Castro
- IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
| | - Cheryl M Corcoran
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Tiziano Colibazzi
- Department of Psychiatry, The New York State Psychiatric Institute, Columbia College of Physicians and Surgeons, New York, NY, USA
| |
Collapse
|
21
|
Chao X, Kai Z, Wu H, Wang J, Chen X, Su H, Shang X, Lin R, Huang L, He H, Lang J, Li L. Fragmentomics features of ovarian cancer. Int J Cancer 2024. [PMID: 38769763 DOI: 10.1002/ijc.34981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/14/2024] [Accepted: 04/02/2024] [Indexed: 05/22/2024]
Abstract
Ovarian cancer (OC) is a major cause of cancer mortality in women worldwide. Due to the occult onset of OC, its nonspecific clinical symptoms in the early phase, and a lack of effective early diagnostic tools, most OC patients are diagnosed at an advanced stage. In this study, shallow whole-genome sequencing was utilized to characterize fragmentomics features of circulating tumor DNA (ctDNA) in OC patients. By applying a machine learning model, multiclass fragmentomics data achieved a mean area under the curve (AUC) of 0.97 (95% CI 0.962-0.976) for diagnosing OC. OC scores derived from this model strongly correlated with the disease stage. Further comparative analysis of OC scores illustrated that the fragmentomics-based technology provided additional clinical benefits over the traditional serum biomarkers cancer antigen 125 (CA125) and the Risk of Ovarian Malignancy Algorithm (ROMA) index. In conclusion, fragmentomics features in ctDNA are potential biomarkers for the accurate diagnosis of OC.
Collapse
Affiliation(s)
- Xiaopei Chao
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China
- Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China
- State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
| | - Zhentian Kai
- Department of Bioinformatics, Zhejiang Shaoxing Topgen Biomedical Technology CO., LTD, Shanghai, China
| | - Huanwen Wu
- Department of Pathology, Peking Union Medical College Hospital, Beijing, China
| | - Jing Wang
- Department of Pathology, Peking Union Medical College Hospital, Beijing, China
| | - Xiaojing Chen
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China
- Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China
- State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
| | - Haiqi Su
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China
- Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China
- State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
| | - Xiao Shang
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China
- Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China
- State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
| | - Ruijue Lin
- Department of Technology, Zhejiang Topgen Clinical Laboratory Co., LTD., Huzhou, China
| | - Lisha Huang
- Department of Bioinformatics, Zhejiang Shaoxing Topgen Biomedical Technology CO., LTD, Shanghai, China
| | - Hongsheng He
- Department of Bioinformatics, Zhejiang Shaoxing Topgen Biomedical Technology CO., LTD, Shanghai, China
| | - Jinghe Lang
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China
- Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China
- State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
| | - Lei Li
- Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China
- Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China
- State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
| |
Collapse
|
22
|
Lee HJ, Schwamm LH, Sansing LH, Kamel H, de Havenon A, Turner AC, Sheth KN, Krishnaswamy S, Brandt C, Zhao H, Krumholz H, Sharma R. StrokeClassifier: ischemic stroke etiology classification by ensemble consensus modeling using electronic health records. NPJ Digit Med 2024; 7:130. [PMID: 38760474 PMCID: PMC11101464 DOI: 10.1038/s41746-024-01120-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 04/23/2024] [Indexed: 05/19/2024] Open
Abstract
Determining acute ischemic stroke (AIS) etiology is fundamental to secondary stroke prevention efforts but can be diagnostically challenging. We trained and validated an automated classification tool, StrokeClassifier, using electronic health record (EHR) text from 2039 non-cryptogenic AIS patients at 2 academic hospitals to predict the 4-level outcome of stroke etiology adjudicated by agreement of at least 2 board-certified vascular neurologists' review of the EHR. StrokeClassifier is an ensemble consensus meta-model of 9 machine learning classifiers applied to features extracted from discharge summary texts by natural language processing. StrokeClassifier was externally validated in 406 discharge summaries from the MIMIC-III dataset reviewed by a vascular neurologist to ascertain stroke etiology. Compared with vascular neurologists' diagnoses, StrokeClassifier achieved the mean cross-validated accuracy of 0.74 and weighted F1 of 0.74 for multi-class classification. In MIMIC-III, its accuracy and weighted F1 were 0.70 and 0.71, respectively. In binary classification, the two metrics ranged from 0.77 to 0.96. The top 5 features contributing to stroke etiology prediction were atrial fibrillation, age, middle cerebral artery occlusion, internal carotid artery occlusion, and frontal stroke location. We designed a certainty heuristic to grade the confidence of StrokeClassifier's diagnosis as non-cryptogenic by the degree of consensus among the 9 classifiers and applied it to 788 cryptogenic patients, reducing cryptogenic diagnoses from 25.2% to 7.2%. StrokeClassifier is a validated artificial intelligence tool that rivals the performance of vascular neurologists in classifying ischemic stroke etiology. With further training, StrokeClassifier may have downstream applications including its use as a clinical decision support system.
Collapse
Affiliation(s)
- Ho-Joon Lee
- Department of Genetics and Yale Center for Genome Analysis, Yale School of Medicine, New Haven, CT, USA.
| | - Lee H Schwamm
- Department of Neurology and Comprehensive Stroke Center, Massachusetts General Hospital and Harvard Medical School Boston, Boston, MA, USA
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA
| | - Lauren H Sansing
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA
| | - Hooman Kamel
- Department of Neurology, Weill Cornell Medicine, New York City, NY, USA
| | - Adam de Havenon
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA
| | - Ashby C Turner
- Department of Neurology and Comprehensive Stroke Center, Massachusetts General Hospital and Harvard Medical School Boston, Boston, MA, USA
| | - Kevin N Sheth
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA
| | - Smita Krishnaswamy
- Departments of Genetics and Computer Science, Yale School of Medicine, New Haven, CT, USA
| | - Cynthia Brandt
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Hongyu Zhao
- Departments of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Harlan Krumholz
- Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Richa Sharma
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
23
|
Ebrahimi A, Henriksen MBH, Brasen CL, Hilberg O, Hansen TF, Jensen LH, Peimankar A, Wiil UK. Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study. BMC Med Res Methodol 2024; 24:114. [PMID: 38760718 PMCID: PMC11100078 DOI: 10.1186/s12874-024-02231-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 04/23/2024] [Indexed: 05/19/2024] Open
Abstract
BACKGROUND Smoking is a critical risk factor responsible for over eight million annual deaths worldwide. It is essential to obtain information on smoking habits to advance research and implement preventive measures such as screening of high-risk individuals. In most countries, including Denmark, smoking habits are not systematically recorded and at best documented within unstructured free-text segments of electronic health records (EHRs). This would require researchers and clinicians to manually navigate through extensive amounts of unstructured data, which is one of the main reasons that smoking habits are rarely integrated into larger studies. Our aim is to develop machine learning models to classify patients' smoking status from their EHRs. METHODS This study proposes an efficient natural language processing (NLP) pipeline capable of classifying patients' smoking status and providing explanations for the decisions. The proposed NLP pipeline comprises four distinct components, which are; (1) considering preprocessing techniques to address abbreviations, punctuation, and other textual irregularities, (2) four cutting-edge feature extraction techniques, i.e. Embedding, BERT, Word2Vec, and Count Vectorizer, employed to extract the optimal features, (3) utilization of a Stacking-based Ensemble (SE) model and a Convolutional Long Short-Term Memory Neural Network (CNN-LSTM) for the identification of smoking status, and (4) application of a local interpretable model-agnostic explanation to explain the decisions rendered by the detection models. The EHRs of 23,132 patients with suspected lung cancer were collected from the Region of Southern Denmark during the period 1/1/2009-31/12/2018. A medical professional annotated the data into 'Smoker' and 'Non-Smoker' with further classifications as 'Active-Smoker', 'Former-Smoker', and 'Never-Smoker'. Subsequently, the annotated dataset was used for the development of binary and multiclass classification models. An extensive comparison was conducted of the detection performance across various model architectures. RESULTS The results of experimental validation confirm the consistency among the models. However, for binary classification, BERT method with CNN-LSTM architecture outperformed other models by achieving precision, recall, and F1-scores between 97% and 99% for both Never-Smokers and Active-Smokers. In multiclass classification, the Embedding technique with CNN-LSTM architecture yielded the most favorable results in class-specific evaluations, with equal performance measures of 97% for Never-Smoker and measures in the range of 86 to 89% for Active-Smoker and 91-92% for Never-Smoker. CONCLUSION Our proposed NLP pipeline achieved a high level of classification performance. In addition, we presented the explanation of the decision made by the best performing detection model. Future work will expand the model's capabilities to analyze longer notes and a broader range of categories to maximize its utility in further research and screening applications.
Collapse
Affiliation(s)
- Ali Ebrahimi
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, 5230, Denmark.
| | | | - Claus Lohman Brasen
- Department of Biochemistry and Immunology, Lillebaelt Hospital, University Hospital of Southern Denmark, Vejle, 7100, Denmark
- Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark
| | - Ole Hilberg
- Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark
- Department of Internal Medicine, Lillebaelt Hospital, University Hospital of Southern Denmark, Vejle, 7100, Denmark
| | - Torben Frøstrup Hansen
- Department of Oncology, Lillebaelt Hospital, University Hospital of Southern Denmark, Vejle, 7100, Denmark
- Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark
| | - Lars Henrik Jensen
- Department of Oncology, Lillebaelt Hospital, University Hospital of Southern Denmark, Vejle, 7100, Denmark
| | - Abdolrahman Peimankar
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, 5230, Denmark
| | - Uffe Kock Wiil
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, 5230, Denmark
| |
Collapse
|
24
|
Li X, Jones P, Zhao M. Identifying potential (re)hemorrhage among sporadic cerebral cavernous malformations using machine learning. Sci Rep 2024; 14:11022. [PMID: 38745042 PMCID: PMC11094099 DOI: 10.1038/s41598-024-61851-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 05/10/2024] [Indexed: 05/16/2024] Open
Abstract
The (re)hemorrhage in patients with sporadic cerebral cavernous malformations (CCM) was the primary aim for CCM management. However, accurately identifying the potential (re)hemorrhage among sporadic CCM patients in advance remains a challenge. This study aims to develop machine learning models to detect potential (re)hemorrhage in sporadic CCM patients. This study was based on a dataset of 731 sporadic CCM patients in open data platform Dryad. Sporadic CCM patients were followed up 5 years from January 2003 to December 2018. Support vector machine (SVM), stacked generalization, and extreme gradient boosting (XGBoost) were used to construct models. The performance of models was evaluated by area under receiver operating characteristic curves (AUROC), area under the precision-recall curve (PR-AUC) and other metrics. A total of 517 patients with sporadic CCM were included (330 female [63.8%], mean [SD] age at diagnosis, 42.1 [15.5] years). 76 (re)hemorrhage (14.7%) occurred during follow-up. Among 3 machine learning models, XGBoost model yielded the highest mean (SD) AUROC (0.87 [0.06]) in cross-validation. The top 4 features of XGBoost model were ranked with SHAP (SHapley Additive exPlanations). All-Elements XGBoost model achieved an AUROCs of 0.84 and PR-AUC of 0.49 in testing set, with a sensitivity of 0.86 and a specificity of 0.76. Importantly, 4-Elements XGBoost model developed using top 4 features got a AUROCs of 0.83 and PR-AUC of 0.40, a sensitivity of 0.79, and a specificity of 0.72 in testing set. Two machine learning-based models achieved accurate performance in identifying potential (re)hemorrhages within 5 years in sporadic CCM patients. These models may provide insights for clinical decision-making.
Collapse
Affiliation(s)
- Xiaopeng Li
- Department of Neurology, The First Affiliated Hospital of Henan University, Kaifeng, China
| | - Peng Jones
- Independent Researcher, Xinyang, Henan, China
| | - Mei Zhao
- Department of Neurology, The First Affiliated Hospital of Nanchang University, No. 17 Yongwai Street, Nanchang, 330006, Jiangxi, China.
| |
Collapse
|
25
|
Parrish RL, Buchman AS, Tasaki S, Wang Y, Avey D, Xu J, De Jager PL, Bennett DA, Epstein MP, Yang J. SR-TWAS: Leveraging Multiple Reference Panels to Improve TWAS Power by Ensemble Machine Learning. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.06.20.23291605. [PMID: 37425698 PMCID: PMC10327185 DOI: 10.1101/2023.06.20.23291605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Multiple reference panels of a given tissue or multiple tissues often exist, and multiple regression methods could be used for training gene expression imputation models for TWAS. To leverage expression imputation models (i.e., base models) trained with multiple reference panels, regression methods, and tissues, we develop a Stacked Regression based TWAS (SR-TWAS) tool which can obtain optimal linear combinations of base models for a given validation transcriptomic dataset. Both simulation and real studies showed that SR-TWAS improved power, due to increased effective training sample sizes and borrowed strength across multiple regression methods and tissues. Leveraging base models across multiple reference panels, tissues, and regression methods, our real application studies identified 6 independent significant risk genes for Alzheimer's disease (AD) dementia for supplementary motor area tissue and 9 independent significant risk genes for Parkinson's disease (PD) for substantia nigra tissue. Relevant biological interpretations were found for these significant risk genes.
Collapse
|
26
|
Zhang X, Chen S, Zhang P, Wang C, Wang Q, Zhou X. Staging of Liver Fibrosis Based on Energy Valley Optimization Multiple Stacking (EVO-MS) Model. Bioengineering (Basel) 2024; 11:485. [PMID: 38790352 PMCID: PMC11117710 DOI: 10.3390/bioengineering11050485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/26/2024] Open
Abstract
Currently, staging the degree of liver fibrosis predominantly relies on liver biopsy, a method fraught with potential risks, such as bleeding and infection. With the rapid development of medical imaging devices, quantification of liver fibrosis through image processing technology has become feasible. Stacking technology is one of the effective ensemble techniques for potential usage, but precise tuning to find the optimal configuration manually is challenging. Therefore, this paper proposes a novel EVO-MS model-a multiple stacking ensemble learning model optimized by the energy valley optimization (EVO) algorithm to select most informatic features for fibrosis quantification. Liver contours are profiled from 415 biopsied proven CT cases, from which 10 shape features are calculated and inputted into a Support Vector Machine (SVM) classifier to generate the accurate predictions, then the EVO algorithm is applied to find the optimal parameter combination to fuse six base models: K-Nearest Neighbors (KNNs), Decision Tree (DT), Naive Bayes (NB), Extreme Gradient Boosting (XGB), Gradient Boosting Decision Tree (GBDT), and Random Forest (RF), to create a well-performing ensemble model. Experimental results indicate that selecting 3-5 feature parameters yields satisfactory results in classification, with features such as the contour roundness non-uniformity (Rmax), maximum peak height of contour (Rp), and maximum valley depth of contour (Rm) significantly influencing classification accuracy. The improved EVO algorithm, combined with a multiple stacking model, achieves an accuracy of 0.864, a precision of 0.813, a sensitivity of 0.912, a specificity of 0.824, and an F1-score of 0.860, which demonstrates the effectiveness of our EVO-MS model in staging the degree of liver fibrosis.
Collapse
Affiliation(s)
- Xuejun Zhang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
| | - Shengxiang Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
| | - Pengfei Zhang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
| | - Chun Wang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
| | - Qibo Wang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
| | - Xiangrong Zhou
- Department of Electrical, Electronic and Computer Engineering, Gifu University, Gifu 501-1193, Japan;
| |
Collapse
|
27
|
Shen J, Wang S, Sun H, Huang J, Bai L, Wang X, Dong Y, Tang Z. A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data. BMC Med Res Methodol 2024; 24:105. [PMID: 38702624 PMCID: PMC11067084 DOI: 10.1186/s12874-024-02232-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 04/23/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Survival prediction using high-dimensional molecular data is a hot topic in the field of genomics and precision medicine, especially for cancer studies. Considering that carcinogenesis has a pathway-based pathogenesis, developing models using such group structures is a closer mimic of disease progression and prognosis. Many approaches can be used to integrate group information; however, most of them are single-model methods, which may account for unstable prediction. METHODS We introduced a novel survival stacking method that modeled using group structure information to improve the robustness of cancer survival prediction in the context of high-dimensional omics data. With a super learner, survival stacking combines the prediction from multiple sub-models that are independently trained using the features in pre-grouped biological pathways. In addition to a non-negative linear combination of sub-models, we extended the super learner to non-negative Bayesian hierarchical generalized linear model and artificial neural network. We compared the proposed modeling strategy with the widely used survival penalized method Lasso Cox and several group penalized methods, e.g., group Lasso Cox, via simulation study and real-world data application. RESULTS The proposed survival stacking method showed superior and robust performance in terms of discrimination compared with single-model methods in case of high-noise simulated data and real-world data. The non-negative Bayesian stacking method can identify important biological signal pathways and genes that are associated with the prognosis of cancer. CONCLUSIONS This study proposed a novel survival stacking strategy incorporating biological group information into the cancer prognosis models. Additionally, this study extended the super learner to non-negative Bayesian model and ANN, enriching the combination of sub-models. The proposed Bayesian stacking strategy exhibited favorable properties in the prediction and interpretation of complex survival data, which may aid in discovering cancer targets.
Collapse
Affiliation(s)
- Junjie Shen
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Shuo Wang
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, 79085, Freiburg, Germany
| | - Hao Sun
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Jie Huang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Lu Bai
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Xichao Wang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Yongfei Dong
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China.
| |
Collapse
|
28
|
Pratyush P, Bahmani S, Pokharel S, Ismail HD, KC DB. LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model. Bioinformatics 2024; 40:btae290. [PMID: 38662579 PMCID: PMC11088740 DOI: 10.1093/bioinformatics/btae290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/13/2024] [Accepted: 04/24/2024] [Indexed: 05/13/2024] Open
Abstract
MOTIVATION Recent advancements in natural language processing have highlighted the effectiveness of global contextualized representations from protein language models (pLMs) in numerous downstream tasks. Nonetheless, strategies to encode the site-of-interest leveraging pLMs for per-residue prediction tasks, such as crotonylation (Kcr) prediction, remain largely uncharted. RESULTS Herein, we adopt a range of approaches for utilizing pLMs by experimenting with different input sequence types (full-length protein sequence versus window sequence), assessing the implications of utilizing per-residue embedding of the site-of-interest as well as embeddings of window residues centered around it. Building upon these insights, we developed a novel residual ConvBiLSTM network designed to process window-level embeddings of the site-of-interest generated by the ProtT5-XL-UniRef50 pLM using full-length sequences as input. This model, termed T5ResConvBiLSTM, surpasses existing state-of-the-art Kcr predictors in performance across three diverse datasets. To validate our approach of utilizing full sequence-based window-level embeddings, we also delved into the interpretability of ProtT5-derived embedding tensors in two ways: firstly, by scrutinizing the attention weights obtained from the transformer's encoder block; and secondly, by computing SHAP values for these tensors, providing a model-agnostic interpretation of the prediction results. Additionally, we enhance the latent representation of ProtT5 by incorporating two additional local representations, one derived from amino acid properties and the other from supervised embedding layer, through an intermediate fusion stacked generalization approach, using an n-mer window sequence (or, peptide/fragment). The resultant stacked model, dubbed LMCrot, exhibits a more pronounced improvement in predictive performance across the tested datasets. AVAILABILITY AND IMPLEMENTATION LMCrot is publicly available at https://github.com/KCLabMTU/LMCrot.
Collapse
Affiliation(s)
- Pawel Pratyush
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Soufia Bahmani
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Suresh Pokharel
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Hamid D Ismail
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Dukka B KC
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| |
Collapse
|
29
|
Cao D, Hu M, Zhi D, Liang J, Tan Q, Lei Q, Li M, Cheng H, Wang L, Dai W. Systematic evaluation of machine learning-enhanced trifocal IOL power selection for axial myopia cataract patients. Comput Biol Med 2024; 173:108245. [PMID: 38531253 DOI: 10.1016/j.compbiomed.2024.108245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 03/03/2024] [Accepted: 03/04/2024] [Indexed: 03/28/2024]
Abstract
PURPOSE This study aimed to evaluate and optimize intraocular lens (IOL) power selection for cataract patients with high axial myopia receiving trifocal IOLs. DESIGN A multi-center, retrospective observational case series was conducted. Patients having an axial length ≥26 mm and undergoing cataract surgery with trifocal IOL implanted were studied. METHODS Preoperative biometric and postoperative outcome data from 139 eyes were collected to train and test various machine learning (ML) models (support vector machine, linear regression, and stacking regressor) using five-fold cross-validation. The models' performance was further validated externally using data from 48 eyes enrolled from other hospitals. Performance of seven IOL calculation formulas (BUII, Kane, EVO, K6, DGS, Holladay I, and SRK/T) were examined with and without ML models. RESULTS The results of cross-validation revealed improvements across all IOL calculation formulas, especially for K6 and Holladay I. The model increased the percentage of eyes with a prediction error (PE) within ±0.50 D from 71.94% to 79.14% for K6, and from 35.25% to 51.80% for Holladay I. In external validation involving 48 patients from other centers, six out of seven formulas demonstrated a reduction in the mean absolute error (MAE). K6's PE within ±0.50 D improved from 62.50% to 77.08%, and Holladay I from 16.67% to 58.33%. CONCLUSIONS In this study, we conducted a comprehensive evaluation of seven IOL power calculation formulas in high axial myopia cases and explored the effectiveness of the Stacking Regressor model in augmenting their accuracy. Of these formulas, K6 and Holladay I exhibited the most significant improvements, suggesting that integrating ML may have varying levels of effectiveness across different formulas but holds substantial promise in improving the predictability of IOL power calculations in patients with long eyes.
Collapse
Affiliation(s)
- Danmin Cao
- Aier Institute of Digital Ophthalmology & Visual Science, Changsha Aier Eye Hospital, Changsha, China; Department of Ophthalmology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China; Aier Eye Hospital of Wuhan University, Wuhan, China
| | - Min Hu
- Aier Institute of Digital Ophthalmology & Visual Science, Changsha Aier Eye Hospital, Changsha, China; Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Danlin Zhi
- The First Affiliated Hospital of University of South China, Hengyang, China
| | - Jianheng Liang
- Guangzhou Aier Eye Hospital, Jinan University, Guangzhou, China
| | - Qian Tan
- Aier Institute of Digital Ophthalmology & Visual Science, Changsha Aier Eye Hospital, Changsha, China
| | - Qiong Lei
- Aier Eye Hospital of Wuhan University, Wuhan, China
| | - Maoyan Li
- Aier Institute of Digital Ophthalmology & Visual Science, Changsha Aier Eye Hospital, Changsha, China
| | - Hao Cheng
- Department of Ophthalmology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Li Wang
- Cullen Eye Institute, Department of Ophthalmology, Baylor College of Medicine, Houston, TX, USA
| | - Weiwei Dai
- Aier Institute of Digital Ophthalmology & Visual Science, Changsha Aier Eye Hospital, Changsha, China.
| |
Collapse
|
30
|
Shen Z, Zhong Y, Wang Y, Zhu H, Liu R, Yu S, Zhang H, Wang M, Yang T, Zhang M. A computational approach to estimate postmortem interval using postmortem computed tomography of multiple tissues based on animal experiments. Int J Legal Med 2024; 138:1093-1107. [PMID: 37999765 DOI: 10.1007/s00414-023-03127-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 10/27/2023] [Indexed: 11/25/2023]
Abstract
The estimation of postmortem interval (PMI) is a complex and challenging problem in forensic medicine. In recent years, many studies have begun to use machine learning methods to estimate PMI. However, research combining postmortem computed tomography (PMCT) with machine learning models for PMI estimation is still in early stages. This study aims to establish a multi-tissue machine learning model for PMI estimation using PMCT data from various tissues. We collected PMCT data of seven tissues, including brain, eyeballs, myocardium, liver, kidneys, erector spinae, and quadriceps femoris from 10 rabbits after death. CT images were taken every 12 h until 192 h after death, and HU values were extracted from the CT images of each tissue as a dataset. Support vector machine, random forest, and K-nearest neighbors were performed to establish PMI estimation models, and after adjusting the parameters of each model, they were used as first-level classification to build a stacking model to further improve the PMI estimation accuracy. The accuracy and generalized area under the receiver operating characteristic curve of the multi-tissue stacking model were able to reach 93% and 0.96, respectively. Results indicated that PMCT detection could be used to obtain postmortem change of different tissue densities, and the stacking model demonstrated strong predictive and generalization abilities. This approach provides new research methods and ideas for the study of PMI estimation.
Collapse
Affiliation(s)
- Zefang Shen
- Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, No. 25 Xitucheng Road, Haidian District, Beijing, 100088, China
| | - Yue Zhong
- Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, No. 25 Xitucheng Road, Haidian District, Beijing, 100088, China
| | - Yucong Wang
- Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, No. 25 Xitucheng Road, Haidian District, Beijing, 100088, China
| | - Haibiao Zhu
- Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, No. 25 Xitucheng Road, Haidian District, Beijing, 100088, China
| | - Ran Liu
- Forensic Science Center of Beijing Huatong Junjian Science and Technology Company Limited, Beijing, 100016, China
| | - Shengnan Yu
- Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, No. 25 Xitucheng Road, Haidian District, Beijing, 100088, China
| | - Haidong Zhang
- Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, No. 25 Xitucheng Road, Haidian District, Beijing, 100088, China
| | - Min Wang
- Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, No. 25 Xitucheng Road, Haidian District, Beijing, 100088, China
| | - Tiantong Yang
- Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, No. 25 Xitucheng Road, Haidian District, Beijing, 100088, China.
| | - Mengzhou Zhang
- Key Laboratory of Evidence Science (China University of Political Science and Law), Ministry of Education, No. 25 Xitucheng Road, Haidian District, Beijing, 100088, China.
| |
Collapse
|
31
|
Lee Y, Park J, Lee CO. Parareal Neural Networks Emulating a Parallel-in-Time Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6353-6364. [PMID: 36173779 DOI: 10.1109/tnnls.2022.3206797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
As deep neural networks (DNNs) become deeper, the training time increases. In this perspective, multi-GPU parallel computing has become a key tool in accelerating the training of DNNs. In this article, we introduce a novel methodology to construct a parallel neural network that can utilize multiple GPUs simultaneously from a given DNN. We observe that layers of DNN can be interpreted as the time steps of a time-dependent problem and can be parallelized by emulating a parallel-in-time algorithm called parareal. The parareal algorithm consists of fine structures which can be implemented in parallel and a coarse structure that gives suitable approximations to the fine structures. By emulating it, the layers of DNN are torn to form a parallel structure, which is connected using a suitable coarse network. We report accelerated and accuracy-preserved results of the proposed methodology applied to VGG-16 and ResNet-1001 on several datasets.
Collapse
|
32
|
Mota LFM, Giannuzzi D, Pegolo S, Sturaro E, Gianola D, Negrini R, Trevisi E, Ajmone Marsan P, Cecchinato A. Genomic prediction of blood biomarkers of metabolic disorders in Holstein cattle using parametric and nonparametric models. Genet Sel Evol 2024; 56:31. [PMID: 38684971 PMCID: PMC11057143 DOI: 10.1186/s12711-024-00903-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Metabolic disturbances adversely impact productive and reproductive performance of dairy cattle due to changes in endocrine status and immune function, which increase the risk of disease. This may occur in the post-partum phase, but also throughout lactation, with sub-clinical symptoms. Recently, increased attention has been directed towards improved health and resilience in dairy cattle, and genomic selection (GS) could be a helpful tool for selecting animals that are more resilient to metabolic disturbances throughout lactation. Hence, we evaluated the genomic prediction of serum biomarkers levels for metabolic distress in 1353 Holsteins genotyped with the 100K single nucleotide polymorphism (SNP) chip assay. The GS was evaluated using parametric models best linear unbiased prediction (GBLUP), Bayesian B (BayesB), elastic net (ENET), and nonparametric models, gradient boosting machine (GBM) and stacking ensemble (Stack), which combines ENET and GBM approaches. RESULTS The results show that the Stack approach outperformed other methods with a relative difference (RD), calculated as an increment in prediction accuracy, of approximately 18.0% compared to GBLUP, 12.6% compared to BayesB, 8.7% compared to ENET, and 4.4% compared to GBM. The highest RD in prediction accuracy between other models with respect to GBLUP was observed for haptoglobin (hapto) from 17.7% for BayesB to 41.2% for Stack; for Zn from 9.8% (BayesB) to 29.3% (Stack); for ceruloplasmin (CuCp) from 9.3% (BayesB) to 27.9% (Stack); for ferric reducing antioxidant power (FRAP) from 8.0% (BayesB) to 40.0% (Stack); and for total protein (PROTt) from 5.7% (BayesB) to 22.9% (Stack). Using a subset of top SNPs (1.5k) selected from the GBM approach improved the accuracy for GBLUP from 1.8 to 76.5%. However, for the other models reductions in prediction accuracy of 4.8% for ENET (average of 10 traits), 5.9% for GBM (average of 21 traits), and 6.6% for Stack (average of 16 traits) were observed. CONCLUSIONS Our results indicate that the Stack approach was more accurate in predicting metabolic disturbances than GBLUP, BayesB, ENET, and GBM and seemed to be competitive for predicting complex phenotypes with various degrees of mode of inheritance, i.e. additive and non-additive effects. Selecting markers based on GBM improved accuracy of GBLUP.
Collapse
Affiliation(s)
- Lucio F M Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
| | - Diana Giannuzzi
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Sara Pegolo
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
| | - Enrico Sturaro
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Daniel Gianola
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - Riccardo Negrini
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Erminio Trevisi
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Paolo Ajmone Marsan
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Alessio Cecchinato
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| |
Collapse
|
33
|
Zhang Z, Zhou L, Wu Y, Wang N. The meta-learning method for the ensemble model based on situational meta-task. Front Neurorobot 2024; 18:1391247. [PMID: 38736985 PMCID: PMC11082275 DOI: 10.3389/fnbot.2024.1391247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 04/04/2024] [Indexed: 05/14/2024] Open
Abstract
Introduction The meta-learning methods have been widely used to solve the problem of few-shot learning. Generally, meta-learners are trained on a variety of tasks and then generalized to novel tasks. Methods However, existing meta-learning methods do not consider the relationship between meta-tasks and novel tasks during the meta-training period, so that initial models of the meta-learner provide less useful meta-knowledge for the novel tasks. This leads to a weak generalization ability on novel tasks. Meanwhile, different initial models contain different meta-knowledge, which leads to certain differences in the learning effect of novel tasks during the meta-testing period. Therefore, this article puts forward a meta-optimization method based on situational meta-task construction and cooperation of multiple initial models. First, during the meta-training period, a method of constructing situational meta-task is proposed, and the selected candidate task sets provide more effective meta-knowledge for novel tasks. Then, during the meta-testing period, an ensemble model method based on meta-optimization is proposed to minimize the loss of inter-model cooperation in prediction, so that multiple models cooperation can realize the learning of novel tasks. Results The above-mentioned methods are applied to popular few-shot character datasets and image recognition datasets. Furthermore, the experiment results indicate that the proposed method achieves good effects in few-shot classification tasks. Discussion In future work, we will extend our methods to provide more generalized and useful meta-knowledge to the model during the meta-training period when the novel few-shot tasks are completely invisible.
Collapse
Affiliation(s)
- Zhengchao Zhang
- College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang, China
- Modeling and Emulation in E-Government National Engineering Laboratory, Harbin Engineering University, Harbin, Heilongjiang, China
| | - Lianke Zhou
- College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang, China
- Modeling and Emulation in E-Government National Engineering Laboratory, Harbin Engineering University, Harbin, Heilongjiang, China
| | - Yuyang Wu
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, Guangdong, China
| | - Nianbin Wang
- College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang, China
- Modeling and Emulation in E-Government National Engineering Laboratory, Harbin Engineering University, Harbin, Heilongjiang, China
| |
Collapse
|
34
|
Tran XTD, Phan TL, To VT, Tran NVN, Nguyen NNS, Nguyen DNH, Tran NTN, Truong TN. Integration of the Butina algorithm and ensemble learning strategies for the advancement of a pharmacophore ligand-based model: an in silico investigation of apelin agonists. Front Chem 2024; 12:1382319. [PMID: 38690013 PMCID: PMC11058650 DOI: 10.3389/fchem.2024.1382319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/18/2024] [Indexed: 05/02/2024] Open
Abstract
Introduction: 3D pharmacophore models describe the ligand's chemical interactions in their bioactive conformation. They offer a simple but sophisticated approach to decipher the chemically encoded ligand information, making them a valuable tool in drug design. Methods: Our research summarized the key studies for applying 3D pharmacophore models in virtual screening for 6,944 compounds of APJ receptor agonists. Recent advances in clustering algorithms and ensemble methods have enabled classical pharmacophore modeling to evolve into more flexible and knowledge-driven techniques. Butina clustering categorizes molecules based on their structural similarity (indicated by the Tanimoto coefficient) to create a structurally diverse training dataset. The learning method combines various individual pharmacophore models into a set of pharmacophore models for pharmacophore space optimization in virtual screening. Results: This approach was evaluated on Apelin datasets and afforded good screening performance, as proven by Receiver Operating Characteristic (AUC score of 0.994 ± 0.007), enrichment factor of (EF1% of 50.07 ± 0.211), Güner-Henry score of 0.956 ± 0.015, and F-measure of 0.911 ± 0.031. Discussion: Although one of the high-scoring models achieved statistically superior results in each dataset (AUC of 0.82; an EF1% of 19.466; GH of 0.131 and F1-score of 0.071), the ensemble learning method including voting and stacking method balanced the shortcomings of each model and passed with close performance measures.
Collapse
Affiliation(s)
- Xuan-Truc Dinh Tran
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
| | - Tieu-Long Phan
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Van-Thinh To
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
| | | | - Nhu-Ngoc Song Nguyen
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
| | - Dong-Nghi Hoang Nguyen
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
| | - Ngoc-Tam Nguyen Tran
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
| | - Tuyen Ngoc Truong
- Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
| |
Collapse
|
35
|
Li C, Wang L, Sun D, Chen Y. An ensemble framework-based approach for modeling stability of expansive soil slopes: fusion of machine learning algorithms and protection structure disease data. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:24375-24397. [PMID: 38441739 DOI: 10.1007/s11356-024-32583-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 02/18/2024] [Indexed: 04/07/2024]
Abstract
Slope failures lead to catastrophic consequences in numerous countries, so accurate slope stability evaluation is critical in geological disaster prevention and control. In this study, the type and characteristics of slope protection structure disease were determined through the field investigation of an expansive soil area, and this information is incorporated into the numerical simulations and works to develop prediction models of slope stability. Four base machine learning (ML) methods are used to capture the relationship between protection structure diseases and factor of safety (FOS). Further, with the help of stacked generalization (SG), four ML models are combined, and the final SG model is used to predict the FOS. The results show that ML methods can effectively utilize this information and achieve excellent prediction results. The proposed SG model exhibits superior accuracy and robustness in predicting FOS compared to other ML methods. With FOS as the regression variable, the main feature contributions are slope height (37.05%) > slip distance of retaining wall (25.43%) > expansive force (18.03%) > slope gradient (12.00%); the coupling relationship among features is also captured by the proposed model. It is concluded that the SG method is particularly suitable for slope stability modeling under small sample conditions. Besides, the SG-based model effectively captures the impact of protection structure diseases on slope stability, enhances the interpretability of the ML model, and provides a reference for the maintenance and repair of the protection structure.
Collapse
Affiliation(s)
- Chao Li
- School of Urban Railway Transportation, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Lei Wang
- School of Urban Railway Transportation, Shanghai University of Engineering Science, Shanghai, 201620, China.
| | - De'an Sun
- School of Mechanics and Engineering Science, Shanghai University, Shanghai, 200444, China
| | - Yang Chen
- School of Naval Architecture and Civil Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
36
|
Shen C, Zhan W, Xin K, Li M, Sun Z, Cong H, Xu C, Tang J, Wu Z, Xu B, Wei Z, Xue C, Zhao C, Wang Z. Machine-learning-assisted and real-time-feedback-controlled growth of InAs/GaAs quantum dots. Nat Commun 2024; 15:2724. [PMID: 38553435 PMCID: PMC10980817 DOI: 10.1038/s41467-024-47087-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 03/15/2024] [Indexed: 04/02/2024] Open
Abstract
The applications of self-assembled InAs/GaAs quantum dots (QDs) for lasers and single photon sources strongly rely on their density and quality. Establishing the process parameters in molecular beam epitaxy (MBE) for a specific density of QDs is a multidimensional optimization challenge, usually addressed through time-consuming and iterative trial-and-error. Here, we report a real-time feedback control method to realize the growth of QDs with arbitrary density, which is fully automated and intelligent. We develop a machine learning (ML) model named 3D ResNet 50 trained using reflection high-energy electron diffraction (RHEED) videos as input instead of static images and providing real-time feedback on surface morphologies for process control. As a result, we demonstrate that ML from previous growth could predict the post-growth density of QDs, by successfully tuning the QD densities in near-real time from 1.5 × 1010 cm-2 down to 3.8 × 108 cm-2 or up to 1.4 × 1011 cm-2. Compared to traditional methods, our approach can dramatically expedite the optimization process and improve the reproducibility of MBE. The concepts and methodologies proved feasible in this work are promising to be applied to a variety of material growth processes, which will revolutionize semiconductor manufacturing for optoelectronic and microelectronic industries.
Collapse
Affiliation(s)
- Chao Shen
- Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
- School of Physics Science and Technology, Xinjiang University, Urumqi, Xinjiang, 830046, China
| | - Wenkang Zhan
- Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
| | - Kaiyao Xin
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
- State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
| | - Manyang Li
- Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
| | - Zhenyu Sun
- Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
| | - Hui Cong
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
- Key Laboratory of Optoelectronic Materials and Devices, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
| | - Chi Xu
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
- Key Laboratory of Optoelectronic Materials and Devices, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
| | - Jian Tang
- School of Physical and Electronic Engineering, Yancheng Teachers University, Yancheng, 224002, China
| | - Zhaofeng Wu
- School of Physics Science and Technology, Xinjiang University, Urumqi, Xinjiang, 830046, China
| | - Bo Xu
- Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
| | - Zhongming Wei
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
- State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
| | - Chunlai Xue
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
- Key Laboratory of Optoelectronic Materials and Devices, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
| | - Chao Zhao
- Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China.
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China.
| | - Zhanguo Wang
- Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
| |
Collapse
|
37
|
Wang J, Wu DD, DeLorenzo C, Yang J. Examining factors related to low performance of predicting remission in participants with major depressive disorder using neuroimaging data and other clinical features. PLoS One 2024; 19:e0299625. [PMID: 38547128 PMCID: PMC10977765 DOI: 10.1371/journal.pone.0299625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/13/2024] [Indexed: 04/02/2024] Open
Abstract
Major depressive disorder (MDD), a prevalent mental health issue, affects more than 8% of the US population, and almost 17% in the young group of 18-25 years old. Since Covid-19, its prevalence has become even more significant. However, the remission (being free of depression) rates of first-line antidepressant treatments on MDD are only about 30%. To improve treatment outcomes, researchers have built various predictive models for treatment responses and yet none of them have been adopted in clinical use. One reason is that most predictive models are based on data from subjective questionnaires, which are less reliable. Neuroimaging data are promising objective prognostic factors, but they are expensive to obtain and hence predictive models using neuroimaging data are limited and such studies were usually in small scale (N<100). In this paper, we proposed an advanced machine learning (ML) pipeline for small training dataset with large number of features. We implemented multiple imputation for missing data and repeated K-fold cross validation (CV) to robustly estimate predictive performances. Different feature selection methods and stacking methods using 6 general ML models including random forest, gradient boosting decision tree, XGBoost, penalized logistic regression, support vector machine (SVM), and neural network were examined to evaluate the model performances. All predictive models were compared using model performance metrics such as accuracy, balanced accuracy, area under ROC curve (AUC), sensitivity and specificity. Our proposed ML pipeline was applied to a training dataset and obtained an accuracy and AUC above 0.80. But such high performance failed while applying our ML pipeline using an external validation dataset from the EMBARC study which is a multi-center study. We further examined the possible reasons especially the site heterogeneity issue.
Collapse
Affiliation(s)
- Junying Wang
- Department of Applied Mathematics and Statistics, Stony Brook University, New York, New York, United states of America
| | - David D. Wu
- School of Engineering, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Christine DeLorenzo
- Department of Psychiatry and Behavioral Health, Stony Brook University, Stony Brook, New York, United States of America
- Department of Biomedical Engineering, Stony Brook University, Stony Brook, New York, United States of America
| | - Jie Yang
- Department of Family, Population & Preventive Medicine, Stony Brook University, Stony Brook, New York, United States of America
| |
Collapse
|
38
|
Bacon KL, Felson DT, Jafarzadeh SR, Kolachalama VB, Hausdorff JM, Gazit E, Stefanik JJ, Corrigan P, Segal NA, Lewis CE, Nevitt MC, Kumar D. Gait Alterations and Association With Worsening Knee Pain and Physical Function: A Machine Learning Approach With Wearable Sensors in the Multicenter Osteoarthritis Study. Arthritis Care Res (Hoboken) 2024. [PMID: 38523250 DOI: 10.1002/acr.25327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 01/23/2024] [Accepted: 03/21/2024] [Indexed: 03/26/2024]
Abstract
OBJECTIVE The objective of this study was to identify gait alterations related to worsening knee pain and worsening physical function, using machine learning approaches applied to wearable sensor-derived data from a large observational cohort. METHODS Participants in the Multicenter Osteoarthritis Study (MOST) completed a 20-m walk test wearing inertial sensors on their lower back and ankles. Parameters describing spatiotemporal features of gait were extracted from these data. We used an ensemble machine learning technique ("super learning") to optimally discriminate between those with and without worsening physical function and, separately, those with and without worsening pain over two years. We then used log-binomial regression to evaluate associations of the top 10 influential variables selected with super learning with each outcome. We also assessed whether the relation of altered gait with worsening function was mediated by changes in pain. RESULTS Of 2,324 participants, 29% and 24% had worsening knee pain and function over two years, respectively. From the super learner, several gait parameters were found to be influential for worsening pain and for worsening function. After adjusting for confounders, greater gait asymmetry, longer average step length, and lower dominant frequency were associated with worsening pain, and lower cadence was associated with worsening function. Worsening pain partially mediated the association of cadence with function. CONCLUSION We identified gait alterations associated with worsening knee pain and those associated with worsening physical function. These alterations could be assessed with wearable sensors in clinical settings. Further research should determine whether they might be therapeutic targets to prevent worsening pain and worsening function.
Collapse
Affiliation(s)
| | | | | | | | - Jeffrey M Hausdorff
- Tel Aviv University and Tel Aviv Sourasky Medical Center, Tel Aviv, Israel, and Rush University Medical Center, Chicago, Illinois
| | - Eran Gazit
- Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | | | | | - Neil A Segal
- University of Kansas Medical Center, Kansas City
| | | | | | | |
Collapse
|
39
|
Shen J, Wang S, Dong Y, Sun H, Wang X, Tang Z. A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data. BMC Bioinformatics 2024; 25:119. [PMID: 38509499 PMCID: PMC10953151 DOI: 10.1186/s12859-024-05741-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 03/11/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND High-dimensional omics data are increasingly utilized in clinical and public health research for disease risk prediction. Many previous sparse methods have been proposed that using prior knowledge, e.g., biological group structure information, to guide the model-building process. However, these methods are still based on a single model, offen leading to overconfident inferences and inferior generalization. RESULTS We proposed a novel stacking strategy based on a non-negative spike-and-slab Lasso (nsslasso) generalized linear model (GLM) for disease risk prediction in the context of high-dimensional omics data. Briefly, we used prior biological knowledge to segment omics data into a set of sub-data. Each sub-model was trained separately using the features from the group via a proper base learner. Then, the predictions of sub-models were ensembled by a super learner using nsslasso GLM. The proposed method was compared to several competitors, such as the Lasso, grlasso, and gsslasso, using simulated data and two open-access breast cancer data. As a result, the proposed method showed robustly superior prediction performance to the optimal single-model method in high-noise simulated data and real-world data. Furthermore, compared to the traditional stacking method, the proposed nsslasso stacking method can efficiently handle redundant sub-models and identify important sub-models. CONCLUSIONS The proposed nsslasso method demonstrated favorable predictive accuracy, stability, and biological interpretability. Additionally, the proposed method can also be used to detect new biomarkers and key group structures.
Collapse
Affiliation(s)
- Junjie Shen
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China
| | - Shuo Wang
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, 79085, Freiburg, Germany
| | - Yongfei Dong
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China
| | - Hao Sun
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China
| | - Xichao Wang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China.
| |
Collapse
|
40
|
Datta S, Nabeel Asim M, Dengel A, Ahmed S. NTpred: a robust and precise machine learning framework for in silico identification of Tyrosine nitration sites in protein sequences. Brief Funct Genomics 2024; 23:163-179. [PMID: 37248673 DOI: 10.1093/bfgp/elad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/12/2023] [Accepted: 05/02/2023] [Indexed: 05/31/2023] Open
Abstract
Post-translational modifications (PTMs) either enhance a protein's activity in various sub-cellular processes, or degrade their activity which leads toward failure of intracellular processes. Tyrosine nitration (NT) modification degrades protein's activity that initiates and propagates various diseases including neurodegenerative, cardiovascular, autoimmune diseases and carcinogenesis. Identification of NT modification supports development of novel therapies and drug discoveries for associated diseases. Identification of NT modification in biochemical labs is expensive, time consuming and error-prone. To supplement this process, several computational approaches have been proposed. However these approaches fail to precisely identify NT modification, due to the extraction of irrelevant, redundant and less discriminative features from protein sequences. This paper presents the NTpred framework that is competent in extracting comprehensive features from raw protein sequences using four different sequence encoders. To reap the benefits of different encoders, it generates four additional feature spaces by fusing different combinations of individual encodings. Furthermore, it eradicates irrelevant and redundant features from eight different feature spaces through a Recursive Feature Elimination process. Selected features of four individual encodings and four feature fusion vectors are used to train eight different Gradient Boosted Tree classifiers. The probability scores from the trained classifiers are utilized to generate a new probabilistic feature space, which is used to train a Logistic Regression classifier. On the BD1 benchmark dataset, the proposed framework outperforms the existing best-performing predictor in 5-fold cross validation and independent test evaluation with combined improvement of 13.7% in MCC and 20.1% in AUC. Similarly, on the BD2 benchmark dataset, the proposed framework outperforms the existing best-performing predictor with combined improvement of 5.3% in MCC and 1.0% in AUC. NTpred is publicly available for further experimentation and predictive use at: https://sds_genetic_analysis.opendfki.de/PredNTS/.
Collapse
Affiliation(s)
- Sourajyoti Datta
- Department of Computer Science, Rheinland Pfälzische Technische Universität, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rheinland Pfälzische Technische Universität, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany
| |
Collapse
|
41
|
Brar AS, Singh K. A multi-objective stacked regression method for distance based colour measuring device. Sci Rep 2024; 14:5530. [PMID: 38448462 PMCID: PMC10918078 DOI: 10.1038/s41598-024-54785-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 02/16/2024] [Indexed: 03/08/2024] Open
Abstract
Identifying colour from a distance is challenging due to the external noise associated with the measurement process. The present study focuses on developing a colour measuring system and a novel Multi-target Regression (MTR) model for accurate colour measurement from distance. Herein, a novel MTR method, referred as Multi-Objective Stacked Regression (MOSR) is proposed. The core idea behind MOSR is based on stacking as an ensemble approach with multi-objective evolutionary learning using NSGA-II. A multi-objective optimization approach is used for selecting base learners that maximises prediction accuracy while minimising ensemble complexity, which is further compared with six state-of-the-art methods over the colour dataset. Classification and regression tree (CART), Random Forest (RF) and Support Vector Machine (SVM) were used as regressor algorithms. MOSR outperformed all compared methods with the highest coefficient of determination values for all three targets of the colour dataset. Rigorous comparison with state-of-the-art methods over 18 benchmarked datasets showed MOSR outperformed in 15 datasets when CART was used as a regressor algorithm and 11 datasets when RF and SVM were used as regressor algorithms. The MOSR method was statistically superior to compared methods and can be effectively used to measure accurate colour values in the distance-based colour measuring device.
Collapse
Affiliation(s)
- Amrinder Singh Brar
- Department of Computer Science and Engineering, Punjabi University, Patiala, 147002, India.
| | - Kawaljeet Singh
- University Computer Centre, Punjabi University, Patiala, 147002, India
| |
Collapse
|
42
|
M'hamdi O, Takács S, Palotás G, Ilahy R, Helyes L, Pék Z. A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data. PLANTS (BASEL, SWITZERLAND) 2024; 13:746. [PMID: 38475592 DOI: 10.3390/plants13050746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 03/01/2024] [Accepted: 03/04/2024] [Indexed: 03/14/2024]
Abstract
The tomato as a raw material for processing is globally important and is pivotal in dietary and agronomic research due to its nutritional, economic, and health significance. This study explored the potential of machine learning (ML) for predicting tomato quality, utilizing data from 48 cultivars and 28 locations in Hungary over 5 seasons. It focused on °Brix, lycopene content, and colour (a/b ratio) using extreme gradient boosting (XGBoost) and artificial neural network (ANN) models. The results revealed that XGBoost consistently outperformed ANN, achieving high accuracy in predicting °Brix (R² = 0.98, RMSE = 0.07) and lycopene content (R² = 0.87, RMSE = 0.61), and excelling in colour prediction (a/b ratio) with a R² of 0.93 and RMSE of 0.03. ANN lagged behind particularly in colour prediction, showing a negative R² value of -0.35. Shapley additive explanation's (SHAP) summary plot analysis indicated that both models are effective in predicting °Brix and lycopene content in tomatoes, highlighting different aspects of the data. SHAP analysis highlighted the models' efficiency (especially in °Brix and lycopene predictions) and underscored the significant influence of cultivar choice and environmental factors like climate and soil. These findings emphasize the importance of selecting and fine-tuning the appropriate ML model for enhancing precision agriculture, underlining XGBoost's superiority in handling complex agronomic data for quality assessment.
Collapse
Affiliation(s)
- Oussama M'hamdi
- Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
- Doctoral School of Plant Science, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
| | - Sándor Takács
- Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
| | - Gábor Palotás
- Univer Product Zrt, Szolnoki út 35, 6000 Kecskemét, Hungary
| | - Riadh Ilahy
- Laboratory of Horticulture, National Agricultural Research Institute of Tunisia (INRAT), University of Carthage, Ariana 1004, Tunisia
| | - Lajos Helyes
- Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
| | - Zoltán Pék
- Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
| |
Collapse
|
43
|
Wang S, Yam C, Chen S, Hu L, Li L, Hung FF, Fan J, Che CM, Chen G. Predictions of photophysical properties of phosphorescent platinum(II) complexes based on ensemble machine learning approach. J Comput Chem 2024; 45:321-330. [PMID: 37861354 DOI: 10.1002/jcc.27238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 09/23/2023] [Indexed: 10/21/2023]
Abstract
Cyclometalated Pt(II) complexes are popular phosphorescent emitters with color-tunable emissions. To render their practical applications as organic light-emitting diodes emitters, it is required to develop Pt(II) complexes with high radiative decay rate constant and photoluminescence (PL) quantum yield. Here, a general protocol is developed for accurate predictions of emission wavelength, radiative decay rate constant, and PL quantum yield based on the combination of first-principles quantum mechanical method, machine learning, and experimental calibration. A new dataset concerning phosphorescent Pt(II) emitters is constructed, with more than 200 samples collected from the literature. Features containing pertinent electronic properties of the complexes are chosen and ensemble learning models combined with stacking-based approaches exhibit the best performance, where the values of squared correlation coefficients are 0.96, 0.81, and 0.67 for the predictions of emission wavelength, PL quantum yield and radiative decay rate constant, respectively. The accuracy of the protocol is further confirmed using 24 recently reported Pt(II) complexes, which demonstrates its reliability for a broad palette of Pt(II) emitters.
Collapse
Affiliation(s)
- Shuai Wang
- Department of Chemistry, The University of Hong Kong, Hong Kong, China
| | - ChiYung Yam
- Hong Kong Quantum AI Lab Limited, Hong Kong, China
- Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, China
| | - Shuguang Chen
- Department of Chemistry, The University of Hong Kong, Hong Kong, China
- Hong Kong Quantum AI Lab Limited, Hong Kong, China
| | - LiHong Hu
- School of Information Science and Technology, Northeast Normal University, Changchun, China
| | - Liping Li
- Hong Kong Quantum AI Lab Limited, Hong Kong, China
| | - Faan-Fung Hung
- Department of Chemistry, The University of Hong Kong, Hong Kong, China
- Hong Kong Quantum AI Lab Limited, Hong Kong, China
- State Key Laboratory of Synthetic Chemistry, HKU-CAS Joint Laboratory on New Materials, The University of Hong Kong, Hong Kong, China
| | - Jiaqi Fan
- Hong Kong Quantum AI Lab Limited, Hong Kong, China
| | - Chi-Ming Che
- Department of Chemistry, The University of Hong Kong, Hong Kong, China
- Hong Kong Quantum AI Lab Limited, Hong Kong, China
- State Key Laboratory of Synthetic Chemistry, HKU-CAS Joint Laboratory on New Materials, The University of Hong Kong, Hong Kong, China
| | - GuanHua Chen
- Department of Chemistry, The University of Hong Kong, Hong Kong, China
- Hong Kong Quantum AI Lab Limited, Hong Kong, China
| |
Collapse
|
44
|
Tu JB, Liao WJ, Liu WC, Gao XH. Using machine learning techniques to predict the risk of osteoporosis based on nationwide chronic disease data. Sci Rep 2024; 14:5245. [PMID: 38438569 PMCID: PMC10912338 DOI: 10.1038/s41598-024-56114-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 03/01/2024] [Indexed: 03/06/2024] Open
Abstract
Osteoporosis is a major public health concern that significantly increases the risk of fractures. The aim of this study was to develop a Machine Learning based predictive model to screen individuals at high risk of osteoporosis based on chronic disease data, thus facilitating early detection and personalized management. A total of 10,000 complete patient records of primary healthcare data in the German Disease Analyzer database (IMS HEALTH) were included, of which 1293 diagnosed with osteoporosis and 8707 without the condition. The demographic characteristics and chronic disease data, including age, gender, lipid disorder, cancer, COPD, hypertension, heart failure, CHD, diabetes, chronic kidney disease, and stroke were collected from electronic health records. Ten different machine learning algorithms were employed to construct the predictive mode. The performance of the model was further validated and the relative importance of features in the model was analyzed. Out of the ten machine learning algorithms, the Stacker model based on Logistic Regression, AdaBoost Classifier, and Gradient Boosting Classifier demonstrated superior performance. The Stacker model demonstrated excellent performance through ten-fold cross-validation on the training set and ROC curve analysis on the test set. The confusion matrix, lift curve and calibration curves indicated that the Stacker model had optimal clinical utility. Further analysis on feature importance highlighted age, gender, lipid metabolism disorders, cancer, and COPD as the top five influential variables. In this study, a predictive model for osteoporosis based on chronic disease data was developed using machine learning. The model shows great potential in early detection and risk stratification of osteoporosis, ultimately facilitating personalized prevention and management strategies.
Collapse
Affiliation(s)
- Jun-Bo Tu
- Department of Orthopaedics, Xinfeng County People's Hospital, Jiangxi, 341600, Xinfeng, China
| | - Wei-Jie Liao
- Department of ICU, GanZhou People's Hospital, GanZhou, 341000, Jiangxi, China
| | - Wen-Cai Liu
- Department of Orthopaedics, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, 600 Yishan Road, Shanghai, 200233, China.
| | - Xing-Hua Gao
- Department of Orthopaedics, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, 510180, China.
| |
Collapse
|
45
|
Öhlschuster M, Comiskey D, Kavanagh M, Kickinger F, Scaldaferri C, Sigler M, Nilsen P. On the prediction of SAV transmission among Norwegian aquaculture sites. Prev Vet Med 2024; 224:106095. [PMID: 38232517 DOI: 10.1016/j.prevetmed.2023.106095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 11/28/2023] [Accepted: 12/12/2023] [Indexed: 01/19/2024]
Abstract
Pancreas Disease (PD) is a viral disease that affects Atlantic salmon (Salmo salar) in Norwegian, Scottish and Irish aquaculture. It is caused by salmonid alphavirus (SAV) and represents a significant problem in salmonid farming. Infection with SAV leads to reduced growth, mortality, product downgrading, and has a significant financial impact for the farms. The overall aim of this study is to evaluate the effect of various factors on the transmission of SAV and to create a predictive model capable of providing an early warning system for salmon farms within the Norwegian waters. Using a combination of publicly available databases, specifically BarentsWatch, and privately held PCR analyses a feature set consisting of 11 unique features was created based on the input parameters of the databases. An ensemble model was developed based on this feature set using XG-Boost, Ada-Boost, Random Forest and a Multilayer Perceptron. It was possible to successfully predict SAV transmission with 94.4% accuracy. Moreover, it was possible to predict SAV transmission 8 weeks in advance of a 'PD registration' at individual aquaculture salmon farming sites. Important predictors included well boat movement, environmental factors, proximity to sites with a 'PD registration' and seasonality.
Collapse
Affiliation(s)
| | - D Comiskey
- Zoetis, Cherrywood Business Park, Loughlinstown, D18 T3Y1 Dublin, Ireland
| | - M Kavanagh
- Zoetis, Cherrywood Business Park, Loughlinstown, D18 T3Y1 Dublin, Ireland
| | | | | | - M Sigler
- Zoetis, Jutogasse 3, 4675 Weibern, Austria
| | - P Nilsen
- Pharmaq Analytiq, Thormøhlensgate 53D, Bergen 5006, Norway.
| |
Collapse
|
46
|
Ibrahim M, Beneyto A, Contreras I, Vehi J. An ensemble machine learning approach for the detection of unannounced meals to enhance postprandial glucose control. Comput Biol Med 2024; 171:108154. [PMID: 38382387 DOI: 10.1016/j.compbiomed.2024.108154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 02/02/2024] [Accepted: 02/12/2024] [Indexed: 02/23/2024]
Abstract
BACKGROUND Hybrid automated insulin delivery systems enhance postprandial glucose control in type 1 diabetes, however, meal announcements are burdensome. To overcome this, we propose a machine learning-based automated meal detection approach; METHODS:: A heterogeneous ensemble method combining an artificial neural network, random forest, and logistic regression was employed. Trained and tested on data from two in-silico cohorts comprising 20 and 47 patients. It accounted for various meal sizes (moderate to high) and glucose appearance rates (slow and rapid absorbing). To produce an optimal prediction model, three ensemble configurations were used: logical AND, majority voting, and logical OR. In addition to the in-silico data, the proposed meal detector was also trained and tested using the OhioT1DM dataset. Finally, the meal detector is combined with a bolus insulin compensation scheme; RESULTS:: The ensemble majority voting obtained the best meal detector results for both the in-silico and OhioT1DM cohorts with a sensitivity of 77%, 94%, 61%, precision of 96%, 89%, 72%, F1-score of 85%, 91%, 66%, and with false positives per day values of 0.05, 0.19, 0.17, respectively. Automatic meal detection with insulin compensation has been performed in open-loop insulin therapy using the AND ensemble, chosen for its lower false positive rate. Time-in-range has significantly increased 10.48% and 16.03%, time above range was reduced by 5.16% and 11.85%, with a minimal time below range increase of 0.35% and 2.69% for both in-silico cohorts, respectively, compared to the results without a meal detector; CONCLUSION:: To increase the overall accuracy and robustness of the predictions, this ensemble methodology aims to take advantage of each base model's strengths. All of the results point to the potential application of the proposed meal detector as a separate module for the detection of meals in automated insulin delivery systems to achieve improved glycemic control.
Collapse
Affiliation(s)
- Muhammad Ibrahim
- Modeling, Identification and Control Engineering Laboratory (MICELab), Institut d'Informàtica i Aplicacions, Universitat de Girona, Girona, Spain
| | - Aleix Beneyto
- Modeling, Identification and Control Engineering Laboratory (MICELab), Institut d'Informàtica i Aplicacions, Universitat de Girona, Girona, Spain
| | - Ivan Contreras
- Modeling, Identification and Control Engineering Laboratory (MICELab), Institut d'Informàtica i Aplicacions, Universitat de Girona, Girona, Spain
| | - Josep Vehi
- Modeling, Identification and Control Engineering Laboratory (MICELab), Institut d'Informàtica i Aplicacions, Universitat de Girona, Girona, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Madrid, Spain.
| |
Collapse
|
47
|
Chen D, Gu X, Guo H, Cheng T, Yang J, Zhan Y, Fu Q. Spatiotemporally continuous PM 2.5 dataset in the Mekong River Basin from 2015 to 2022 using a stacking model. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 914:169801. [PMID: 38184264 DOI: 10.1016/j.scitotenv.2023.169801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/13/2023] [Accepted: 12/29/2023] [Indexed: 01/08/2024]
Abstract
With the potential to cause millions of deaths, PM2.5 pollution has become a global concern. In Southeast Asia, the Mekong River Basin (MRB) is experiencing heavy PM2.5 pollution and the existing PM2.5 studies in the MRB are limited in terms of accuracy and spatiotemporal coverage. To achieve high-accuracy and long-term PM2.5 monitoring of the MRB, fused aerosol optical depth (AOD) data and multi-source auxiliary data are fed into a stacking model to estimate PM2.5 concentrations. The proposed stacking model takes advantage of convolutional neural network (CNN) and Light Gradient Boosting Machine (LightGBM) models and can well represent the spatiotemporal heterogeneity of the PM2.5-AOD relationship. In the cross-validation (CV), comparison with CNN and LightGBM models shows that the stacking model can better suppress overfitting, with a higher coefficient of determination (R2) of 0.92, a lower root mean square error (RMSE) of 5.58 μg/m3, and a lower mean absolute error (MAE) of 3.44 μg/m3. For the first time, the high-accuracy PM2.5 dataset reveals spatially and temporally continuous PM2.5 pollution and variations in the MRB from 2015 to 2022. Moreover, the spatiotemporal variations of annual and monthly PM2.5 pollution are also investigated at the regional and national scales. The dataset will contribute to the analysis of the causes of PM2.5 pollution and the development of mitigation policies in the MRB.
Collapse
Affiliation(s)
- Debao Chen
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
| | - Xingfa Gu
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China; School of Remote Sensing and Information Engineering, North China Institute of Aerospace Engineering, Langfang, China
| | - Hong Guo
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China.
| | - Tianhai Cheng
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
| | - Jian Yang
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
| | - Yulin Zhan
- National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
| | - Qiming Fu
- School of Remote Sensing and Information Engineering, North China Institute of Aerospace Engineering, Langfang, China
| |
Collapse
|
48
|
Kabir E, Guikema SD, Quiring SM. Power outage prediction using data streams: An adaptive ensemble learning approach with a feature- and performance-based weighting mechanism. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2024; 44:686-704. [PMID: 37666505 DOI: 10.1111/risa.14211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
A wide variety of weather conditions, from windstorms to prolonged heat events, can substantially impact power systems, posing many risks and inconveniences due to power outages. Accurately estimating the probability distribution of the number of customers without power using data about the power utility system and environmental and weather conditions can help utilities restore power more quickly and efficiently. However, the critical shortcoming of current models lies in the difficulties of handling (i) data streams and (ii) model uncertainty due to combining data from various weather events. Accordingly, this article proposes an adaptive ensemble learning algorithm for data streams, which deploys a feature- and performance-based weighting mechanism to adaptively combine outputs from multiple competitive base learners. As a proof of concept, we use a large, real data set of daily customer interruptions to develop the first adaptive all-weather outage prediction model using data streams. We benchmark several approaches to demonstrate the advantage of our approach in offering more accurate probabilistic predictions. The results show that the proposed algorithm reduces the probabilistic predictions' error of the base learners between 4% and 22% with an average of 8%, which also result in substantially more accurate point predictions. The improvement made by our algorithm is enhanced as we exchange base learners with simpler models.
Collapse
Affiliation(s)
- Elnaz Kabir
- Department of Engineering Technology & Industrial Distribution, Texas A&M University, College Station, Texas, USA
| | - Seth D Guikema
- Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Steven M Quiring
- Department of Geography, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
49
|
Kikuchi Y, Kawczynski MG, Anegondi N, Neubert A, Dai J, Ferrara D, Quezada-Ruiz C. Machine Learning to Predict Faricimab Treatment Outcome in Neovascular Age-Related Macular Degeneration. OPHTHALMOLOGY SCIENCE 2024; 4:100385. [PMID: 37868796 PMCID: PMC10585644 DOI: 10.1016/j.xops.2023.100385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 08/07/2023] [Accepted: 08/10/2023] [Indexed: 10/24/2023]
Abstract
Purpose To develop machine learning (ML) models to predict, at baseline, treatment outcomes at month 9 in patients with neovascular age-related macular degeneration (nAMD) receiving faricimab. Design Retrospective proof of concept study. Participants Patients enrolled in the phase II AVENUE trial (NCT02484690) of faricimab in nAMD. Methods Baseline characteristics and spectral domain-OCT (SD-OCT) image data from 185 faricimab-treated eyes were split into 80% training and 20% test sets at the patient level. Input variables were baseline age, sex, best-corrected visual acuity (BCVA), central subfield thickness (CST), low luminance deficit, treatment arm, and SD-OCT images. A regression problem (BCVA) and a binary classification problem (reduction of CST by 35%) were considered. Overall, 10 models were developed and tested for each problem. Benchmark classical ML models (linear, random forest, extreme gradient boosting) were trained on baseline characteristics; benchmark deep neural networks (DNNs) were trained on baseline SD-OCT B-scans. Baseline characteristics and SD-OCT data were merged using 2 approaches: model stacking (using DNN prediction as an input feature for classical ML models) and model averaging (which averaged predictions from the DNN using SD-OCT volume and from classical ML models using baseline characteristics). Main Outcome Measures Treatment outcomes were defined by 2 target variables: functional (BCVA letter score) and anatomical (percent decrease in CST from baseline) outcomes at month 9. Results The best-performing BCVA regression model with respect to the test coefficient of determination (R2) was the linear model in the model-stacking approach with R2 of 0.31. The best-performing CST classification model with respect to test area under receiver operating characteristics (AUROC) was the benchmark linear model with AUROC of 0.87. A post hoc analysis showed the baseline BCVA and the baseline CST had the most effect in the all-model prediction for BCVA regression and CST classification, respectively. Conclusions Promising signals for predicting treatment outcomes from baseline characteristics were detected; however, the predictive benefit of baseline images was unclear in this proof-of-concept study. Further testing and validation with larger, independent datasets is required to fully explore the predictive capacity of ML models using baseline imaging data. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Yusuke Kikuchi
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
- Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California
| | - Michael G. Kawczynski
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
| | - Neha Anegondi
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
- Clinical Imaging Group, Genentech, Inc., South San Francisco, California
| | - Ales Neubert
- Data & Analytics, Roche Pharma Research and Early Development, Basel, Switzerland
| | - Jian Dai
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
| | - Daniela Ferrara
- Roche Personalized Healthcare Program, Genentech, Inc., South San Francisco, California
| | - Carlos Quezada-Ruiz
- Clinical Science, Genentech, Inc., South San Francisco, California
- Department of Ophthalmology, Clínica de Ojos Garza Viejo, San Pedro Garza, Garcia, Nuevo Leon, Mexico
| |
Collapse
|
50
|
Song A, Lusk JB, Roh KM, Hsu ST, Valikodath NG, Lad EM, Muir KW, Engelhard MM, Limkakeng AT, Izatt JA, McNabb RP, Kuo AN. RobOCTNet: Robotics and Deep Learning for Referable Posterior Segment Pathology Detection in an Emergency Department Population. Transl Vis Sci Technol 2024; 13:12. [PMID: 38488431 PMCID: PMC10946693 DOI: 10.1167/tvst.13.3.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/31/2024] [Indexed: 03/19/2024] Open
Abstract
Purpose To evaluate the diagnostic performance of a robotically aligned optical coherence tomography (RAOCT) system coupled with a deep learning model in detecting referable posterior segment pathology in OCT images of emergency department patients. Methods A deep learning model, RobOCTNet, was trained and internally tested to classify OCT images as referable versus non-referable for ophthalmology consultation. For external testing, emergency department patients with signs or symptoms warranting evaluation of the posterior segment were imaged with RAOCT. RobOCTNet was used to classify the images. Model performance was evaluated against a reference standard based on clinical diagnosis and retina specialist OCT review. Results We included 90,250 OCT images for training and 1489 images for internal testing. RobOCTNet achieved an area under the curve (AUC) of 1.00 (95% confidence interval [CI], 0.99-1.00) for detection of referable posterior segment pathology in the internal test set. For external testing, RAOCT was used to image 72 eyes of 38 emergency department patients. In this set, RobOCTNet had an AUC of 0.91 (95% CI, 0.82-0.97), a sensitivity of 95% (95% CI, 87%-100%), and a specificity of 76% (95% CI, 62%-91%). The model's performance was comparable to two human experts' performance. Conclusions A robotically aligned OCT coupled with a deep learning model demonstrated high diagnostic performance in detecting referable posterior segment pathology in a cohort of emergency department patients. Translational Relevance Robotically aligned OCT coupled with a deep learning model may have the potential to improve emergency department patient triage for ophthalmology referral.
Collapse
Affiliation(s)
- Ailin Song
- Duke University School of Medicine, Durham, NC, USA
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - Jay B. Lusk
- Duke University School of Medicine, Durham, NC, USA
| | - Kyung-Min Roh
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - S. Tammy Hsu
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | | | - Eleonora M. Lad
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - Kelly W. Muir
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - Matthew M. Engelhard
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | | | - Joseph A. Izatt
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Ryan P. McNabb
- Department of Ophthalmology, Duke University, Durham, NC, USA
| | - Anthony N. Kuo
- Department of Ophthalmology, Duke University, Durham, NC, USA
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| |
Collapse
|