Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Wolpert DH. Stacked generalization. Neural Netw 1992. [DOI: 10.1016/s0893-6080(05)80023-1] [Citation(s) in RCA: 1732] [Impact Index Per Article: 54.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Number

Cited by Other Article(s)

Lanjewar MG, Panchbhai KG, Patle LB. Sugar detection in adulterated honey using hyper-spectral imaging with stacking generalization method. Food Chem 2024;450:139322. [PMID: 38613963 DOI: 10.1016/j.foodchem.2024.139322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 03/26/2024] [Accepted: 04/08/2024] [Indexed: 04/15/2024]

Huang H, Fang Z, Xu Y, Lu G, Feng C, Zeng M, Tian J, Ping Y, Han Z, Zhao Z. Stacking and ridge regression-based spectral ensemble preprocessing method and its application in near-infrared spectral analysis. Talanta 2024;276:126242. [PMID: 38761656 DOI: 10.1016/j.talanta.2024.126242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/20/2024]

Abstract

Spectral preprocessing techniques can, to a certain extent, eliminate irrelevant information, such as current noise and stray light from spectral data, thereby enhancing the performance of prediction models. However, current preprocessing techniques mostly attempt to find the best single preprocessing method or their combination, overlooking the complementary information among different preprocessing methods. These preprocessing techniques fail to maximize the utilization of useful information in spectral data and restrict the performance of prediction models. This study proposed a spectral ensemble preprocessing method based on the rapidly developing ensemble learning methods in recent years and the ridge regression (RR) model, named stacking preprocessing ridge regression (SPRR), to address the aforementioned issues. Different from conventional ensemble learning methods, the proposed SPRR method applied multiple different preprocessing techniques to the original spectral data, generating multiple preprocessed datasets. These datasets were then individually inputted into RR base models for training. Ultimately, RR still served as the meta-model, integrating the output results of each RR base model through stacking. This approach not only produced diversity in base models but also achieved higher accuracy and lower computational complexity by using a single type of base model. On the apple spectral dataset collected by our team, correlation analysis showed significant complementary information among the data produced by different preprocessing techniques. This provided robust theoretical support for the proposed SPRR method. By introducing the currently popular averaging ensemble preprocessing method in a comparative experiment, the results of applying the proposed SPRR method to six datasets (apple, meat, wheat, olive oil, tablet, and corn) demonstrated that compared to the single preprocessing method and averaging ensemble preprocessing method, SPRR yielded the best accuracy and reliability for all six datasets. Furthermore, under the same conditions of the training and test datasets, the proposed SPRR method demonstrated better performance than the four commonly used ensemble preprocessing methods.

Collapse

He H, Yang H, Mercaldo F, Santone A, Huang P. Isolation forest-voting fusion-multioutput: A stroke risk classification method based on the multidimensional output of abnormal sample detection. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;253:108255. [PMID: 38833760 DOI: 10.1016/j.cmpb.2024.108255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 12/23/2023] [Accepted: 05/26/2024] [Indexed: 06/06/2024]

Abstract

BACKGROUND AND OBJECTIVE

Stroke has become a major disease threatening the health of people around the world. It has the characteristics of high incidence, high fatality, and a high recurrence rate. At this stage, problems such as poor recognition accuracy of stroke screening based on electronic medical records and insufficient recognition of stroke risk levels exist. These problems occur because of the systematic errors of medical equipment and the characteristics of the collectors during the process of electronic medical record collection. Errors can also occur due to misreporting or underreporting by the collection personnel and the strong subjectivity of the evaluation indicators.

METHODS

This paper proposes an isolation forest-voting fusion-multioutput algorithm model. First, the screening data are collected for numerical processing and normalization. The composite feature score index of this paper is used to analyze the importance of risk factors, and then, the isolation forest is used. The algorithm detects abnormal samples, uses the voting fusion algorithm proposed in this article to perform decision fusion prediction classification, and outputs multidimensional (risk factor importance score, abnormal sample label, risk level classification, and stroke prediction) results that can be used as auxiliary decision information by doctors and medical staff.

RESULTS

The isolation forest-voting fusion-multioutput algorithm proposed in this article has five categories (zero risk, low risk, high risk, ischemic stroke (TIA), and hemorrhagic stroke (HE)). The average accuracy rate of stroke prediction reached 79.59 %.

CONCLUSIONS

The isolation forest-voting fusion-multioutput algorithm model proposed in this paper can not only accurately identify the various categories of stroke risk levels and stroke prediction but can also output multidimensional auxiliary decision-making information to help medical staff make decisions, thereby greatly improving the screening efficiency.

Collapse

Akbar MN, Ruf SF, Singh A, Faghihpirayesh R, Garner R, Bennett A, Alba C, Rocca ML, Imbiriba T, Erdoğmuş D, Duncan D. Advancing post-traumatic seizure classification and biomarker identification: Information decomposition based multimodal fusion and explainable machine learning with missing neuroimaging data. Comput Med Imaging Graph 2024;115:102386. [PMID: 38718562 DOI: 10.1016/j.compmedimag.2024.102386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 04/16/2024] [Accepted: 04/16/2024] [Indexed: 06/03/2024]

Abstract

A late post-traumatic seizure (LPTS), a consequence of traumatic brain injury (TBI), can potentially evolve into a lifelong condition known as post-traumatic epilepsy (PTE). Presently, the mechanism that triggers epileptogenesis in TBI patients remains elusive, inspiring the epilepsy community to devise ways to predict which TBI patients will develop PTE and to identify potential biomarkers. In response to this need, our study collected comprehensive, longitudinal multimodal data from 48 TBI patients across multiple participating institutions. A supervised binary classification task was created, contrasting data from LPTS patients with those without LPTS. To accommodate missing modalities in some subjects, we took a two-pronged approach. Firstly, we extended a graphical model-based Bayesian estimator to directly classify subjects with incomplete modality. Secondly, we explored conventional imputation techniques. The imputed multimodal information was then combined, following several fusion and dimensionality reduction techniques found in the literature, and subsequently fitted to a kernel- or a tree-based classifier. For this fusion, we proposed two new algorithms: recursive elimination of correlated components (RECC) that filters information based on the correlation between the already selected features, and information decomposition and selective fusion (IDSF), which effectively recombines information from decomposed multimodal features. Our cross-validation findings showed that the proposed IDSF algorithm delivers superior performance based on the area under the curve (AUC) score. Ultimately, after rigorous statistical comparisons and interpretable machine learning examination using Shapley values of the most frequently selected features, we recommend the two following magnetic resonance imaging (MRI) abnormalities as potential biomarkers: the left anterior limb of internal capsule in diffusion MRI (dMRI), and the right middle temporal gyrus in functional MRI (fMRI).

Collapse

Affiliation(s)

Md Navid Akbar Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America.
Sebastian F Ruf Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
Ashutosh Singh Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
Razieh Faghihpirayesh Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
Rachael Garner Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, 2025 Zonal Ave. 210, Los Angeles, CA 90033, United States of America
Alexis Bennett Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, 2025 Zonal Ave. 210, Los Angeles, CA 90033, United States of America
Celina Alba Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, 2025 Zonal Ave. 210, Los Angeles, CA 90033, United States of America
Marianna La Rocca Dipartimento Interateneo di Fisica "M. Merlin", Università degli studi di Bari "A. Moro", Bari, Italy
Tales Imbiriba Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
Deniz Erdoğmuş Cognitive Systems Lab, Dept. of Electrical and Computer Engineering, College of Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, United States of America
Dominique Duncan Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, 2025 Zonal Ave. 210, Los Angeles, CA 90033, United States of America

Collapse

Miao J, Chen T, Misir M, Lin Y. Deep learning for predicting 16S rRNA gene copy number. Sci Rep 2024;14:14282. [PMID: 38902329 PMCID: PMC11190246 DOI: 10.1038/s41598-024-64658-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/11/2024] [Indexed: 06/22/2024] Open

Ameksa M, Elamrani Abou Elassad Z, Lamjadli S, Mousannif H. Predicting stroke events with a proactive fusion system: a comprehensive study on imbalance class handling in computational biomechanics. Comput Methods Biomech Biomed Engin 2024:1-18. [PMID: 38902976 DOI: 10.1080/10255842.2024.2363946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 05/28/2024] [Indexed: 06/22/2024]

Srisongkram T. DeepRA: A novel deep learning-read-across framework and its application in non-sugar sweeteners mutagenicity prediction. Comput Biol Med 2024;178:108731. [PMID: 38870727 DOI: 10.1016/j.compbiomed.2024.108731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/07/2024] [Accepted: 06/08/2024] [Indexed: 06/15/2024]

Fereidooni D, Karimi Z, Ghasemi F. Non-destructive test-based assessment of uniaxial compressive strength and elasticity modulus of intact carbonate rocks using stacking ensemble models. PLoS One 2024;19:e0302944. [PMID: 38857272 PMCID: PMC11164374 DOI: 10.1371/journal.pone.0302944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/14/2024] [Indexed: 06/12/2024] Open

Abstract

The uniaxial compressive strength (UCS) and elasticity modulus (E) of intact rock are two fundamental requirements in engineering applications. These parameters can be measured either directly from the uniaxial compressive strength test or indirectly by using soft computing predictive models. In the present research, the UCS and E of intact carbonate rocks have been predicted by introducing two stacking ensemble learning models from non-destructive simple laboratory test results. For this purpose, dry unit weight, porosity, P-wave velocity, Brinell surface harnesses, UCS, and static E were measured for 70 carbonate rock samples. Then, two stacking ensemble learning models were developed for estimating the UCS and E of the rocks. The applied stacking ensemble learning method integrates the advantages of two base models in the first level, where base models are multi-layer perceptron (MLP) and random forest (RF) for predicting UCS, and support vector regressor (SVR) and extreme gradient boosting (XGBoost) for predicting E. Grid search integrating k-fold cross validation is applied to tune the parameters of both base models and meta-learner. The results demonstrate the generalization ability of the stacking ensemble method in the comparison of base models in the terms of common performance measures. The values of coefficient of determination (R2) obtained from the stacking ensemble are 0.909 and 0.831 for predicting UCS and E, respectively. Similarly, the stacking ensemble yielded Root Mean Squared Error (RMSE) values of 1.967 and 0.621 for the prediction of UCS and E, respectively. Accordingly, the proposed models have superiority in the comparison of SVR and MLP as single models and RF and XGBoost as two representative ensemble models. Furthermore, sensitivity analysis is carried out to investigate the impact of input parameters.

Collapse

Razlivina J, Dmitrenko A, Vinogradov V. AI-Powered Knowledge Base Enables Transparent Prediction of Nanozyme Multiple Catalytic Activity. J Phys Chem Lett 2024;15:5804-5813. [PMID: 38781458 DOI: 10.1021/acs.jpclett.4c00959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]

James Jensen A, Silva CS, Costello KE, Banks S. A novel post-processing technique for correcting symmetric implant ambiguity in measuring total knee arthroplasty kinematics from single-plane fluoroscopy. J Biomech 2024;170:112172. [PMID: 38833908 DOI: 10.1016/j.jbiomech.2024.112172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 05/17/2024] [Accepted: 05/23/2024] [Indexed: 06/06/2024]

Zhang J, Jin A, Han X, Chen Z, Diao C, Zhang Y, Liu X, Xu F, Liu J, Qiu X, Tan X, Luo L, Liu Y. The LISA-PPV Formula: An Ensemble Artificial Intelligence-Based Thick Intraocular Lens Calculation Formula for Vitrectomized Eyes. Am J Ophthalmol 2024;262:237-245. [PMID: 38452920 DOI: 10.1016/j.ajo.2024.02.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/20/2024] [Accepted: 02/27/2024] [Indexed: 03/09/2024]

Affiliation(s)

Jiaqing Zhang From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China
Aixia Jin From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China
Xiaotong Han From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China
Zhixin Chen Shenzhen Key Laboratory of Ophthalmology, Shenzhen Eye Hospital, Affiliated Hospital of Jinan University (Z.C.), Shenzhen, China
Chunli Diao Department Of Ophthalmology, The People's Hospital of Guangxi Zhuang Autonomous Region and Institute of Ophthalmic Diseases, Guangxi Academy Of Medical Sciences (C.D.), Nanning, China; Department of Ophthalmology, The First Affiliated Hospital of Guangxi University of Chinese Medicine (C.D.), Nanning, China
Yu Zhang From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Cataract Department, Shanxi Eye Hospital (Y.Z., J.L.), Taiyuan, China
Xinhua Liu Shenzhen Key Laboratory of Ophthalmology, Shenzhen Eye Hospital, Affiliated Hospital of Jinan University (Z.C.), Shenzhen, China
Fan Xu Department Of Ophthalmology, The People's Hospital of Guangxi Zhuang Autonomous Region and Institute of Ophthalmic Diseases, Guangxi Academy Of Medical Sciences (C.D.), Nanning, China
Jiewei Liu Cataract Department, Shanxi Eye Hospital (Y.Z., J.L.), Taiyuan, China
Xiaozhang Qiu From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China
Xuhua Tan From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China.
Lixia Luo From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China.
Yizhi Liu From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China; Guangdong Provincial Clinical Research Center for Ocular Diseases (J.Z., A.J., X.H., Y.Z., X.Q., X.T., L.L., Y.L.), Guangzhou, China

Collapse

Matougui Z, Djerbal L, Bahar R. A comparative study of heterogeneous and homogeneous ensemble approaches for landslide susceptibility assessment in the Djebahia region, Algeria. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024;31:40554-40580. [PMID: 36892699 DOI: 10.1007/s11356-023-26247-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 02/27/2023] [Indexed: 06/18/2023]

Abstract

This study aims to compare the performance of ensembles according to their inherent diversity in the context of landslide susceptibility assessment. Heterogeneous and homogeneous ensemble types can be distinguished; four ensembles of each approach were implemented in the Djebahia region. The heterogeneous ensembles include stacking (ST), voting (VO), weighting (WE), and a new approach in landslide assessment called meta-dynamic ensemble selection (DES), while the homogeneous ensembles include AdaBoost (ADA), bagging (BG), random forest (RF), and random subspace (RSS). To ensure a consistent comparison, each ensemble was implemented using individual base learners. The heterogeneous ensembles were generated by combining eight different machine learning algorithms, while the homogeneous ensembles only used a single base learner, with diversity achieved through resampling the training dataset. The spatial dataset used in this study consisted of 115 landslide events and 12 conditioning factors, which were randomly divided into training and testing datasets. The models were evaluated through various aspects, including receiver operating characteristic (ROC) curves, root mean squared error (RMSE), landslide density distribution (LDD), threshold-dependent metrics (Kappa index, accuracy, and recall scores), and a global visual representation using the Taylor diagram. Additionally, a sensitivity analysis (SA) was conducted for the best performing models to assess the importance of the factors and the resilience of the ensembles. The results revealed that homogeneous ensembles outperformed heterogeneous ensembles in terms of AUC and threshold-dependent metrics, with AUC ranging from 0.962 to 0.971 for the test dataset. ADA was the best performing model for these metrics and the least in terms of RMSE (0.366). However, the heterogeneous ensemble ST provided a finer RMSE (0.272), and DES showed the best LDD, indicating a stronger potential to generalize the phenomenon. The Taylor diagram was consistent with the other results, indicating that ST was the best performing model, followed by RSS. The SA demonstrated that RSS was the most robust (mean AUC variation of - 0.022) and ADA was the least robust (mean AUC variation of - 0.038).

Collapse

Zarbakhsh S, Shahsavar AR, Soltani M. Optimizing PGRs for in vitro shoot proliferation of pomegranate with bayesian-tuned ensemble stacking regression and NSGA-II: a comparative evaluation of machine learning models. PLANT METHODS 2024;20:82. [PMID: 38822411 PMCID: PMC11143642 DOI: 10.1186/s13007-024-01211-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/17/2024] [Indexed: 06/03/2024]

Abstract

BACKGROUND

The process of optimizing in vitro shoot proliferation is a complicated task, as it is influenced by interactions of many factors as well as genotype. This study investigated the role of various concentrations of plant growth regulators (zeatin and gibberellic acid) in the successful in vitro shoot proliferation of three Punica granatum cultivars ('Faroogh', 'Atabaki' and 'Shirineshahvar'). Also, the utility of five Machine Learning (ML) algorithms-Support Vector Regression (SVR), Random Forest (RF), Extreme Gradient Boosting (XGB), Ensemble Stacking Regression (ESR) and Elastic Net Multivariate Linear Regression (ENMLR)-as modeling tools were evaluated on in vitro multiplication of pomegranate. A new automatic hyperparameter optimization method named Adaptive Tree Pazen Estimator (ATPE) was developed to tune the hyperparameters. The performance of the models was evaluated and compared using statistical indicators (MAE, RMSE, RRMSE, MAPE, R and R2), while a specific Global Performance Indicator (GPI) was introduced to rank the models based on a single parameter. Moreover, Non‑dominated Sorting Genetic Algorithm‑II (NSGA‑II) was employed to optimize the selected prediction model.

RESULTS

The results demonstrated that the ESR algorithm exhibited higher predictive accuracy in comparison to other ML algorithms. The ESR model was subsequently introduced for optimization by NSGA‑II. ESR-NSGA‑II revealed that the highest proliferation rate (3.47, 3.84, and 3.22), shoot length (2.74, 3.32, and 1.86 cm), leave number (18.18, 19.76, and 18.77), and explant survival (84.21%, 85.49%, and 56.39%) could be achieved with a medium containing 0.750, 0.654, and 0.705 mg/L zeatin, and 0.50, 0.329, and 0.347 mg/L gibberellic acid in the 'Atabaki', 'Faroogh', and 'Shirineshahvar' cultivars, respectively.

CONCLUSIONS

This study demonstrates that the 'Shirineshahvar' cultivar exhibited lower shoot proliferation success compared to the other cultivars. The results indicated the good performance of ESR-NSGA-II in modeling and optimizing in vitro propagation. ESR-NSGA-II can be applied as an up-to-date and reliable computational tool for future studies in plant in vitro culture.

Collapse

Hosseinzadeh M, Hussain D, Zeki Mahmood FM, A. Alenizi F, Varzeghani AN, Asghari P, Darwesh A, Malik MH, Lee SW. A model for skin cancer using combination of ensemble learning and deep learning. PLoS One 2024;19:e0301275. [PMID: 38820401 PMCID: PMC11142560 DOI: 10.1371/journal.pone.0301275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 03/13/2024] [Indexed: 06/02/2024] Open

Kushwaha NL, Kudnar NS, Vishwakarma DK, Subeesh A, Jatav MS, Gaddikeri V, Ahmed AA, Abdelaty I. Stacked hybridization to enhance the performance of artificial neural networks (ANN) for prediction of water quality index in the Bagh river basin, India. Heliyon 2024;10:e31085. [PMID: 38784559 PMCID: PMC11112320 DOI: 10.1016/j.heliyon.2024.e31085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/03/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open

Pratiwi NKC, Tayara H, Chong KT. An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction. Int J Mol Sci 2024;25:5957. [PMID: 38892144 PMCID: PMC11172808 DOI: 10.3390/ijms25115957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/27/2024] [Accepted: 05/27/2024] [Indexed: 06/21/2024] Open

Dayimu A, Simidjievski N, Demiris N, Abraham J. Sample size determination for prediction models via learning-type curves. Stat Med 2024. [PMID: 38803150 DOI: 10.1002/sim.10121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 05/07/2024] [Accepted: 05/10/2024] [Indexed: 05/29/2024]

Chen R, Yan Q, Tuoheti T, Xu L, Gao Q, Zhang Y, Ren H, Zheng L, Wang F, Liu Y. A prediction model of rubber content in the dried root of Taraxacum kok-saghyz Rodin based on near-infrared spectroscopy. PLANT METHODS 2024;20:77. [PMID: 38797847 PMCID: PMC11128126 DOI: 10.1186/s13007-024-01183-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 04/12/2024] [Indexed: 05/29/2024]

Abstract

BACKGROUND

Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified.

RESULT

The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS-FD CASR-RF and MWS-FD chemical characteristic band RF, respectively. Their respective R P 2 , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively.

CONCLUSIONS

This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.

Collapse

Affiliation(s)

Runfeng Chen Agricultural College, Xinjiang Agricultural University, Urumqi, 830052, People's Republic of China Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China
Qingqing Yan Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China
Tuhanguli Tuoheti Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China
Lin Xu Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China. National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China.
Qiang Gao Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China. National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China.
Yan Zhang Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China National Central Asian Characteristic Crop Germplasm Resources Medium-Term Gene Bank (Urumqi), Urumqi, 830091, People's Republic of China
Hailong Ren Crops Research Institute, Guangdong Academy of Agricultural Sciences, Guangdong Provincial Key Laboratory of Crop Genetic Improvement, Guangzhou, 510308, People's Republic of China
Lipeng Zheng Agricultural College, Xinjiang Agricultural University, Urumqi, 830052, People's Republic of China Institute of Crop Germplasm Resource, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, People's Republic of China
Feng Wang Beijing Linglong Tyre Company Limited, Beijing, 101102, People's Republic of China
Ya Liu Comprehensive Testing Ground, Xinjiang Academy of Agricultural Sciences, Urumqi, 830052, People's Republic of China

Collapse

Kim SB, Kang JH, Cheon M, Kim DJ, Lee BC. Stacked neural network for predicting polygenic risk score. Sci Rep 2024;14:11632. [PMID: 38773257 PMCID: PMC11109142 DOI: 10.1038/s41598-024-62513-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 05/17/2024] [Indexed: 05/23/2024] Open

Reinen JM, Polosecki P, Castro E, Corcoran CM, Cecchi GA, Colibazzi T. Multimodal fusion of brain signals for robust prediction of psychosis transition. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2024;10:54. [PMID: 38773120 PMCID: PMC11109212 DOI: 10.1038/s41537-024-00464-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/15/2024] [Indexed: 05/23/2024]

Chao X, Kai Z, Wu H, Wang J, Chen X, Su H, Shang X, Lin R, Huang L, He H, Lang J, Li L. Fragmentomics features of ovarian cancer. Int J Cancer 2024. [PMID: 38769763 DOI: 10.1002/ijc.34981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/14/2024] [Accepted: 04/02/2024] [Indexed: 05/22/2024]

Affiliation(s)

Xiaopei Chao Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
Zhentian Kai Department of Bioinformatics, Zhejiang Shaoxing Topgen Biomedical Technology CO., LTD, Shanghai, China
Huanwen Wu Department of Pathology, Peking Union Medical College Hospital, Beijing, China
Jing Wang Department of Pathology, Peking Union Medical College Hospital, Beijing, China
Xiaojing Chen Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
Haiqi Su Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
Xiao Shang Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
Ruijue Lin Department of Technology, Zhejiang Topgen Clinical Laboratory Co., LTD., Huzhou, China
Lisha Huang Department of Bioinformatics, Zhejiang Shaoxing Topgen Biomedical Technology CO., LTD, Shanghai, China
Hongsheng He Department of Bioinformatics, Zhejiang Shaoxing Topgen Biomedical Technology CO., LTD, Shanghai, China
Jinghe Lang Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China
Lei Li Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Beijing, China Department of Gynecologic Oncology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Beijing, China State Key Laboratory for Complex, Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, China

Collapse

Lee HJ, Schwamm LH, Sansing LH, Kamel H, de Havenon A, Turner AC, Sheth KN, Krishnaswamy S, Brandt C, Zhao H, Krumholz H, Sharma R. StrokeClassifier: ischemic stroke etiology classification by ensemble consensus modeling using electronic health records. NPJ Digit Med 2024;7:130. [PMID: 38760474 PMCID: PMC11101464 DOI: 10.1038/s41746-024-01120-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 04/23/2024] [Indexed: 05/19/2024] Open

Abstract

Determining acute ischemic stroke (AIS) etiology is fundamental to secondary stroke prevention efforts but can be diagnostically challenging. We trained and validated an automated classification tool, StrokeClassifier, using electronic health record (EHR) text from 2039 non-cryptogenic AIS patients at 2 academic hospitals to predict the 4-level outcome of stroke etiology adjudicated by agreement of at least 2 board-certified vascular neurologists' review of the EHR. StrokeClassifier is an ensemble consensus meta-model of 9 machine learning classifiers applied to features extracted from discharge summary texts by natural language processing. StrokeClassifier was externally validated in 406 discharge summaries from the MIMIC-III dataset reviewed by a vascular neurologist to ascertain stroke etiology. Compared with vascular neurologists' diagnoses, StrokeClassifier achieved the mean cross-validated accuracy of 0.74 and weighted F1 of 0.74 for multi-class classification. In MIMIC-III, its accuracy and weighted F1 were 0.70 and 0.71, respectively. In binary classification, the two metrics ranged from 0.77 to 0.96. The top 5 features contributing to stroke etiology prediction were atrial fibrillation, age, middle cerebral artery occlusion, internal carotid artery occlusion, and frontal stroke location. We designed a certainty heuristic to grade the confidence of StrokeClassifier's diagnosis as non-cryptogenic by the degree of consensus among the 9 classifiers and applied it to 788 cryptogenic patients, reducing cryptogenic diagnoses from 25.2% to 7.2%. StrokeClassifier is a validated artificial intelligence tool that rivals the performance of vascular neurologists in classifying ischemic stroke etiology. With further training, StrokeClassifier may have downstream applications including its use as a clinical decision support system.

Collapse

Ebrahimi A, Henriksen MBH, Brasen CL, Hilberg O, Hansen TF, Jensen LH, Peimankar A, Wiil UK. Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study. BMC Med Res Methodol 2024;24:114. [PMID: 38760718 PMCID: PMC11100078 DOI: 10.1186/s12874-024-02231-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 04/23/2024] [Indexed: 05/19/2024] Open

Abstract

BACKGROUND

Smoking is a critical risk factor responsible for over eight million annual deaths worldwide. It is essential to obtain information on smoking habits to advance research and implement preventive measures such as screening of high-risk individuals. In most countries, including Denmark, smoking habits are not systematically recorded and at best documented within unstructured free-text segments of electronic health records (EHRs). This would require researchers and clinicians to manually navigate through extensive amounts of unstructured data, which is one of the main reasons that smoking habits are rarely integrated into larger studies. Our aim is to develop machine learning models to classify patients' smoking status from their EHRs.

METHODS

This study proposes an efficient natural language processing (NLP) pipeline capable of classifying patients' smoking status and providing explanations for the decisions. The proposed NLP pipeline comprises four distinct components, which are; (1) considering preprocessing techniques to address abbreviations, punctuation, and other textual irregularities, (2) four cutting-edge feature extraction techniques, i.e. Embedding, BERT, Word2Vec, and Count Vectorizer, employed to extract the optimal features, (3) utilization of a Stacking-based Ensemble (SE) model and a Convolutional Long Short-Term Memory Neural Network (CNN-LSTM) for the identification of smoking status, and (4) application of a local interpretable model-agnostic explanation to explain the decisions rendered by the detection models. The EHRs of 23,132 patients with suspected lung cancer were collected from the Region of Southern Denmark during the period 1/1/2009-31/12/2018. A medical professional annotated the data into 'Smoker' and 'Non-Smoker' with further classifications as 'Active-Smoker', 'Former-Smoker', and 'Never-Smoker'. Subsequently, the annotated dataset was used for the development of binary and multiclass classification models. An extensive comparison was conducted of the detection performance across various model architectures.

RESULTS

The results of experimental validation confirm the consistency among the models. However, for binary classification, BERT method with CNN-LSTM architecture outperformed other models by achieving precision, recall, and F1-scores between 97% and 99% for both Never-Smokers and Active-Smokers. In multiclass classification, the Embedding technique with CNN-LSTM architecture yielded the most favorable results in class-specific evaluations, with equal performance measures of 97% for Never-Smoker and measures in the range of 86 to 89% for Active-Smoker and 91-92% for Never-Smoker.

CONCLUSION

Our proposed NLP pipeline achieved a high level of classification performance. In addition, we presented the explanation of the decision made by the best performing detection model. Future work will expand the model's capabilities to analyze longer notes and a broader range of categories to maximize its utility in further research and screening applications.

Collapse

Li X, Jones P, Zhao M. Identifying potential (re)hemorrhage among sporadic cerebral cavernous malformations using machine learning. Sci Rep 2024;14:11022. [PMID: 38745042 PMCID: PMC11094099 DOI: 10.1038/s41598-024-61851-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 05/10/2024] [Indexed: 05/16/2024] Open

Parrish RL, Buchman AS, Tasaki S, Wang Y, Avey D, Xu J, De Jager PL, Bennett DA, Epstein MP, Yang J. SR-TWAS: Leveraging Multiple Reference Panels to Improve TWAS Power by Ensemble Machine Learning. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.06.20.23291605. [PMID: 37425698 PMCID: PMC10327185 DOI: 10.1101/2023.06.20.23291605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]

Zhang X, Chen S, Zhang P, Wang C, Wang Q, Zhou X. Staging of Liver Fibrosis Based on Energy Valley Optimization Multiple Stacking (EVO-MS) Model. Bioengineering (Basel) 2024;11:485. [PMID: 38790352 PMCID: PMC11117710 DOI: 10.3390/bioengineering11050485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/26/2024] Open

Shen J, Wang S, Sun H, Huang J, Bai L, Wang X, Dong Y, Tang Z. A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data. BMC Med Res Methodol 2024;24:105. [PMID: 38702624 PMCID: PMC11067084 DOI: 10.1186/s12874-024-02232-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 04/23/2024] [Indexed: 05/06/2024] Open

Abstract

BACKGROUND

Survival prediction using high-dimensional molecular data is a hot topic in the field of genomics and precision medicine, especially for cancer studies. Considering that carcinogenesis has a pathway-based pathogenesis, developing models using such group structures is a closer mimic of disease progression and prognosis. Many approaches can be used to integrate group information; however, most of them are single-model methods, which may account for unstable prediction.

METHODS

We introduced a novel survival stacking method that modeled using group structure information to improve the robustness of cancer survival prediction in the context of high-dimensional omics data. With a super learner, survival stacking combines the prediction from multiple sub-models that are independently trained using the features in pre-grouped biological pathways. In addition to a non-negative linear combination of sub-models, we extended the super learner to non-negative Bayesian hierarchical generalized linear model and artificial neural network. We compared the proposed modeling strategy with the widely used survival penalized method Lasso Cox and several group penalized methods, e.g., group Lasso Cox, via simulation study and real-world data application.

RESULTS

The proposed survival stacking method showed superior and robust performance in terms of discrimination compared with single-model methods in case of high-noise simulated data and real-world data. The non-negative Bayesian stacking method can identify important biological signal pathways and genes that are associated with the prognosis of cancer.

CONCLUSIONS

This study proposed a novel survival stacking strategy incorporating biological group information into the cancer prognosis models. Additionally, this study extended the super learner to non-negative Bayesian model and ANN, enriching the combination of sub-models. The proposed Bayesian stacking strategy exhibited favorable properties in the prediction and interpretation of complex survival data, which may aid in discovering cancer targets.

Collapse

Affiliation(s)

Junjie Shen Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
Shuo Wang Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, 79085, Freiburg, Germany
Hao Sun Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
Jie Huang Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
Lu Bai Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
Xichao Wang Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
Yongfei Dong Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
Zaixiang Tang Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China.

Collapse

Pratyush P, Bahmani S, Pokharel S, Ismail HD, KC DB. LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model. Bioinformatics 2024;40:btae290. [PMID: 38662579 PMCID: PMC11088740 DOI: 10.1093/bioinformatics/btae290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/13/2024] [Accepted: 04/24/2024] [Indexed: 05/13/2024] Open

Abstract

MOTIVATION

Recent advancements in natural language processing have highlighted the effectiveness of global contextualized representations from protein language models (pLMs) in numerous downstream tasks. Nonetheless, strategies to encode the site-of-interest leveraging pLMs for per-residue prediction tasks, such as crotonylation (Kcr) prediction, remain largely uncharted.

RESULTS

Herein, we adopt a range of approaches for utilizing pLMs by experimenting with different input sequence types (full-length protein sequence versus window sequence), assessing the implications of utilizing per-residue embedding of the site-of-interest as well as embeddings of window residues centered around it. Building upon these insights, we developed a novel residual ConvBiLSTM network designed to process window-level embeddings of the site-of-interest generated by the ProtT5-XL-UniRef50 pLM using full-length sequences as input. This model, termed T5ResConvBiLSTM, surpasses existing state-of-the-art Kcr predictors in performance across three diverse datasets. To validate our approach of utilizing full sequence-based window-level embeddings, we also delved into the interpretability of ProtT5-derived embedding tensors in two ways: firstly, by scrutinizing the attention weights obtained from the transformer's encoder block; and secondly, by computing SHAP values for these tensors, providing a model-agnostic interpretation of the prediction results. Additionally, we enhance the latent representation of ProtT5 by incorporating two additional local representations, one derived from amino acid properties and the other from supervised embedding layer, through an intermediate fusion stacked generalization approach, using an n-mer window sequence (or, peptide/fragment). The resultant stacked model, dubbed LMCrot, exhibits a more pronounced improvement in predictive performance across the tested datasets.

AVAILABILITY AND IMPLEMENTATION

LMCrot is publicly available at https://github.com/KCLabMTU/LMCrot.

Collapse

Cao D, Hu M, Zhi D, Liang J, Tan Q, Lei Q, Li M, Cheng H, Wang L, Dai W. Systematic evaluation of machine learning-enhanced trifocal IOL power selection for axial myopia cataract patients. Comput Biol Med 2024;173:108245. [PMID: 38531253 DOI: 10.1016/j.compbiomed.2024.108245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 03/03/2024] [Accepted: 03/04/2024] [Indexed: 03/28/2024]

Abstract

PURPOSE

This study aimed to evaluate and optimize intraocular lens (IOL) power selection for cataract patients with high axial myopia receiving trifocal IOLs.

DESIGN

A multi-center, retrospective observational case series was conducted. Patients having an axial length ≥26 mm and undergoing cataract surgery with trifocal IOL implanted were studied.

METHODS

Preoperative biometric and postoperative outcome data from 139 eyes were collected to train and test various machine learning (ML) models (support vector machine, linear regression, and stacking regressor) using five-fold cross-validation. The models' performance was further validated externally using data from 48 eyes enrolled from other hospitals. Performance of seven IOL calculation formulas (BUII, Kane, EVO, K6, DGS, Holladay I, and SRK/T) were examined with and without ML models.

RESULTS

The results of cross-validation revealed improvements across all IOL calculation formulas, especially for K6 and Holladay I. The model increased the percentage of eyes with a prediction error (PE) within ±0.50 D from 71.94% to 79.14% for K6, and from 35.25% to 51.80% for Holladay I. In external validation involving 48 patients from other centers, six out of seven formulas demonstrated a reduction in the mean absolute error (MAE). K6's PE within ±0.50 D improved from 62.50% to 77.08%, and Holladay I from 16.67% to 58.33%.

CONCLUSIONS

In this study, we conducted a comprehensive evaluation of seven IOL power calculation formulas in high axial myopia cases and explored the effectiveness of the Stacking Regressor model in augmenting their accuracy. Of these formulas, K6 and Holladay I exhibited the most significant improvements, suggesting that integrating ML may have varying levels of effectiveness across different formulas but holds substantial promise in improving the predictability of IOL power calculations in patients with long eyes.

Collapse

Shen Z, Zhong Y, Wang Y, Zhu H, Liu R, Yu S, Zhang H, Wang M, Yang T, Zhang M. A computational approach to estimate postmortem interval using postmortem computed tomography of multiple tissues based on animal experiments. Int J Legal Med 2024;138:1093-1107. [PMID: 37999765 DOI: 10.1007/s00414-023-03127-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 10/27/2023] [Indexed: 11/25/2023]

Lee Y, Park J, Lee CO. Parareal Neural Networks Emulating a Parallel-in-Time Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024;35:6353-6364. [PMID: 36173779 DOI: 10.1109/tnnls.2022.3206797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Mota LFM, Giannuzzi D, Pegolo S, Sturaro E, Gianola D, Negrini R, Trevisi E, Ajmone Marsan P, Cecchinato A. Genomic prediction of blood biomarkers of metabolic disorders in Holstein cattle using parametric and nonparametric models. Genet Sel Evol 2024;56:31. [PMID: 38684971 PMCID: PMC11057143 DOI: 10.1186/s12711-024-00903-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open

Abstract

BACKGROUND

Metabolic disturbances adversely impact productive and reproductive performance of dairy cattle due to changes in endocrine status and immune function, which increase the risk of disease. This may occur in the post-partum phase, but also throughout lactation, with sub-clinical symptoms. Recently, increased attention has been directed towards improved health and resilience in dairy cattle, and genomic selection (GS) could be a helpful tool for selecting animals that are more resilient to metabolic disturbances throughout lactation. Hence, we evaluated the genomic prediction of serum biomarkers levels for metabolic distress in 1353 Holsteins genotyped with the 100K single nucleotide polymorphism (SNP) chip assay. The GS was evaluated using parametric models best linear unbiased prediction (GBLUP), Bayesian B (BayesB), elastic net (ENET), and nonparametric models, gradient boosting machine (GBM) and stacking ensemble (Stack), which combines ENET and GBM approaches.

RESULTS

The results show that the Stack approach outperformed other methods with a relative difference (RD), calculated as an increment in prediction accuracy, of approximately 18.0% compared to GBLUP, 12.6% compared to BayesB, 8.7% compared to ENET, and 4.4% compared to GBM. The highest RD in prediction accuracy between other models with respect to GBLUP was observed for haptoglobin (hapto) from 17.7% for BayesB to 41.2% for Stack; for Zn from 9.8% (BayesB) to 29.3% (Stack); for ceruloplasmin (CuCp) from 9.3% (BayesB) to 27.9% (Stack); for ferric reducing antioxidant power (FRAP) from 8.0% (BayesB) to 40.0% (Stack); and for total protein (PROTt) from 5.7% (BayesB) to 22.9% (Stack). Using a subset of top SNPs (1.5k) selected from the GBM approach improved the accuracy for GBLUP from 1.8 to 76.5%. However, for the other models reductions in prediction accuracy of 4.8% for ENET (average of 10 traits), 5.9% for GBM (average of 21 traits), and 6.6% for Stack (average of 16 traits) were observed.

CONCLUSIONS

Our results indicate that the Stack approach was more accurate in predicting metabolic disturbances than GBLUP, BayesB, ENET, and GBM and seemed to be competitive for predicting complex phenotypes with various degrees of mode of inheritance, i.e. additive and non-additive effects. Selecting markers based on GBM improved accuracy of GBLUP.

Collapse

Affiliation(s)

Lucio F M Mota Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
Diana Giannuzzi Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
Sara Pegolo Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
Enrico Sturaro Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
Daniel Gianola Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, 53706, USA
Riccardo Negrini Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
Erminio Trevisi Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
Paolo Ajmone Marsan Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
Alessio Cecchinato Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy

Collapse

Zhang Z, Zhou L, Wu Y, Wang N. The meta-learning method for the ensemble model based on situational meta-task. Front Neurorobot 2024;18:1391247. [PMID: 38736985 PMCID: PMC11082275 DOI: 10.3389/fnbot.2024.1391247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 04/04/2024] [Indexed: 05/14/2024] Open

Tran XTD, Phan TL, To VT, Tran NVN, Nguyen NNS, Nguyen DNH, Tran NTN, Truong TN. Integration of the Butina algorithm and ensemble learning strategies for the advancement of a pharmacophore ligand-based model: an in silico investigation of apelin agonists. Front Chem 2024;12:1382319. [PMID: 38690013 PMCID: PMC11058650 DOI: 10.3389/fchem.2024.1382319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/18/2024] [Indexed: 05/02/2024] Open

Li C, Wang L, Sun D, Chen Y. An ensemble framework-based approach for modeling stability of expansive soil slopes: fusion of machine learning algorithms and protection structure disease data. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024;31:24375-24397. [PMID: 38441739 DOI: 10.1007/s11356-024-32583-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 02/18/2024] [Indexed: 04/07/2024]

Shen C, Zhan W, Xin K, Li M, Sun Z, Cong H, Xu C, Tang J, Wu Z, Xu B, Wei Z, Xue C, Zhao C, Wang Z. Machine-learning-assisted and real-time-feedback-controlled growth of InAs/GaAs quantum dots. Nat Commun 2024;15:2724. [PMID: 38553435 PMCID: PMC10980817 DOI: 10.1038/s41467-024-47087-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 03/15/2024] [Indexed: 04/02/2024] Open

Affiliation(s)

Chao Shen Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China School of Physics Science and Technology, Xinjiang University, Urumqi, Xinjiang, 830046, China
Wenkang Zhan Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
Kaiyao Xin College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
Manyang Li Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
Zhenyu Sun Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
Hui Cong College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China Key Laboratory of Optoelectronic Materials and Devices, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
Chi Xu College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China Key Laboratory of Optoelectronic Materials and Devices, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
Jian Tang School of Physical and Electronic Engineering, Yancheng Teachers University, Yancheng, 224002, China
Zhaofeng Wu School of Physics Science and Technology, Xinjiang University, Urumqi, Xinjiang, 830046, China
Bo Xu Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China
Zhongming Wei College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
Chunlai Xue College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China Key Laboratory of Optoelectronic Materials and Devices, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
Chao Zhao Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China. College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China.
Zhanguo Wang Laboratory of Solid State Optoelectronics Information Technology, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Science, Beijing, 101804, China

Collapse

Wang J, Wu DD, DeLorenzo C, Yang J. Examining factors related to low performance of predicting remission in participants with major depressive disorder using neuroimaging data and other clinical features. PLoS One 2024;19:e0299625. [PMID: 38547128 PMCID: PMC10977765 DOI: 10.1371/journal.pone.0299625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/13/2024] [Indexed: 04/02/2024] Open

Abstract

Major depressive disorder (MDD), a prevalent mental health issue, affects more than 8% of the US population, and almost 17% in the young group of 18-25 years old. Since Covid-19, its prevalence has become even more significant. However, the remission (being free of depression) rates of first-line antidepressant treatments on MDD are only about 30%. To improve treatment outcomes, researchers have built various predictive models for treatment responses and yet none of them have been adopted in clinical use. One reason is that most predictive models are based on data from subjective questionnaires, which are less reliable. Neuroimaging data are promising objective prognostic factors, but they are expensive to obtain and hence predictive models using neuroimaging data are limited and such studies were usually in small scale (N<100). In this paper, we proposed an advanced machine learning (ML) pipeline for small training dataset with large number of features. We implemented multiple imputation for missing data and repeated K-fold cross validation (CV) to robustly estimate predictive performances. Different feature selection methods and stacking methods using 6 general ML models including random forest, gradient boosting decision tree, XGBoost, penalized logistic regression, support vector machine (SVM), and neural network were examined to evaluate the model performances. All predictive models were compared using model performance metrics such as accuracy, balanced accuracy, area under ROC curve (AUC), sensitivity and specificity. Our proposed ML pipeline was applied to a training dataset and obtained an accuracy and AUC above 0.80. But such high performance failed while applying our ML pipeline using an external validation dataset from the EMBARC study which is a multi-center study. We further examined the possible reasons especially the site heterogeneity issue.

Collapse

Bacon KL, Felson DT, Jafarzadeh SR, Kolachalama VB, Hausdorff JM, Gazit E, Stefanik JJ, Corrigan P, Segal NA, Lewis CE, Nevitt MC, Kumar D. Gait Alterations and Association With Worsening Knee Pain and Physical Function: A Machine Learning Approach With Wearable Sensors in the Multicenter Osteoarthritis Study. Arthritis Care Res (Hoboken) 2024. [PMID: 38523250 DOI: 10.1002/acr.25327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 01/23/2024] [Accepted: 03/21/2024] [Indexed: 03/26/2024]

Shen J, Wang S, Dong Y, Sun H, Wang X, Tang Z. A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data. BMC Bioinformatics 2024;25:119. [PMID: 38509499 PMCID: PMC10953151 DOI: 10.1186/s12859-024-05741-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 03/11/2024] [Indexed: 03/22/2024] Open

Datta S, Nabeel Asim M, Dengel A, Ahmed S. NTpred: a robust and precise machine learning framework for in silico identification of Tyrosine nitration sites in protein sequences. Brief Funct Genomics 2024;23:163-179. [PMID: 37248673 DOI: 10.1093/bfgp/elad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/12/2023] [Accepted: 05/02/2023] [Indexed: 05/31/2023] Open

Abstract

Post-translational modifications (PTMs) either enhance a protein's activity in various sub-cellular processes, or degrade their activity which leads toward failure of intracellular processes. Tyrosine nitration (NT) modification degrades protein's activity that initiates and propagates various diseases including neurodegenerative, cardiovascular, autoimmune diseases and carcinogenesis. Identification of NT modification supports development of novel therapies and drug discoveries for associated diseases. Identification of NT modification in biochemical labs is expensive, time consuming and error-prone. To supplement this process, several computational approaches have been proposed. However these approaches fail to precisely identify NT modification, due to the extraction of irrelevant, redundant and less discriminative features from protein sequences. This paper presents the NTpred framework that is competent in extracting comprehensive features from raw protein sequences using four different sequence encoders. To reap the benefits of different encoders, it generates four additional feature spaces by fusing different combinations of individual encodings. Furthermore, it eradicates irrelevant and redundant features from eight different feature spaces through a Recursive Feature Elimination process. Selected features of four individual encodings and four feature fusion vectors are used to train eight different Gradient Boosted Tree classifiers. The probability scores from the trained classifiers are utilized to generate a new probabilistic feature space, which is used to train a Logistic Regression classifier. On the BD1 benchmark dataset, the proposed framework outperforms the existing best-performing predictor in 5-fold cross validation and independent test evaluation with combined improvement of 13.7% in MCC and 20.1% in AUC. Similarly, on the BD2 benchmark dataset, the proposed framework outperforms the existing best-performing predictor with combined improvement of 5.3% in MCC and 1.0% in AUC. NTpred is publicly available for further experimentation and predictive use at: https://sds_genetic_analysis.opendfki.de/PredNTS/.

Collapse

Brar AS, Singh K. A multi-objective stacked regression method for distance based colour measuring device. Sci Rep 2024;14:5530. [PMID: 38448462 PMCID: PMC10918078 DOI: 10.1038/s41598-024-54785-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 02/16/2024] [Indexed: 03/08/2024] Open

M'hamdi O, Takács S, Palotás G, Ilahy R, Helyes L, Pék Z. A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data. PLANTS (BASEL, SWITZERLAND) 2024;13:746. [PMID: 38475592 DOI: 10.3390/plants13050746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 03/01/2024] [Accepted: 03/04/2024] [Indexed: 03/14/2024]

Wang S, Yam C, Chen S, Hu L, Li L, Hung FF, Fan J, Che CM, Chen G. Predictions of photophysical properties of phosphorescent platinum(II) complexes based on ensemble machine learning approach. J Comput Chem 2024;45:321-330. [PMID: 37861354 DOI: 10.1002/jcc.27238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 09/23/2023] [Indexed: 10/21/2023]

Tu JB, Liao WJ, Liu WC, Gao XH. Using machine learning techniques to predict the risk of osteoporosis based on nationwide chronic disease data. Sci Rep 2024;14:5245. [PMID: 38438569 PMCID: PMC10912338 DOI: 10.1038/s41598-024-56114-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 03/01/2024] [Indexed: 03/06/2024] Open

Abstract

Osteoporosis is a major public health concern that significantly increases the risk of fractures. The aim of this study was to develop a Machine Learning based predictive model to screen individuals at high risk of osteoporosis based on chronic disease data, thus facilitating early detection and personalized management. A total of 10,000 complete patient records of primary healthcare data in the German Disease Analyzer database (IMS HEALTH) were included, of which 1293 diagnosed with osteoporosis and 8707 without the condition. The demographic characteristics and chronic disease data, including age, gender, lipid disorder, cancer, COPD, hypertension, heart failure, CHD, diabetes, chronic kidney disease, and stroke were collected from electronic health records. Ten different machine learning algorithms were employed to construct the predictive mode. The performance of the model was further validated and the relative importance of features in the model was analyzed. Out of the ten machine learning algorithms, the Stacker model based on Logistic Regression, AdaBoost Classifier, and Gradient Boosting Classifier demonstrated superior performance. The Stacker model demonstrated excellent performance through ten-fold cross-validation on the training set and ROC curve analysis on the test set. The confusion matrix, lift curve and calibration curves indicated that the Stacker model had optimal clinical utility. Further analysis on feature importance highlighted age, gender, lipid metabolism disorders, cancer, and COPD as the top five influential variables. In this study, a predictive model for osteoporosis based on chronic disease data was developed using machine learning. The model shows great potential in early detection and risk stratification of osteoporosis, ultimately facilitating personalized prevention and management strategies.

Collapse

Öhlschuster M, Comiskey D, Kavanagh M, Kickinger F, Scaldaferri C, Sigler M, Nilsen P. On the prediction of SAV transmission among Norwegian aquaculture sites. Prev Vet Med 2024;224:106095. [PMID: 38232517 DOI: 10.1016/j.prevetmed.2023.106095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 11/28/2023] [Accepted: 12/12/2023] [Indexed: 01/19/2024]

Ibrahim M, Beneyto A, Contreras I, Vehi J. An ensemble machine learning approach for the detection of unannounced meals to enhance postprandial glucose control. Comput Biol Med 2024;171:108154. [PMID: 38382387 DOI: 10.1016/j.compbiomed.2024.108154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 02/02/2024] [Accepted: 02/12/2024] [Indexed: 02/23/2024]

Abstract

BACKGROUND

Hybrid automated insulin delivery systems enhance postprandial glucose control in type 1 diabetes, however, meal announcements are burdensome. To overcome this, we propose a machine learning-based automated meal detection approach; METHODS:: A heterogeneous ensemble method combining an artificial neural network, random forest, and logistic regression was employed. Trained and tested on data from two in-silico cohorts comprising 20 and 47 patients. It accounted for various meal sizes (moderate to high) and glucose appearance rates (slow and rapid absorbing). To produce an optimal prediction model, three ensemble configurations were used: logical AND, majority voting, and logical OR. In addition to the in-silico data, the proposed meal detector was also trained and tested using the OhioT1DM dataset. Finally, the meal detector is combined with a bolus insulin compensation scheme; RESULTS:: The ensemble majority voting obtained the best meal detector results for both the in-silico and OhioT1DM cohorts with a sensitivity of 77%, 94%, 61%, precision of 96%, 89%, 72%, F1-score of 85%, 91%, 66%, and with false positives per day values of 0.05, 0.19, 0.17, respectively. Automatic meal detection with insulin compensation has been performed in open-loop insulin therapy using the AND ensemble, chosen for its lower false positive rate. Time-in-range has significantly increased 10.48% and 16.03%, time above range was reduced by 5.16% and 11.85%, with a minimal time below range increase of 0.35% and 2.69% for both in-silico cohorts, respectively, compared to the results without a meal detector; CONCLUSION:: To increase the overall accuracy and robustness of the predictions, this ensemble methodology aims to take advantage of each base model's strengths. All of the results point to the potential application of the proposed meal detector as a separate module for the detection of meals in automated insulin delivery systems to achieve improved glycemic control.

Collapse

Chen D, Gu X, Guo H, Cheng T, Yang J, Zhan Y, Fu Q. Spatiotemporally continuous PM_2.5 dataset in the Mekong River Basin from 2015 to 2022 using a stacking model. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024;914:169801. [PMID: 38184264 DOI: 10.1016/j.scitotenv.2023.169801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/13/2023] [Accepted: 12/29/2023] [Indexed: 01/08/2024]

Affiliation(s)

Debao Chen National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
Xingfa Gu National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China; School of Remote Sensing and Information Engineering, North China Institute of Aerospace Engineering, Langfang, China
Hong Guo National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China.
Tianhai Cheng National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
Jian Yang National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
Yulin Zhan National Engineering Laboratory for Satellite Remote Sensing Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
Qiming Fu School of Remote Sensing and Information Engineering, North China Institute of Aerospace Engineering, Langfang, China

Collapse

Kabir E, Guikema SD, Quiring SM. Power outage prediction using data streams: An adaptive ensemble learning approach with a feature- and performance-based weighting mechanism. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2024;44:686-704. [PMID: 37666505 DOI: 10.1111/risa.14211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]

Kikuchi Y, Kawczynski MG, Anegondi N, Neubert A, Dai J, Ferrara D, Quezada-Ruiz C. Machine Learning to Predict Faricimab Treatment Outcome in Neovascular Age-Related Macular Degeneration. OPHTHALMOLOGY SCIENCE 2024;4:100385. [PMID: 37868796 PMCID: PMC10585644 DOI: 10.1016/j.xops.2023.100385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 08/07/2023] [Accepted: 08/10/2023] [Indexed: 10/24/2023]

Abstract

Purpose

To develop machine learning (ML) models to predict, at baseline, treatment outcomes at month 9 in patients with neovascular age-related macular degeneration (nAMD) receiving faricimab.

Design

Retrospective proof of concept study.

Participants

Patients enrolled in the phase II AVENUE trial (NCT02484690) of faricimab in nAMD.

Methods

Baseline characteristics and spectral domain-OCT (SD-OCT) image data from 185 faricimab-treated eyes were split into 80% training and 20% test sets at the patient level. Input variables were baseline age, sex, best-corrected visual acuity (BCVA), central subfield thickness (CST), low luminance deficit, treatment arm, and SD-OCT images. A regression problem (BCVA) and a binary classification problem (reduction of CST by 35%) were considered. Overall, 10 models were developed and tested for each problem. Benchmark classical ML models (linear, random forest, extreme gradient boosting) were trained on baseline characteristics; benchmark deep neural networks (DNNs) were trained on baseline SD-OCT B-scans. Baseline characteristics and SD-OCT data were merged using 2 approaches: model stacking (using DNN prediction as an input feature for classical ML models) and model averaging (which averaged predictions from the DNN using SD-OCT volume and from classical ML models using baseline characteristics).

Main Outcome Measures

Treatment outcomes were defined by 2 target variables: functional (BCVA letter score) and anatomical (percent decrease in CST from baseline) outcomes at month 9.

Results

The best-performing BCVA regression model with respect to the test coefficient of determination (R2) was the linear model in the model-stacking approach with R2 of 0.31. The best-performing CST classification model with respect to test area under receiver operating characteristics (AUROC) was the benchmark linear model with AUROC of 0.87. A post hoc analysis showed the baseline BCVA and the baseline CST had the most effect in the all-model prediction for BCVA regression and CST classification, respectively.

Conclusions

Promising signals for predicting treatment outcomes from baseline characteristics were detected; however, the predictive benefit of baseline images was unclear in this proof-of-concept study. Further testing and validation with larger, independent datasets is required to fully explore the predictive capacity of ML models using baseline imaging data.

Financial Disclosures

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Collapse

Song A, Lusk JB, Roh KM, Hsu ST, Valikodath NG, Lad EM, Muir KW, Engelhard MM, Limkakeng AT, Izatt JA, McNabb RP, Kuo AN. RobOCTNet: Robotics and Deep Learning for Referable Posterior Segment Pathology Detection in an Emergency Department Population. Transl Vis Sci Technol 2024;13:12. [PMID: 38488431 PMCID: PMC10946693 DOI: 10.1167/tvst.13.3.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/31/2024] [Indexed: 03/19/2024] Open