Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model 2021;61:2623-2640. [PMID: 34100609 DOI: 10.1021/acs.jcim.1c00160] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

For:	Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model 2021;61:2623-2640. [PMID: 34100609 DOI: 10.1021/acs.jcim.1c00160] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Number

Cited by Other Article(s)

Garg A, Ramamurthi N, Das SS. Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML. J Chem Inf Model 2025;65:3976-3989. [PMID: 40230275 DOI: 10.1021/acs.jcim.5c00023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]

Abstract

The classification models built on class imbalanced data sets tend to prioritize the accuracy of the majority class, and thus, the minority class generally has a higher misclassification rate. Different techniques are available to address the class imbalance in classification models and can be categorized as data-level, algorithm-level, and hybrid methods. But to the best of our knowledge, an in-depth analysis of the performance of these techniques against the class ratio is not available in the literature. We have addressed these shortcomings in this study and have performed a detailed analysis of the performance of four different techniques to address imbalanced class distribution using machine learning (ML) methods and AutoML tools. To carry out our study, we have selected four such techniques─(a) threshold optimization using (i) GHOST and (ii) the area under the precision-recall curve (AUPR) curve, (b) internal balancing method of AutoML and class-weight of machine learning methods, and (c) data balancing using SMOTETomek─and generated 27 data sets considering nine different class ratios (i.e., the ratio of the positive class and total samples) from three data sets that belong to the drug discovery and development field. We have employed random forest (RF) and support vector machine (SVM) as representatives of ML classifier and AutoGluon-Tabular (version 0.6.1) and H2O AutoML (version 3.40.0.4) as representatives of AutoML tools. The important findings of our studies are as follows: (i) there is no effect of threshold optimization on ranking metrics such as AUC and AUPR, but AUC and AUPR get affected by class-weighting and SMOTTomek; (ii) for ML methods RF and SVM, significant percentage improvement up to 375, 33.33, and 450 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy, which are suitable for performance evaluation of imbalanced data sets; (iii) for AutoML libraries AutoGluon-Tabular and H2O AutoML, significant percentage improvement up to 383.33, 37.25, and 533.33 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy; (iv) the general pattern of percentage improvement in balanced accuracy is that the percentage improvement increases when the class ratio is systematically decreased from 0.5 to 0.1; in the case of F1 score and MCC, maximum improvement is achieved at the class ratio of 0.3; (v) for both ML and AutoML with balancing, it is observed that any individual class-balancing technique does not outperform all other methods on a significantly higher number of data sets based on F1 score; (vi) the three external balancing techniques combined outperformed the internal balancing methods of the ML and AutoML; (vii) AutoML tools perform as good as the ML models and in some cases perform even better for handling imbalanced classification when applied with imbalance handling techniques. In summary, exploration of multiple data balancing techniques is recommended for classifying imbalanced data sets to achieve optimal performance as neither of the external techniques nor the internal techniques outperform others significantly. The results are specific to the ML methods and AutoML libraries used in this study, and for generalization, a study can be carried out considering a sizable number of ML methods and AutoML libraries.

Collapse

Al-Ahmari S, Nadeem F. Improving Surgical Site Infection Prediction Using Machine Learning: Addressing Challenges of Highly Imbalanced Data. Diagnostics (Basel) 2025;15:501. [PMID: 40002652 PMCID: PMC11854898 DOI: 10.3390/diagnostics15040501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 02/13/2025] [Accepted: 02/17/2025] [Indexed: 02/27/2025] Open

Abstract

Background: Surgical site infections (SSIs) lead to higher hospital readmission rates and healthcare costs, representing a significant global healthcare burden. Machine learning (ML) has demonstrated potential in predicting SSIs; however, the challenge of addressing imbalanced class ratios remains. Objectives: The aim of this study is to evaluate and enhance the predictive capabilities of machine learning models for SSIs by assessing the effects of feature selection, resampling techniques, and hyperparameter optimization. Methods: Using routine SSI surveillance data from multiple hospitals in Saudi Arabia, we analyzed a dataset of 64,793 surgical patients, of whom 1632 developed SSI. Seven machine learning algorithms were created and tested: Decision Tree (DT), Gaussian Naive Bayes (GNB), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Stochastic Gradient Boosting (SGB), and K-Nearest Neighbors (KNN). We also improved several resampling strategies, such as undersampling and oversampling. Grid search five-fold cross-validation was employed for comprehensive hyperparameter optimization, in conjunction with balanced sampling techniques. Features were selected using a filter method based on their relationships with the target variable. Results: Our findings revealed that RF achieves the highest performance, with an MCC of 0.72. The synthetic minority oversampling technique (SMOTE) is the best-performing resampling technique, consistently enhancing the performance of most machine learning models, except for LR and GNB. LR struggles with class imbalance due to its linear assumptions and bias toward the majority class, while GNB's reliance on feature independence and Gaussian distribution make it unreliable for under-represented minority classes. For computational efficiency, the Instance Hardness Threshold (IHT) offers a viable alternative undersampling technique, though it may compromise performance to some extent. Conclusions: This study underscores the potential of ML models as effective tools for assessing SSI risk, warranting further clinical exploration to improve patient outcomes. By employing advanced ML techniques and robust validation methods, these models demonstrate promising accuracy and reliability in predicting SSI events, even in the face of significant class imbalances. In addition, using MCC in this study ensures a more reliable and robust evaluation of the model's predictive performance, particularly in the presence of an imbalanced dataset, where other metrics may fail to provide an accurate evaluation.

Collapse

Siddique F, Lee BK. Predicting adolescent psychopathology from early life factors: A machine learning tutorial. GLOBAL EPIDEMIOLOGY 2024;8:100161. [PMID: 39279846 PMCID: PMC11402309 DOI: 10.1016/j.gloepi.2024.100161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 07/10/2024] [Accepted: 08/27/2024] [Indexed: 09/18/2024] Open

Abstract

Objective

The successful implementation and interpretation of machine learning (ML) models in epidemiological studies can be challenging without an extensive programming background. We provide a didactic example of machine learning for risk prediction in this study by determining whether early life factors could be useful for predicting adolescent psychopathology.

Methods

In total, 9643 adolescents ages 9-10 from the Adolescent Brain and Cognitive Development (ABCD) Study were included in ML analysis to predict high Child Behavior Checklist (CBCL) scores (i.e., t-scores ≥ 60). ML models were constructed using a series of predictor combinations (prenatal, family history, sociodemographic) across 5 different algorithms. We assessed ML performance through sensitivity, specificity, F1-score, and area under the curve (AUC) metrics.

Results

A total of 1267 adolescents (13.1 %) were found to have high CBCL scores. The best performing algorithms were elastic net and gradient boosted trees. The best performing elastic net models included prenatal and family history factors (Sensitivity 0.654, Specificity 0.713; AUC 0.742, F1-score 0.401) and prenatal, family, history, and sociodemographic factors (Sensitivity 0.668, Specificity 0.704; AUC 0.745, F1-score 0.402). Across all 5 ML algorithms, family history factors (e.g., either parent had nervous breakdowns, trouble holding jobs/fights/police encounters, and counseling for mental issues) and sociodemographic covariates (e.g., maternal age, child's sex, caregiver income and caregiver education) tended to be better predictors of adolescent psychopathology. The most important prenatal predictors were unplanned pregnancy, birth complications, and pregnancy complications.

Conclusion

Our results suggest that inclusion of prenatal, family history, and sociodemographic factors in ML models can generate moderately accurate predictions of adolescent psychopathology. Issues associated with model overfitting, hyperparameter tuning, and system seed setting should be considered throughout model training, testing, and validation. Future early risk predictions models may improve with the inclusion of additional relevant covariates.

Collapse

Ostojic D, Lalousis PA, Donohoe G, Morris DW. The challenges of using machine learning models in psychiatric research and clinical practice. Eur Neuropsychopharmacol 2024;88:53-65. [PMID: 39232341 DOI: 10.1016/j.euroneuro.2024.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 08/06/2024] [Accepted: 08/12/2024] [Indexed: 09/06/2024]

Hou YF, Zhang L, Zhang Q, Ge F, Dral PO. Physics-Informed Active Learning for Accelerating Quantum Chemical Simulations. J Chem Theory Comput 2024. [PMID: 39264419 DOI: 10.1021/acs.jctc.4c00821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]

Affiliation(s)

Yi-Fan Hou State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
Lina Zhang State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
Quanhao Zhang State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
Fuchun Ge State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
Pavlo O Dral State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China Institute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Toruń, ul. Grudziądzka 5, Toruń 87-100, Poland

Collapse

Kuo PF, Hsu WT, Lord D, Putra IGB. Classification of autonomous vehicle crash severity: Solving the problems of imbalanced datasets and small sample size. ACCIDENT; ANALYSIS AND PREVENTION 2024;205:107666. [PMID: 38901160 DOI: 10.1016/j.aap.2024.107666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 05/21/2024] [Accepted: 06/03/2024] [Indexed: 06/22/2024]

Abstract

Only a few researchers have shown how environmental factors and road features relate to Autonomous Vehicle (AV) crash severity levels, and none have focused on the data limitation problems, such as small sample sizes, imbalanced datasets, and high dimensional features. To address these problems, we analyzed an AV crash dataset (2019 to 2021) from the California Department of Motor Vehicles (CA DMV), which included 266 collision reports (51 of those causing injuries). We included external environmental variables by collecting various points of interest (POIs) and roadway features from Open Street Map (OSM) and Data San Francisco (SF). Random Over-Sampling Examples (ROSE) and the Synthetic Minority Over-Sampling Technique (SMOTE) methods were used to balance the dataset and increase the sample size. These two balancing methods were used to expand the dataset and solve the small sample size problem simultaneously. Mutual information, random forest, and XGboost were utilized to address the high dimensional feature and the selection problem caused by including a variety of types of POIs as predictive variables. Because existing studies do not use consistent procedures, we compared the effectiveness of using the feature-selection preprocessing method as the first process to employing the data-balance technique as the first process. Our results showed that AV crash severity levels are related to vehicle manufacturers, vehicle damage level, collision type, vehicle movement, the parties involved in the crash, speed limit, and some types of POIs (areas near transportation, entertainment venues, public places, schools, and medical facilities). Both resampling methods and three data preprocessing methods improved model performance, and the model that used SMOTE and data-balancing first was the best. The results suggest that over-sampling and the feature selection method can improve model prediction performance and define new factors related to AV crash severity levels.

Collapse

Bronstein MV, Kummerfeld E, MacDonald A, Vinogradov S. Identifying psychological predictors of SARS-CoV-2 vaccination: A machine learning study. Vaccine 2024;42:126198. [PMID: 39106578 DOI: 10.1016/j.vaccine.2024.126198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 07/29/2024] [Accepted: 07/29/2024] [Indexed: 08/09/2024]

Abstract

BACKGROUND

Major barriers to addressing SARS-CoV-2 vaccine hesitancy include limited knowledge of what causes delay/refusal of SARS-CoV-2 vaccination and limited ability to predict who will remain unvaccinated over significant time periods despite vaccine availability. The present study begins to address these barriers by developing a machine learning model that prospectively predicts who will persist in not vaccinating against SARS-CoV-2.

METHOD

Unvaccinated individuals (n = 325) who completed a baseline survey were followed over the six-month period when vaccines against SARS-CoV-2 were first widely available (April-October 2021). A random forest model was used to predict who would remain unvaccinated against SARS-CoV-2 from their baseline measures, including demographic information (e.g., age), medical history (e.g., of influenza vaccination), Health-Belief Model constructs (e.g., perceived vaccine dangerousness), conspiracist ideation, and task-based metrics of vulnerability to conspiracist ideation (e.g., tendency toward illusory pattern perception).

RESULTS

The resulting model significantly predicted vaccination status (AUC-PR = 0.77, 95%CI [0.56 0.90]). At the optimal probability threshold determined by the Generalized Threshold Shifting Protocol, the model was moderately precise (0.83) when identifying individuals who remained unvaccinated (n = 80), and had a very low rate (0.04) of false-positives (incorrectly suggesting that individuals remained unvaccinated). Permutational importance tests suggested that baseline SARS-CoV-2 vaccine intentions conveyed the most information about future SARS-CoV-2 vaccination status. Conspiracist ideation was the second most informative predictor, suggesting that misinformation influences vaccination behavior. Other important predictors included perceived vaccine dangerousness, as expected under the Health Belief Model, and influenza vaccination history.

CONCLUSIONS

The model we developed can accurately and prospectively identify individuals who remain unvaccinated against SARS-CoV-2. It could therefore facilitate empirically-informed allocation of interventions that encourage vaccine uptake. The predictive value of conspiracist ideation, perceived vaccine dangerousness, and vaccine intentions in our model is consistent with potential causal relations between these variables and SARS-CoV-2 vaccine uptake.

Collapse

Wang HE, Weiner JP, Saria S, Lehmann H, Kharrazi H. Assessing racial bias in healthcare predictive models: Practical lessons from an empirical evaluation of 30-day hospital readmission models. J Biomed Inform 2024;156:104683. [PMID: 38925281 DOI: 10.1016/j.jbi.2024.104683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]

Abstract

OBJECTIVE

Despite increased availability of methodologies to identify algorithmic bias, the operationalization of bias evaluation for healthcare predictive models is still limited. Therefore, this study proposes a process for bias evaluation through an empirical assessment of common hospital readmission models. The process includes selecting bias measures, interpretation, determining disparity impact and potential mitigations.

METHODS

This retrospective analysis evaluated racial bias of four common models predicting 30-day unplanned readmission (i.e., LACE Index, HOSPITAL Score, and the CMS readmission measure applied as is and retrained). The models were assessed using 2.4 million adult inpatient discharges in Maryland from 2016 to 2019. Fairness metrics that are model-agnostic, easy to compute, and interpretable were implemented and apprised to select the most appropriate bias measures. The impact of changing model's risk thresholds on these measures was further assessed to guide the selection of optimal thresholds to control and mitigate bias.

RESULTS

Four bias measures were selected for the predictive task: zero-one-loss difference, false negative rate (FNR) parity, false positive rate (FPR) parity, and generalized entropy index. Based on these measures, the HOSPITAL score and the retrained CMS measure demonstrated the lowest racial bias. White patients showed a higher FNR while Black patients resulted in a higher FPR and zero-one-loss. As the models' risk threshold changed, trade-offs between models' fairness and overall performance were observed, and the assessment showed all models' default thresholds were reasonable for balancing accuracy and bias.

CONCLUSIONS

This study proposes an Applied Framework to Assess Fairness of Predictive Models (AFAFPM) and demonstrates the process using 30-day hospital readmission model as the example. It suggests the feasibility of applying algorithmic bias assessment to determine optimized risk thresholds so that predictive models can be used more equitably and accurately. It is evident that a combination of qualitative and quantitative methods and a multidisciplinary team are necessary to identify, understand and respond to algorithm bias in real-world healthcare settings. Users should also apply multiple bias measures to ensure a more comprehensive, tailored, and balanced view. The results of bias measures, however, must be interpreted with caution and consider the larger operational, clinical, and policy context.

Collapse

Wossnig L, Furtmann N, Buchanan A, Kumar S, Greiff V. Best practices for machine learning in antibody discovery and development. Drug Discov Today 2024;29:104025. [PMID: 38762089 DOI: 10.1016/j.drudis.2024.104025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 04/25/2024] [Accepted: 05/13/2024] [Indexed: 05/20/2024]

Liu J, Luo J, Chen X, Xie J, Wang C, Wang H, Yuan Q, Li S, Zhang Y, Hu J, Shi C. Opioid Nonadherence Risk Prediction of Patients with Cancer-Related Pain Based on Five Machine Learning Algorithms. Pain Res Manag 2024;2024:7347876. [PMID: 38872993 PMCID: PMC11175844 DOI: 10.1155/2024/7347876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 04/03/2024] [Accepted: 05/02/2024] [Indexed: 06/15/2024]

Affiliation(s)

Jinmei Liu Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China
Juan Luo Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China
Xu Chen Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China
Jiyi Xie Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China
Cong Wang Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China
Hanxiang Wang Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China
Qi Yuan Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China
Shijun Li Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China
Yu Zhang Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China
Jianli Hu Cancer Center, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
Chen Shi Department of Pharmacy, Union Hospital, Tongji Medical College, Huazhong University of Science & Technology (HUST), Wuhan, China Hubei Province Clinical Research Center for Precision Medicine for Critical Illness, Wuhan 430022, China

Collapse

De Abreu Ferreira R, Zhong S, Moureaud C, Le MT, Rothstein A, Li X, Wang L, Patwardhan M. A Pilot, Predictive Surveillance Model in Pharmacovigilance Using Machine Learning Approaches. Adv Ther 2024;41:2435-2445. [PMID: 38704799 PMCID: PMC11133112 DOI: 10.1007/s12325-024-02870-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 04/04/2024] [Indexed: 05/07/2024]

Mirzaee Moghaddam Kasmaee A, Ataei A, Moravvej SV, Alizadehsani R, Gorriz JM, Zhang YD, Tan RS, Acharya UR. ELRL-MD: a deep learning approach for myocarditis diagnosis using cardiac magnetic resonance images with ensemble and reinforcement learning integration. Physiol Meas 2024;45:055011. [PMID: 38697206 DOI: 10.1088/1361-6579/ad46e2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 05/02/2024] [Indexed: 05/04/2024]

Almazroi AA, Ayub N. Enhancing aspect-based multi-labeling with ensemble learning for ethical logistics. PLoS One 2024;19:e0295248. [PMID: 38771789 PMCID: PMC11108219 DOI: 10.1371/journal.pone.0295248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 11/20/2023] [Indexed: 05/23/2024] Open

Abstract

In the dynamic domain of logistics, effective communication is essential for streamlined operations. Our innovative solution, the Multi-Labeling Ensemble (MLEn), tackles the intricate task of extracting multi-labeled data, employing advanced techniques for accurate preprocessing of textual data through the NLTK toolkit. This approach is carefully tailored to the prevailing language used in logistics communication. MLEn utilizes innovative methods, including sentiment intensity analysis, Word2Vec, and Doc2Vec, ensuring comprehensive feature extraction. This proves particularly suitable for logistics in e-commerce, capturing nuanced communication essential for efficient operations. Ethical considerations are a cornerstone in logistics communication, and MLEn plays a pivotal role in detecting and categorizing inappropriate language, aligning inherently with ethical norms. Leveraging Tf-IDF and Vader for feature enhancement, MLEn adeptly discerns and labels ethically sensitive content in logistics communication. Across diverse datasets, including Emotions, MLEn consistently achieves impressive accuracy levels ranging from 92% to 97%, establishing its superiority in the logistics context. Particularly, our proposed method, DenseNet-EHO, outperforms BERT by 8% and surpasses other techniques by a 15-25% efficiency. A comprehensive analysis, considering metrics such as precision, recall, F1-score, Ranking Loss, Jaccard Similarity, AUC-ROC, sensitivity, and time complexity, underscores DenseNet-EHO's efficiency, aligning with the practical demands within the logistics track. Our research significantly contributes to enhancing precision, diversity, and computational efficiency in aspect-based sentiment analysis within logistics. By integrating cutting-edge preprocessing, sentiment intensity analysis, and vectorization, MLEn emerges as a robust framework for multi-label datasets, consistently outperforming conventional approaches and giving outstanding precision, accuracy, and efficiency in the logistics field.

Collapse

Ansari M, White AD. Learning peptide properties with positive examples only. DIGITAL DISCOVERY 2024;3:977-986. [PMID: 38756224 PMCID: PMC11094695 DOI: 10.1039/d3dd00218g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 03/30/2024] [Indexed: 05/18/2024]

Miley K, Bronstein MV, Ma S, Lee H, Green MF, Ventura J, Hooker CI, Nahum M, Vinogradov S. Trajectories and predictors of response to social cognition training in people with schizophrenia: A proof-of-concept machine learning study. Schizophr Res 2024;266:92-99. [PMID: 38387253 PMCID: PMC11005939 DOI: 10.1016/j.schres.2024.02.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 12/15/2023] [Accepted: 02/17/2024] [Indexed: 02/24/2024]

Zhou Q, Ye W, Yu X, Bao YJ. A pathway-based computational framework for identification of a new modal of multi-omics biomarkers and its application in esophageal cancer. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;247:108077. [PMID: 38382307 DOI: 10.1016/j.cmpb.2024.108077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 01/14/2024] [Accepted: 02/10/2024] [Indexed: 02/23/2024]

Abstract

BACKGROUND

The pathway-based strategy has been recently proposed for identifying biomarkers with the advantages of higher biological interpretability and cross-data robustness than the conventional gene-based strategy. However, its utility in clinical applications has been limited due to the high computational complexity and ill-defined performance.

OBJECTIVE

The current study presents a machine learning-based computational framework using multi-omics data for identifying a new modal of biomarkers, called pathway-derived core biomarkers, which have the advantages of both gene-based and pathway-based biomarkers.

METHODS

Machine-learning methods and gene-pathway network were integrated to select the pathway-derived core biomarkers. Multiple machine-learning algorithms were used to construct and validate the diagnostic models of the biomarkers based on more than 1400 multi-omics clinical samples of esophageal squamous cell carcinoma (ESCC).

RESULTS

The results showed that the classifier models based on the new modal biomarkers achieved superior performance in the training datasets with an average AUC/accuracy of 0.98/0.95 and 0.89/0.81 for mRNAs and miRNA, respectively, higher than the currently known classifier models based on the conventional gene-based strategy and pathway-based strategy. In the testing cohorts, the AUC/accuracy increased by 6.1 %/7.3 % than the models based on the native gene-based biomarkers. The improved performance was further confirmed in independent validation cohorts. Specifically, the sensitivity/specificity increased by ∼3 % and the variance significantly decreased by ∼69 % compared with that of the native gene-based biomarkers. Importantly, the pathway-derived core biomarkers also recovered 45 % more previously reported biomarkers than the gene-based biomarkers and are more functionally relevant to the ESCC etiology (involved in 14 versus 7 pathways related with ESCC or other cancer), highlighting the cross-data robustness of this new modal of biomarkers via enhanced functional relevance.

CONCLUSIONS

The results demonstrated that the new modal of biomarkers not only have improved predicting performance and robustness, but also exhibit higher functional interpretability thus leading to the potential application in cancer diagnosis.

Collapse

Hang NT, Long NT, Duy ND, Chien NN, Van Phuong N. Towards safer and efficient formulations: Machine learning approaches to predict drug-excipient compatibility. Int J Pharm 2024;653:123884. [PMID: 38341049 DOI: 10.1016/j.ijpharm.2024.123884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 01/28/2024] [Accepted: 02/03/2024] [Indexed: 02/12/2024]

Akinola LK, Uzairu A, Shallangwa GA, Abechi SE, Umar AB. Identification of estrogen receptor agonists among hydroxylated polychlorinated biphenyls using classification-based quantitative structure-activity relationship models. Curr Res Toxicol 2024;6:100158. [PMID: 38435023 PMCID: PMC10907392 DOI: 10.1016/j.crtox.2024.100158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/22/2024] [Accepted: 02/22/2024] [Indexed: 03/05/2024] Open

Lee JH, Shin J, Min JH, Jeong WK, Kim H, Choi SY, Lee J, Hong S, Kim K. Preoperative prediction of early recurrence in resectable pancreatic cancer integrating clinical, radiologic, and CT radiomics features. Cancer Imaging 2024;24:6. [PMID: 38191489 PMCID: PMC10775464 DOI: 10.1186/s40644-024-00653-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 12/29/2023] [Indexed: 01/10/2024] Open

Affiliation(s)

Jeong Hyun Lee Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro Gangnam-gu, Seoul, 06351, Republic of Korea
Jaeseung Shin Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro Gangnam-gu, Seoul, 06351, Republic of Korea
Ji Hye Min Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro Gangnam-gu, Seoul, 06351, Republic of Korea.
Woo Kyoung Jeong Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro Gangnam-gu, Seoul, 06351, Republic of Korea
Honsoul Kim Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro Gangnam-gu, Seoul, 06351, Republic of Korea
Seo-Youn Choi Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro Gangnam-gu, Seoul, 06351, Republic of Korea Department of Radiology, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon, Republic of Korea
Jisun Lee Department of Radiology, College of Medicine, Chungbuk National University, Chungbuk National University Hospital, Cheongju, Republic of Korea
Sungjun Hong Department of Digital Health, Samsung Advanced Institute of Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
Kyunga Kim Department of Digital Health, Samsung Advanced Institute of Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea Biomedical Statistics Center, Research Institute for Future Medicine, Samsung Medical Center, Seoul, Republic of Korea

Collapse

van Heerden A, Turon G, Duran-Frigola M, Pillay N, Birkholtz LM. Machine Learning Approaches Identify Chemical Features for Stage-Specific Antimalarial Compounds. ACS OMEGA 2023;8:43813-43826. [PMID: 38027377 PMCID: PMC10666252 DOI: 10.1021/acsomega.3c05664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/18/2023] [Accepted: 10/20/2023] [Indexed: 12/01/2023]

Abstract

Efficacy data from diverse chemical libraries, screened against the various stages of the malaria parasite Plasmodium falciparum, including asexual blood stage (ABS) parasites and transmissible gametocytes, serve as a valuable reservoir of information on the chemical space of compounds that are either active (or not) against the parasite. We postulated that this data can be mined to define chemical features associated with the sole ABS activity and/or those that provide additional life cycle activity profiles like gametocytocidal activity. Additionally, this information could provide chemical features associated with inactive compounds, which could eliminate any future unnecessary screening of similar chemical analogs. Therefore, we aimed to use machine learning to identify the chemical space associated with stage-specific antimalarial activity. We collected data from various chemical libraries that were screened against the asexual (126 374 compounds) and sexual (gametocyte) stages of the parasite (93 941 compounds), calculated the compounds' molecular fingerprints, and trained machine learning models to recognize stage-specific active and inactive compounds. We were able to build several models that predict compound activity against ABS and dual activity against ABS and gametocytes, with Support Vector Machines (SVM) showing superior abilities with high recall (90 and 66%) and low false-positive predictions (15 and 1%). This allowed the identification of chemical features enriched in active and inactive populations, an important outcome that could be mined for essential chemical features to streamline hit-to-lead optimization strategies of antimalarial candidates. The predictive capabilities of the models held true in diverse chemical spaces, indicating that the ML models are therefore robust and can serve as a prioritization tool to drive and guide phenotypic screening and medicinal chemistry programs.

Collapse

Handa K, Thomas MC, Kageyama M, Iijima T, Bender A. On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data. J Cheminform 2023;15:112. [PMID: 37990215 PMCID: PMC10664602 DOI: 10.1186/s13321-023-00781-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 11/10/2023] [Indexed: 11/23/2023] Open

Abstract

While a multitude of deep generative models have recently emerged there exists no best practice for their practically relevant validation. On the one hand, novel de novo-generated molecules cannot be refuted by retrospective validation (so that this type of validation is biased); but on the other hand prospective validation is expensive and then often biased by the human selection process. In this case study, we frame retrospective validation as the ability to mimic human drug design, by answering the following question: Can a generative model trained on early-stage project compounds generate middle/late-stage compounds de novo? To this end, we used experimental data that contains the elapsed time of a synthetic expansion following hit identification from five public (where the time series was pre-processed to better reflect realistic synthetic expansions) and six in-house project datasets, and used REINVENT as a widely adopted RNN-based generative model. After splitting the dataset and training REINVENT on early-stage compounds, we found that rediscovery of middle/late-stage compounds was much higher in public projects (at 1.60%, 0.64%, and 0.21% of the top 100, 500, and 5000 scored generated compounds) than in in-house projects (where the values were 0.00%, 0.03%, and 0.04%, respectively). Similarly, average single nearest neighbour similarity between early- and middle/late-stage compounds in public projects was higher between active compounds than inactive compounds; however, for in-house projects the converse was true, which makes rediscovery (if so desired) more difficult. We hence show that the generative model recovers very few middle/late-stage compounds from real-world drug discovery projects, highlighting the fundamental difference between purely algorithmic design and drug discovery as a real-world process. Evaluating de novo compound design approaches appears, based on the current study, difficult or even impossible to do retrospectively.Scientific Contribution This contribution hence illustrates aspects of evaluating the performance of generative models in a real-world setting which have not been extensively described previously and which hopefully contribute to their further future development.

Collapse

Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023;13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]

Affiliation(s)

Petr Kouba Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic Faculty of Electrical Engineering, Czech Technical University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
Pavel Kohout Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne’s University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
Faraneh Haddadi Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne’s University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
Anton Bushuiev Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
Raman Samusevich Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
Jiri Sedlar Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
Jiri Damborsky Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne’s University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
Tomas Pluskal Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
Josef Sivic Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
Stanislav Mazurenko Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne’s University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic

Collapse

Ma K, He S, Sinha G, Ebadi A, Florea A, Tremblay S, Wong A, Xi P. Towards Building a Trustworthy Deep Learning Framework for Medical Image Analysis. SENSORS (BASEL, SWITZERLAND) 2023;23:8122. [PMID: 37836952 PMCID: PMC10574977 DOI: 10.3390/s23198122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/05/2023] [Accepted: 08/30/2023] [Indexed: 10/15/2023]

Ma C, Wolfinger R. A prediction model for blood-brain barrier penetrating peptides based on masked peptide transformers with dynamic routing. Brief Bioinform 2023;24:bbad399. [PMID: 37985456 DOI: 10.1093/bib/bbad399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 09/26/2023] [Accepted: 10/17/2023] [Indexed: 11/22/2023] Open

Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA. Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 2023;15:73. [PMID: 37641120 PMCID: PMC10464382 DOI: 10.1186/s13321-023-00743-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open

Smajić A, Rami I, Sosnin S, Ecker GF. Identifying Differences in the Performance of Machine Learning Models for Off-Targets Trained on Publicly Available and Proprietary Data Sets. Chem Res Toxicol 2023;36:1300-1312. [PMID: 37439496 PMCID: PMC10445286 DOI: 10.1021/acs.chemrestox.3c00042] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Indexed: 07/14/2023]

Lanini J, Santarossa G, Sirockin F, Lewis R, Fechner N, Misztela H, Lewis S, Maziarz K, Stanley M, Segler M, Stiefl N, Schneider N. PREFER: A New Predictive Modeling Framework for Molecular Discovery. J Chem Inf Model 2023;63:4497-4504. [PMID: 37487018 DOI: 10.1021/acs.jcim.3c00523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]

Ansari M, White AD. Learning Peptide Properties with Positive Examples Only. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.01.543289. [PMID: 37333233 PMCID: PMC10274696 DOI: 10.1101/2023.06.01.543289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]

Wang H, Zhu G, Izu LT, Chen-Izu Y, Ono N, Altaf-Ul-Amin MD, Kanaya S, Huang M. On QSAR-based cardiotoxicity modeling with the expressiveness-enhanced graph learning model and dual-threshold scheme. Front Physiol 2023;14:1156286. [PMID: 37228825 PMCID: PMC10203956 DOI: 10.3389/fphys.2023.1156286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 04/05/2023] [Indexed: 05/27/2023] Open

Almukadi H, Jadkarim GA, Mohammed A, Almansouri M, Sultana N, Shaik NA, Banaganapalli B. Combining machine learning and structure-based approaches to develop oncogene PIM kinase inhibitors. Front Chem 2023;11:1137444. [PMID: 36970406 PMCID: PMC10036574 DOI: 10.3389/fchem.2023.1137444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 02/09/2023] [Indexed: 03/12/2023] Open

Behnoush AH, Khalaji A, Rezaee M, Momtahen S, Mansourian S, Bagheri J, Masoudkabir F, Hosseini K. Machine learning-based prediction of 1-year mortality in hypertensive patients undergoing coronary revascularization surgery. Clin Cardiol 2023;46:269-278. [PMID: 36588391 PMCID: PMC10018097 DOI: 10.1002/clc.23963] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 12/12/2022] [Accepted: 12/19/2022] [Indexed: 01/03/2023] Open

Affiliation(s)

Amir Hossein Behnoush Tehran Heart Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran School of MedicineTehran University of Medical SciencesTehranIran Non‐Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences InstituteTehran University of Medical SciencesTehranIran
Amirmohammad Khalaji Tehran Heart Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran School of MedicineTehran University of Medical SciencesTehranIran Non‐Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences InstituteTehran University of Medical SciencesTehranIran
Malihe Rezaee Tehran Heart Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran Non‐Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences InstituteTehran University of Medical SciencesTehranIran School of MedicineShahid Beheshti University of Medical SciencesTehranIran
Shahram Momtahen Department of Surgery, Tehran Heart CenterTehran University of Medical SciencesTehranIran
Soheil Mansourian Department of Surgery, Tehran Heart CenterTehran University of Medical SciencesTehranIran
Jamshid Bagheri Department of Surgery, Tehran Heart CenterTehran University of Medical SciencesTehranIran
Farzad Masoudkabir Tehran Heart Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran
Kaveh Hosseini Tehran Heart Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research InstituteTehran University of Medical SciencesTehranIran

Collapse

Andrade KM, Silva BPM, de Oliveira LR, Cury PR. Automatic dental biofilm detection based on deep learning. J Clin Periodontol 2023;50:571-581. [PMID: 36635042 DOI: 10.1111/jcpe.13774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/06/2022] [Accepted: 01/09/2023] [Indexed: 01/14/2023]

Leventi-Peetz AM, Weber K. Probabilistic machine learning for breast cancer classification. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023;20:624-655. [PMID: 36650782 DOI: 10.3934/mbe.2023029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]

Cheminformatics analysis of chemicals that increase estrogen and progesterone synthesis for a breast cancer hazard assessment. Sci Rep 2022;12:20647. [PMID: 36450809 PMCID: PMC9712655 DOI: 10.1038/s41598-022-24889-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/22/2022] [Indexed: 12/03/2022] Open

Abstract

Factors that increase estrogen or progesterone (P4) action are well-established as increasing breast cancer risk, and many first-line treatments to prevent breast cancer recurrence work by blocking estrogen synthesis or action. In previous work, using data from an in vitro steroidogenesis assay developed for the U.S. Environmental Protection Agency (EPA) ToxCast program, we identified 182 chemicals that increased estradiol (E2up) and 185 that increased progesterone (P4up) in human H295R adrenocortical carcinoma cells, an OECD validated assay for steroidogenesis. Chemicals known to induce mammary effects in vivo were very likely to increase E2 or P4 synthesis, further supporting the importance of these pathways for breast cancer. To identify additional chemical exposures that may increase breast cancer risk through E2 or P4 steroidogenesis, we developed a cheminformatics approach to identify structural features associated with these activities and to predict other E2 or P4 steroidogens from their chemical structures. First, we used molecular descriptors and physicochemical properties to cluster the 2,012 chemicals screened in the steroidogenesis assay using a self-organizing map (SOM). Structural features such as triazine, phenol, or more broadly benzene ramified with halide, amine or alcohol, are enriched for E2 or P4up chemicals. Among E2up chemicals, phenol and benzenone are found as significant substructures, along with nitrogen-containing biphenyls. For P4up chemicals, phenol and complex aromatic systems ramified with oxygen-based groups such as flavone or phenolphthalein are significant substructures. Chemicals that are active for both E2up and P4up are enriched with substructures such as dihydroxy phosphanedithione or are small chemicals that contain one benzene ramified with chlorine, alcohol, methyl or primary amine. These results are confirmed with a chemotype ToxPrint analysis. Then, we used machine learning and artificial intelligence algorithms to develop and validate predictive classification QSAR models for E2up and P4up chemicals. These models gave reasonable external prediction performances (balanced accuracy ~ 0.8 and Matthews Coefficient Correlation ~ 0.5) on an external validation. The QSAR models were enriched by adding a confidence score that considers the chemical applicability domain and a ToxPrint assessment of the chemical. This profiling and these models may be useful to direct future testing and risk assessments for chemicals related to breast cancer and other hormonally-mediated outcomes.

Collapse

Boldini D, Friedrich L, Kuhn D, Sieber SA. Tuning gradient boosting for imbalanced bioassay modelling with custom loss functions. J Cheminform 2022;14:80. [DOI: 10.1186/s13321-022-00657-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/30/2022] [Indexed: 11/12/2022] Open

Thomas M, O’Boyle NM, Bender A, de Graaf C. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. J Cheminform 2022;14:68. [PMID: 36192789 PMCID: PMC9531503 DOI: 10.1186/s13321-022-00646-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/23/2022] [Indexed: 11/10/2022] Open

Abstract

A plethora of AI-based techniques now exists to conduct de novo molecule generation that can devise molecules conditioned towards a particular endpoint in the context of drug design. One popular approach is using reinforcement learning to update a recurrent neural network or language-based de novo molecule generator. However, reinforcement learning can be inefficient, sometimes requiring up to 105 molecules to be sampled to optimize more complex objectives, which poses a limitation when using computationally expensive scoring functions like docking or computer-aided synthesis planning models. In this work, we propose a reinforcement learning strategy called Augmented Hill-Climb based on a simple, hypothesis-driven hybrid between REINVENT and Hill-Climb that improves sample-efficiency by addressing the limitations of both currently used strategies. We compare its ability to optimize several docking tasks with REINVENT and benchmark this strategy against other commonly used reinforcement learning strategies including REINFORCE, REINVENT (version 1 and 2), Hill-Climb and best agent reminder. We find that optimization ability is improved ~ 1.5-fold and sample-efficiency is improved ~ 45-fold compared to REINVENT while still delivering appealing chemistry as output. Diversity filters were used, and their parameters were tuned to overcome observed failure modes that take advantage of certain diversity filter configurations. We find that Augmented Hill-Climb outperforms the other reinforcement learning strategies used on six tasks, especially in the early stages of training or for more difficult objectives. Lastly, we show improved performance not only on recurrent neural networks but also on a reinforcement learning stabilized transformer architecture. Overall, we show that Augmented Hill-Climb improves sample-efficiency for language-based de novo molecule generation conditioning via reinforcement learning, compared to the current state-of-the-art. This makes more computationally expensive scoring functions, such as docking, more accessible on a relevant timescale.

Collapse

Machine learning algorithms identify demographics, dietary features, and blood biomarkers associated with stroke records. J Neurol Sci 2022;440:120335. [PMID: 35863116 DOI: 10.1016/j.jns.2022.120335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/26/2022] [Accepted: 07/05/2022] [Indexed: 11/22/2022]

Abstract

OBJECTIVE

We conducted a comprehensive evaluation of features associated with stroke records.

METHODS

We screened the dietary nutrients, blood biomarkers, and clinical information from the National Health and Nutrition Examination Survey (NHANES) 2015-16 database to assess a self-reported history of all strokes (136 strokes, n = 4381). We computed feature importance, built machine learning (ML) models, developed a nomogram, and validated the nomogram on NHANES 2007-08, 2017-18, and the baseline UK Biobank. We calculated the odds ratios with/without adjusting sampling weights (OR/OR_w).

RESULTS

The clinical features have the best predictive power compared to dietary nutrients and blood biomarkers, with 22.8% increased average area under the receiver operating characteristic curves (AUROC) in ML models. We further modeled with ten most important clinical features without compromising the predictive performance. The key features positively associated with stroke include age, cigarette smoking, tobacco smoking, Caucasian or African American race, hypertension, diabetes mellitus, asthma history; the negatively associated feature is the family income. The nomogram based on these key features achieved good performances (AUROC between 0.753 and 0.822) on the test set, the NHANES 2007-08, 2017-18, and the UK Biobank. Key features from the nomogram model include age (OR = 1.05, OR_w = 1.06), Caucasian/African American (OR = 2.68, OR_w = 2.67), diabetes mellitus (OR = 2.30, OR_w = 1.99), asthma (OR = 2.10, OR_w = 2.41), hypertension (OR = 1.86, OR_w = 2.10), and income (OR = 0.83, OR_w = 0.81).

CONCLUSIONS

We identified clinical key features and built predictive models for assessing stroke records with high performance. A nomogram consisting of questionnaire-based variables would help identify stroke survivors and evaluate the potential risk of stroke.

Collapse

Gaber A, Taher MF, Wahed MA, Shalaby NM, Gaber S. Classification of facial paralysis based on machine learning techniques. Biomed Eng Online 2022;21:65. [PMID: 36071434 PMCID: PMC9449956 DOI: 10.1186/s12938-022-01036-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 08/24/2022] [Indexed: 11/11/2022] Open

Alsaui AA, Alghofaili YA, Alghadeer M, Alharbi FH. Resampling Techniques for Materials Informatics: Limitations in Crystal Point Groups Classification. J Chem Inf Model 2022;62:3514-3523. [DOI: 10.1021/acs.jcim.2c00666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Walter M, Allen LN, de la Vega de León A, Webb SJ, Gillet VJ. Analysis of the benefits of imputation models over traditional QSAR models for toxicity prediction. J Cheminform 2022;14:32. [PMID: 35672779 PMCID: PMC9172131 DOI: 10.1186/s13321-022-00611-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/12/2022] [Indexed: 11/21/2022] Open

Xu M, Yang H, Liu G, Tang Y, Li W. In Silico Prediction of Chemical Aquatic Toxicity by Multiple Machine Learning and Deep Learning Approaches. J Appl Toxicol 2022;42:1766-1776. [PMID: 35653511 DOI: 10.1002/jat.4354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/16/2022] [Accepted: 05/31/2022] [Indexed: 11/08/2022]

H Attia M, H Attia M, Tarek Farghaly Y, Ahmed El-Sayed Abulnoor B, Curate F. Performance of the supervised learning algorithms in sex estimation of the proximal femur: A comparative study in contemporary Egyptian and Turkish samples. Sci Justice 2022;62:288-309. [PMID: 35598923 DOI: 10.1016/j.scijus.2022.03.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 12/27/2021] [Accepted: 03/06/2022] [Indexed: 10/18/2022]

García-Cebollada H, López A, Sancho J. Protposer: the web server that readily proposes protein stabilizing mutations with high PPV. Comput Struct Biotechnol J 2022;20:2415-2433. [PMID: 35664235 PMCID: PMC9133766 DOI: 10.1016/j.csbj.2022.05.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/05/2022] [Accepted: 05/05/2022] [Indexed: 01/23/2023] Open

Deep learning model calibration for improving performance in class-imbalanced medical image classification tasks. PLoS One 2022;17:e0262838. [PMID: 35085334 PMCID: PMC8794113 DOI: 10.1371/journal.pone.0262838] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 01/05/2022] [Indexed: 11/19/2022] Open

Bhat HS, Reeves ME, Goldman‐Mellor S. Equity‐Weighted Bootstrapping: Examples and Analysis. Stat (Int Stat Inst) 2022. [DOI: 10.1002/sta4.456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Jeong W, Gaggioli CA, Gagliardi L. Active Learning Configuration Interaction for Excited-State Calculations of Polycyclic Aromatic Hydrocarbons. J Chem Theory Comput 2021;17:7518-7530. [PMID: 34787422 PMCID: PMC8675132 DOI: 10.1021/acs.jctc.1c00769] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Indexed: 11/30/2022]

Patrick Walters W. Comparing classification models-a practical tutorial. J Comput Aided Mol Des 2021;36:381-389. [PMID: 34549368 DOI: 10.1007/s10822-021-00417-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Accepted: 08/18/2021] [Indexed: 01/17/2023]

Amendola G, Cosconati S. PyRMD: A New Fully Automated AI-Powered Ligand-Based Virtual Screening Tool. J Chem Inf Model 2021;61:3835-3845. [PMID: 34270903 DOI: 10.1021/acs.jcim.1c00653] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]